Section 5-6 Analysis Files
The analysis files for Section 5.6 Weather Drivers come in two parts: the Python code to produce the results comparisons and a set of pickle files containing the example results.
There are four files included in the Python Files:
- result2pickle.py: a script to convert the standard XLSX or JSON output file to a pickle files
- save_full_WD_pickles_by_variable.py: a script to transfer the values from all of the results pickles files to files based on variables
- WD_analysis_by_variable: a script to perform the analysis and produce the spreadsheet and chart files
- wd_config.yaml: a configuration file to set the options for the analysis script
Weather Drivers Test Suite Analysis Instructions
The following instructions explain how to run the python code to generate the pickle files and the analysis chart and graphs. Pickle files are a python specific method of saving variables from memory to disk, speeding up subsequent analysis.
The following instructions assume you have python installed with the proper packages and can open a command prompt in the directory with the python analysis code. See the section on Python Setup/Installation if you need to set up your python.
The basic workflow consists of three steps:
- Convert trial result pickles to "pickles by variable"
- Convert user results to pickle
- Run the analysis on the pickles
Steps 2 and 3 are repeated as the user trial results are updated or as changes are made to analysis options in the configuration file.
1) Convert trial result pickles to "Pickles by Variable"Generates a new set of pickles from the trial result pickles that are faster to read and process. This only needs to be done once after downloading the trial result pickles above.
python save_full_WD_pickles_by_variable.py -v
The output pickles are placed in the directory WD_pickles_by_variable, creating it if it doesn't exist. The "-v" option gives verbose output. The default input and output directories can be changed by editing the wd_config.yaml file.
2) Create a pickle file of user test results
Create a pickle file from the test results generated by the user (user.xlsx, user.json). This step is required to compare test results to trial runs and is necessary any time the user test results change. It creates a pickle file named user.p in the current directory.
python results2pickle -v userfile.xlsx
python results2pickle -v userfile.json
3) Run the analysis
Run the analysis and create the CSV and PDF plots which are put into the WD_output directory, creating it if it doesn't exist. If comparing user results to trial results, this needs to be re-run anytime the user test results (and user.p changes).
python WD_analysis_by_variable.py -v
python WD_analysis_by_variable.py -v-u userfile.p
The default input and output directories, types of analysis, and many other settings can be changed by editing the wd_config.yaml file.
The user can utilize an alternative yaml config file by adding the option -c config.yaml to any program command line. It is recommended that the user copy and edit wd_config.yaml rather than writing from scratch or editing wd_config.yaml without a backup copy.
More information about the individual python tools and command line options can be found at the end of this document in section More Information about the Python Code.
The Weather Drivers python code expects a python system using Python V3.6 or greater.
- If you have a working Anaconda Python system installation and do not want a separate environment, your system should be ready to run the Weather Drivers python code.
- If you have a working Anaconda Python installation but want to create a new environment to separate your workflow from other python development, skip to Create Minimal Conda Environment below.
- If you are an experienced python developer/user with a working python installation that is not Anaconda/Miniconda then you can skip to Pip Setup Instructions
- If you are inexperienced with python and/or have no python installation, the use of Miniconda is recommended. Follow the Installing Miniconda instructions below.
Miniconda is a minimal python installation that includes conda (a package manager), Python, and a few other necessary packages. Download the latest install file from the web site and follow the install instructions. (https://docs.conda.io/en/latest/miniconda.html)
Note: With MS windows, using the default install instructions sets up a "user" install that does not require admin privileges to install and is only accessible by the user who is logged in. It is also separate from any other python installations.
For further installation or analysis, you will need to open an Anaconda python prompt. In Windows, look for Anaconda Prompt in the Anaconda3 folder of the Windows Start Menu.
If you only plan to use this python setup for Weather Drivers analysis there is no need to create a separate environment and you can skip to the Install Conda Packages section below.
If you plan to possibly use Miniconda for other python development/analysis it is recommended that you create a separate environment for Weather Drivers. Follow the instructions to Create Minimal Conda Environment below.
If you want a python system set up specifically for Weather Driver analysis, it is recommended that you create a new environment with only the required packages. If you just downloaded and installed miniconda and only plan to use python for Weather Drivers analysis you can skip creating and activating a new environment and skip to Installing Conda Packages.
If you might want to use your miniconda install for other projects, the use of a separate environment is recommended and you can follow the instructions below.
- Use the following command lines at the Anaconda prompt to create and activate a new environment named "WD" for weather driver analysis (you can change the name to whatever you want). The following command creates a minimal environment with a current version of python
conda create --name WD python
conda activate WD
Note: You will need to "activate" the environment every time you open a new Anaconda prompt to make sure you are "in" the correct environment.
Now you want to install the required python packages in your new environment (activate it first) or your bare miniconda install. First, though it is good practice to update the conda environment (do this even if you just installed miniconda). The joblib package is only needed on a PC to run the “fast” option for converting results to pickle files. If you get an error, rerun the command without joblib.
- From an anaconda command prompt give the following commands.
conda update conda
conda install joblib pandas openpyxl seaborn pyyaml
That’s it. Your python setup is complete.
From your command prompt, enter the following pip command to install the require packages. The joblib package is only needed on a PC to run the “fast” option for converting results to pickle files. If you get an error, rerun the command without joblib.
pip install joblib pandas openpyxl seaborn pyyaml
That’s it. Your python setup is complete
The python routines use a common yaml config file for setting input and output directories and other test settings. The default name is wd_config.yaml. Users should edit a copy and then identify the alternate config file at the command line using the "-c" option.
All tools take the "-v" verbose option. You should use the "-v" option in general to be able to see what stage the analysis/conversion is in.
Converting Outputs to Pickles (results2pickle.py)
If you have outputs from the Weather Drivers test suite, they first need to be converted to a pickle file using results2pickle.py. This tool will convert both JSON or XLSX to pickles and automatically detects file type based on extension name.
usage: pythonresults2pickle.py filename
filename - name of file to convert. Must be .xlsx or .json format
-c config_file: alternate yaml_config file that sets variable names, software names, and pickle directories.
-p pickle_dir: directory to store generated pickle files. Overrides directory in the yaml config file
-v/-vv : verbose and veryverbose, prints output as the program runs. Without -v only errors print to the screen
-f: fast. on windows this spawns a VBA script which runs excel, saves to CSV and processes those (see notes below)
-y: yes to overwrite pickle files without prompt
program output: a pickle file with a python list/dictionary of dataframes with info and test results.
- JSON files process fairly quickly (10s of seconds)
- Smaller XLSX files (under 15MB or so) process fairly quickly (10s of seconds)
- Larger XLSX files (over 15MB) can start to take much longer (10s to 100s of seconds). The -f option can be used to speed up processing 2x to 5x depending on size. Using this option is more fragile/temperamental than the default “slow” version. Formatting errors or extra/blank data cells in the .xlsx can cause it to crash. Users should test the “fast” by comparing analysis results with those created using the “slow” version. If the results match, users can subsequently use the ”fast” version with confidence
Converting Pickles by Program to Pickles by Variable (save_full_WD_pickles_by_variable.py)
The save_full_WD_pickles_by_variable.py routine converts the pickles saved by program to pickles saved by variable. These “by-variable” pickles are much smaller and load much faster, speeding subsequent analysis, especially when only a few variables or a subset of tests is desired to be analyzed.
usage: python save_full_WD_pickles_by_variable.py
-c config_file: alternate yaml_config file
-p output_pickle_dir: directory to store the “by-variable” pickles. Overrides directory in the yaml config file
-v : verbose output, prints output as the program runs. Without -v only errors print to the screen
-y: yes to overwrite pickle files without prompt
program output: a large set of pickle files
Analyze the Results (WD_analysis_by_variable.py)
The analysis charts and tables are produced using the WD_analysis_by_variable.py routine. This file uses the same yaml config file as the other tools.
usage: python WD_analysis_by_variable.py
-c config_file: alternate yaml_config file.
-o output_dir: alternate output directory for output results.
-p pickle_dir: alternate location for by-variable pickle files.
-v: verbose output.
program output: a CSV format analysis table and a large number of PDF files containing analysis plots.