About betavioplot ================= Author: Alex Godfrey Date: March 2015 Purpose ------- The goal of betavioplot is to facilitate the comparison of observations from multiple classes made along a single measured dimension by giving a sense of the shape of the data's distribution within each class while still showing individual points. It was designed specifically to work well when... * the data might not be unimodally distributed; * there are very different numbers observations between classes; * the number of observations in some (or all) classes might be very small. betavioplot uses random numbers drawn from beta distributions to spread the data points into a violin-like shape, hence the name ``beta-violin'' plot. Dependencies ------------ betavioplot was developed using Python 2.7; its functionality has not been tested using other versions of Python. betavioplot requires `numpy` and `matplotlib` to be installed. Using betavioplot ================= Data file format ---------------- Data should be supplied in an uncompressed text file with one line for each class of data. The format of each line should be `\t\n` For example: apples,r 1.5,1.2,3.0,4,8 oranges,b 1,3,10,3,2,1.1 lemons,y 0.5,0.1,0.2,0.3,0.8 The above example contains data for three classes, 'apples', 'oranges', and 'lemons'. The points in these classes will be colored red, blue, and yellow, respectively. It is not necessary to specify colors in the data file unless one wants finer control over the class colors. More on colors in the plotting parameters section below. Running betavioplot with a control file (recommended) ----------------------------------------------------- Running betavioplot with a control file gives fuller control over the plotting parameters. All plotting parameters, as well as paths to input and output files, are set in the control file. One argument-value pair is allowed per line of the control file, separated by a ':'. All whitespace will be removed except when set inside of single or double quotation marks. Any characters including and after a '#' will be considered a comment on any line. See the template control file `betavioplot_template.ctl` for details. The only two required parameters in the control file are 'datafile' and 'outfile'; the default values of other parameters will be used if not specified by the user. Once the control file has been set up, the program can be run from the command line with `python betavioplot.py -c control.ctl` Running betavioplot without a control file ------------------------------------------ betavioplot can be run from the command line without a control file, by specifying the data file with '-d' and, optionally, the output plot file with '-o'. The only the names of the classes and their colors can be controlled by this method. `python betavioplot.py -d data.txt -o my_plot.pdf` Plotting Parameters ------------------- All of these parameters can be set in the control file. If they are not set or are left commented out, the default values will be used. ### Input/Output (Required) ### * datafile : file containing data formatted as described above * outfile : file to write plot to ### Plot Labels (Optional) ### * x_label : label for the x-axis; enclose in single or double quotes if there are spaces * y_label : label for the y-axis; enclose in single or double quotes if there are spaces * title : title for the plot; enclose in single or double quotes to use spaces ### Plotting Parameters (Optional) ### * fig_height : figure height in inches (0 < fig_height; default = 5) * fig_width : figure width in inches (0 < fig_width; default = 7) * y_min : the lower bound of the y-axis (default: set automatically) * y_max : the upper bound of the y-axis (default: set automatically) * alpha : transparency of the points (0 < alpha <= 1; default = 0.5) * marker : the shape of the points; see http://matplotlib.org/api/markers_api.html for options (default = 'o' (a circle)) * col_default : the default color for a class when none is specified; multiple types of input are accepted; basic built-in colors: r (red), b (blue), g (green), c (cyan), y (yellow), m (magenta), k (black), w (white); shades of gray can be specified with a float on the range [0, 1], where values closer to 1 are lighter (e.g., 0.8 encodes a light gray); the hex encoding of an RGB color is also accepted, using the prefix 'hex_' (e.g., hex_efefab); finally, there are two built-in color palettes, 'dark2' (as in the ColorBrewer Dark2 qualitative palette) and 'dark9', which have 8 and 9 colors, respectively. If a color palette is set by the user in the control file, the classes will rotate through the color in the palette and any color information in the data file will be ignored. * filled : ['True' | 'False'] if True, plotted points are filled; otherwise, plotted points are open (default = False) * vert_labs : ['True' | 'False'] if True, class labels are oriented vertically * show_median : ['True' | 'False'] if True, the median value for each class is plotted as a line * med_color : the color of the line drawn at the median of each class; see 'col_default' for how to specify colors; 'dark2' and 'dark9' palettes cannot be used (default = 'k' (black)) * med_stroke : the thickness of the median line in points (0 <= med_stroke; ] default = 0.75) * taper : determines how strongly clusters of points taper to create a violin shape; larger values taper the points more strongly; 1 causes there to be no tapering (1 <= taper)