18. Equivalence Testing Bounds Plots
A statistical hypothesis is an assertion about the distribution function for one or more random variables. Generally, two hypotheses are considered, a null hypothesis denoted by H0, and an alternative denoted by either H1 or sometimes Ha. Most often H0 is a statement about the absence of an effect, e.g., no difference between two means. However, it is not required to specify the absence of an effect, and some hypothesis tests are constructed such that the null hypothesis is that there is an effect. Two types of errors can occur in this setting: a type I error happens when H0 is rejected but it is actually true, and a type II error occurs when it is not rejected even though it is false. Most often, a type I error is considered the more serious error, so tests are constructed a priori to control for this type of error. Power is the probability of rejecting the null hypothesis when it is false (i.e., the probability of not making a type II error). If the power is close to 1, the hypothesis test is good at detecting a false null hypothesis. Power is increased when a researcher increases sample size, as well as when a researcher selects stronger effect sizes and significance levels. An equivalence test is where H0 specifies that there is a difference, or that there is an effect. For example, that the difference between two means is not zero. In model validation, equivalence tests can help address the question of “Does the same code produce equivalent output using different computers, compilers, and/or input data?” In the equivalence testing framework, two one-sided tests are often used (Mara and Cribbie 2012). Two one-sided tests for paired data require a user-defined margin, or equivalence interval, within which the samples are considered equivalent. The choice of the equivalence interval is based on practical considerations and the research questions being addressed; in addition, the equivalence interval need not be symmetric. For a conclusive result, both one-sided tests must be rejected, concluding the difference of the means falls within the upper and lower equivalence bounds.
MET employs CI’s along with the user-defined tolerance bounds, which can be used to employ an equivalence test. In this case, H0 is accepted if the CI bounds fall completely between the tolerance bounds, and H0 is rejected otherwise.
Selection of options to produce the plot proceeds approximately counter-clockwise around the METviewer window. The steps to create a series plot are:
Select the desired database from the “Select databases” pulldown menu at the top margin of the METviewer window.
There are a number of tabs just under the database pulldown menu. Select the ‘Series’ tab, which is the leftmost tab.
Select the type of MET statistics that will be used to create the series plot. Click on the “Plot Data” pulldown menu which is located under the tabs. The list contains “Stat”, “MODE”, or “MODE-TD”. Select “Stat”.
Select the desired variable to calculate statistics for in the “Y1 Axis Variables” tab. The first pulldown menu in the “Y Dependent (Forecast) Variables” section lists the variables available in the selected database.
Select the desired statistic to calculate in the second pulldown menu which is to the right of the variable menu. This lists the available attribute statistics in the selected dataset. Multiple statistics can be selected and they will each be plotted as a separate line on the plot.
Select the Y1 Series Variable from the first pulldown menu in that section. There are many options. “MODEL” is used in the included example. In the second pulldown menu to the right of the first are the series variable options, for example, different models. For this plot at least two values of the Series Variable is needed.
Groups are nor supported for this plot type.
“Independent Variable” is optional and will be used as “Fixed Variable”.
Select the type of statistics summary by selecting either “Summary” or “Aggregation Statistics” button in the “Statistics” section. Aggregated statistics may be selected for certain varieties of statistics. The selection can be made from the leftmost dropdown menu in the “Statistics” section. By default, the median value of all statistics will be plotted. Using the dropdown menu, the mean or sum may be selected instead. Choosing this option will cause a single statistic to be calculated from the individual database lines.
Select Lower and Upper equivalence bounds in the Common section. The default values are -0.001 and 0.001
Create ETB derived curve by opening a pop-up menu with ‘Add Derived Curve” button in the bottom of the ‘Series Formatting” table. Select curves of interest and check ‘Equivalence Testing Bounds” radio button. Since the data for both curves should be similar, the event equalisation is enforced.
Now enough information has been entered to produce a graph. To do this, click the “Generate Plot” button at the top of the METviewer window (this is in red text). Typically, if a plot is not produced, it is because the database selected does not contain the correct type of data or the data is not similar. Also, it is imperative to check the data used for the plot by selecting the “R data” tab on the right hand side, above the plot area. The data from the database that is being used to calculate the statistics is listed in this tab. This tab should be checked to avoid the accidental accumulation of inappropriate database lines. For example, it does not make sense to accumulate statistics over different domains, thresholds, models, etc.
There are many other options for plots, but these are the basics.
The image below shows an example of the plot and set-up options for a series plot in METviewer. This example uses the database “mv_hrrr_sppmp_test” to plot “Stat” output for seven ensemble members. The Frequency bias over the East domain is plotted for 3-hour precipitation accumulation exceeding 0.254 cm. Appropriate titles and labels have been entered in the titles and labels tab shown below the plot. Colors and line formatting are shown across the bottom menu of the plot. The values here are the defaults.
Here is the associated xml for this example. It can be copied into an empty file and saved to the desktop then uploaded into the system by clicking on the “Load XML” button in the upper-right corner of the GUI. This XML can be downloaded from this link: etb_plot_xml.xml.
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <plot_spec> <connection> <host>dakota.rap.ucar.edu:3306</host> <database>mv_hrrr_sppmp_test</database> <user>******</user> <password>******</password> <management_system>mariadb</management_system> </connection> <rscript>/usr/local/bin/Rscript</rscript> <folders> <r_tmpl>/opt/vxwww/tomcat/webapps/metviewer_dev//R_tmpl</r_tmpl> <r_work>/opt/vxwww/tomcat/webapps/metviewer_dev//R_work</r_work> <plots>/opt/vxwww/tomcat/webapps/metviewer_output_dev//plots</plots> <data>/opt/vxwww/tomcat/webapps/metviewer_output_dev//data</data> <scripts>/opt/vxwww/tomcat/webapps/metviewer_output_dev//scripts</scripts> </folders> <plot> <template>series_plot.R_tmpl</template> <dep> <dep1> <fcst_var name="APCP_03"> <stat>FAR</stat> </fcst_var> </dep1> <dep2/> </dep> <series1> <field name="model"> <val>HRRR_sppmp_mem1_ctrlvx_hrconus</val> <val>HRRR_sppmp_mem2_ctrlvx_hrconus</val> <val>HRRR_sppmp_mem3_ctrlvx_hrconus</val> <val>HRRR_sppmp_mem4_ctrlvx_hrconus</val> </field> </series1> <series2/> <plot_fix> <field equalize="true" name="vx_mask"> <set name="vx_mask_0"> <val>EAST</val> </set> </field> <field equalize="true" name="fcst_thresh"> <set name="fcst_thresh_1"> <val>>=0.254</val> </set> </field> </plot_fix> <plot_cond/> <indep equalize="true" name="fcst_lead"/> <plot_stat>median</plot_stat> <tmpl> <data_file>plot_20210115_095414.data</data_file> <plot_file>plot_20210115_095414.png</plot_file> <r_file>plot_20210115_095414.R</r_file> <title>ETB</title> <x_label>Difference between models</x_label> <y1_label/> <y2_label/> <caption/> <job_title/> <keep_revisions>false</keep_revisions> <listdiffseries1>list(c("HRRR_sppmp_mem1_ctrlvx_hrconus APCP_03 FAR","HRRR_sppmp_mem2_ctrlvx_hrconus APCP_03 FAR","ETB"),c("HRRR_sppmp_mem3_ctrlvx_hrconus APCP_03 FAR","HRRR_sppmp_mem4_ctrlvx_hrconus APCP_03 FAR","ETB"))</listdiffseries1> <listdiffseries2>list()</listdiffseries2> </tmpl> <execution_type>Python</execution_type> <event_equal>true</event_equal> <vert_plot>false</vert_plot> <x_reverse>false</x_reverse> <num_stats>false</num_stats> <indy1_stag>false</indy1_stag> <indy2_stag>false</indy2_stag> <grid_on>true</grid_on> <sync_axes>false</sync_axes> <dump_points1>false</dump_points1> <dump_points2>false</dump_points2> <log_y1>false</log_y1> <log_y2>false</log_y2> <varianceinflationfactor>false</varianceinflationfactor> <plot_type>png16m</plot_type> <plot_height>8.5</plot_height> <plot_width>11</plot_width> <plot_res>72</plot_res> <plot_units>in</plot_units> <mar>c(8,4,5,4)</mar> <mgp>c(1,1,0)</mgp> <cex>1</cex> <title_weight>2</title_weight> <title_size>1.4</title_size> <title_offset>-2</title_offset> <title_align>0.5</title_align> <xtlab_orient>1</xtlab_orient> <xtlab_perp>-0.75</xtlab_perp> <xtlab_horiz>0.5</xtlab_horiz> <xtlab_freq>0</xtlab_freq> <xtlab_size>1</xtlab_size> <xlab_weight>1</xlab_weight> <xlab_size>1</xlab_size> <xlab_offset>2</xlab_offset> <xlab_align>0.5</xlab_align> <ytlab_orient>1</ytlab_orient> <ytlab_perp>0.5</ytlab_perp> <ytlab_horiz>0.5</ytlab_horiz> <ytlab_size>1</ytlab_size> <ylab_weight>1</ylab_weight> <ylab_size>1</ylab_size> <ylab_offset>-2</ylab_offset> <ylab_align>0.5</ylab_align> <grid_lty>3</grid_lty> <grid_col>#cccccc</grid_col> <grid_lwd>1</grid_lwd> <grid_x>listX</grid_x> <x2tlab_orient>1</x2tlab_orient> <x2tlab_perp>1</x2tlab_perp> <x2tlab_horiz>0.5</x2tlab_horiz> <x2tlab_size>0.8</x2tlab_size> <x2lab_size>0.8</x2lab_size> <x2lab_offset>-0.5</x2lab_offset> <x2lab_align>0.5</x2lab_align> <y2tlab_orient>1</y2tlab_orient> <y2tlab_perp>0.5</y2tlab_perp> <y2tlab_horiz>0.5</y2tlab_horiz> <y2tlab_size>1</y2tlab_size> <y2lab_size>1</y2lab_size> <y2lab_offset>1</y2lab_offset> <y2lab_align>0.5</y2lab_align> <legend_box>o</legend_box> <legend_inset>c(0, -.25)</legend_inset> <legend_ncol>3</legend_ncol> <legend_size>0.8</legend_size> <caption_weight>1</caption_weight> <caption_col>#333333</caption_col> <caption_size>0.8</caption_size> <caption_offset>3</caption_offset> <caption_align>0</caption_align> <ci_alpha>0.05</ci_alpha> <eqbound_low>-0.5</eqbound_low> <eqbound_high>-0.5</eqbound_high> <plot_ci>c("none","none","none","none","none","none")</plot_ci> <show_signif>c(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)</show_signif> <plot_disp>c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE)</plot_disp> <colors>c("#ff0000FF","#8000ffFF","#00aaffFF","#0040ffFF","#000000FF","#0000ffFF")</colors> <pch>c(20,20,20,20,20,20)</pch> <type>c("b","b","b","b","b","b")</type> <lty>c(1,1,1,1,1,1)</lty> <lwd>c(1,1,1,1,1,1)</lwd> <con_series>c(1,1,1,1,1,1)</con_series> <order_series>c(1,2,3,4,5,6)</order_series> <plot_cmd/> <legend>c("","","","","","")</legend> <create_html>TRUE</create_html> <y1_lim>c()</y1_lim> <x1_lim>c()</x1_lim> <y1_bufr>0.04</y1_bufr> <y2_lim>c()</y2_lim> </plot> </plot_spec>