Difference between revisions of "Significance of E. Coli Evolution Experiments"
m |
|||
| Line 272: | Line 272: | ||
|0.22 | |0.22 | ||
|} | |} | ||
| + | |||
| + | Applying the chi-square test to the Blount data violates the parameters under which the chi-square test produces accurate results. The number of 'expected mutants' is below one in all cells, and the chi-square test is not to be used when expected cell numbers are less than five. <ref>http://www.okstate.edu/ag/agedcm4h/academic/aged5980a/5980/newpage28.htm</ref> <ref>http://www.wellesley.edu/Psychology/Psych205/chisquareindep.html</ref><ref>http://www.graphpad.com/www/Book/Choose.htm</ref> | ||
| + | The total number of mutants, four in the first replay and eight in the third, fall below the lower cutoff for the chi-square, which is not felt to produce reliable results for n of less than twenty.<ref>http://faculty.chass.ncsu.edu/garson/PA765/chisq.htm</ref><ref>http://www.basic.northwestern.edu/statguidefiles/gf-dist_ass_viol.html</ref> | ||
| + | |||
| + | Under conditions of low n and low expected cell counts, the chi-square test is too conservative and results in estimated p-values that are too high.<ref>http://www.graphpad.com/www/Book/Choose.htm</ref><ref>http://mysite.du.edu/~jcalvert/econ/chisquar.htm</ref><ref>http://www.basic.northwestern.edu/statguidefiles/gf-dist_ass_viol.html</ref> | ||
| + | |||
| + | Furthermore, the chi-square test assumes that there are no relationships between categories; it should not be used when the same underlying population is tested repeatedly over time, as is done in the Blount experiment.<ref>http://faculty.chass.ncsu.edu/garson/PA765/chisq.htm</ref><ref>http://www.okstate.edu/ag/agedcm4h/academic/aged5980a/5980/newpage28.htm | ||
| + | </ref> Thus the chi-square p-value of 0.004 calculated for replay experiment 2 cannot be interpreted as supporting Blount's conclusions, since this low p-value was derived through the inappropriate application of the test. | ||
==References== | ==References== | ||
Revision as of 16:39, March 25, 2009
Blount, Borland, and Lenski[1] claimed that a key evolutionary innovation was observed during a laboratory experiment. That claim is false. The claim was based on incorrect measurements of statistical significance. Rather than using a test from the statistics literature, a flawed test was contrived and used to measure significance. The flawed test (“mean mutation generation”) produced artificially low p-values.
Contents
Test Statistics
The test statistic used in Blout, Borland, and Lenski was mean mutation generation. For example, the experiment one mean mutation generation is (see Tables 1 and 2 of the paper)
The null hypothesis from the paper was constant mutation rate over all generations. However, measurements of mean mutation
generation can fail to observe deviations from that null
hypothesis. Consider an experiment where the mutation
probabilities for generations 1, 2, and 3 are
,
, and
, respectively. If the mutation probabilities per generation
are
, then the mean mutation generation is 2.
However, if the mutation probabilities are
and
, then the mean mutation generation is still 2. In the
latter example the experiment has deviated from the null
hypothesis, but the mean mutation
generation is insensitive to the
change. Because this test can fail to observe deviations from the null hypothesis, levels of statistical significance computed using this test (p-values) are meaningless.
If the trials from an experiment have two outcomes (e.g. success or failure), then the chi-square test for independence can be written
where
is the number of successes (e.g. mutations) in the i-th experiment,
is the number of trials in the i-th experiment, and
is the fraction of trials that are successful in the all experiments. The chi-square test is, on average, at a minimum when all success probabilities are equal (
for
) and will, on average, increase whenever the data follows any other hypothesis (
where
is the mean of the success probabilities). Thus the chi-square test is an effective hypothesis test for the data and hypotheses from Blount et al.
Experiment One Data
The data from experiment one of the paper is shown below (see Table 1 of the paper). The expected outcomes under the null hypothesis (no evolutionary innovation occurs) are also shown.
| Generation | Trials | Mutants | Statics | Expected Mutants | Expected Statics |
|---|---|---|---|---|---|
| 0 | 6 | 0 | 6 | 0.333 | 5.667 |
| 10000 | 6 | 0 | 6 | 0.333 | 5.667 |
| 20000 | 6 | 0 | 6 | 0.333 | 5.667 |
| 25000 | 6 | 0 | 6 | 0.333 | 5.667 |
| 27500 | 6 | 0 | 6 | 0.333 | 5.667 |
| 29000 | 6 | 0 | 6 | 0.333 | 5.667 |
| 30000 | 6 | 0 | 6 | 0.333 | 5.667 |
| 30500 | 6 | 1 | 5 | 0.333 | 5.667 |
| 31000 | 6 | 0 | 6 | 0.333 | 5.667 |
| 31500 | 6 | 1 | 5 | 0.333 | 5.667 |
| 32000 | 6 | 0 | 6 | 0.333 | 5.667 |
| 32500 | 6 | 2 | 4 | 0.333 | 5.667 |
| Total | 72 | 4 | 68 | 4 | 68 |
When the flawed test is used to compute the significance of this data, the p-value is 0.0085 (see Table 2 of the paper). This p-value is considered statistically significant. However, when the data is analyzed using a standard method (the chi-square test) the p-value is 0.19. This p-value is much larger than the one from the paper and indicates that there is no reason to reject the null hypothesis. The chi-square test p-value for experiment two is small (0.0004). However, experiment three is not statistically significant because its p-value is 0.22.
The chi-square test is a common statistical method.[2] It can be implemented in Microsoft Excel. If the numbers from the last four columns of the experiment one data table (excluding the “totals” row) are entered into Excel in rows 1-12 and columns A-D, then the p-value can be computed by entering “=CHITEST(A1:B12,C1:D12)” into any empty cell of the spreadsheet.
Experiment Three Data
The experiment three data from Blount et al. is shown in the table below. The expected numbers of mutants under the null hypothesis (constant mutation rate) is also shown.
| Generation | Trials | Mutants | Statics | Expected Mutants | Expected Statics |
|---|---|---|---|---|---|
| 0 | 200 | 0 | 200 | 0.571 | 199.429 |
| 10000 | 200 | 0 | 200 | 0.571 | 199.429 |
| 20000 | 200 | 0 | 200 | 0.571 | 199.429 |
| 25000 | 200 | 0 | 200 | 0.571 | 199.429 |
| 27500 | 200 | 2 | 198 | 0.571 | 199.429 |
| 29000 | 200 | 0 | 200 | 0.571 | 199.429 |
| 30000 | 200 | 2 | 198 | 0.571 | 199.429 |
| 30500 | 200 | 0 | 200 | 0.571 | 199.429 |
| 31000 | 200 | 0 | 200 | 0.571 | 199.429 |
| 31500 | 200 | 0 | 200 | 0.571 | 199.429 |
| 32000 | 200 | 1 | 199 | 0.571 | 199.429 |
| 32500 | 200 | 1 | 199 | 0.571 | 199.429 |
| Total | 2800 | 8 | 2792 | 8 | 2792 |
Comparison of p-Values
The following table compares the p-values reported in Table 2 of Blount et al. to the chi-square p-values for the same experiments. For experiments one and three, the chi-square p-values are much larger than the "mean generation" test p-values from the paper.
| Experiment 1 | Experiment 2 | Experiment 3 | |
|---|---|---|---|
| p-Value from Paper | 0.0085 | 0.0007 | 0.082 |
| Chi-square p-value | 0.19 | 0.0004 | 0.22 |
Applying the chi-square test to the Blount data violates the parameters under which the chi-square test produces accurate results. The number of 'expected mutants' is below one in all cells, and the chi-square test is not to be used when expected cell numbers are less than five. [3] [4][5] The total number of mutants, four in the first replay and eight in the third, fall below the lower cutoff for the chi-square, which is not felt to produce reliable results for n of less than twenty.[6][7]
Under conditions of low n and low expected cell counts, the chi-square test is too conservative and results in estimated p-values that are too high.[8][9][10]
Furthermore, the chi-square test assumes that there are no relationships between categories; it should not be used when the same underlying population is tested repeatedly over time, as is done in the Blount experiment.[11][12] Thus the chi-square p-value of 0.004 calculated for replay experiment 2 cannot be interpreted as supporting Blount's conclusions, since this low p-value was derived through the inappropriate application of the test.
References
- ↑ http://www.pnas.org/content/105/23/7899.full.pdf
- ↑ Mathematical Statistics with Applications by Wackerly, Mendenhall, and Scheaffer, Section 14.4.
- ↑ http://www.okstate.edu/ag/agedcm4h/academic/aged5980a/5980/newpage28.htm
- ↑ http://www.wellesley.edu/Psychology/Psych205/chisquareindep.html
- ↑ http://www.graphpad.com/www/Book/Choose.htm
- ↑ http://faculty.chass.ncsu.edu/garson/PA765/chisq.htm
- ↑ http://www.basic.northwestern.edu/statguidefiles/gf-dist_ass_viol.html
- ↑ http://www.graphpad.com/www/Book/Choose.htm
- ↑ http://mysite.du.edu/~jcalvert/econ/chisquar.htm
- ↑ http://www.basic.northwestern.edu/statguidefiles/gf-dist_ass_viol.html
- ↑ http://faculty.chass.ncsu.edu/garson/PA765/chisq.htm
- ↑ http://www.okstate.edu/ag/agedcm4h/academic/aged5980a/5980/newpage28.htm
See Also
http://www.sciencenews.org/index/feature/activity/view/id/40006/title/Molecular_Evolution
http://sciencenews.org/view/generic/id/40649/title/FOR_KIDS_Hitting_the_redo_button_on_evolution


