Difference between revisions of "Talk:Significance of E. Coli Evolution Experiments"
(Problems with chi-squared test when assumptions are violated) |
FredFerguson (Talk | contribs) (Literature on Monte Carlo tests) |
||
| Line 36: | Line 36: | ||
::You are assuming that p-values are wrong based on a test that is inappropriate in this case due to data limitations. Did you perform a z-transformation on the chi-squared for the three data groups?--[[User:Able806|Able806]] 10:19, 11 March 2009 (EDT) | ::You are assuming that p-values are wrong based on a test that is inappropriate in this case due to data limitations. Did you perform a z-transformation on the chi-squared for the three data groups?--[[User:Able806|Able806]] 10:19, 11 March 2009 (EDT) | ||
| + | |||
| + | :::There's a large literature on various kinds of Monte Carlo test, a very short summary of which is that they're inevitably more accurate than parametric tests (e.g. F, t, chi-squared, etc) because they don't make assumptions about the distribution of the data under the null hypothesis. See for example ''Introduction to the Bootstrap'' by B. Efron and R. Tibshirani and ''The Jack-knife, the Bootstrap and Other Resampling Plans'', also by Efron. They're certainly applicable to small datasets and their accuracy is really only limited by the number of samples you care to take. E.g. 1000 M-C samples would give you a pretty accurate idea about significance at the alpha<1% level (That book should answer SJohnson's questions of 18:50 on 4/3/09 and 16:38 on 5/3/09 about accuracy and Aschalfly's comment of 17:07 on 5/3/09 about appropriateness of Monte Carlo tests.) [[User:FredFerguson|FredFerguson]] 16:53, 11 March 2009 (EDT) | ||
---- | ---- | ||
Revision as of 20:53, March 11, 2009
SJohnson, your assessment, while good in the utilization of the chi-squared test is unfortunately incorrect. The Monte Carlo resampling gives a more accurate p-value than the chi-squared. You may research the literature (i.e. publications in statistical mathematics, many pubs actualy compare Monte Carlo vs Chi Squared) to discover that this method is commonly used in advance statistical work and how it is more accurate than the chi-squared test.--Able806 17:00, 4 March 2009 (EST)
- It doesn’t make sense to compare the chi-square test, which is a specific statistical hypothesis test, to Monte Carlo methods, which can be used for anything from fluid motion modeling to p-value computations. You can use Monte Carlo methods to compute the p-values of the chi-square test!
- Monte Carlo methods involve the generation of random realizations. Your broad claim the Monte Carlo methods are “more accurate” than the chi-square test is obviously incorrect because the accuracy of Monte Carlo methods always depends on the number of random realizations generated. When p-values are small, Monte Carlo methods are notoriously inaccurate unless the number of realizations generated is enormous.
- Which publications compare Monte Carlo to chi-square and show that the former is more accurate? Could you provide specific examples? Thanks. SJohnson 18:50, 4 March 2009 (EST)
- In furtherance of SJohnson's remarks with respect to rarely occurring events, the use of the basic Monte Carlo method is plainly incorrect for modeling a rarely occurring event, as the Lenski paper did. This has long been pointed out in Flaws in Richard Lenski Study. I know evolutionists will never admit a flaw in anything promoting their pet theory, but this (and other) flaws in that paper is undeniable.
- Watch how evolutionists defended obvious errors in the Lenski paper, and then realize why the Piltdown Man fraud was taught for 40 years without evolutionists admitting it was a hoax.--Andy Schlafly 09:55, 5 March 2009 (EST)
- Andy, how exactly is the Monte Carlo method incorrect to use in this case? I have seen it used in publications with much smaller datasets.--Able806 10:29, 5 March 2009 (EST)
- Able806, I'm interested in looking at the publications you mentioned that use Monte Carlo methods to analyze small data sets. Could you provide some examples? Thanks. SJohnson 16:41, 5 March 2009 (EST)
- SJohnson, here are two papers, 1 and 2. Most are in chemistry and genetics where you find the observed to be much smaller and have to use the MCM. You can search on the subject as well and find that how Lenski performed the test is the standard for microbiological genetic analysis.--Able806 10:19, 11 March 2009 (EDT)
- Able806, you still seem to miss the point about how inappropriate the Monte Carlo method (as used in the Lenski paper) is for evaluating rarely occurring events. You need to open your mind to be productive. If you simply cling to a view that Lenski (who I don't think has any meaningful education in statistics) must somehow be right, then you're not going to make any progress in understanding the flaws.--Andy Schlafly 17:07, 5 March 2009 (EST)
- Andy, you still have not answered what you find inappropriate about his use of the Monte Carlo method? I am a reasonable person and with evidence I do have an open mind. I provided examples last week, with a working model, showing that Monte Carlo is better than the chi-square in this case. I have also shown where the Chi-Square was inappropriate due to the occurrence size as well. So if you have any evidence that Monte Carlo should not be used in the way that Lenski used please let it be shown.--Able806 10:19, 11 March 2009 (EDT)
Sjohnson, I believe you just proved my point. In the literature of mean and covariance structure analysis, non-central chi-square distribution is commonly used to describe the behavior of the likelihood ratio statistic under alternative hypothesis; it is widely believed that the non-central chi-square distribution is justified by statistical theory. Actually, when the null hypothesis is not trivially violated, the non-central chi-square distribution cannot describe the LR statistic well even when data are normally distributed and the sample size is large. Monte Carlo results compare the strength of the normal distribution against that of the non-central chi-square distribution. In an association analysis comparing cases and controls with respect to allele frequencies at a highly polymorphic locus, a potential problem is that the conventional chi-squared test may not be valid for a large, sparse contingency table. Reliance on statistics with known asymptotic distribution is unnecessary, as Monte Carlo simulations can be performed to estimate the significance level of the test statistic.
Here is a link to a great page the provides an interactive example as to why the Chi Squared test would provide poor results compared to the Monte Carlo in relation to the Lenski data workup.
Something you may have overlooked was that the data set is actually too small to use the chi square method correctly. It is often accepted that is any of the analyzed data falls under 10 for a particular cell of the data set then the Yates correction needs to be applied; unfortunately the Yates correction can over correct thus skewing the p-value. Lenksi seemed to understand this by supporting his Monte Carlo p-value results with the Fisher z-transformation p-value.
I hope this helps.--Able806 10:27, 5 March 2009 (EST)
- I’m still waiting to hear which literature says that “Monte Carlo resampling” is “more accurate than the chi-squared test”. The page mentioned above [1] is a discussion of why statisticians “fail to reject the null” rather than “accepting the null” when the p-value is above 0.05 or so. The page says nothing about superiority of Monte Carlo methods. Why were alternate hypothesis distributions mentioned? Only the null hypothesis distribution is used to calculate a p-value. Yates’s correction is for 2x2 contingency tables [2]. It doesn’t apply in this case. Finally, what the heck do “covariance structure analysis” and “allele frequencies at a highly polymorphic locus” have to do with this problem? SJohnson 16:38, 5 March 2009 (EST)
- SJohnson, I am looking for this paper for you, I cited it for one of my past publications dealing with allele frequencies (I believe it came from the Duke Biostatistics group). To answer your question about allele frequencies, that is the issue at hand, more about the genetics than the math, but it is the item being studied. So you stated that Yates can not be used and statistics says the number of occurrences is too small to evaluate using the Chi-Squared test so what would you recommend instead of the Monte-Carlo Method?
- Regarding the "Fisher z-transformation p-value" from the paper, garbage in garbage out. If the p-values were bad to begin with, then why would a combination of them be meaningful? SJohnson 10:49, 9 March 2009 (EDT)
- You are assuming that p-values are wrong based on a test that is inappropriate in this case due to data limitations. Did you perform a z-transformation on the chi-squared for the three data groups?--Able806 10:19, 11 March 2009 (EDT)
- There's a large literature on various kinds of Monte Carlo test, a very short summary of which is that they're inevitably more accurate than parametric tests (e.g. F, t, chi-squared, etc) because they don't make assumptions about the distribution of the data under the null hypothesis. See for example Introduction to the Bootstrap by B. Efron and R. Tibshirani and The Jack-knife, the Bootstrap and Other Resampling Plans, also by Efron. They're certainly applicable to small datasets and their accuracy is really only limited by the number of samples you care to take. E.g. 1000 M-C samples would give you a pretty accurate idea about significance at the alpha<1% level (That book should answer SJohnson's questions of 18:50 on 4/3/09 and 16:38 on 5/3/09 about accuracy and Aschalfly's comment of 17:07 on 5/3/09 about appropriateness of Monte Carlo tests.) FredFerguson 16:53, 11 March 2009 (EDT)
Quick question for SJohnson: How many degrees of freedom did you choose when calculating the p-value? I'd like to know upon what condition you base that number. Thanks.--Argon 11:05, 5 March 2009 (EST)
- The degree of freedom for a contingency table is rows minus one times columns minus one. That is,
. Here’s a pretty good tutorial I came across: [3]. For the experiments from [4], the DOFs are 11, 11, and 13. For experiment one, the chi-square test statistic is
- where
is the observed value and
is the expected null hypothesis value. So if you have MS Excel, another way to arrive at the p-value of 0.19 is to type “=CHIDIST(14.82,11)” into a cell. Cheers! SJohnson 16:38, 5 March 2009 (EST)
- OK, thanks for the info. From what I'd calculated and looked up in tables, the numbers seemed close to a df=11 for a chi-square of ~14. (Aside: With terms having 17/3 in the denominator in the figures above, were you using the test of independence? I was using Pearson's test for fit of a distribution which returns a chi-squared value of 14 and roughly matched the p-values you reported, assuming the df was 11).
- Also, the first sentence of the article reads: "Blount, Borland, and Lenski[1] claimed that a key evolutionary innovation was observed during a laboratory experiment. That claim is false." A small correction: There were several claims in the paper. The 'key evolutionary innovation' was acquiring the ability to utilize citrate as a food source. That claim was demonstrated multiple times. The claim, which pertains to this statistics discussion was that the Cit+ phenotype arose in a multi-step process, first requiring a rare, pre-adaptive mutation before additional mutation(s) lead to the subsequent development of citrate utilization.--Argon 20:46, 5 March 2009 (EST)
- My biology-degreed wife assures me that mutation does not necessarily mean that evolution occurred. What the paper claimed is that evolution (a “key innovation”) occurred in the lab. The key innovation supposedly increased the mutation rate. In the experiments, the observed mutation rate increased after generation 31,000, but not enough to make a statistically significant claim that the rate is not constant. The analysis in the paper was similar to flipping a coin ten times, counting six heads and claiming that the coin must be biased against tails. In reality, there’s nothing surprising about a fair coin producing slightly more of one outcome than the other. Just like there's nothing surprising about there being slightly more mutations in later generations than early generations given the null hypothesis (constant mutation rate). SJohnson 10:46, 9 March 2009 (EDT)
- SJohnson, not to say anything about your wife, but has she had a 400 level molecular genetics course (most general biology degrees do not cover the detail unless they are specialized)? If so, she would have mentioned that if the mutation passes to the offspring and is selectively beneficial to the population then it is a step of evolution as along as the conditions continue through the sharing of the mutation with the population and the environment is such that reduces the growth rate of the non-transformed population. While not all mutations are signs that evolution occurred the mutations that pass to offspring and provide a benefit compared to other offspring are very strong indicators. In the case of this paper the population that evolved the cit+ was able to metabolize a chemical in their environment which allowed for an adaptation advantage compared to the non-transformed colonies.--Able806 10:19, 11 March 2009 (EDT)
Let’s go back to the beginning. There appears to be confusion about the difference between test statistics and methods for computing p-values. As is noted at the beginning of the page [5], the fundamental problem with the paper is that it used a flawed test statistic, not that it used Monte Carlo methods to find the p-value for that flawed statistic.
Every hypothesis test uses a test statistic to reduce the data to a single number. The p-value for the test statistic can be calculated analytically (as I’ve done for the chi-square test statistic) or by Monte Carlo methods. In the paper, Monte Carlo methods were used to compute the p-value of the “mutation generation” test statistic. The key problem with the analysis from the paper is that it doesn’t work to use a weighted average to test for variations in mutation rate. This is like trying to use the sample variance to test for an increase in the mean in Gaussian-distributed data. A statistic should be selected based on the null and alternate hypothesis distributions of the data. The chi-square test (unlike the weighted average from the paper) is a reasonable choice for data that mutates at a constant rate under the null hypothesis, but mutates at varying rates under the alternate hypothesis.
Able806, you made a good point about the contingency table cell frequencies being relatively low, but were wrong when you said ”the data set is actually too small to use the chi square method correctly”. In the low cell frequency case the chi-square test is still effective, but the null hypothesis distribution of the chi-square statistic starts to look less like the chi-square distribution. Thus, p-values calculated using the chi-square distribution may be a bit off. However, Monte Carlo p-values are always imperfect as well because it's impossible to generate an infinite number of random realizations. There are imperfections in p-values generated by analytic and Monte Carlo methods. However, low cell frequencies does not explain the >20x and >2.5x differences between chi-square p-values and p-values from the paper for experiments one and three. The reason for those huge differences was the use of the flawed test statistic (“mutation generation”) in the paper. SJohnson 16:38, 5 March 2009 (EST)
- SJohnson, the chi-squared test is a valuable statistical tool, but the limitations of the test must be acknowledged. The chi-squared test can only produce valid results if the assumptions that underly the test are not violated. As an analogy, Newtonian models of motion fail to produce accurate results as velocities approach the speed of light; under those circumstances one must switch to a theory that accounts for relativistic effects.
- It seems that you have simply dismissed the widely-acknowledged fact that the chi-squared test is inappropriate for use in situations where n in any cell is less less than a threshold number. Different authors set different thresholds, but all are well above the numbers seen in your chi-squared analysis - even the most liberal guidelines advise against the chi-squared test when any expected cell frequency is less than one or more than 20% of the table cells are less than 5; others require that expected values in all cells must be more than 5. With smaller amounts of data, the test is insensitive and errs on the side of rejecting the hypothesis. If you attempt your chi-squared statistical analysis with a program that is more sophisticated than MS Excel (as I did), you get an error message indicating that the results are invalid due to low expected cell counts.
- That issue aside, there are other reasons that the chi-squared test is inappropriate here. As the links above point out, the categories tested must be truly independent; one example is that you can't use the chi-squared test to compare age and ability to kick a field goal by testing the same experimental group twice, one year apart; you have to test one group of age A and a different group of age B. In the case of the Blount paper, the categories are not independent. Even if there were adequate numbers to address the low-expected-frequency problem, this would make the chi-squared an invalid test in this case.
- There are other significant problems with the use of the chi-squared test in this circumstance, but they can wait until you address these first major problems.--ElyM 12:18, 11 March 2009 (EDT)
Misinterpretation of test
SJohnson, Your analysis misinterprets the test. You say the null hypothesis is that this mutation cannot happen. They saw a mutation (4 mutations, in fact, in the data set you show) so the null hypothesis (as you state is) is disproved. That's perfectly straightforward.
I don't know what the "mean mutation generation" test is but you're doing when you apply a chi-squared test to this dataset is to test if the mutations are evenly distributed throughout the generations. Your test says they are, so there's no strong evidence to suppose that mutations are likely to occur in one generation rather than another in the series of tests. Blount's test says thay aren't, so it's more likely that the mutation will occur later in the series of tests. I can't tell which test is right without knowing more about the test that Blount used.
But that point (the foregoing paragraph) has no bearing at all on the null hypothesis, as you describe it. The mutation appeared, so that means the hypothesis that the mutation can't happen is disproved. Very simple. FredFerguson 21:10, 8 March 2009 (EDT)
- I never said that “the null hypothesis is that this mutation cannot happen”. The chi-square test statistic I'm using wouldn’t be defined if the null hypothesis mutation rate was zero because the
term in the denominator of the statistic (see above equation) would be zero.
- The test statistic from the paper is the average of the generation numbers of observed mutations. For experiment one this number is
- The same number is shown in Table 2 of the paper. SJohnson 10:46, 9 March 2009 (EDT)
- SJohnson, the way you're calculating the chi-squared statistic implies that you're testing the null hypothesis of a constant mutation rate over time against an alternative hypothesis of a mutation rate which varies over time. FredFerguson 11:02, 9 March 2009 (EDT)




