Difference between revisions of "Statistics"

Revision as of 18:31, May 17, 2007

Statistics

Major approaches
Frequency probability
Bayesian inference
Non-parametric statistics
Common methods
Analysis of variance
Chi-Square test
Students t-test
Z test
Linear regression
Bayesian model selection
Bootstrapping

Statistics is the application of mathematics to the understanding of data. It involves all stages of data collection and processing from the initial collection, to the analysis and ultimately to the conclusions and interpretations of the data. It is used in all research oriented disciplines from physics, chemistry and biology to economics, anthropology and psychology as well as many thousands of other fields. It is also used in businesses and governments.

Statistics analyzes data in two primary ways, the first is called descriptive statistics which describes and summarizes the data. Often this will include things like: the mean, standard error, or standard deviation each of these is an example of a statistic. Also statistics can attempt to infer relationships between the data collected and various hypothesis or populations, this is called inferential statistics. Both descriptive and inferential statistics comprise applied statistics. There is also a discipline called mathematical statistics, which is concerned with the theoretical basis of the subject.

Statistics takes its name from the fact that it was traditionally taught to monarchs to enable them to manage affairs of state.

Frequentist Approaches

Frequentist approaches are often referred to as classical approaches because it is the oldest and most used method of statistical analysis. The heart of this approach is to try and understand data as a relative frequency or ratio of a particular occurrence out of a total possible number of occurrences. For example, a frequentist would describe the number of times a coin turns up heads as a ratio of total number of heads out of total number of flips.

Descriptive statistics

Frequentist approahces to descriptive statistics mostly involve averaging. For example, the mean of a sample is calculated as the total value of all observations divided by total number of trials, and the standard error is calculated by taking the total error size for all samples and dividing by total number of trials.

These methods stem from the view of data as ratios probabilities.

Inferential statistics

Frequentist approaches to inferential statistics primarily involve trying to compare descriptive statistics of two data sets to determine if they are significantly different. One of the most common approaches is to test a given data set against a null hypothesis or the data set that would be created if the values were the result of random chance alone. For example, if a given head came up 9 times as heads and 1 time as tails you would compare the number of heads, 9, to the number of heads that would be expected if chance alone was operating, or 5.

Testing against the null hypothesis is sometimes referred to as an omnibus test since it is testing the idea that a given data set is the result of anything other than chance. Often it is much more desirable to test specific data sets against each other.

Common frequentist methods

Pearson product-moment correlation coefficient

Bayesian Approaches

Bayesian statistics is a method of applying Bayes theorem to data analysis. One of the biggest difference between Bayesian approaches and frequentist approaches is that Bayesians attempt to determine the probability that a given hypothesis is true given the data, while frequentist attempt to define the probability of getting the data given that a particular hypothesis is true.

Bayesian approaches are becoming more and more popular in science because what most people are interested in is the probability of the proposed hypothesis, not the probability of the data. It also does not need to make prior assumptions about the data such as normality and homogeneity of variance. However, Bayesian methods have come under fire from many frequentist proponents. This has led to very heated debate in statistical circles, though this has largely died now, about the respective validity of both methods. The primary complaint leveled at Bayesian statistics is that it must use a prior probability of a hypothesis in its analysis. This prior is intended to build contextual information into the analysis, but it may be seen by its critics as subjective or arbitrary. Commonly used prior distributions include the uniform distribution and beta distribution.

Descriptive statistics

Bayesian methods all use Bayes' equation, this applies for both descriptive and inferential statistics. To find such things as the mean and standard deviation first a prior probability for all means and standard deviations must be assigned. In practice this usually means assigning uniform probabilities to values equally spaced between what we think is the minimum and maximum values for the statistic we are interested in (the number of values depends on the grid density, which is proportional to accuracy and inversely proportional to computation time). Then a likelihood of each value is then calculated based on the data and then Bayes equation is used to assign a posterior probability for each value. These posterior probabilities can be plotted as a probability density function (PDF) to see the various probabilites for the value given the data, or often simply the value with the highest posterior probability is simply chosen.

Inferential statistics

Inferential statistics in Bayesian methods looks much the same as descriptive statistics since both use the Bayes equation and the same basic approach. To compare to means you would calculate the PDF for each data set then subtract them from each other to figure out the probability that they differ.

In order to compare hypothesis Bayesian model selection is often used. This is when each hypothesis you want to test is assigned a prior probability, and then the likelihood of the data given each hypothesis being test is calculated. You can then us Bayes equation to determine the relative probabilities that each hypothesis is correct. This method is almost always testing relative probabilites since to calculate an absolute probability would require knowing every possible hypothesis. Usually this is not possible, but sometimes the subset is finite enough it can be tested.

Because of the large number of calculations needed for model selection Bayesian approaches have only became practical and popular with the advent of computers. But even with the most modern computers available many Bayesian models remain computational intractable. Recent developments in applying Markov chain Monte Carlo methods to these problems have led to promising results.

Non-parametric and Bootstrapping methods

One of the greatest problems in frequentist approaches to statistics is that it often relies on making prior assumptions about how the data looks and was collected. Most commonly the data must be a normal distribution and have homogeneity of variance. Different statistical methods are more or less robust to violations of these assumptions, and some techniques have attempted to avoid them all together. The end result though is usually a significant loss of power and increased likelihood of error.

Non-parametric statistics are any one of many methods that attempt to define descriptive characteristics or make inferential claims with out the need of tightly confined parameters. The main goals is to try and eliminate the need for assumptions without sacrificing power and accuracy.

Bootstrapping statistics is a particularly popular non-parametric approach. Bootstrapping is computationally costly and has only recently become feasible for most data sets. It involves [sampling with replacement]] from the given data set perhaps as many as 100,000 times in order to determine mean, error, best fits and comparisons of data sets.

@@ Line 43: / Line 43: @@
 ==Bayesian Approaches==
-[[Bayesian inference | Bayesian statistics]] is a method of applying [[Bayes equation]] to data analysis. One of the biggest difference between Bayesian approaches and frequentist approaches is that Bayesians attempt to determine the probability that a given hypothesis is true given the data, while frequentist attempt to define the probability of getting the data given that a particular hypothesis is true.
+[[Bayesian inference | Bayesian statistics]] is a method of applying [[Bayes theorem]] to data analysis. One of the biggest difference between Bayesian approaches and frequentist approaches is that Bayesians attempt to determine the probability that a given hypothesis is true given the data, while frequentist attempt to define the probability of getting the data given that a particular hypothesis is true.
 Bayesian approaches are becoming more and more popular in science because what most people are interested in is the probability of the proposed hypothesis, not the probability of the data. It also does not need to make prior assumptions about the data such as [[Normal distribution | normality]] and [[homogeneity of variance]]. However, Bayesian methods have come under fire from many frequentist proponents. This has led to very heated debate in statistical circles, though this has largely died now, about the respective validity of both methods. The primary complaint leveled at Bayesian statistics is that it must use a [[prior probability]] of a hypothesis in its analysis. This prior is intended to build contextual information into the analysis, but it may be seen by its critics as subjective or arbitrary.  Commonly used prior distributions include the [[uniform distribution]] and [[beta distribution]].

Difference between revisions of "Statistics"

Revision as of 18:31, May 17, 2007

Contents

Frequentist Approaches

Descriptive statistics

Inferential statistics

Common frequentist methods

Bayesian Approaches

Descriptive statistics

Inferential statistics

Non-parametric and Bootstrapping methods

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Popular Links

donate

Edit Console