Statistics
Statistics is the application of mathematics to the understanding of data. It involves all stages of data collection and processing from the initial collection, to the analysis and ultimately to the conclusions and interpretations of the data. It is used in all research oriented disciplines from physics, chemistry and biology to economics, anthropology and psychology as well as many thousands of other fields. It is also used in businesses and governments.
Statistics analyzes data in two primary ways, the first is called descriptive statistics which describes and summarizes the data. Often this will include things like: the mean, standard error, or standard deviation. Also statistics can attempt to infer relationships between the data collected and various hypothesis or populations, this is called inferential statistics. Both descriptive and inferential statistics comprise applied statistics. There is also a discipline called mathematical statistics, which is concerned with the theoretical basis of the subject.
Statistics takes its name from the fact that it was traditionally taught to monarchs to enable them to manage affairs of state.
Contents
Frequentist Approaches
Frequentist approaches are often referred to as classical approaches because it is the oldest and most used method of statistical analysis. The heart of this approach is to try and understand data as a relative frequency or ratio of a particular occurrence out of a total possible number of occurrences. For example, a frequentist would describe the number of times a coin turns up heads as a ratio of total number of heads out of total number of flips.
Descriptive statistics
Frequentist approahces to descriptive statistics mostly involve averaging. For example, the mean of a sample is calculated as the total value of all observations divided by total number of trials, and the standard error is calculated by taking the total error size for all samples and dividing by total number of trials.
These methods stem from the view of data as ratios probabilities.
Inferential statistics
Frequentist approaches to inferential statistics primarily involve trying to compare descriptive statistics of two data sets to determine if they are significantly different. One of the most common approaches is to test a given data set against a null hypothesis or the data set that would be created if the values were the result of random chance alone. For example, if a given head came up 9 times as heads and 1 time as tails you would compare the number of heads, 9, to the number of heads that would be expected if chance alone was operating, or 5.
Testing against the null hypothesis is sometimes referred to as an omnibus test since it is testing the idea that a given data set is the result of anything other than chance. Often it is much more desirable to test specific data sets against each other.
Bayesian Approaches
Bayesian statistics is a method of applying Bayes equation to data analysis. One of the biggest difference between Bayesian approaches and frequentist approaches is that Bayesians attempt to determine the probability that a given hypothesis is true given the data, while frequentist attempt to define the probability of getting the data given that a particular hypothesis is true.
Bayesian approaches are becoming more and more popular in science because what most people are interested in is the probability of proposed hypothesis no the probability of the data. However, Bayesian methods have come under fire from many frequentist proponents. There is actually a very heated debate in statistical circles about the respective validity of both methods. The primary complaint leveled at Bayesian statistics is that it must use a prior probability of a hypothesis in its analysis. Often this prior is not known out right and assigned seemingly arbitrary values based on particular distributions such as the uniform distribution or beta distribution.
Descriptive statistics
Bayesian methods all use Bayes equation, this applies for both descriptive and inferential statistics. To find such things as the mean and standard deviation first a prior probability for all means and standard deviations must be assigned. In practice this usually means assigning uniform probabilities to values equally spaced between what we think is the minimum and maximum values for the statistic we are interested in (the number of values depends on the grid density, which is proportional to accuracy and inversely proportional to computation time). Then a likelihood of each value is then calculated based on the data and then Bayes equation is used to assign a posterior probability for each value. These posterior probabilities can be plotted as a probability density function (PDF) to see the various probabilites for the value given the data, or often simply the value with the highest posterior probability is simply chosen.
Inferential statistics
Inferential statistics in Bayesian methods looks much the same as descriptive statistics since both use the Bayes equation and the same basic approach. To compare to means you would calculate the PDF for each data set then subtract them from each other to figure out the probability that they differ.
In order to compare hypothesis Bayesian model selection is often used. This is when each hypothesis you want to test is assigned a prior probability, and then the likelihood of the data given each hypothesis being test is calculated. You can then us Bayes equation to determine the relative probabilities that each hypothesis is correct. This method is almost always testing relative probabilites since to calculate and absolute probability would require knowing every possible hypothesis. Usually this is not possible, but sometimes the subset is finite enough it can be tested.
Because of the large number of calculations needed for model selection Bayesian approaches have only became practical and popular with the advent of computers. But even with the most modern computers available many Bayesian models remain computational intractable. Recent developments in applying Markov chain Monte Carlo methods to these problems have lead to promising results.