# Difference between revisions of "Statistic"

Line 1: | Line 1: | ||

− | A '''statistic''' is a function of a distributed variable. | + | A '''statistic''' is a function of a distributed variable. Notionally, it is a calculation made on the basis of a set numbers typically derived as a sample from some presumed underlying [[probability distribution]], and usually used in order to estimate something about the distribution from which the sample is taken. The use of a '''statistic''' to characterize a set of observations is generally justified on the basis of its [[asymptote|asymptotic]] behavior, that is, a given '''statistic''' accurately characterizes the underlying phenomena only probabilistically (this consideration is the genesis of [[confidence interval]]s in [[classical statistics]]) and is considered to be accurate only in the limit as the number of observations increases without bounds. It should be noted however that the use of [[confidence interval]]s is somewhat problematic since their calculations are based on certain presumptions about the nature of the underlying true distribution, which may or may not prove to be good. |

For example, suppose a [[random sample]] of three children is chosen from a particular class, and their heights measured as 1.42cm., 1.54cm., and 1.48cm; then the [[arithmetic mean]] of these heights is 1.48cm. We might then go on to use this value of 1.48cm to represent the [[average]] height of a child in that class. | For example, suppose a [[random sample]] of three children is chosen from a particular class, and their heights measured as 1.42cm., 1.54cm., and 1.48cm; then the [[arithmetic mean]] of these heights is 1.48cm. We might then go on to use this value of 1.48cm to represent the [[average]] height of a child in that class. |

## Revision as of 15:02, 13 February 2013

A **statistic** is a function of a distributed variable. Notionally, it is a calculation made on the basis of a set numbers typically derived as a sample from some presumed underlying probability distribution, and usually used in order to estimate something about the distribution from which the sample is taken. The use of a **statistic** to characterize a set of observations is generally justified on the basis of its asymptotic behavior, that is, a given **statistic** accurately characterizes the underlying phenomena only probabilistically (this consideration is the genesis of confidence intervals in classical statistics) and is considered to be accurate only in the limit as the number of observations increases without bounds. It should be noted however that the use of confidence intervals is somewhat problematic since their calculations are based on certain presumptions about the nature of the underlying true distribution, which may or may not prove to be good.

For example, suppose a random sample of three children is chosen from a particular class, and their heights measured as 1.42cm., 1.54cm., and 1.48cm; then the arithmetic mean of these heights is 1.48cm. We might then go on to use this value of 1.48cm to represent the average height of a child in that class.

Clearly the validity and reliability of such estimations will depend enormously on a range of factors such as the type of distributions, the number in the sample, and on sampling methods used.

### Formal Definition:

Let X_{1}, X_{2}, X_{3}, ...., X_{n} be a random sample of size n from some distribution. A statistic calculated on the sample is defined to be any function of the set of values X_{1}, X_{2}, X_{3}, ...., X_{n}, involving no unknown quantities ^{[1]}

The point of this definition is to ensure that the process results in an actual numerical value, rather than a formula involving variables.

### Examples of Statistics:

- Arithmetic mean
- Median
- Standard deviation
- Pearson's measure of skewness
*= 3*(mean - median)/standard deviation*

### References

- ↑ Francis, A. (2005) Advanced Level Statistics, Stanley Thornes