# Difference between revisions of "Statistic"

m (wikify) |
DavidB4-bot (Talk | contribs) (Spelling, grammar, and general fixes) |
||

(5 intermediate revisions by 4 users not shown) | |||

Line 1: | Line 1: | ||

− | A '''statistic''' is a calculation made on the basis of a set numbers derived as a sample from some [[probability distribution]], and usually used in order to estimate something about the distribution from which the sample is taken. | + | A '''statistic''' is a function of a distributed variable. Notionally, it is a calculation made on the basis of a set numbers typically derived as a sample from some presumed underlying [[probability distribution]], and usually used in order to estimate something about the distribution from which the sample is taken. The use of a '''statistic''' to characterize a set of observations is generally justified on the basis of its [[asymptote|asymptotic]] behavior, that is, a given '''statistic''' accurately characterizes the underlying phenomena only probabilistically (this consideration is the genesis of [[confidence interval]]s in [[classical statistics]]) and is considered to be accurate only in the limit as the number of observations increases without bounds. It should be noted however that the use of [[confidence interval]]s is somewhat problematic since their calculations are based on certain presumptions about the nature of the underlying true distribution, which may or may not prove to be good. |

− | For example, suppose a [[random sample]] of three children is chosen from a particular class, and their heights measured as 1. | + | For example, suppose a [[random sample]] of three children is chosen from a particular class, and their heights measured as 1.42 cm., 1.54 cm., and 1.48 cm; then the [[arithmetic mean]] of these heights is 1.48 cm. We might then go on to use this value of 1.48 cm to represent the [[average]] height of a child in that class. |

Clearly the [[validity]] and [[reliability]] of such [[estimation]]s will depend enormously on a range of factors such as the type of distributions, the number in the sample, and on sampling methods used. | Clearly the [[validity]] and [[reliability]] of such [[estimation]]s will depend enormously on a range of factors such as the type of distributions, the number in the sample, and on sampling methods used. | ||

− | + | ==Formal Definition== | |

− | Let X<sub>1</sub>, X<sub>2</sub>, X<sub>3</sub>, ...., X<sub>n</sub> be a random sample of size n from some distribution. A | + | Let X<sub>1</sub>, X<sub>2</sub>, X<sub>3</sub>, ...., X<sub>n</sub> be a random sample of size n from some distribution. A statistic calculated on the sample is defined to be any [[function]] of the set of values X<sub>1</sub>, X<sub>2</sub>, X<sub>3</sub>, ...., X<sub>n</sub>, involving no unknown quantities <ref>Francis, A. (2005) Advanced Level Statistics, Stanley Thornes</ref> |

The point of this definition is to ensure that the process results in an actual numerical value, rather than a formula involving variables. | The point of this definition is to ensure that the process results in an actual numerical value, rather than a formula involving variables. | ||

− | + | ==Examples of Statistics== | |

* [[Arithmetic mean]] | * [[Arithmetic mean]] | ||

Line 18: | Line 18: | ||

* [[Pearson's measure of skewness]] '' = 3*(mean - median)/standard deviation | * [[Pearson's measure of skewness]] '' = 3*(mean - median)/standard deviation | ||

− | + | ==References== | |

<references/> | <references/> |

## Latest revision as of 09:05, 24 June 2016

A **statistic** is a function of a distributed variable. Notionally, it is a calculation made on the basis of a set numbers typically derived as a sample from some presumed underlying probability distribution, and usually used in order to estimate something about the distribution from which the sample is taken. The use of a **statistic** to characterize a set of observations is generally justified on the basis of its asymptotic behavior, that is, a given **statistic** accurately characterizes the underlying phenomena only probabilistically (this consideration is the genesis of confidence intervals in classical statistics) and is considered to be accurate only in the limit as the number of observations increases without bounds. It should be noted however that the use of confidence intervals is somewhat problematic since their calculations are based on certain presumptions about the nature of the underlying true distribution, which may or may not prove to be good.

For example, suppose a random sample of three children is chosen from a particular class, and their heights measured as 1.42 cm., 1.54 cm., and 1.48 cm; then the arithmetic mean of these heights is 1.48 cm. We might then go on to use this value of 1.48 cm to represent the average height of a child in that class.

Clearly the validity and reliability of such estimations will depend enormously on a range of factors such as the type of distributions, the number in the sample, and on sampling methods used.

## Formal Definition

Let X_{1}, X_{2}, X_{3}, ...., X_{n} be a random sample of size n from some distribution. A statistic calculated on the sample is defined to be any function of the set of values X_{1}, X_{2}, X_{3}, ...., X_{n}, involving no unknown quantities ^{[1]}

The point of this definition is to ensure that the process results in an actual numerical value, rather than a formula involving variables.

## Examples of Statistics

- Arithmetic mean
- Median
- Standard deviation
- Pearson's measure of skewness
*= 3*(mean - median)/standard deviation*

## References

- ↑ Francis, A. (2005) Advanced Level Statistics, Stanley Thornes