Difference between revisions of "Hot-Deck Imputation"

From Conservapedia
Jump to: navigation, search
Line 1: Line 1:
'''Hot-Deck Imputation''' replaces missing data with comparable data from the same set.  "Hot-deck imputation is a means of imputing data, using the data from other observations in the sample at hand." <ref>http://analytics.ncsu.edu/sesug/1999/075.pdf</ref>  For example, suppose census officials were unable to count the number of people in a given house and decided to fill in the missing data using hot-deck imputation.  They would use the data from a similar house in the same area, and substitute the number of people in that house for the missing data.
+
'''Hot-Deck Imputation''' replaces missing data with comparable data from the same set.  "Hot-deck imputation is a means of imputing data, using the data from other observations in the sample at hand." <ref>http://analytics.ncsu.edu/sesug/1999/075.pdf</ref>  For example, suppose census officials were unable to count the number of people in a given house and decided to fill in the missing data using hot-deck imputation.  They would use the data from a similar house in the same area, and substitute the number of people in that house for the missing data. Hot-deck imputation is one of the most widely used imputation methods. <ref>http://nces.ed.gov/StatProg/2002/appendixb3.asp</ref>
  
 
This paper discusses various types of imputation as well as their benefits and problems: http://nces.ed.gov/StatProg/2002/appendixb3.asp
 
This paper discusses various types of imputation as well as their benefits and problems: http://nces.ed.gov/StatProg/2002/appendixb3.asp
Line 6: Line 6:
  
 
All methods of imputation are less-than-ideal.  "Kalton and Kasprzyk, 1982...cautioned that imputation methods do not necessarily lead to a reduction in bias, relative to the incomplete data set. And, they warned against the danger of analysts treating the "complete" cases as actual responses, thus overstating the precision of the survey estimates." <ref>http://nces.ed.gov/StatProg/2002/appendixb3.asp</ref>
 
All methods of imputation are less-than-ideal.  "Kalton and Kasprzyk, 1982...cautioned that imputation methods do not necessarily lead to a reduction in bias, relative to the incomplete data set. And, they warned against the danger of analysts treating the "complete" cases as actual responses, thus overstating the precision of the survey estimates." <ref>http://nces.ed.gov/StatProg/2002/appendixb3.asp</ref>
 +
 +
"One problem occurs with [hot-deck imputation] when several records with missing values occur together on the file. This results in the current donor value being assigned to multiple records, thus leading to a lack of precision in the survey estimates (Kalton and Kasprzyk, 1986)."<ref>http://nces.ed.gov/StatProg/2002/appendixb3.asp</ref>  The method by which the donor value is selected could easily skew results, particularly in areas where there are typically many blank values, such as densely populated cities.  If the population of these areas (which tend to be democratic) were exaggerated, redistricting could assign more Representatives to their state.  Ultimately, this would result in more liberal politicians in Congress.
 +
 +
Finally, the results of hot-deck imputations must be treated carefully.  These data sets appear to be complete, but in reality, they are not.  "Regardless of the specifics, all hot-deck procedures take imputed values from a respondent in the same data file, thus yielding imputations that are valid, although not necessarily internally consistent for the respondent values. In order to evaluate the hot-deck imputation used for any specific data collection, detailed information is required." <ref>http://nces.ed.gov/StatProg/2002/appendixb3.asp</ref>
  
 
==Imputation in the United States Census==
 
==Imputation in the United States Census==

Revision as of 14:30, May 21, 2009

Hot-Deck Imputation replaces missing data with comparable data from the same set. "Hot-deck imputation is a means of imputing data, using the data from other observations in the sample at hand." [1] For example, suppose census officials were unable to count the number of people in a given house and decided to fill in the missing data using hot-deck imputation. They would use the data from a similar house in the same area, and substitute the number of people in that house for the missing data. Hot-deck imputation is one of the most widely used imputation methods. [2]

This paper discusses various types of imputation as well as their benefits and problems: http://nces.ed.gov/StatProg/2002/appendixb3.asp

The Flaws of Imputation

All methods of imputation are less-than-ideal. "Kalton and Kasprzyk, 1982...cautioned that imputation methods do not necessarily lead to a reduction in bias, relative to the incomplete data set. And, they warned against the danger of analysts treating the "complete" cases as actual responses, thus overstating the precision of the survey estimates." [3]

"One problem occurs with [hot-deck imputation] when several records with missing values occur together on the file. This results in the current donor value being assigned to multiple records, thus leading to a lack of precision in the survey estimates (Kalton and Kasprzyk, 1986)."[4] The method by which the donor value is selected could easily skew results, particularly in areas where there are typically many blank values, such as densely populated cities. If the population of these areas (which tend to be democratic) were exaggerated, redistricting could assign more Representatives to their state. Ultimately, this would result in more liberal politicians in Congress.

Finally, the results of hot-deck imputations must be treated carefully. These data sets appear to be complete, but in reality, they are not. "Regardless of the specifics, all hot-deck procedures take imputed values from a respondent in the same data file, thus yielding imputations that are valid, although not necessarily internally consistent for the respondent values. In order to evaluate the hot-deck imputation used for any specific data collection, detailed information is required." [5]

Imputation in the United States Census

"Although imputation was used in the 1940 and 1950 censuses to determine characteristics of the population, it was not used to determine the actual population count for apportionment purposes until 1960." [6]

Hot-Deck Imputation may be used in the 2010 census. It is controversial for two reasons. First, the Constitution requires Actual Enumeration. Second, this type of imputation could easily be used to gerrymander districts in a fashion which will favor Obama's reelection.

This paper from the U.S. Bureau of the Census describes the Hot-Deck method in detail: http://analytics.ncsu.edu/sesug/1999/075.pdf

In surveying adults on probation, the Census Bureau follows six requirements:

1. We impute age, race and gender independently.
2. For race, we first try to base imputation on other data (e.g., ethnicity) for the same person.
3. We impute all remaining missing values using only "good" unimputed) data for another person in the same group, or ctrlnum.
4. We use each "good" data value only once for imputation.
5. We work backwards over the ctrlnum; then, if necessary, we work forwards.
6. If no "good" data can be used from within the ctrlnum, we assign some type of "out of range" value.

Utah v. Evans

Utah v. Evans, 182 F. Supp. 2d 1165 (2001) is the leading court case about the use of Hot-Deck Imputation in the census. After the use of Hot-Deck techniques in the 2000 census, one of Utah's congressional seats was appointed to North Carolina. Utah challenged this, on the grounds that Hot-Deck Imputation is a form of sampling, and "violated various statutory provisions and the Constitution."

The District Court for the District of Utah, Central Division stated "We begin by noting that section 195 does not preclude the Census Bureau from the use of every type of statistical methodology in arriving at apportionment figures during a decennial census. Instead, it prohibits only "the use of the statistical method known as 'sampling.'"

"[H]ot deck imputation is not sampling. Sampling is the selection of a subset of units from a larger population in such a way that each unit of the population has a known chance of selection. Sampling is used where a scientifically selected set of units can be used to represent the entire population from which they are drawn."

The Court concluded "that the Constitution does not prohibit the use of narrowly tailored statistical methodologies, such as hot deck imputation, for the purpose of improving the accuracy of the decennial census and furthering "the constitutional goal of equal representation."

Problems with Hot-Deck Enumeration

Quoting Article I of the Constitution, the Utah v. Evans court said, "The final part of the sentence says that the “actual Enumeration” shall take place “in such Manner as” Congress itself “shall by Law direct,” thereby suggesting the breadth of congressional methodological authority, rather than its limitation." The text of the constitution and the courts interpretation of that text both state clearly that the census is the responsibility of Congress. But President Obama is having the White House run the 2010 census. This is an unconstitutional, dangerous violation of Separation of Powers.

The Possibility of Manipulation

In addressing the potential of manipulation of Hot-Deck Imputation, the court said, "The Court need not decide here the precise methodological limits foreseen by the Census Clause. It need say only that in this instance, where all efforts have been made to reach every household, where the methods used consist not of statistical sampling but of inference, where that inference involves a tiny percent of the population, where the alternative is to make a far less accurate assessment of the population, and where consequently manipulation of the method is highly unlikely, those limits are not exceeded."

This statement seems to contradict itself. If hot-deck imputation is used for such a "tiny percent of the population", how can it's omission cause a "far less accurate assessment of the population"? In order to skew the results that much, hot-deck imputation would have to be used for much more than a "tiny percent". The court is exaggerating the impact of hot-deck imputation. If used only when necessary, the data collected from hot-deck imputation may be negligible.

As the court says above, hot-deck imputation is valuable only when used out of absolute necessity. The problem is, there is no way to make sure that its use is saved for situations which truly require it. The lack of laws about the use of hot-deck imputation opens the opportunity for its misuse and manipluation.

The court also said, "Utah has not claimed that the Bureau has used imputation to manipulate results. It has not explained how census-taking that fills in ultimate blanks through imputation is more susceptible to manipulation than census-taking that fills in ultimate blanks with a zero." But when one fills in blanks with zero, there is only one choice, and no opportunity for bias. Imputation uses a number from the same set, but with so many numbers to choose one, there is much room for error or manipulation.

References

  1. http://analytics.ncsu.edu/sesug/1999/075.pdf
  2. http://nces.ed.gov/StatProg/2002/appendixb3.asp
  3. http://nces.ed.gov/StatProg/2002/appendixb3.asp
  4. http://nces.ed.gov/StatProg/2002/appendixb3.asp
  5. http://nces.ed.gov/StatProg/2002/appendixb3.asp
  6. Utah v. Evans, 182 F. Supp. 2d 1165 (2001)