![]() Then somebody will ask "How big is K?" and I'll say, "K=100,000" and they'll dismiss the correlations as merely a result of "data dredging".īut if I'm making far fewer comparisons (say K=10) then people will be less likely to level that charge against my findings. Then for each pair of columns of the data I compute some association measure like correlation, and for every pair whose association measure exceeds some threshold I declare, "These fields are associated-something's going on here!" This leads to a spurious excess of false-positive and statistically significant results. Data dredging is the cherry-picking of multiple statistical tests on a data set to demonstrate a promising or attractive finding. Suppose I have a dataset of N datapoints that are each a vector of K values. Wikis > Research > Statistics > Data Dredging. example, the given relationship is the maximum correlation. This question is going to be a bit more vague than I usually ask on StackExchange sites, but it keeps coming up in my life so I'm going to ask it. entitled Data Snooping, Dredging and Fishing: The Dark Side of.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |