statistics are hard
01 Aug 2009I am reading Freakonomics for a book club thing.. A minor quibble with one part. There’s a section where they are looking for evidence of racial discrimination among dating websites by comparing people that list “no racial preference” and their actual email choices:
The white men who said that race didn’t matter sent 90 percent of their e-mail queries to white women. The white women who said race didn’t matter sent about 97 percent of their e-mail queries to white men.
Is it possible that race really didn’t matter for these white women and men and that they simply never happened to browse a nonwhite date that interested them?
The above condemning conclusion-as-a-question doesn’t necessarily follow from the presented numbers. It’s meaningless without the race breakdown of the dating websites in question. That is: this large skew represents foremost the fact that … most people on the dating websites in question are probably white. I don’t have that data (they probably didn’t either). But at worst, we can assume the dating website demographics match the US: which has around 74% white people – thus indicating that white men chose white women 90% of the time from a pool that is already 75% white. (technically I should look up the demographics for race + sex, but I am too lazy). This still indicates a preference, but doesn’t justify saying that they “never happened to browse a nonwhite date that interested them”. It merely indicates a 14% difference of preference. If you really wanted to find hypocrisy in someone saying that they have no racial preference, you’d have to find a way to isolate the data such that the choices between white/non-white were 50/50 every time.
Like I said, it’s a minor quibble.. This happens to me a lot when I read pop science.. Maybe I am too comfortable reading annoyingly annotated sociology texts, but this sort of thing irks me.. Now that I’ve found them skirting past statistical subtleties to make a point, I feel like I need to read their book even more carefully – like I would a more academic text – except without the benefit of sourced data. It’s also somewhat ironic, in that this snippet comes directly after a section discussing “experts” using information asymmetry to their advantage.