Though data-driven academics often insist that the heart of science is raw numbers, the truth is that people are influenced more by the presentation than by actual data.
By Christopher Koski
Course: Language, Gender, and Sexuality (Ling 2400)
Advisor: Prof. Kira Hall, TA Ayden Parish
LURA 2018
Such is the art of statistical analysis: Two people with two different motivations can take two identical data sets and produce facts and figures fitting two entirely different narratives. Here鈥檚 a personal favorite of mine, courtesy of Reuters. I generally consider Reuters an alright news source, but take a look at this graph:
All of the data points in this graph are correct, at least according to the Florida Department of Law Enforcement. The presentation of the data, however, is misleading. This graph seems to show a slowly increasing number of gun deaths right up until 2005, where Florida鈥檚 new Stand Your Ground law coincides with a drastic decline in firearm-related fatalities. (This law basically states that you can use lethal force when you are in a situation wherein you believe you are in danger of death or serious injury.) However, the 鈥渄ecline鈥 indicated in the graph is the exact opposite of what the data actually show, which is a drastic increase in the numbers of murders committed using firearms after 2005. The author, C. Chan, apparently decided to display the graph vertically mirrored, with the values on the Y-Axis increasing as you go down, rather than up. I believe that this representation was intentional.
Unfortunately, most presentation problems are less easy to spot than this example. The one I want to talk about today is a commonly cited statistic in research on language, gender, and sexuality. In fact, Penny Eckert and McConnell-Ginet, the authors of a textbook that we used in the course Language, Gender, and Sexuality (Ling 2400) at 精品SM在线影片, also cite this statistic: About half of all men and half of all women are between the height of 5鈥 4鈥 and 5鈥 10鈥. Eckert and McConnell-Ginet use this figure to question why we do not see more heterosexual relationships in which the man is shorter than the woman. Given the statistic, with half of men and women between 5鈥4鈥 and 5鈥10鈥, it seems like we should see more couples where the woman is taller.
Again, the data I have available to me (sourced from the CDC) corroborates this claim. The problem is that those numbers mean a whole lot less than it looks like they do.
The problem is that the statistic itself is vague at best. Note that there isn鈥檛 actually any overlap promised by this statement: the only two things that need to be true are that half of all men are between 5鈥 4鈥 and 5鈥 10鈥 and half of all women fulfill the same condition. Let鈥檚 look at some possibilities for height values between men (blue) and women (red), based solely on the fact that half of each must be between 5鈥 4鈥 and 5鈥 10鈥:
This graph is not representative of reality, but is a feasible distribution of height if our only consideration is placing half of all men and women between 5鈥 4鈥 and 5鈥 10鈥 (or 64 and 70 inches, respectively). In this graph, there is basically no overlap between heights, despite our height constraint. The odds of a relationship where the woman is taller are more or less negligible on this graph. Here鈥檚 another possibility that works within the height rule:
This graph also fits the height constraint, but this time the overlap includes just about everyone. If this graph was a truthful representation of height distribution, than we would expect the proportion of relationships where the woman is taller to be around 40%!
Neither of these graphs are accurate representations of height distribution, but the fact that there is so much room for error in the 5鈥 4鈥 - 5鈥 10鈥 rule means that any claims we make based off of that statistic alone are difficult to justify. Based on the CDC鈥檚 data on height distribution, here is a more accurate graph:
听
The presentation suggests that there is a good amount of overlap here, but looks can be deceiving: The actual probability of a randomly selected male-female pair having a taller woman is only around 7%, based on a simulation of 40,000 random pairings. This doesn鈥檛 even take into account that people tend to date within their demographic, and the overlap within each community is even smaller. (I鈥檒l spare you the graphs for that, but they鈥檙e easy enough to generate from the attached dataset.)
So what鈥檚 the point of all this? I don鈥檛 actually disagree with the assertion that people select for height when dating: A study by Yancey and Emerson (2014) notes that 37% of sampled men say that they will only date women shorter than them, while 55% of sampled women say that they will only date men taller than them. I am also pretty sure that Eckert and McConnell-Ginet aren鈥檛 trying to mislead anybody with their statement; this figure has been bandied about in many gender studies over the years, so it鈥檚 most likely just a part of the lexicon now, or a part of the 鈥渉all of mirrors,鈥 as Eckert and McConnell-Ginet would say.
I do, however, think that honesty is an important part of academia, and this includes providing people with a complete picture of the truth. When we write an academic article, textbook, or any other sort of scientific journalism, we have a duty to produce a precise, fact-based account of the claims we make. We live in an age where an enormous amount of data is generated, interpreted, and presented every day, and there are people who will use their position as statistical purveyors to mislead the masses.
When you see a strange or vague claim, try to dig a bit deeper; you may be surprised at what you find.
References
Yancey, G., & Emerson, M. O. (2014). Does Height Matter? An Examination of Height Preferences in Romantic Coupling. Journal of Family Issues, 37(1), 53-73. doi:10.1177/0192513x13519256
Anthropometric Reference Data for Children and Adults: United States, 2007鈥2010. (2012, October). Retrieved from
Eckert, P., & McConnell-Ginet, S. (2013). Language and gender. Cambridge, England: Cambridge University Press.
听
听