Making sense with numbers
I've been reading the Knight Foundation Magic Of Music Final Report, as mentioned recently on Drew McManus' Adaptistration. On page 32, the report says
In trying to profile the factors that might predict a ticket buyer, one statistic stood out: 74 percent of them had played an instrument or sung in a chorus at some time in their lives.
They conclude that participatory education programs (getting children to play instruments or sing) "were strongly correlated with later concert attendance" (p. 33).
At first glance, that's quite impressive. Does that say all you have to do is get someone to play the violin in school and you've got a 74% chance of having created a concert-goer twenty years in the future?
I don't think so. Let's think about what that number means.
Before we get started, I must say that I'm certainly not trying to pick on the Knight Foundation or Dr. Thomas Wolf, the author of the report. The report didn't say what I inferred two paragraphs back; it just stated a number and concluded there was a strong relationship. I'm writing this simply because it's an example of a type of statistic I see quoted from time to time, in which the number measured and quoted may not be the number you want to know. I certainly welcome anyone who can help me refine or correct my thinking.
To help you judge my reasoning, I'll walk through it from top to bottom. Please comment, if you see places I went astray.
The report seems to say that the probability of someone having played an instrument or sung in a group, given that they were a ticket purchaser, was 0.74:
I = the incidence of having played an instrument
or sung in a chorus
T = the incidence of being a ticket buyer
Pr(I|T) = 0.74
That's likely not what you really care about, though. It would be much more interesting to know the probability of a person being a ticket buyer, given that they played an instrument or sang in a group:
Pr(T|I) = ?
By Bayes Theorem,
Pr(T|I) = -------------
It's not clear what the report means by all its terms. For example, are ticket buyers just the ones who made purchases, or does it include those who came along with the purchaser? Even if I knew their definitions, I don't have their data set to duplicate the calculations they may have made.
Ignoring all that uncertainty for a moment, I checked out the arts section of the Statistical Abstract of the United States, as the Knight study was US-based.
According to table 1226, 11.6% of persons 18 and over attended a classical music performance in the 12 months prior to a 2002 NEA survey.
Pr(T) = 0.116
We now need the percentage of people who have sung in a choir or played an instrument. Again, it's hard to know how the Knight study defined that. The Gallup 2003 "American Attitudes Toward Music" study says 37% of surveyed Americans 12 and up played a musical instrument. Let's say that
Pr(I) = 0.37
although, because it doesn't include singing, it may understate the percentage. (Actually, because the data was taken for a different purpose and using a different process, it may under- or overstate the percentage. Table 19 of the NEA report indicates that 33.9% of U.S. adults had taken a music lesson at some time in their lives; as that's close to 37%, a number in this range seems at least reasonable.)
0.74 * 0.116
Pr(T|I) = ------------
Pr(T|I) = 0.23
If my data and reasoning are correct, that means that just under a quarter of those who played an instrument or sang in a choir are likely to attend classical music performances. That's still a big fraction, but it's not nearly as dramatic as the 74%.
What are the lessons of this exercise?
- The probability of X, given that Y happened, isn't the same as the probability of Y, given that X happened. Often surveys and other statistical sources give us the former, when we are more interested in the latter.
- It's possible to convert one into the other using Bayes theorem. Bayes theorem is handy for other reasons, too; for one, you can use it to update your belief about the probability of a certain event as you collect new data.
- You can often find free data that enables you to make that conversion.
- Data aren't really free, if you count the effort you have to put into understanding and reconciling the data. Even though I found my data in credible places, I still don't know for sure if differences in the way terms were defined or data was collected make a difference in the conclusion I drew.
- Triangulation can help. By discovering that Gallup and NEA surveys gave similar results, I can be somewhat more sure of my conclusions.
While I'm interested in sustainability of the arts, I'm also interested in making sense of the situations, especially the organizational and societal situations, in which we find ourselves. Calculations such as this using Bayes theorem can help us make sense of these type of numbers. If we're not likely to do the math or have the necessary data ourselves, at least we can be aware of the issues and ask those giving us the numbers to interpret them appropriately.
I invite your questions and feedback on this and any of my postings. That's how I (and, presumably, we) learn.