Monday, November 20, 2006

Making sense with numbers

Understanding numbers is a big part of making sense of the world. Sometimes, though, numbers don't mean what they might seem to mean.

I've been reading the Knight Foundation Magic Of Music Final Report, as mentioned recently on Drew McManus' Adaptistration. On page 32, the report says

In trying to profile the factors that might predict a ticket buyer, one statistic stood out: 74 percent of them had played an instrument or sung in a chorus at some time in their lives.

They conclude that participatory education programs (getting children to play instruments or sing) "were strongly correlated with later concert attendance" (p. 33).

At first glance, that's quite impressive. Does that say all you have to do is get someone to play the violin in school and you've got a 74% chance of having created a concert-goer twenty years in the future?

I don't think so. Let's think about what that number means.

Before we get started, I must say that I'm certainly not trying to pick on the Knight Foundation or Dr. Thomas Wolf, the author of the report. The report didn't say what I inferred two paragraphs back; it just stated a number and concluded there was a strong relationship. I'm writing this simply because it's an example of a type of statistic I see quoted from time to time, in which the number measured and quoted may not be the number you want to know. I certainly welcome anyone who can help me refine or correct my thinking.

To help you judge my reasoning, I'll walk through it from top to bottom. Please comment, if you see places I went astray.

The report seems to say that the probability of someone having played an instrument or sung in a group, given that they were a ticket purchaser, was 0.74:


I = the incidence of having played an instrument
or sung in a chorus
T = the incidence of being a ticket buyer

Pr(I|T) = 0.74


That's likely not what you really care about, though. It would be much more interesting to know the probability of a person being a ticket buyer, given that they played an instrument or sang in a group:


Pr(T|I) = ?


By Bayes Theorem,


Pr(I|T) Pr(T)
Pr(T|I) = -------------
Pr(I)


It's not clear what the report means by all its terms. For example, are ticket buyers just the ones who made purchases, or does it include those who came along with the purchaser? Even if I knew their definitions, I don't have their data set to duplicate the calculations they may have made.

Ignoring all that uncertainty for a moment, I checked out the arts section of the Statistical Abstract of the United States, as the Knight study was US-based.

According to table 1226, 11.6% of persons 18 and over attended a classical music performance in the 12 months prior to a 2002 NEA survey.

Thus


Pr(T) = 0.116


We now need the percentage of people who have sung in a choir or played an instrument. Again, it's hard to know how the Knight study defined that. The Gallup 2003 "American Attitudes Toward Music" study says 37% of surveyed Americans 12 and up played a musical instrument. Let's say that


Pr(I) = 0.37


although, because it doesn't include singing, it may understate the percentage. (Actually, because the data was taken for a different purpose and using a different process, it may under- or overstate the percentage. Table 19 of the NEA report indicates that 33.9% of U.S. adults had taken a music lesson at some time in their lives; as that's close to 37%, a number in this range seems at least reasonable.)

Thus


0.74 * 0.116
Pr(T|I) = ------------
0.37

Pr(T|I) = 0.23


If my data and reasoning are correct, that means that just under a quarter of those who played an instrument or sang in a choir are likely to attend classical music performances. That's still a big fraction, but it's not nearly as dramatic as the 74%.

What are the lessons of this exercise?


  1. The probability of X, given that Y happened, isn't the same as the probability of Y, given that X happened. Often surveys and other statistical sources give us the former, when we are more interested in the latter.
  2. It's possible to convert one into the other using Bayes theorem. Bayes theorem is handy for other reasons, too; for one, you can use it to update your belief about the probability of a certain event as you collect new data.
  3. You can often find free data that enables you to make that conversion.
  4. Data aren't really free, if you count the effort you have to put into understanding and reconciling the data. Even though I found my data in credible places, I still don't know for sure if differences in the way terms were defined or data was collected make a difference in the conclusion I drew.
  5. Triangulation can help. By discovering that Gallup and NEA surveys gave similar results, I can be somewhat more sure of my conclusions.


While I'm interested in sustainability of the arts, I'm also interested in making sense of the situations, especially the organizational and societal situations, in which we find ourselves. Calculations such as this using Bayes theorem can help us make sense of these type of numbers. If we're not likely to do the math or have the necessary data ourselves, at least we can be aware of the issues and ask those giving us the numbers to interpret them appropriately.

I invite your questions and feedback on this and any of my postings. That's how I (and, presumably, we) learn.

3 Comments:

Blogger Chip said...

Even the 23% may be overstated.

The foundation study referred to people who played an instrument at some time in their lives.

The Gallup survey appears to be asking about people who play an instrument now.

I took band in high school, but right now I can't honestly say I play an instrument. Haven't for the last 25 years or so. Because there are probably many others like me, the denominator is probably larger than .37 and the ratio therefore <.23

18 December, 2006 14:55  
Blogger Bill Harris said...

Chip, thanks for your comment. Yes, it's very hard to link data taken by different organizations at different times for different purposes; you may well be correct.

I think it's even challenging to link data taken by the same organization at different times for varying purposes.

18 December, 2006 16:15  
Blogger Bill Harris said...

Chip, my first response to your comment was a bit generic, so I checked the Gallup report again. Questions 5 and 6 (slides 14 and 15) seem to give the detail that supports the 37% reported in the highlight on slide 6. Since we don't know the size of households surveyed, we can't get much useful ratio data from question 6, but question 5 ("Do you, yourself, play a musical instrument of any kind?", answered with the green bars on slide 15) does seem to give a potentially useful ratio.

As you note, there are problems. For one, some people (you, for example) might answer "No" even when you used to play an instrument. It does seem likely that the actual number would be larger—perhaps much larger—than 0.37 for just that reason. However, Table 19 of the NEA report cited above does indicate that 33.9% had taken a music lesson at some time, which suggests the number could be lower, not higher.

For another, the 37% refers to respondents who were, according to slide 4, twelve years of age or older. Few twelve-year-olds would be likely ticket purchasers, so the population over which I is calculated is likely different than the population over which T is calculated. I'm not sure how that might affect the results.

Finally (for this comment), non-responses were not included in the results. I don't what kind of bias, if any, that might have introduced into the result.

So where does that leave us? Were it not for the NEA's Table 19, I would likely agree with your assessment that Pr(T|I) is likely even lower than 0.23. If I use the NEA data for Pr(I), Pr(T|I) becomes 0.25.

At this point, I draw two conclusions.

First, this is a great example of the cost of data. Trying to use existing, free data (as I did for this column), even when it's gathered by (different) reputable organizations, is problematic and should be treated with caution.

Second, the fraction of those having played an instrument who become ticket buyers is smaller than one might conclude from a quick reading of the Knight Report, on the order of 1/5 to 1/4, and I'm not overly confident in setting tight confidence bounds on our estimate of that fraction. This is consistent with the qualitative tone of the original analysis ("The number you care about is quite a bit less than 74%"), and your comment has pointed out added uncertainty in its precise value. Thus skepticism may be warranted, not because the Knight number is wrong, but because we don't have enough data to do our own confirmation of what the likely number might be.

Thanks, Chip, for taking the time to add to this discussion.

19 December, 2006 13:32  

Post a Comment

<< Home