Thursday, July 26, 2007

If you can say it, it's done

Even in this day and age, computing is a problem. How many of you us take the time to do some of the calculations mentioned here when faced with business or economic data, and how many of you us just read the analyst's summary and take the analyst's advice?

To some degree, that's because it takes time and effort to double-check such work, and that only gets worse if the subject is complex. It's also because the tools we have aren't always set up to help us do such things on the fly, and we're often on the fly (or in meetings, which can be as challenging).

That's one reason I've encouraged some of you who are interested to learn alternative approaches.

At least one APLer, Randy MacDonnell, has written about APL, "If you can say it, it's done." The same is true, of course, about J, its descendant. I had occasion recently to write a program to calculate whether a certain Monte Carlo simulation was done. I found a quotation by Andrew Gelman describing the Gelman - Rubin statistic:

For any given parameter, R-hat is the estimated posterior variance of the parameter, based on the mixture of all the simulated sequences, divided by the average of the variances within each sequence.

That looked easy enough, so I just wrote it down:

R=: var @: , % mean @: var

In English, that's "the variance of the entire set of data" (var @: ,)
"divided by" (%) "the mean of the variance of each data sequence" (mean @: var).

"If you can say it, it's done."

And you thought this was a blog about business, not programming, right? You were right. While J is a language that can be used by programmers, it's also a language that can be used by you and me to express quantitative ideas more powerfully and concisely than a spreadsheet. If you're ever interested in numerical answers from a spreadsheet, you could be interested in J. Perhaps, for some of you, it's worth downloading and trying out. Much as in learning a foreign (human) language, you won't be able to do much at first, but, eventually, you might be surprised what you can do. In a way, it's as much about thinking than about computing, and yet you can process some pretty large data sets with pretty concise "programs," too.

Thanks to Randy and Andrew for the quotations. For those of you interested in the Gelman-Rubin statistic, Andrew has pointed me to two papers giving more information: his Inference from Iterative Simulation Using Multiple Sequences with Donald Rubin and his General Methods for Monitoring Convergence of Iterative Simulations with Steve Brooks.

Labels: , , , ,


Blogger Devon said...

Bill -

this is a good little essay that's relevant to two separate things I'm doing: figuring out how to minimize the number of Monte Carlo simulations and, separately, writing a paper for APL2007 on the advantages of terse, array-based code.

One quibble: you say that J is a descendant of APL - but don't they both have the same father? This would make it more of younger sibling (brother? sister?) of APL.


Devon McCormick

27 July, 2007 13:21  
Blogger Bill Harris said...

Thanks for dropping by, Devon. You're probably right on the genealogy (jenealogy?) of J. I'm interested in your MC work; are you doing Markov Chain Monte Carlo simulations?

27 July, 2007 13:45  

Post a Comment

<< Home