Sunday, August 12, 2007

Weighting Studies in Meta-Analysis

Entry for 11 August 2007:

[This is one of my technical research entries, but it also shows how my mind works.]

Beth and I are working away on the meta-analysis of Person-Centred and Experiential therapy outcome research. So far, most of what Beth has been up to is re-analyzing the data I previously analyzed, in order to check on my calculations and ratings. This is so that we will be able to calculate inter-rater reliability, which considered to be the standard for meta-analysis. Of course, doing this with someone else means that various things that I have been doing my own way for years (through the previous generations of the analysis) now come into question and have to be discussed and clarified. This is a good process but sometimes challenging.

For example, should we use weighting in estimating effect sizes? That is, should we give some studies more weight than others in calculating the overall effect sizes? I learned in grad school that weighting rarely improves measurement, and so I have tended to distrust such approaches. I have been using weighting by sample size as one of the two ways of estimating overall effects; this appears to be the most common form of weighting, but does raise questions about significant testing, because is creates statistical nonindependence problems. (That’s because when you weight by sample size you the unit for significance testing becomes the client rather than the study, and clients within studies tend to share variance for various reasons, including having the same therapist, being in the same therapy and filling out the same instruments at the same times, etc.)

There are complicated methods for correcting sample size estimates, but these are cumbersome. In some ways, also, they seem to me to miss the point, which is that the true unit in meta-analysis is the study. The original idea of meta-analysis, as developed by Gene Glass, was that meta-analysis works with populations of studies. This makes sense to me because it is clear that there are wild differences between studies in clients, therapists, treatment, methods and effects. But of course, some of the studies in the meta-analysis we are doing have a sample size of five (our lower limit), while one or two of the larger German studies in our sample have samples of 1400. Given this, both weighting all studies the same (“unit weighting”) nor weighting by sample size seem problematic.

This is probably why Bruce Wampold recommends weighing by the inverse of the error: Smaller samples tend to have less reliable error estimates (that is, the error estimates have more error), which also creates bias in effect estimates (for this reason, I’ve been using a separate correction for small sample bias). Using the inverse of the error (1 divided by the standard deviation or standard error of the mean), gives more weight to studies with small error. Such studies tend to be more tightly designed and to sample clients more narrowly, which would give more weight to controlled studies and less weight to naturalistic studies.

Hmm… whatever you do, there is always problem. If you believe that there is a single True Effect Size, then you will want to weight by inverse error or sample size, in the belief that doing this will lead you closer to Truth. If you are more of a relativist, a methodological pluralist, or even a critical realist, then you will be suspicious of the search for singular Truth, but will be more interested in looking at things from various angles and looking at the points of convergence and divergence. Thus, it makes sense to me to estimate overall effect sizes in at least two different ways: Unit weighting (treating all studies as equal) and sample-size weighting. When I have done this in the past, it has sometimes produced virtually identical results, and at other times has produced slightly different results (a difference of .13 sd is the largest I’ve found so far), indicating (a) that weighting didn’t make much difference; and (b) that the estimates are fairly robust. Relying on single weighting solutions doesn’t provide this kind of information!

No comments: