Psychological Bulletin
Copyright 2006 by the American Psychological Association
2006, Vol. 132, No. 5, 667– 691
0033-2909/06/$12.00
DOI: 10.1037/0033-2909.132.5.667
A Meta-Analytic Review of Obesity Prevention Programs for Children
and Adolescents: The Skinny on Interventions That Work
Eric Stice, Heather Shaw, and C. Nathan Marti
University of Texas at Austin
This meta-analytic review summarizes obesity prevention programs and their effects and investigates
participant, intervention, delivery, and design features associated with larger effects. A literature search
identified 64 prevention programs seeking to produce weight gain prevention effects, of which 21%
produced significant prevention effects that were typically pre- to post effects. Larger effects emerged for
programs that targeted children and adolescents (vs. preadolescents) and females, programs that were
relatively brief, programs that solely targeted weight control versus other health behaviors (e.g.,
smoking), programs evaluated in pilot trials, and programs wherein participants must have self-selected
into the intervention. Other factors, including mandated improvements in diet and exercise, sedentary
behavior reduction, delivery by trained interventionists, and parental involvement, were not associated
with significantly larger effects.
Keywords: obesity, prevention, meta-analysis, moderators
Obesity in adulthood results in an increased risk for future death
yielded similar effects, though behavioral family-based interven-
from all causes, coronary heart disease, atherosclerotic cerebro-
tions have produced more persistent weight loss effects (Epstein,
vascular disease, and colorectal cancer, as well as serious medical
Valoski, Wing, & McCurley, 1990; Flodmark, Ohlsson, Ryden, &
problems including hyperlipidemia, hypertension, gallbladder dis-
Sveger, 1993). Compounding matters, only about 10% of obese
ease, and diabetes mellitus (Calle, Thun, Petrelli, Rodriguez, &
children and adolescents seek weight loss treatment (e.g., French,
Heath, 1999). Obesity in childhood and adolescence has also been
Perry, Leon, & Fulkerson, 1994). Accordingly, much effort has
associated with serious medical problems, including high blood
been devoted to developing and evaluating obesity prevention
pressure, adverse lipoprotein profiles, diabetes mellitus, athero-
programs, in the hope that this strategy will more effectively curb
sclerotic cerebrovascular disease, coronary heart disease, colorec-
this pernicious public health problem.
tal cancer, and death from all causes, as well as lower educational
Studies have evaluated four major types of interventions that
attainment and poverty (Dietz, 1998). The prevalence of obesity
were expected to produce weight gain prevention effects. These
has increased sharply over the last 3 decades; currently, 65% of
include (a) multifocus cardiovascular disease prevention programs
adults are classified as overweight or obese (Hedley et al., 2004).
that targeted obesity along with other risk factors for cardiovas-
The prevalence of obesity has risen even more sharply among
cular disease (e.g., hypertension and smoking), (b) prevention
adolescents and young adults (Hedley et al., 2004), which is
programs that focused solely on the prevention of obesity or
alarming because obesity persists into adulthood for 70% of obese
weight gain, (c) interventions designed to solely increase physical
adolescents (Magarey, Daniels, Boulton, & Cockington, 2003).
activity, and (d) eating disorder prevention programs that pro-
Obesity also carries a high fiscal cost; roughly $100 billion per
moted use of healthy weight-management skills.
year is spent on obesity-related health care (Wolf, 1998).
Although numerous evaluations of weight gain prevention pro-
Unfortunately, successful treatments for obesity have been elu-
grams have been conducted, their results have not been compre-
sive. For adults, the current treatment of choice only results in
hensively reviewed and analyzed with meta-analytic procedures.
about a 10% reduction in body weight, and virtually all patients
Several excellent narrative reviews exist (e.g., Dietz & Gortmaker,
regain this weight within a few years of treatment (Jeffery et al.,
2001; Schmitz & Jeffery, 2000; Story, 1999), but meta-analytic
2000). Obesity treatments for children and adolescents have
techniques were not used to empirically describe effect sizes or
investigate potential moderators of intervention effects. One meta-
analytic review has been published (Campbell, Waters, O’Meara,
Eric Stice and Heather Shaw, Department of Psychology, University of
Kelly, & Summerbell, 2003), but it used extensive exclusionary
Texas at Austin; C. Nathan Marti, Department of Educational Administra-
criteria that rendered it impossible to examine moderators of
tion, University of Texas at Austin.
intervention effects (only 10 trials were included). Thus, the over-
Preparation of this article was supported by research grants (MH/
arching goal of this article is to address this important gap in the
DK61957 and MH70699) from the National Institutes of Health.
literature. The first aim of this review is to provide a summary of
We are very grateful to Amy Greenwold, Krista Heim, and David Huh
these prevention programs and their effects. The second aim is to
for their assistance with the literature search and article preparation.
examine participant, intervention, delivery, and design features
Correspondence concerning this article should be addressed to Eric
Stice, who is now at Oregon Research Institute, 1715 Franklin Boulevard,
that are associated with larger intervention effects. Given the
Eugene, OR, 97403. E-mail: estice@ori.org
heterogeneity in the effects from these interventions, it is important
667
668
STICE, SHAW, AND MARTI
to systematically consider the moderators associated with inter-
effects of obesity prevention programs for this population. In
ventions that produced the largest effects. The third aim is to
support, more females than males are dissatisfied with their bodies,
discuss promising directions for future research in light of the
and the vast majority of adolescent females with body image
findings from completed trials.
concerns are dissatisfied because they feel overweight (Thompson
et al., 1999). In contrast, the reasons males give for body image
Putative Moderators of Intervention Effects
concerns are more heterogeneous, and nearly half who indicate
that they are dissatisfied with their weight actually wish to gain
A unique feature of meta-analyses is that they permit empirical
weight. In addition, females are at higher risk for onset of obesity
examination of factors associated with variation in effect sizes.
than males (Solomon & Manson, 1997). Because there was more
Elucidating factors that moderate prevention program effects is
evidence that obesity prevention programs produce larger effects
informative because it highlights aspects of the participants, inter-
for females than males, we hypothesized that intervention effects
vention, program delivery, and research design that are associated
for prevention programs might be larger for females.
with stronger intervention effects. This information should in-
crease the yield of future prevention efforts by identifying the
Participant Ethnicity
conditions under which optimal prevention effects occur. As well,
this information might identify particular subgroups of individuals
There is also reason to believe that ethnicity might moderate
for whom alternative obesity prevention programs need to be
obesity prevention effects. On the one hand, there is evidence that
developed. Analyses of moderators of intervention effects should
Black and Hispanic individuals show elevated rates of overweight
also advance general theories regarding effective routes to alter
and obesity as well as greater increases in weight over develop-
maladaptive health behaviors and attitudes. Accordingly, we in-
ment, relative to other ethnic groups (e.g., Burke & Bild, 1996;
vestigated several potential moderators of intervention effects that
Kimm et al., 2001), suggesting that programs targeting these high-
were selected on the basis of theory, prior findings, and previous
risk groups might be more effective because there is a greater
literature reviews.
opportunity to show a prevention effect. On the other hand, over-
weight and obesity are less stigmatized and are associated with less
Participant Features
body dissatisfaction for certain ethnic minority groups (e.g., Dun-
can, Anton, Newton, & Perri, 2003), particularly Black women,
Participant Age
which might attenuate the effects of obesity prevention programs
Researchers have hypothesized that obesity prevention pro-
for these populations. Thus, we hypothesized that intervention
grams are more effective when they are delivered to middle school
effects would be different for programs primarily targeting high-
or high school students versus grade school students (Baranowski,
risk ethnic minority participants versus those primarily targeting
Cullen, Nicklas, Thompson, & Baranowski, 2002). Younger chil-
low-risk ethnic groups.
dren may find it difficult to grasp the concepts and skills taught in
the interventions. They may also be less likely to impact the food
Risk Status of Participants
purchases made by adults (when eating at home or restaurants).
Thus, we hypothesized that effects would be significantly larger
More generally, we have hypothesized (Stice & Shaw, 2004)
for interventions offered to adolescents versus children.
that interventions are more effective when offered to high-risk
participants (selected prevention programs) versus all individuals
Participant Gender
in a population (universal prevention programs). In the obesity
prevention field, selected interventions have been directed at a
Results from prior trials suggest that obesity prevention pro-
variety of groups at elevated risk for future weight gain, including
grams that promoted a healthier lower calorie diet (Perry et al.,
Black and Hispanic individuals, students with other cardiovascular
1998) and those that also attempted to increase physical activity
disease risk factors (e.g., hypertension), overweight or obese indi-
and/or decrease sedentary behavior (Gortmaker et al., 1999; Van-
viduals, 1st-year college students, and females with body dissat-
dongen et al., 1995) produced larger effects for females than for
isfaction. Theoretically, these high-risk individuals are more mo-
males. However, another obesity prevention program that pro-
tivated to engage in the prevention program content and thus are
moted healthy lower calorie diets and increased physical activity
more likely to benefit. It is also likely that low-risk individuals
found significantly stronger effects for males than for females
have less room for change on the outcomes (a floor effect). One
(Kain, Uauy, Vio, Cerda, & Leyton, 2004), and one obesity treat-
narrative review comparing selected and universal school-wide
ment trial found that an intervention solely aimed at increasing
obesity prevention programs concluded that selected interventions
activity and decreasing sedentary behaviors was more effective for
may be more effective in reducing pediatric obesity than universal
boys than girls, though an intervention focusing solely on increas-
interventions (Resnicow, 1993). In addition, prevention programs
ing activity level was equally effective for boys and girls (Epstein,
for eating pathology (Killen et al., 1993), depression (Clarke et al.,
Paluch, & Raynor, 2001). Although these findings may represent
1995), anxiety (Lowry-Webster, Barrett, & Dadds, 2001), behavior
chance findings because most trials did not report that intervention
problems (Stoolmiller, Eddy, & Reid, 2000), and substance abuse
effects were moderated by gender, there was more evidence that
(Murphy et al., 2001) have often produced stronger effects for
obesity prevention programs produced larger effects for females
high-risk subsamples than for the full sample of individuals en-
than for males. This finding may have emerged because sociocul-
rolled in these universal prevention programs. Thus, we hypothe-
tural pressures for thinness are greater for females (Thompson,
sized that intervention effects would be larger for selected pro-
Heinberg, Altabe, & Tantleff-Dunn, 1999), which may amplify the
grams versus universal programs. Because the key distinction
OBESITY PREVENTION PROGRAMS
669
between selected and universal programs is that the former are
ventions that directly changed the nutritional content of school
offered to high-risk individuals, we use the term risk status of
lunches (e.g., Donnelly et al., 1996; Luepker et al., 1996). We
participants to refer to this moderator.
hypothesized that interventions that involved a direct improvement
to dietary intake should produce stronger intervention effects than
Intervention Features
those that did not.
Intervention Duration
Increased Activity
Previous meta-analyses of prevention programs for other prob-
Another implication from the energy balance model of obesity is
lem behaviors have suggested that longer duration multisession
that increased physical activity will decrease risk for future weight
interventions produced more superior effects than very brief inter-
gain (Wadden, Vogt, Foster, & Anderson, 1998). Although most
ventions (Rooney & Murray, 1996; Stice & Shaw, 2004). Theo-
obesity prevention programs recommend regular physical activity,
retically, interventions with a longer duration afford a greater
we distinguish between prevention programs that directly manip-
opportunity for presentation of information and behavioral change
ulated physical activity from those that simply recommended it,
skills. We hypothesized that intervention effects would be stronger
because we felt this would provide a more sensitive test of this
for prevention programs with a longer versus shorter duration.
potential moderator. The most common example of programs that
manipulated physical activity was school-based interventions that
Parental Involvement
administered a physical education class for students in the inter-
It has also been suggested that parental involvement leads to
vention condition but not the control condition (e.g., Dwyer,
more favorable results in obesity prevention, as the family is
Coonan, Leitch, Hetzel, & Baghurst, 1983; McMurray et al.,
thought to be key to developing a psychosocial environment that is
2002). We hypothesized that programs that directly increased
conducive to healthy eating and physical activity (Story, 1999).
physical activity would have larger intervention effects than those
Parents are usually responsible for determining food offerings in
that did not increase activity.
and away from the home, at least through a certain age, as well as
influencing exercise and recreation. Obesity treatment trials have
Reduced Sedentary Behavior
suggested that both child and adolescent weight loss programs are
more effective when at least one parent is involved (Epstein, Wing,
A third implication of the energy balance model of obesity is
Koeske, & Valoski, 1987; Golan, Weizman, Apter, & Fainaru,
that interventions that reduce sedentary behavior, such as TV
1998). Therefore, we hypothesized that obesity prevention pro-
viewing and video game use, should also decrease risk for future
grams with parental involvement would have larger effects than
weight gain. Indeed, it has been theorized that more effective
those without parental involvement.
obesity prevention programs focused on reducing sedentary be-
havior (Baranowski et al., 2002), and TV viewing is considered
one of the most modifiable causes of obesity in children (Robin-
Psychoeducational Content
son, 1999). We hypothesized that larger effects would emerge for
Because research has suggested that psychoeducational content
programs that focused on reducing sedentary behaviors than for
is ineffective in producing behavioral change (Helweg-Larsen &
programs that did not target this risk factor.
Collins, 1997; Larimer & Cronce, 2002), we hypothesized that
psychoeducational programs would be associated with weaker
Number of Behavior Targets
intervention effects. Indirect support for this hypothesis was pro-
vided by a recent meta-analysis which found that eating disorder
Our review of the literature suggested that the number of health
prevention programs with psychoeducational content are less ef-
behaviors targeted in an intervention was inversely related to the
fective than those without this content (Stice & Shaw, 2004).
magnitude of intervention effects for obesity. Specifically, it ap-
peared that interventions that attempted to change a broad array of
Dietary Improvement
health behaviors, such as body weight, blood pressure, cholesterol,
and smoking, were less effective than programs that focused solely
One implication from the energy balance model of obesity is
on body weight. Our clinical experience from designing and eval-
that a reduction in fat and sugar intake and an increase in fruit and
uating prevention programs also suggests that interventions focus-
vegetable intake will decrease the risk for future weight gain
ing on a few concepts are more effective than those focusing on a
(Epstein, Gordy, et al., 2001). Although virtually all obesity pre-
broader array of concepts. It may be that the greater the complexity
vention programs recommend consumption of low-fat diets, we
of the message relayed by the intervention, the more difficult it is
differentiated between programs that directly manipulated dietary
for participants to process, store, and retrieve information pre-
change as part of the intervention and those that did not. We
sented in the programs. Consistent with this general impression, a
reasoned that distinguishing between interventions that actually
review of school-based cardiovascular disease prevention trials
manipulated diet and those that did not would provide the most
concluded that broad-based programs targeting multiple health
sensitive test of this moderator. Another benefit is that this coding
behaviors aimed at reducing risks for cardiovascular disease have
scheme captures environmental manipulations of the food envi-
not been effective for reducing obesity in children (Resnicow &
ronment, which is useful because theorists have suggested that the
Robinson, 1997). We hypothesized that programs targeting multi-
food environment plays a key role in obesity promotion (Wadden,
ple health behaviors would have smaller effects than those solely
Brownell, & Foster, 2002). The most common example was inter-
targeting weight change.
670
STICE, SHAW, AND MARTI
Delivery Features
weight gain prevention effects and therefore engage more effec-
tively in the prevention program. Thus, we hypothesized that
Teachers Versus Professional Interventionists
intervention effects would be significantly larger for self-
presenting volunteers than for participants recruited through
Researchers have suggested that obesity prevention programs
population-based recruitment efforts.
are more effective when delivered by dedicated interventionists
versus classroom teachers (Baranowski et al., 2002). Theoretically,
teachers are not able to devote as much time and energy to
Random Assignment
providing interventions as dedicated interventionists because
We theorized that trials that randomly assigned participants to
teachers have classroom responsibilities that take precedence.
condition might produce larger intervention effects than trials that
Moreover, dedicated interventionists are typically able to provide
used alternative approaches to allocating participants to treatment
the intervention several times per school year, allowing them to
condition, such as matching. We reasoned that because random
develop and refine their presentation strategies, whereas teachers
assignment is the best approach to generating groups that are
typically will only provide the intervention once per year. In
equivalent on any potential confounding variables at baseline (with
addition, teachers rarely receive the amount of specialized training
sufficiently large sample sizes), it should therefore minimize the
and detailed supervision provided to dedicated interventionists.
chances that any of these confounding variables are correlated with
Thus, we hypothesized that intervention effects would be signifi-
treatment condition, which should thus maximize the ability to
cantly larger for programs delivered by dedicated interventionists
detect intervention effects if they really occur (i.e., randomization
versus classroom teachers.
maximizes the signal-to-noise ratio reflected in inferential tests of
the intervention effects). Accordingly, we hypothesized that inter-
Didactic Versus Interactive Format
vention effects may be greater for trials that used random assign-
ment relative to other approaches to assigning participants to
Meta-analytic reviews of substance abuse (Tobler et al., 2000)
condition. However, because the proper analysis of intervention
and eating disorder (Stice & Shaw, 2004) prevention programs
effects involves tests of differential change across conditions,
have found that interactive programs produced larger intervention
which adjusts for any initial differences at baseline on the out-
effects than didactic programs. Theoretically, participants in inter-
come, we suspected that this effect might not reach statistical
active programs show greater intervention effects because this
significance. Consistent with this expectation, random assignment
format helps participants engage in the program content, which
did not emerge as a significant moderator of effects sizes in our
facilitates skill acquisition and attitudinal change. Interactive pro-
meta-analysis of eating disorder prevention programs (Stice &
grams are also more likely to involve exercises that allow partic-
Shaw, 2004).
ipants to apply the skills taught in the intervention, which should
enhance skill acquisition (e.g., particular sports). We predicted that
interactive programs would be more effective than didactic
Nested Data Modeled Incorrectly
programs.
Virtually all parametric inferential tests, such as repeated mea-
sures analysis of variance, growth curve, and survival models, used
Design Features
to test for intervention effects within randomized trials assume
Pilot Study
independence of errors. However, when participants are nested
within schools, classes, or group-based interventions, the assump-
Our review of the prevention and treatment literature for obesity
tion of independence may not hold (Baldwin, Murray, & Shadish,
and eating disorders suggested that larger intervention effects were
2005). Participants within these nested groups may be more similar
often observed for pilot trials of a new intervention relative to large
than participants from across these groups, which can artificially
demonstration trials. Such a pattern of effects might occur because
reduce the error terms used to test for intervention effects, which
interventionists are more passionate about new prevention pro-
increases risk for a false positive finding. Thus, we hypothesized
grams or because demonstration trials are more methodologically
that studies that did not model the nested nature of the data in the
rigorous and are therefore more immune to experimenter effects
trial would produce artificially larger effect sizes for the interven-
(e.g., because they more often use blinded assessors and minimal
tions relative to studies that modeled the nested nature of the data.
intervention control conditions). Thus, we hypothesized that inter-
vention effects would be significantly larger for pilot evaluations
Potential Artifacts
of new interventions.
We also investigated three variables that might produce artifacts
Recruitment Method
for the effect sizes and bias our estimates of effect size moderators,
with the goal of including these variables as covariates in the
Our experience suggests that intervention effects are often larger
models if necessary. First, our review of the eating disorder pre-
when prevention programs are delivered solely to participants who
vention field suggested that interventions tend to produce larger
have actively self-selected into trials in response to recruitment
effect sizes when they are compared with assessment-only or
efforts, such as media advertisements, relative to when prevention
waitlist control conditions relative to when they are compared with
programs are offered to all individuals in a defined population
active interventions that are credible and structurally matched to
(e.g., a particular school). Presumably this is because the former
the intervention in terms of contact hours (Stice & Shaw, 2004).
strategy recruits individuals who are more motivated to achieve
Theoretically, this pattern of findings occurs because the active
OBESITY PREVENTION PROGRAMS
671
comparison groups more effectively control for demand charac-
tion programs and psychoeducational interventions produced significant
teristics, participant expectances, and other nonspecific factors that
weight gain prevention effects (Stice & Shaw, 2004). We included a wide
contribute to intervention effects. Thus, we tested whether type of
variety of interventions that were expected to produce weight gain preven-
control condition was systematically related to the intervention
tion effects in the hope that it would maximize our chances of identifying
effect sizes. Second, because effect sizes for prevention programs
participant, intervention, delivery, and design features that are associated
with the most efficacious obesity prevention programs. If multiple reports
tend to be smaller when longer follow-up periods are examined
of the same trial were published, we selected the one with the longest
relative to shorter follow-up periods or pretest to posttest designs
follow-up period.
for prevention programs (Stice & Shaw, 2004), we tested whether
This meta-analysis focused solely on effect sizes for weight gain pre-
follow-up length was related to effect size magnitude. Third,
vention effects, as assessed by differential change in body fat measures.
because prior meta-analyses have found that unpublished studies
We did not include effect sizes for changes in self-reported dietary intake
often have smaller effects than published studies (Lipsey & Wil-
or physical activity, because numerous trials have found significant inter-
son, 2001), we investigated whether publication status was related
vention effects for self-reported dietary intake and physical activity, but no
to intervention effect sizes.
significant effects for weight change (e.g., Baranowski et al., 2003; Luep-
ker et al., 1996; Puska et al., 1982). According to the energy balance model
of adiposity, any true reduction in caloric intake and/or increase in physical
Method
expenditure should be accompanied by concomitant changes in body mass.
Therefore, we interpreted this pattern of findings as suggesting that self-
Sample of Studies
report measures of dietary intake and physical activity are of questionable
Following the recommendations of Lipsey and Wilson (2001), we used
validity, at least within the context of the demand characteristics of obesity
five procedures to retrieve published and unpublished trials of obesity
prevention trials. This interpretation dovetails with studies that have found
prevention programs. First, a computer search was performed on Psy-
that people underreport caloric intake and overreport activity level (Ban-
cINFO, MEDLINE, Dissertation Abstracts International, and Cumulative
dini, Schoeller, Dyr, & Dietz, 1990; Lichtman et al., 1992).
Index to Nursing and Allied Health Literature for the years 1980 –2005
We focused exclusively on prevention programs that were evaluated in
(through October) with the following keywords: obesity weight, cardio-
controlled trials. We included trials in which participants were randomly
vascular disease, prevention, preventive, and intervention. Two research
assigned to an intervention; to active interventions that were not focused on
assistants and a professional librarian performed independent searches to
weight gain prevention (e.g., a general parent training intervention); or to
increase the odds that all relevant articles would be retrieved. Eric Stice and
usual-programming (e.g., standard physical education classes), waitlist, or
Heather Shaw reviewed the products of all three searches to identify
assessment-only control conditions. We also included trials in which some
pertinent articles. Second, the tables of content for journals that commonly
relevant comparison group was used (e.g., matched controls) in a quasi-
publish articles in this area were reviewed for this same period (e.g.,
experimental design. Random assignment to condition is optimal because
Preventive Medicine, Journal of Pediatrics, Health Education Quarterly).
it is the best approach to generating comparison groups that are equated on
Third, we consulted narrative reviews of the obesity prevention field to
any potential confounding variables at baseline (Shadish, Cook, & Camp-
search for additional citations of relevance. Fourth, the reference sections
bell, 2002). Because many confounds are unknown, random assignment is
of all identified articles were examined. Finally, established obesity pre-
preferable to the use of control groups that are matched to the intervention
vention researchers were contacted and asked for copies of unpublished
group on preselected dimensions. Nonetheless, carefully selected compar-
articles (under review or in press) describing prevention trials.
ison groups can permit useful inferences regarding intervention effects if
analyses test for significant differences in change over time across condi-
tions (i.e., controlled for initial between-group differences on the outcome;
Inclusion and Exclusion Criteria
Shadish et al., 2002). We excluded trials that compared only active inter-
The defining feature of a successful obesity prevention program is that
ventions, because it seemed inappropriate to compare them with trials that
it results in significantly less weight gain or risk for obesity onset than
used a control condition and because it is difficult to determine whether a
observed in the control group. Thus, we only included trials that used some
lack of differential change across active interventions signifies that both
type of proxy measure of body fat as an outcome. Most trials used the body
prevention programs were effective or that neither was effective.
mass index (BMI
Kg/M2) as the primary proxy measure of body fat, but
We also focused exclusively on studies that tested whether the change in
a few studies, particularly older ones, used skinfold thickness. It is impor-
the outcomes over time was significantly greater in the intervention group
tant to note that BMI is not a direct measure of body fat. Although this
versus the control group. This could take the form of a Time
Condition
proxy measure tends to show high correlations with the most precise
interaction in a repeated-measures analysis of variance model, an analysis
measures of body fat (r
.80 –.90), such as dual energy x-ray absorpti-
of covariance model that controlled for initial levels of the outcome
ometry (DEXA; Dietz & Robinson, 1998), it has been found to show lower
variable, or a growth curve model that controlled for initial levels of the
agreement with DEXA measures in large epidemiology samples (r
.71;
outcome (e.g., the effects were conditional upon the intercept value of the
Ellis, Abrams, & Wong, 1999). Nonetheless, because the BMI is easy to
dependent variable coded to reflect the level of the outcome at baseline;
measure, shows high test–retest reliability, is inexpensive, and correlates
Stice & Shaw, 2004). It is necessary to control for initial levels of the
with health risk markers and diseases, such as elevated blood pressure,
outcome variable because otherwise the analyses are not providing a test of
adverse lipoprotein profiles, atherosclerotic lesions, serum insulin levels,
differential change over time across conditions. Verifying that the groups
and diabetes mellitus, it is considered the measurement of choice for
do not differ at baseline on the outcome variable does not solve this
large-scale studies (Dietz & Robinson, 1998; Freedman & Perry, 2000).
problem because the objective is to model change from baseline to inter-
As noted previously, we included trials that were primarily conceptual-
vention termination or follow-up, rather than just to conduct between-
ized as evaluations of obesity prevention programs, as well as trials that
subjects tests of the groups at termination or follow-up. If the intervention
evaluated other interventions that were expected to result in less weight
group had higher initial BMI scores than the control group, the analyses
gain or risk for obesity onset but that were not primarily conceptualized as
may not detect a true intervention effect (a Type II error), whereas if the
obesity prevention programs (e.g., certain physical activity interventions,
control group had higher initial BMI scores than the intervention group, the
eating disorder prevention programs, and psychoeducational interven-
analyses might erroneously suggest that an intervention effect was present
tions). A prior meta-analysis indicated that certain eating disorder preven-
when it was not (a Type I error). We also included trials that used logistic
672
STICE, SHAW, AND MARTI
regression or survival models to test whether the rates of onset of obesity
we generated them directly by calculating Cohen’s d with the means and
or overweight were significantly less in the intervention condition versus
standard deviations (from the control group at baseline) reported in the
the control condition if initially obese or overweight participants, respec-
article, which we then converted to r using the Rosenthal formula, or we
tively, were excluded from the analyses (Willett & Singer, 1993). Studies
reconstituted the data using weighted probability values to estimate a
that only tested for significant changes within condition were not included
chi-square test that provided an odds ratio, which we then converted to r
because this type of analysis does not test whether the changes in the
using the Lipsey and Wilson formula. If none of these options for gener-
intervention condition are significantly greater than the changes in the
ating effect sizes was possible, we estimated effect sizes from the exact p
control condition. With this latter approach, there is no way to separate
values reported by the authors using the formula provided on page 19 of
the effects of the intervention from those of alternative sources, such as
Rosenthal (1991). If exact p values were not reported, we generated them
regression to the mean or measurement artifacts.
from the test statistics (e.g., F) and degrees of freedom using Microsoft
We excluded trials that were described as obesity treatment programs by
Excel.
the authors because the purpose of the present report was to provide a
We were able to use the methods described previously to generate effect
meta-analytic review of programs that sought to prevent future weight gain
sizes or estimates of effect sizes for all trials that reported significant
or obesity onset. Nonetheless, we included evaluations of programs that
intervention effects and for most trials that reported nonsignificant effects.
sought to prevent future weight gain in overweight or obese samples if they
However, for the two trials that reported nonsignificant effects and did not
were not referred to as treatment programs by the authors. More generally,
provide any other data with which to estimate the effect size (Fardy et al.,
we did not exclude studies solely because the average BMI of participants
1996; Willet, 1995), we used full information maximum likelihood esti-
fell above conventional cutoffs for overweight or obese (e.g., over 25 or 30
mation to impute the missing effect sizes because this approach produces
for young adult samples).
more accurate and efficient parameter estimates than listwise deletion or
We also restricted our focus to trials that targeted children and adoles-
alternative imputation approaches such as mean substitution (Schafer &
cents because of our interest in determining whether effective interventions
Graham, 2002). We selected this approach over the more common strategy
have been designed for developing individuals. We believe that obesity
of assuming an effect size of zero (Lipsey & Wilson, 2001) because more
prevention programs should be implemented before most individuals will
precise estimates of these missing values can be generated using the
show onset of obesity. However, we used a broad view of adolescence and
conditional probabilities between effect sizes and effect size moderators
included trials with a mean participant age of up to 22 years because this
from the trials that provided complete data on these variables.
captured college-based obesity prevention programs. College-aged individ-
uals are still developing self-regulation skills, particularly with regard to
Operationalization and Coding of Effect Size Moderators
dietary and exercise behaviors. In addition, many developmental psychol-
ogists consider adolescence to span from approximately age 12 through age
Table 1 lists the numeric values used to code each moderator, the
24 because most individuals in the United States have not settled into adult
operationalization of each moderator, and relevant descriptive statistics
roles by their early 20s (Arnett, 2000).
describing the distribution of the moderators.2 We coded certain modera-
tors two ways in an effort to ensure that we were not missing the effects of
a moderator, because we did not operationalize it optimally. First, in
Effect Size Estimation Procedures
addition to coding the average age of participants in the study, we also
coded the age range of participants, to determine whether studies focusing
We calculated effect sizes for tests of differential change in BMI and risk
on a narrow age range may be better able to deliver an intervention that is
for obesity onset across the intervention and control conditions because
developmentally appropriate. Second, with regard to participant ethnicity,
virtually all of the prevention trials included BMI as a primary outcome.
we coded both the percentage of participants who were Black or Hispanic
Although other proxy measures of adiposity were used in several trials,
(a continuous variable), because these two groups are at particularly high
such as skinfold thickness and waist-to-hip ratios, these latter outcomes
risk for obesity, and the dominant ethnic group represented in the samples
were operationalized inconsistently and were collected in only a subset of
(a nominal variable). Third, with regard to intervention duration, we coded
the trials. We considered averaging the effect sizes from these various
both the total amount of intervention hours and the total length of the
adiposity proxy measures, but we noted that the intervention effects for
intervention in weeks because these two aspects of intervention duration
these various outcomes were often contradictory and were concerned that
varied somewhat independently (the r between these two dimensions was
averaging across diverse measures would introduce unnecessary error
only .50). Fourth, with regard to psychoeducational content, we coded both
variance into the analyses. Furthermore, the measurement error is consid-
whether each intervention contained psychoeducational content (to stay
erably lower for the BMI relative to alternative proxy body fat measures,
parallel with the coding used for the other intervention content codes) and
including waist circumference, triceps skinfold, and subscapular skinfold
whether the intervention included only psychoeducational content, to ex-
measures (Freedman & Perry, 2000). In the four studies that did not collect
plore the possibility that these latter types of interventions were uniquely
BMI data, effect sizes were calculated for alternative proxy measures of
associated with small intervention effects.
body fat; Dwyer et al. (1983) used skin-fold measures, Eliakim, Makowski,
Brasel, and Cooper (2000) used MRI estimations of percent body fat, and
1
Gutin et al. (1995) and Gutin and Owens (1999) used DEXA estimations
We did not focus on effect sizes, such as Cohen’s (1988) d, which
of percent body fat.
focus on posttest mean differences across conditions without correcting for
The correlation coefficient (r) was selected as the index of effect size
pretest mean differences. Such effect size estimates are not able to rule out
because of its similar interpretation across different combinations of inter-
the possibility that differences at baseline between the conditions, even if
val, ordinal, and nominal variables (Pearson’s r, Spearman’s rho, and point
nonsignificant, artificially amplified or attenuated effect size estimates.
biserial; Rosenthal, 1991). Furthermore, this effect size preserved the
This theoretically has the effect of introducing greater error variance in
valence of the effects (unlike measures such as eta squared). Cohen’s
effect size estimates and therefore decreases power in analyses testing
(1988) criteria for small (r
.10), medium (r
.30), and large (r
.50)
heterogeneity of treatment effects and moderators of treatment effects.
effects were used.1
2 It might be noted that only 55% of the trials that did not use random
If effect sizes were reported in Cohen’s (1988) d, we converted them to
assignment to condition used matching to create the groups, suggesting that
r with the formula provided on page 20 of Rosenthal (1991). If effects were
the variable reflecting random assignment was not simply a surrogate for
reported as odds ratios, they were converted to r with the formula provided
matching, which would have complicated the interpretation of the former
on page 194 of Lipsey and Wilson (2001). If no effect sizes were reported,
moderator.
OBESITY PREVENTION PROGRAMS
673
One aspect of our coding system was constrained by the distribution of
whether nested data was modeled incorrectly) to 1.00 (for 75% of
a certain moderator across studies. Specifically, although we were inter-
the nominal variables examined in this report). These analyses
ested in testing whether the intervention effects were significantly larger
indicate that there was high interrater agreement.
for females than males, only 33% of the trials that we located reported
Tables 5 and 6 report the magnitude of effect sizes and provide
effect sizes separately for the sexes (and only 21% provided a direct test of
the participant, intervention, delivery, and design features that
whether sex moderated the intervention effects). Accordingly, we tested
were investigated as potential moderators of intervention effects.
whether interventions offered solely to females were more effective than
The effect sizes reflect analyses performed on the entire samples
those offered solely to males or those offered to both sexes. We took this
approach because (a) this variable emerged as a significant predictor of
used in these studies, versus effect sizes for various subgroups
eating disorder prevention program effects (Stice & Shaw, 2004), (b) our
such as the different genders, because such subgroup analyses
initial review of the findings suggested that effects were larger for female-
were not consistently reported across trials.
only trials, and (c) this allowed us to include all trials in the analyses.
Because only two interventions were offered solely to males, we did not
Average Effect Size and Effect Size Heterogeneity
feel comfortable estimating an average effect for these two trials.
There were also a number of other potential moderators that we were
Analyses were conducted on the effect size for change in BMI
unable to code because insufficient information was provided in the articles
in the intervention condition versus the control condition. We first
and reports. We were unable to code average attendance because only 44%
converted Pearson’s rs to z scores to avoid problematic standard
of the studies reported this variable. We were unable to code the socio-
error estimates (Hedges & Olkin, 1985). We then used the SPSS
economic status of the sample because parallel information (e.g., average
macro developed by Lipsey and Wilson (2001) to estimate the
parental income) was reported in only 35% of studies. We were unable to
overall inverse variance weighted average effect size for random
code the method of handling missing data (e.g., listwise deletion [compl-
eter analysis], last observation carried forward, full information maximum
effects models. All mean values were computed with this method.
likelihood estimation imputation) because less than 40% of the studies
The average effect size across all studies was very small (r
reported this information.
.04) but was significantly larger than zero (z
2.94, p
.01). The
We used a consensus approach to coding the effect size moderators. Eric
rs for the effect sizes ranged from
.24 to .50. Only 13 of these
Stice and Heather Shaw were each responsible for coding certain moder-
interventions (1 of which was evaluated in two trials), or 21% of
ators but consulted with each other when questions regarding the coding of
the 61 programs evaluated, found significant positive intervention
particular studies arose. Although this approach allowed for a refinement of
effects based on an alpha level of .05 (Dwyer et al., 1983; Eliakim
the coding system and served to increase interrater agreement, we did not
et al., 2000; Fitzgibbon et al., 2004; Gutin & Owens, 1999; Killen
use the consensus approach on all data points or double code all studies.
et al., 1988; Lionis et al., 1991; Manios, Moschandreas, Hatzis, &
Thus, we examined intercoder agreement by having Eric Stice and Heather
Kafatos, 2002; Robinson, 1999; Stice, Orjada, & Tristan, 2006;
Shaw code all of the moderators for a randomly selected 30% of the trials
examined in this meta-analytic review.
Stice & Ragan, 2002; Stice, Shaw, Burton, & Wade, 2006; Tamir
et al., 1990). One intervention (Alexandrov, Maslennikova, Kulik-
ovm, Propirnij, & Perova, 1992) reported a significant negative
Results
effect, which either represented a chance finding or an iatrogenic
effect.
Descriptive Statistics
There was significant heterogeneity in effect sizes (Q
204.41,
The literature search identified 46 trials that met the inclusion
p
.001), indicating that there was variability across the effect
criteria, in which 61 different obesity prevention programs were
sizes produced by the interventions (i.e., that effects were not
evaluated (12 trials evaluated more than 1 prevention program, and
equivalent across trials). The heterogeneity in the effects suggests
3 prevention programs were evaluated in 2 trials), resulting in a
that there may be participant, intervention, delivery, and design
total of 64 effect sizes for this review. Of these 64 prevention
features that account for the variability in effect sizes.
programs, 30 were universal, and 34 were selected. The majority
focused on both males and females (n
48), but 14 focused solely
Moderator Analyses
on females, and 2 focused solely on males. The majority of these
Two moderators could not be examined because of severe
interventions were school-based programs (84%). A total of 51 of
restrictions in range; because only two studies used credible active
the 64 prevention programs used random assignment to condition,
control conditions, and because we located only two unpublished
of which 13% were randomized at the participant level, 2% were
reports, we did not consider type of control condition or publica-
randomized at the group level, and 85% were randomized at the
tion status3 further. Two potential confounding variables were not
school level. Brief descriptions of the samples, program content,
examined because they did not show significant relations to effect
and intervention effects are provided in Tables 2 and 3 for uni-
sizes: preliminary univariate analyses indicated that length of
versal and selected prevention programs, respectively. Figure 1
follow-up (z
1.58, p
.11,
0.18) and the age range of
provides a flowchart showing the number of studies that were
participants in the trials (z
.80, p
.42,
0.10) were not
omitted because of the various exclusionary criteria.
significantly related to effect size magnitude. Within this context,
To assess interrater agreement between the two coders respon-
it should be noted that preliminary analyses also indicated that
sible for abstracting effect sizes and moderators, we calculated the
interclass correlation coefficient for continuous variables and
kappa ( ) coefficients for nominal variables (see Table 4). The
3 Even though there were only two unpublished trials included in the
interclass correlation coefficients ranged from a low of .95 (for the
present meta-analysis, we confirmed that there was no evidence that the
effect size estimates) to 1.0 (for 80% of the continuous variables
unpublished studies had significant different effect sizes relative to pub-
examined in this report). The
coefficients ranged from .87 (for
lished studies (z
.03, p
.82,
0.03).
674
STICE, SHAW, AND MARTI
2;
50
13
14
12
58
7
25
6
52
50
5
39
13
Islander
only)
content
57
content
7
statistic
1
content
behavior
increase
3.60
39.65
43.53
32.73
behavior
1.50
males
sessions)
content
Black
change)
increase
14
and
Asian/Pacific
SD
SD
42;
30
SD
SD
activity
sedentary
Descriptive
SD
4;
34
improvement
American
psychoeducational
39
(psychoeducation
(attended
activity
sedentary
gender
only
improvement
10.89,
3.38,
33.34,
44.34,
31.30,
(behavioral
psychoeducational
than
psychoeducational
dietary
physical
reduced
Native
M
M
Mixed
Female
M
Caucasian
Hispanic
Universal
Selected
M
M
None
Minimal
Medium
High
Psychoeducational
No
Only
More
Dietary
No
Physical
No
Reduced
No
so
in
and
to
only
to
sample.
not
were
were
they
change
the
range
truncated
levels
just
were
increase
order
in
was
otherwise
were
intervention
invited
age
was
there
who
information.
after-school
in
ethnic
included
reducing
grade
because
the
intervention
of
an
the
range
studies
Hispanic
years
use,
covered
universally
weeks.
psychoeducational
were
information.
systematic
that
was
baseline
or
in
range
or
dissatisfaction,
Unless
that
the
parents.
involvement.
of
at
age
most
Because
baseline)
largest
28
systematic
for
Age
was
selected
class
(at
body
be
parents
when
games)
values
two
only.
Black
weeks
received
criteria
Academic
to
presentation
concrete,
0.
PE
of
When
because
male/female
the
sample.
study
were
of
parental
target
a
intervention
with
intervention
factors).
when
Interventions
as
concrete,
video
and
participants
4).
2
levels).
(e.g.,
intervention.
of
males
sample
the
the
and
are
each.
included
presentation
only
a
the
of
(
loss)
age
of
parents
or
of
risk
r
h
of
number
number
level
coded
modified
(TV,
age
15
females
whether
included
male
entire
these
1
considered
obesity.
obesity
lunches.
or
the
two
whether
participants
hours
as
the
the
when
included
variable.
the
(weight
included
included
were
included
focus
description
for
media
mean
12
disease
females
just
for
majority
were
substituted
from
new
involvement
a
as
skew.
school
from
because
risk
the
study
risk
students,
coded
(e.g.,
and
was
as
males,
dummy
years
included
change
intervention
primary
Coding
representing
such
a
was
weeks.
high
representing
high
involvement
students
representing
representing
representing
representing
intervention
intervention
the
preparation
intervention
such
the
representing
males
just
at
whether
college
28
and
sample
range
positive
Hispanic
modified
increased
of
create
be
or
or
cardiovascular
as
the
variable
both
with
participants
representing
at
session
to
Academic
behavioral
when
when
when
snack
when
when
include
to
1-year
medium
behaviors,
variable
variable
in
grade
or
of
groups
variable
with
variable
variable
variable
involvement
a
activity,
program.
yes
yes
yes
such
yes
yes
Black
each
sessions,
BMI.
because
group
studies
group
place.
a
as
as
as
as
as
6
diet,
covered
reported,
primarily
at
females,
few
combined
either
minority
implemented
are
ethnicity,
students
noted,
considered
took
Minimal
materials,
attend
included
in
occasional
physical
exercise
sedentary
reduce
Continuous
Continuous
Dichotomous
Percentage
Ethnic
Categorical
Continuous
Continuous
Categorical
Coded
Coded
Coded
Coded
Coded
male
Native
or
Hispanic,
sessions),
5
Moderators
3
(psychoeducation
for
genders
change)
(attended
Black,
Islander,
selected,
Value
both
1,
2
minimal
1
no
no
no
no
no
Statistics
medium
(hr)
(weeks)
(behavioral
0
1
0
0
0
0
0
only
2
Asian/Pacific
high
Caucasian,
universal
none,
yes,
yes,
yes,
yes,
yes,
only
Black/Hispanic
4
American
only),
3
Descriptive
Continuous
Continuous
Female
%
1
0
Continuous
Continuous
0
1
1
1
1
1
and
ethnic
only
features
Features
involvement
sedentary
improvement
activity
1
Moderator
Black/Hispanic
group
status
content
content
increase
behavior
M
Range
%
Dominant
Hr
Weeks
Age
Gender
Ethnicity
Risk
Duration
Parental
Psychoeducational
Psychoeducational
Dietary
Physical
Reduced
Table
Operationalization
Participant
Intervention
OBESITY PREVENTION PROGRAMS
675
43
21
18
62
2
statistic
27
46
11
20
6
incorrectly
correctly
48
21
46
group
0.52
16
2
18
group
targets
target
targets
targets
52
modeled
modeled
62
Descriptive
42
10
assignment
control
SD
study
assignment
data
data
control
study
0.19,
pilot
self-selected
random
active
behavioral
behavioral
behavioral
behavioral
0
1
2
3
Interventionist
Teacher
Interactive
Didactic
Pilot
No
Self-selected
Not
Random
No
Nested
Nested
Active
No
M
Published
Unpublished
but
were
less
the
or
self-
schools
effects
addition
lectures).
cessation,
an
was
or
level,
or
sessions)
in
structurally
(i.e.,
pilot
within
of
sessions
discuss
modeled
whether
a
size
recruited
journal
media
school
was
to
or
schools)
classes,
exercise).
change
smoking
included
was
cell
or
prevention
study.
were
were
nested
that
(e.g.,
number
and
for
able
materials
the
data
class
gain
of
if
pilot
data
reviewed
intervention
a
same
criteria
included
were
facilitator,
particular
the
it
efforts
on
peer
at
at
condition
the
targeted
we
the
intervention
intervention
participants,
nested
weight
nutrition
a
and
reduction.
the
specify,
participants
in
or
was
on
not
(e.g.,
(e.g.,
presentation
Focused
control
whether
whether
participants
whether
whether
whether
did
considered
interventions.
behaviors
other
whether
recruitment
group
producing
description
frequencies,
cholesterol
we
published
interventionist.
strategy
active
each
conditions.
for
on
which
didactic
press).
health
to
modeled).
an
and
presentations
in
authors
broader
randomization
in
with
just
never
Coding
of
representing
not
years.
trained
representing
representing
to
representing
representing
of
group-based
used
Based
representing
in
was
or
When
condition),
was
was
or
intervention
control,
assigned
unit
trial
intervention
number
the
variable
variable
material
included
component,
variable
each
report
study.
variable
variable
population-based
response
the
to
report
the
control.
teacher
variable
(e.g.,
i
f
(in
a
in
variable
classes,
if
follow-up
if
pressure
a
0
credible
50
randomly
yes
psychoeducational
a
of
yes
weight
by
to
blood
led
interactive
intervention
intervention
feasibility
than
through
selected
advertisements).
were
correctly
nesting
schools,
equivalent
and
(e.g.,
(coded
Represents
Dichotomous
Dichotomous
Dichotomous
Categorical
Dichotomous
Dichotomous
Coded
Length
Coded
3
t
o
0
self-selected
teacher
index.
not
from
correctly
0
didactic
mass
Value
0
0
0
no
no
no
Add New Comment