Cardiorespiratory fitness was estimated using the Fitnessgram PACER test (98). The PACER test has been validated to estimate maximal oxygen consumption (98). A study by Morrow et al. (233) assessed the inter-rater reliability of physical education teacher and expert administrators (n = 23) in 1,010 elementary school students.
Teacher/teacher reliability was reported to have 82% agreement (modified kappa = 0.64,
P < 0.001) and expert/expert reliability had 96% agreement (modified kappa = 0.92, P <
0.001). The PACER is a multistage 20-meter shuttle run that requires the participants to run as long as they can while the pace gets faster each minute. Two lines were measured 20 meters apart and marked with cones. The Fitnessgram audio application was used for this assessment. Participants placed their feet behind the starting line on the “ready”
signal given from a research assistant. The research assistant then began the audio
recording. At the word “start”, participants were asked to run to the other cone before the first beep sounds. At the sound of the second beep, participants turned around and ran back to the first line. Participants had to wait for the beep before they began each run. A triple beep indicated the end of a minute and notified the participants that the pace would increase. Participants continued running back and forth from the start to the end line until the end of the PACER, or until they had two misses (i.e., they were not able to reach the second line before the beep sounded). Research assistants crossed off lap numbers on the Physical and Fitness Data Sheet. The total score was the number of laps completed before the second miss. Scores were also dichotomously categorized as meeting the healthy fitness zone or not from Fitnessgram standardized norms based on age and gender (98).
Muscular Fitness
To assess muscular fitness, we included a full-body series of resistance exercises appropriate for the pediatric population (including a front squat, push-up, lunge, bent- over row, shoulder press, calf-raise, and curl-up) that has been used in a recent cognition study by Kao et al. (32). For each exercise using the correct form, participants were asked to complete as many repetitions as possible in 30 seconds with either a self-selected medicine ball or body weight. A strength index score was calculated for each exercise that took the medicine ball weight, body weight, and repetition number into account (i.e., strength index = [body weight + medicine ball]/number of repetitions).
Covariate and Moderator Measures Physical Measures
Physical measures included weight and height measured at baseline. Weight was measured using an electronic, portable scale (Scaletronix 5125 Model, White Plains, NY). The participant was asked to remove his or her shoes and any excess clothing. He or she was instructed to stand on the taped “X” mark on the scale and weight was recorded to the nearest tenth of a kg. A portable stadiometer (Shorr Height Measuring Board, Olney, MD) was used to measure height. With shoes removed, the participant stood with his or her back and heels against the board, with feet together, and head placed neutrally so the lower level of the orbit is parallel to the floor. Height was recorded to the nearest tenth of a cm. Weight and height were measured at least twice and were measured in an alternating order (i.e., weight measurement #1, height measurement #1, weight
measurement #2, height measurement #2). A third measurement was taken if the difference between the first two readings was greater than 0.3 kg for weight and/or 0.5 cm for height. Average weight and height was used to calculate body mass index (BMI). BMI percentiles were then calculated based on birth date, measurement date, and BMI (2).
Physical Activity
Physical activity was assessed objectively with accelerometry. At each
measurement time point, a GTX3+ accelerometer, programmed to store data in 15-second epochs, was worn by each participant on his or her right hip for seven consecutive days. Non-wear time was classified using a modified (i.e., at least 30 minutes of consecutive
zeros) Choi et al. algorithm (234). The Evenson et al. (105) activity count thresholds for children were used to categorize daily accelerometer counts into continuous variables of percent time spent in sedentary, light, moderate, and vigorous intensity physical activity.
Demographic Measures
Parents and guardians completed an online Demographic Data Questionnaire (Appendix G) at baseline to collect information on race and ethnicity, presence of an education/learning disability, parental income, parental education, parental height and weight, and child handedness. If a parent or guardian was unable to complete this questionnaire online, a printed copy was made available to them.
Statistical Analyses
Sample Size and Power Calculation
All statistical analyses were completed in Stata (Stata 15.1, College Station, TX), with α levels set at P < 0.05. Previously published youth intervention studies have reported a small to moderate effect size on changes in cognition and academic
performance (2). The primary aim of this study was to determine if we could recruit and retain a sample of elementary school students, and if the intervention was acceptable to the participants. However, for our secondary aim a power calculation was run in Stata to estimate the sample size needed to detect a meaningful change in one of our cognitive outcomes (inhibition/attention). Using an analysis of covariance (ANCOVA) model, assuming a 0.80 correlation between baseline and post-scores, a sample size of 46 children will gave us 95% confidence and 80% power to detect a moderate effect size
(Cohen’s d = 0.5) in inhibition/attention. Based on attrition rates from previous school- based studies conducted in our laboratory, we anticipated a 10% loss to follow-up, so we planned to recruit 50 participants (25 per school) (216).
Descriptive Statistics
Descriptive statistics were calculated for variables for the overall sample and between the two groups. To assess for differences between the groups at baseline two sample t-tests were used for continuous variables and chi square tests were used for categorical variables.
Aims and Hypotheses