Session V - Fracture Healing


Friday, October 13, 2000 Session V, Paper #28, 8:44 am

Type II Error Rates (Beta Errors) of Randomized Trials in Orthopaedic Trauma

Paul Tornetta, III, MD; Heather Locher, MSc; Mohit Bhandari, MD, MSc; Boston Medical Center, Boston, MA

Introduction: Although there is agreement that randomized trials are the best design for assessing treatment effectiveness, they are subject to Beta (Type II) error if not of sufficient sample size. Beta (Type II) error is the probability of concluding that no difference between treatment groups exists, when, in fact, there is a difference. For example, there may be "no difference" in infection rates between methods of treatment due to a sample size that is too small to identify that a difference actually exists. Ideally, the scientific community will accept a Beta error rate of 20% (ß=0.20), which corresponds with a study power of 80%. Most investigators agree that beta error rates greater than 20% (study power less than 80%) are subject to unacceptably high risks of false negative results. Therefore, while a randomized trial may limit bias through randomization or even blinding, its results are subject to error when its sample size is not sufficiently large. Results of underpowered studies may lead to incorrect conclusions that 2 techniques have the same outcome or complication rate and thus mislead the reader.

Purpose: Our purpose was twofold: 1) to evaluate Beta (Type II) error rates and study power (1-b) for primary and secondary outcomes of published randomized trials involving fracture care from 1968-1999, and 2) to compare the relative study power of those trials which used dichotomous versus continuous variables as their primary endpoints.

Materials and Methods: Eligibility Criteria: Included studies were required to be 1) published, 2) described as randomized trials, 3) involve the care of patients with fractures, either operative or conservative, and 4) contain sufficient outcome information to calculate study power. Search Strategy: Computer database searches (Medline, Pubmed, Cochrane) were performed independently by 2 investigators to identify all potentially relevant study titles. Additional strategies to identify articles included 1) hand searches of the Journal of Orthopaedic Trauma, Clinical Orthopaedics and Related Research, Acta Orthopaedica Scandinavica and the Journal of Trauma from 1968-1999 and 2) bibliography searches of those potentially relevant articles. Data Abstraction: Baseline information obtained from each eligible study included date of publication, journal, geographic location, number of centers, number of patients, therapies compared and outcome measures. Important outcome measures were both dichotomous (i.e., % re-operation, % nonunion, % implant failure, % infection, % mortality) and continuous (i.e., time to union, functional outcome, patient satisfaction, range of motion). Study Power: For each study, a standard power calculation was performed on the primary and secondary outcomes. As an example, for continuous outcome variables, we required Za=1.96, study sample size, an estimate of the pooled standard deviation of the population, and the observed difference in effect. Similar methodology was utilized for dichotomous variables. Acceptable study power was agreed a priori to be 80% or greater (Type II error £ 0.20). Validity of Power Calculations: To ensure accuracy, 2 investigators independently performed power calculations on a random sample of 30 articles. The remaining power calculations were not performed until 100% agreement was obtained. Inconsistencies between study results and power calculations were also examined independently by 2investigators as a final check for validity.

Results: Literature Search: We identified 620 potentially relevant citations from Medline, of which only 187 studies were eligible. Two additional articles were found in the Pubmed search. Hand searches identified another 7 articles. Thus, a total of 196 randomized trials in orthopaedic trauma were included for the power analysis. Study Characteristics: A total of 18,498 patients were randomized in 32 different journal articles. The mean size of the trials was 95 patients (S.D. = 79, range = 10-607). The greatest number of trials was performed in North America (19.2%) or the United Kingdom (18.8%) and typically involved a single center (91.3%). Power Analysis: Of the 196 manuscripts reviewed, 90 reported no significant difference between their study groups; 74 reported significant differences, such that power analysis was not relevant; and 32 articles did not have enough information to properly evaluate. For the 90 articles that had significant data available and reported no significant difference between end points there were 269 end points identified, 143 primary and 126 secondary. Only 5 manuscripts of the 90 reporting no significant difference gave information related to a power analysis in the manuscript. The average power for dichotomous end points was only 26% and for continuous end points was 33%. Primary versus secondary outcomes showed 26% vs 25% power for dichotomous end points and 46% vs 21% for continuous end points. Beta error rate: Of the trials not reaching statistical significance, 91% of those with dichotomous end points were subject to beta error and 85% of those with continuous end points were subject to beta error. (Table 1)

Discussion and Conclusions: The beta (Type II) error rate for randomized trials in orthopaedic trauma is exceedingly high, averaging 90%. Only 5 of 90 (5.5%) studies that did not document a statistically significant difference in their outcome variables even examined the issue of whether their study was sufficiently powered to make conclusions. There was a slightly higher power in the primary outcomes using continuous variables than in the other outcomes measured; however, the sample size in this group was only 27 studies. These findings indicate that the vast majority of randomized trials in orthopaedic trauma are not of sufficient sample size to draw accurate conclusions. Thus, when a surgeon reads papers that conclude that there is no difference between treatment methods, he or she must be extremely suspicious to avoid incorrect conclusions. Furthermore, it is incumbent upon the journal reviewers to insist that power analysis is included for all studies demonstrating no difference in treatment groups.