Evaluation of the Consensus Process in Decision-Making for SubmittedAbstracts to Annual Meetings of the Orthopaedic Trauma Association

Session I - Basic Science

Thurs., 10/9/03 Paper #1, 1:10 PM

Evaluation of the Consensus Process in Decision-Making for Submitted Abstracts to Annual Meetings of the Orthopaedic Trauma Association

Paul Tornetta, III, MD¹; Mohit Bhandari, MD²; David C. Templeman, MD³

1Boston Medical Center, Boston, Massachusetts, USA;
²Department of Clinical Epidemiology and Biostastics, McMaster Health Sciences, Hamilton, Ontario, Canada;
³Hennepin Medical Center, Minneapolis, Minnesota, USA

Purpose: Only a small proportion of the abstracts submitted to the annual meeting of the Orthopaedic Trauma Association (OTA) can be accepted for podium presentation. Annual program committee members must ensure that the selection of abstracts is free from bias and transparent to investigators. No previous studies have evaluated the abstract selection process at an orthopaedic subspecialty meeting. A critical examination of the current selection process will inform members of this issue and provide a framework for ongoing quality assurance at orthopaedic conferences. We examined the consistency of reviewers in grading abstracts submitted for podium presentations at the 2001 and 2002 Annual Meetings of the OTA and evaluated whether the grades of the actual podium presentations at the meeting were consistent with the grades based upon abstracts only.

Methods: Reviewers (2001, N = 8; 2002, N = 9) independently graded all abstracts submitted to the OTA (2001, N = 440, 2002, N = 420) for presentation in a blinded manner. Abstracts submitted by members of the review panel were independently adjudicated by six reviewers that were not members of the committee. Prior to final decision making, all reviewers met to discuss the abstracts submitted for oral presentation. Discussions varied depending upon the variability of the scores until consensus regarding the pre-meeting ranking of papers was achieved. During the meeting, unblinded reviewers independently re-graded the podium presentations (1 to 5, with 5 being the highest quality). Intra-class correlation coefficients (ICC) were used as a measure of agreement among reviewers' grades before and after the oral presentations. Additionally, we examined change in ranking of papers before and after their presentation at the meeting. Pearson correlations were used to examine the associations of pre- and post-meeting ranks.

Results: Among the 440 abstracts reviewed in 2001 and the 420 abstracts in 2002, the inter-reviewer reliability for abstract review was 0.23 (95% CI = 0.19 to 0.27) and 0.27 (95% CI = 0.22 to 0.32), respectively. Agreement on the abstracts selected for presentation was no better than for those that were not. Despite disagreements in the quality of the abstracts, reviewers achieved consensus by discussions in a face-to-face meeting to determine the program. Agreement among unblinded reviewers of the 67 and 73 podium presentations during the meetings did not improve agreement: ICC = 0.22 (95% CI = 0.12 to 0.36). The correlation between pre- and post-meeting ranking of the papers was r = 0.25. Of the 2002 papers that ultimately ranked in the top 20 after the full presentation of the papers, 15 papers had originally been ranked below 20 in the initial grading (range of rankings, 1 to 62). Only one of the top three papers of the meeting were originally ranked in the top three prior to the meeting (range of rankings, 1 to 24).

Discussion and Conclusion: We report the following: 1) reviewers, when independently adjudicating abstracts in a blinded manner, have low inter-observer agreement in scores; 2) the ranking of papers based upon pre-meeting abstract review does not influence the final ranking after podium presentations have occurred; and 3) the process of achieving consensus among reviewers is the most desirable method for limiting bias and improving the abstract selection process. Program committees should consider adding a "consensus meeting" to rank all articles in light of our findings that reviewers independently achieve poor agreement. Our finding that correlation between pre and post-meeting ranks was poor further supports the influence of podium presentations on final ranking.