Winner of the 2013 Division D Significant Contribution to Educational Measurement and Research Methodology award
Overview.- A Statistical Perspective on Equating Test Scores (Alina A. von Davier).- Part I: Research Questions and Data Collection Designs.- Equating Test Scores: Toward Best Practices Neil J. Dorans, Tim P. Moses, and Daniel R. Eignor).- Scoring and Scaling Educational Tests Michael J. Kolen, Ye Tong, and Robert L. Brennan).- Statistical Models for Vertical Linking James E. Carlson).- An Empirical Example of Change Analysis by Linking Longitudinal Item Response Data From Multiple Tests (John J. McArdle and Kevin J. Grimm).- How to Average Equating Functions, If You Must (Paul W. Holland and William E. Strawderman).- New Approaches to Equating With Small Samples (Samuel A. Livingston and Sooyeon Kim).- Part II: Measurement and Equating Models.- Using Exponential Families for Equating (Shelby J. Haberman).- An Alternative Continuization Method: The Continuized Log-Linear Method (Tianyou Wang).- Equating Through Alternative Kernels (Yi-Hsuan Lee and Alina A. von Davier).- A Bayesian Nonparametric Model for Test Equating (George Karabatsos and Stephen G. Walker).- Generalized Equating Functions for NEAT Designs (Haiwen H. Chen, Samuel A. Livingston, and Paul W. Holland).- Local Observed-Score Equating (Wim J. van der Linden).- A General Model for IRT Scale Linking and Scale Transformations (Matthias von Davier and Alina A. von Davier).- Linking With Nonparametric IRT Models (Xueli Xu, Jeff A. Douglas, and Young-Sun Lee).- Part III: Evaluation.- Applications of Asymptotic Expansion in Item Response Theory Linking (Haruhiko Ogasawara).- Evaluating the Missing Data Assumptions of the Chain and Poststratification Equating Methods (Sandip Sinharay, Paul W. Holland, and Alina A. von Davier).- Robustness of IRT Observed-Score Equating (C. A. W. Glas and Anton A. Beguin).- Hypothesis Testing of Equating Differences in the Kernel Equating Framework (Frank Rijmen, Yanxuan Qu, and Alina A. von Davier).- Applying Time-Series Analysis to Detect Scale Drift (Deping Li, Shuhong Li, and Alina A. von Davier).
The goal of this book is to emphasize the formal statistical features of the practice of equating, linking, and scaling. The book encourages the view and discusses the quality of the equating results from the statistical perspective (new models, robustness, fit, testing hypotheses, statistical monitoring) as opposed to placing the focus on the policy and the implications, which although very important, represent a different side of the equating practice.The book contributes to establishing "equating" as a theoretical field, a view that has not been offered often before. The tradition in the practice of equating has been to present the knowledge and skills needed as a craft, which implies that only with years of experience under the guidance of a knowledgeable practitioner could one acquire the required skills. This book challenges this view by indicating how a good equating framework, a sound understanding of the assumptions that underlie the psychometric models, and the use of statistical tests and statistical process control tools can help the practitioner navigate the difficult decisions in choosing the final equating function.
This book provides a valuable reference for several groups: (a) statisticians and psychometricians interested in the theory behind equating methods, in the use of model-based statistical methods for data smoothing, and in the evaluation of the equating results in applied work; (b) practitioners who need to equate tests, including those with these responsibilities in testing companies, state testing agencies, and school districts; and (c) instructors in psychometric, measurement, and psychology programs.