
华南师范大学心理学院, 广州 510631
收稿日期:
2019-05-12出版日期:
2020-04-26发布日期:
2020-03-27通讯作者:
张敏强E-mail:2640726401@qq.com基金资助:
* 国家社会科学基金一般项目(BHA180141)Kernel equating: A framework of observed score equating
WANG Shaojie, ZHANG Minqiang(
School of Psychology, South China Normal University, Guangzhou 510631, China
Received:
2019-05-12Online:
2020-04-26Published:
2020-03-27Contact:
ZHANG Minqiang E-mail:2640726401@qq.com摘要/Abstract
摘要: 核等值流程包括:预平滑、估计分数概率、连续化、等值、评估等值结果。该方法兼具线性等值与等百分位等值的优点, 各环节扩展性与包容性较强; 采用平滑与连续化处理, 可降低等值随机误差; 等值差异标准误等其所特有的概念为结果评估提供可靠的工具。连续化与带宽选择方法等因素均可影响其表现; 基于核等值的新方法为等值发展提供了新颖的视角。未来可关注核等值体系的扩充与完善、流程的更新、等值方法的结合和比较等方向。
图/表 1
表1常用CTT等值与核等值方法对应表
等值设计 | CTT等值 | 核等值 |
---|---|---|
EG | 等百分位等值 | 核等值(最优带宽) |
线性等值 | 核等值(较大带宽, ${{h}_{X}}>$ $10{{\sigma }_{X}}$, 下同) | |
NEAT | 等百分位链等值 | 核链等值(最优带宽) |
等百分位后分层等值 | 核后分层等值(最优带宽) | |
线性链等值 | 核链等值(较大带宽) | |
Tucker等值 | 核后分层等值(较大带宽, 特定条件下) | |
Levine观察分数等值 | - |
表1常用CTT等值与核等值方法对应表
等值设计 | CTT等值 | 核等值 |
---|---|---|
EG | 等百分位等值 | 核等值(最优带宽) |
线性等值 | 核等值(较大带宽, ${{h}_{X}}>$ $10{{\sigma }_{X}}$, 下同) | |
NEAT | 等百分位链等值 | 核链等值(最优带宽) |
等百分位后分层等值 | 核后分层等值(最优带宽) | |
线性链等值 | 核链等值(较大带宽) | |
Tucker等值 | 核后分层等值(较大带宽, 特定条件下) | |
Levine观察分数等值 | - |
参考文献 84
1 | 陈俊丽 . ( 2008). 核等值与其它等值方法的比较研究 (硕士学位论文). 北京语言大学. |
2 | 关丹丹, 景春丽 . ( 2018). 新高考改革背景下不分文理的数学成绩差异研究. 数学教育学报, 27( 4), 31-34. |
3 | 罗莲 . (2008a). 基于HSK数据对核等值法与其他等值方法的比较研究 (博士学位论文). 北京语言大学. |
4 | 罗莲 . (2008b). 一种新的等值方法:核等值法. 心理学探新, 28( 2), 69-74. |
5 | 张敏强, 胡晖 . ( 1988). 略论测验等值的理论、方法和应用. 华南师范大学学报(社会科学版), ( 4), 113-118. |
6 | Andersson B. ( 2016). Asymptotic standard errors of observed-score equating with polytomous IRT models. Journal of Educational Measurement, 53( 4), 459-477. |
7 | Andersson B., Br?nberg K., & Wiberg M . ( 2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55( 6), 1-25. |
8 | Andersson B. & von Davier, A. A . ( 2014). Improving the bandwidth selection in kernel equating. Journal of Educational Measurement, 51( 3), 223-238. |
9 | Andersson B. &Wiberg M. , ( 2017). Item response theory observed-score kernel equating. Psychometrika, 82( 1), 48-66. |
10 | Ar?kan ?. A., &Gelbal S. , ( 2018). A comparison of traditional and kernel equating methods. International Journal of Assessment Tools in Education, 5( 3), 417-427. |
11 | Chen H. ( 2012). A comparison between linear IRT observed- score equating and Levine observed-score equating under the generalized kernel equating framework. Journal of Educational Measurement, 49( 3), 269-284. |
12 | Chen H. &Holland P. , ( 2009). Construction of chained true score equipercentile equatings under the kernel equating (KE) framework and their relationship to Levine true score equating. ETS Research Report Series, 2009( 1), i-15. |
13 | Chen H. &Holland P. , ( 2010). New equating methods and their relationships with Levine observed score linear equating under the kernel equating framework. Psychometrika, 75( 3), 542-557. |
14 | Chen H. H., Livingston S. A., & Holland P. W . ( 2009). Generalized equating functions for NEAT designs. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for social and behavioral sciences: Statistical models for test equating, scaling, and linking( pp. 185-200). New York City, NY: Springer. |
15 | Choi S.I . ( 2009). A comparison of kernel equating and traditional equipercentile equating methods and the parametric bootstrap methods for estimating standard errors in equipercentile equating (Unpublished doctorial dissertation). University of Illinois at Urbana-Champaign. |
16 | Cid J. A., & von Davier, A. A . ( 2015). Examining potential boundary bias effects in kernel smoothing on equating: An introduction for the adaptive and Epanechnikov kernels. Applied Psychological Measurement, 39( 3), 208-222. |
17 | de Ayala R. J., Smith B., & Norman Dvorak R . ( 2018). A comparative evaluation of kernel equating and test characteristic curve equating. Applied Psychological Measurement, 42( 2), 155-168. |
18 | Dorans N. J., Liu J., & Hammond S . ( 2008). Anchor test type and population invariance: An exploration across subpopulations and test administrations. Applied Psychological Measurement, 32( 1), 81-97. |
19 | Dorans N. J., &Puhan G. , ( 2017). Contributions to score linking theory and practice. In B. Veldkamp, & M. von Davier (Series Eds.) & R. E. Bennett, & M. von Davier (Vol. Eds.), Methodology of educational measurement and assessment: Advancing human assessment: The methodological, psychological and policy contributions of ETS( pp. 79-132). Cham, Zug, Switzerland: Springer. |
20 | Duong M. & von Davier, A. A . (2008,March). Kernel equating with observed mixture distributions in a single- group design. Paper presented at the annual meeting of the National Council on Measurement in Education, New York, NY. |
21 | ETS. ( 2007a). GENASYS [Computer software]. Princeton, NJ: Author. |
22 | ETS. ( 2007b). KE Software [Computer software]. Princeton, NJ: Author. |
23 | Godfrey K.E . ( 2007). A comparison of kernel equating and IRT true score equating methods (Unpublished doctorial dissertation). The University of North Carolina at Greensboro. |
24 | González, J. ( 2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59( 7), 1-30. |
25 | González J., Barrientos A. F., & Quintana F. A . ( 2015). Bayesian nonparametric estimation of test equating functions with covariates. Computational Statistics & Data Analysis, 89, 222-244. |
26 | González J. & von Davier, A. A . ( 2016). An illustration of the Epanechnikov and adaptive continuization methods in kernel equating.In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 .(Cham, Zug, Switzerland: Springer. |
27 | Grant M. C., Zhang L., & Damiano I . ( 2009). An evaluation of kernel equating: Parallel equating with classical methods in the SAT subject tests? program. ETS Research Report Series, 2009(1), i-25. |
28 | Haberman S.J . ( 1984). Adjustment by minimum discriminant information. The Annals of Statistics, 12( 3), 971-988. |
29 | Haberman S.J . ( 2015). Pseudo-equivalent groups and linking. Journal of Educational and Behavioral Statistics, 40( 3), 254-273. |
30 | H?ggstr?m J. &Wiberg M. , ( 2014). Optimal bandwidth selection in observed‐score kernel equating. Journal of Educational Measurement, 51( 2), 201-211. |
31 | Holland P. W., &Thayer D. T . ( 2000). Univariate and bivariate loglinear models for discrete test score distributions. Journal of Educational and Behavioral Statistics, 25( 2), 133-183. |
32 | Holland P. W., von Davier A. A., Sinharay S., & Han N . ( 2006). Testing the untestable assumptions of the chain and poststratification equating methods for the NEAT design. ETS Research Report Series, 2006(( 1), i-38. |
33 | Jiang Y., von Davier A. A., & Chen H . ( 2012). Evaluating equating results: Percent relative error for chained kernel equating. Journal of Educational Measurement, 49( 1), 39-58. |
34 | Jones M. C., Marron J. S., & Sheather S. J . ( 1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91( 433), 401-407. |
35 | Kim H.Y . ( 2014). A comparison of smoothing methods for the common item nonequivalent groups design (Unpublished doctorial dissertation). University of Iowa, Iowa City. |
36 | Kim S. &Lu R. , ( 2018). The pseudo-equivalent groups approach as an alternative to common-item equating. ETS Research Report Series, 2018( 1), 1-13. |
37 | Kolen M. J., &Brennan R. L . ( 2014). Test equating, scaling, and linking: methods and practices. New York City, NY: Springer Science & Business Media. |
38 | Lee Y. H., & von Davier, A. A . ( 2008). Comparing alternative kernels for the kernel method of test equating: Gaussian, logistic, and uniform kernels. ETS Research Report Series, 2008( 1), i-26. |
39 | Lee Y. H., & von Davier, A. A . ( 2011). Equating through alternative kernels. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for social and behavioral sciences: Statistical models for test equating, scaling, and linkingpp. 159-273). New York City, NY: Springer. |
40 | Le?ncio W. &Wiberg M. , ( 2017). Evaluating equating transformations from different frameworks. In M. Wiberg, S. Culpepper, R. Janssen, J. González, & D. Molenaar (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 233. Quantitative psychology: The 82nd annual meeting of the Psychometric Society, Zurich, Switzerland, 2017(pp. 101-110). Cham, Zug, Switzerland: Springer. |
41 | Liang T. & von Davier, A. A . ( 2014). Cross-validation: An alternative bandwidth-selection method in kernel equating. Applied Psychological Measurement, 38( 4), 281-295. |
42 | Liu J. &Low A. C . ( 2007). An exploration of kernel equating using SAT? data: Equating to a similar population and to a distant population. ETS Research Report Series, 2007( 1), i-22. |
43 | Liu J. &Low A. C . ( 2008). A comparison of the kernel equating method with traditional equating methods using SAT? data. Journal of Educational Measurement, 45( 4), 309-323. |
44 | Longford N.T . ( 2015). Equating without an anchor for nonequivalent groups of examinees. Journal of Educational and Behavioral Statistics, 40( 3), 227-253. |
45 | Lord F.M . ( 1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum. |
46 | Lu R. &Guo H. , ( 2018). A simulation study to compare nonequivalent groups with anchor test equating and pseudo-equivalent group linking. ETS Research Report Series, 2018( 1), 1-16. |
47 | Mao X. ( 2006). An investigation of the accuracy of the estimates of standard errors for the kernel equating functions (Unpublished doctorial dissertation). University of Iowa, Iowa City. |
48 | Meng Y. ( 2012). Comparison of kernel equating and item response theory equating methods (Unpublished doctorial dissertation). University of Massachusetts Amherst. |
49 | Moses T. &Holland P. , ( 2007). Kernel and traditional equipercentile equating with degrees of presmoothing. ETS Research Report Series, 2007( 1), 1-39. |
50 | Moses T. &Holland P. , ( 2008). Notes on a general framework for observed score equating. ETS Research Report Series, 2008( 2), i-34. |
51 | Moses T., Yang W. L., & Wilson C . ( 2007). Using kernel equating to assess item order effects on test scores. Journal of Educational Measurement, 44( 2), 157-178. |
52 | Norman Dvorak, R. L . ( 2009). A comparison of kernel equating to the test characteristic curve method (Unpublished doctorial dissertation). University of Nebraska, Lincoln. |
53 | Puhan G., von Davier A., & Gupta S . ( 2008). Impossible scores resulting in zero frequencies in the anchor test: Impact on smoothing and equating. ETS Research Report Series, 2008( 1), i-26. |
54 | R Core Team . ( 2017). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. |
55 | Sansivieri V. &Wiberg M. , ( 2016). IRT observed-score equating with the nonequivalent groups with covariates design. In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 (pp. 275-285). Cham, Zug, Switzerland: Springer. |
56 | Sansivieri V., Wiberg M., & Matteucci M . ( 2017). A review of test equating methods with a special focus on IRT-based approaches. Statistica, 77( 4), 329-352. |
57 | Sinharay S. &Holland P. W . ( 2010). A new approach to comparing several equating methods in the context of the NEAT design. Journal of Educational Measurement, 47( 3), 261-285. |
58 | Underhill J.L . ( 2017). The robustness of kernel equating as non-normality occurs under the equivalent groups design (Unpublished doctorial dissertation). University of Florida, Gainesville. |
59 | van der Linden, W. J . ( 2010). On bias in linear observed-score equating. Measurement: Interdisciplinary Research & Perspective, 8( 1), 21-26. |
60 | van der Linden, W. J . ( 2013). Some conceptual issues in observed-score equating. Journal of Educational Measurement, 50( 3), 249-285. |
61 | van der Linden, W. J., &Wiberg M. , ( 2010). Local observed-score equating with anchor-test designs. Applied Psychological Measurement, 34( 8), 620-640. |
62 | von Davier, A. A . ( 2011a). An observed-score equating framework.In P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger (Series. Eds.) & N. J. Dorans, & S. Sinharay (Vol. Eds.), Lecture notes in statistics: Proceedings: Vol 202. Looking back: proceedings of a conference in honor of Paul W. Holland(pp. 221-238). New York City, NY: Springer. |
63 | von Davier, A. A . ( 2011 b). A statistical perspective on equating test scores. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for social and behavioral sciences: Statistical models for test equating, scaling, and linking( pp. 1-17). New York City, NY: Springer. |
64 | von Davier, A. A . ( 2013). Observed-score equating: An overview. Psychometrika, 78( 4), 605-623. |
65 | von Davier, A. A., &Chen H. , ( 2013). The kernel levine equipercentile observed-score equating function. ETS Research Report Series,( 2), i-27. |
66 | von Davier A. A., Fournier-Zajac S., & Holland P. W . ( 2007). An equipercentile version of the Levine linear observed-score equating function using the methods of kernel equating. ETS Research Report Series,( 1), i-19. |
67 | von Davier A. A., Holland P. W., Livingston S. A., Casabianca J., Grant M. C., & Martin K . ( 2006). An evaluation of the kernel equating method: A special study with pseudotests constructed from real test data. ETS Research Report Series,( 1), i-31. |
68 | von Davier A. A., Holland P. W., & Thayer D. T . ( 2004). The kernel method of test equating. New York City, NY: Springer-Verlag. |
69 | von Davier, A. A., &Kong N. , ( 2005). A unified approach to linear equating for the nonequivalent groups design. Journal of Educational and Behavioral Statistics, 30( 3), 313-342. |
70 | Wallin G., H?ggstr?m J., & Wiberg M . ( 2017). How to select the bandwidth in kernel equating-An evaluation of five different methods. In M. Wiberg, S. Culpepper, R. Janssen, J. González, & D. Molenaar (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 233. Quantitative psychology: The 82nd annual meeting of the Psychometric Society, Zurich, Switzerland, 2017 (pp. 91-100). Cham, Zug, Switzerland: Springer. |
71 | Wallin G. &Wiberg M. , ( 2016). Nonequivalent groups with covariates design using propensity scores for kernel equating. In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 (pp. 309-319). Cham, Zug, Switzerland: Springer. |
72 | Wallin G. &Wiberg M. , ( 2019). Kernel equating using propensity scores for nonequivalent groups. Journal of Educational and Behavioral Statistics, 44( 4), 390-414. |
73 | Wang, T. ( 2007). An alternative continuization method to the kernel method in von Davier, Holland and Thayer’s (2004) test equating framework (No.11).. Retrieved Jan 8, 2020, from |
74 | Wang T. ( 2011). An alternative continuization method: The continuized log-linear method. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for Social and Behavioral Sciences: Statistical models for test equating, scaling, and linking( pp. 141-157). New York City, NY: Springer. |
75 | Wang T., Lee W. C., Brennan R. L., & Kolen M. J . ( 2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item nonequivalent groups design. Applied Psychological Measurement, 32( 8), 632-651. |
76 | Wedman J. ( 2017). Theory and validity evidence for a large-scale test for selection to higher education (Unpublished doctorial dissertation). Ume? University. |
77 | Wiberg M. ( 2016 a). Alternative linear item response theory observed-score equating methods. Applied Psychological Measurement, 40( 3), 180-199. |
78 | Wiberg M. ( 2016b). Ensuring test quality over time by monitoring the equating transformations. In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 (pp. 239-251). Cham, Zug, Switzerland: Springer. |
79 | Wiberg M. &Br?nberg K. , ( 2015). Kernel equating under the non-equivalent groups with covariates design. Applied Psychological Measurement, 39( 5), 349-361. |
80 | Wiberg M. &González J. , ( 2016). Statistical assessment of estimated transformations in observed-score equating. Journal of Educational Measurement, 53( 1), 106-125. |
81 | Wiberg M. & van der Linden, W. J . ( 2011). Local linear observed-score equating. Journal of Educational Measurement, 48( 3), 229-254. |
82 | Wiberg M ., van der Linden, W. J., & von Davier, A. A. ( 2014). Local observed-score kernel equating. Journal of Educational Measurement, 51( 1), 57-74. |
83 | Wiberg M. & von Davier, A. A . ( 2017). Examining the impact of covariates on anchor tests to ascertain quality over time in a college admissions test. International Journal of Testing, 17( 2), 105-126. |
84 | Xin T. &Zhang J. , ( 2015). Local equating of cognitively diagnostic modeled observed scores. Applied Psychological Measurement, 39( 1), 44-61. |
相关文章 15
[1] | 温忠麟, 方杰, 沈嘉琦, 谭倚天, 李定欣, 马益铭. 新世纪20年国内心理统计方法研究回顾[J]. 心理科学进展, 2021, 29(8): 1331-1344. |
[2] | 刘玥 刘红云. 心理与教育测验中异常作答处理的新技术: 混合模型方法[J]. 心理科学进展, 0, (): 0-0. |
[3] | 苏悦, 刘明明, 赵楠, 刘晓倩, 朱廷劭. 基于社交媒体数据的心理指标识别建模: 机器学习的方法[J]. 心理科学进展, 2021, 29(4): 571-585. |
[4] | 黎穗卿, 陈新玲, 翟瑜竹, 张怡洁, 章植鑫, 封春亮. 人际互动中社会学习的计算神经机制[J]. 心理科学进展, 2021, 29(4): 677-696. |
[5] | 王珺, 宋琼雅, 许岳培, 贾彬彬, 陆春雷, 陈曦, 戴紫旭, 黄之玥, 李振江, 林景希, 罗婉莹, 施赛男, 张莹莹, 臧玉峰, 左西年, 胡传鹏. 解读不显著结果:基于500个实证研究的量化分析[J]. 心理科学进展, 2021, 29(3): 381-393. |
[6] | 徐俊怡, 李中权. 基于游戏的心理测评[J]. 心理科学进展, 2021, 29(3): 394-403. |
[7] | 钟晓钰, 李铭尧, 李凌艳. 问卷调查中被试不认真作答的控制与识别[J]. 心理科学进展, 2021, 29(2): 225-237. |
[8] | 唐倩, 毛秀珍, 何明霜, 何洁. 认知诊断计算机化自适应测验的选题策略[J]. 心理科学进展, 2020, 28(12): 2160-2168. |
[9] | 王阳, 温忠麟, 付媛姝. 等效性检验——结构方程模型评价和测量不变性分析的新视角[J]. 心理科学进展, 2020, 28(11): 1961-1969. |
[10] | 张雪琴, 毛秀珍, 李佳. 基于CAT的在线标定:设计与方法[J]. 心理科学进展, 2020, 28(11): 1970-1978. |
[11] | 张沥今, 魏夏琰, 陆嘉琦, 潘俊豪. Lasso回归:从解释到预测[J]. 心理科学进展, 2020, 28(10): 1777-1788. |
[12] | 黄龙, 徐富明, 胡笑羽. 眼动轨迹匹配法:一种研究决策过程的新方法[J]. 心理科学进展, 2020, 28(9): 1454-1461. |
[13] | 张龙飞, 王晓雯, 蔡艳, 涂冬波. 心理与教育测验中异常反应侦查新技术:变点分析法[J]. 心理科学进展, 2020, 28(9): 1462-1477. |
[14] | 朱海腾. 多层次研究的数据聚合适当性检验:文献评价与关键问题试解[J]. 心理科学进展, 2020, 28(8): 1392-1408. |
[15] | 张银花, 李红, 吴寅. 计算模型在道德认知研究中的应用[J]. 心理科学进展, 2020, 28(7): 1042-1055. |
PDF全文下载地址:
http://journal.psych.ac.cn/xlkxjz/CN/article/downloadArticleFile.do?attachType=PDF&id=5044