核等值：一种观察分数等值体系

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2022-01-01

王少杰, 张敏强(

), 李拓宇, 梁正妍

华南师范大学心理学院, 广州 510631

收稿日期:2019-05-12出版日期:2020-04-26发布日期:2020-03-27
通讯作者:张敏强E-mail:2640726401@qq.com

基金资助:* 国家社会科学基金一般项目(BHA180141)

Kernel equating: A framework of observed score equating

WANG Shaojie, ZHANG Minqiang(

), LI Tuoyu, LIANG Zhengyan

School of Psychology, South China Normal University, Guangzhou 510631, China

Received:2019-05-12Online:2020-04-26Published:2020-03-27
Contact:ZHANG Minqiang E-mail:2640726401@qq.com

摘要/Abstract

摘要： 核等值流程包括：预平滑、估计分数概率、连续化、等值、评估等值结果。该方法兼具线性等值与等百分位等值的优点, 各环节扩展性与包容性较强; 采用平滑与连续化处理, 可降低等值随机误差; 等值差异标准误等其所特有的概念为结果评估提供可靠的工具。连续化与带宽选择方法等因素均可影响其表现; 基于核等值的新方法为等值发展提供了新颖的视角。未来可关注核等值体系的扩充与完善、流程的更新、等值方法的结合和比较等方向。

图/表 1

表1常用CTT等值与核等值方法对应表

等值设计	CTT等值	核等值
EG	等百分位等值	核等值(最优带宽)
EG	线性等值	核等值(较大带宽, ${{h}_{X}}>$ $10{{\sigma }_{X}}$, 下同)
NEAT	等百分位链等值	核链等值(最优带宽)
	等百分位后分层等值	核后分层等值(最优带宽)
	线性链等值	核链等值(较大带宽)
	Tucker等值	核后分层等值(较大带宽, 特定条件下)
	Levine观察分数等值	-

表1常用CTT等值与核等值方法对应表

等值设计	CTT等值	核等值
EG	等百分位等值	核等值(最优带宽)
EG	线性等值	核等值(较大带宽, ${{h}_{X}}>$ $10{{\sigma }_{X}}$, 下同)
NEAT	等百分位链等值	核链等值(最优带宽)
	等百分位后分层等值	核后分层等值(最优带宽)
	线性链等值	核链等值(较大带宽)
	Tucker等值	核后分层等值(较大带宽, 特定条件下)
	Levine观察分数等值	-

参考文献 84

1	陈俊丽 . ( 2008). 核等值与其它等值方法的比较研究 (硕士学位论文). 北京语言大学.
2	关丹丹, 景春丽 . ( 2018). 新高考改革背景下不分文理的数学成绩差异研究. 数学教育学报, 27( 4), 31-34.
3	罗莲 . (2008a). 基于HSK数据对核等值法与其他等值方法的比较研究 (博士学位论文). 北京语言大学.
4	罗莲 . (2008b). 一种新的等值方法:核等值法. 心理学探新, 28( 2), 69-74.
5	张敏强, 胡晖 . ( 1988). 略论测验等值的理论、方法和应用. 华南师范大学学报(社会科学版), ( 4), 113-118.
6	Andersson B. ( 2016). Asymptotic standard errors of observed-score equating with polytomous IRT models. Journal of Educational Measurement, 53( 4), 459-477.
7	Andersson B., Br?nberg K., & Wiberg M . ( 2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55( 6), 1-25.
8	Andersson B. & von Davier, A. A . ( 2014). Improving the bandwidth selection in kernel equating. Journal of Educational Measurement, 51( 3), 223-238.
9	Andersson B. &Wiberg M. , ( 2017). Item response theory observed-score kernel equating. Psychometrika, 82( 1), 48-66.
10	Ar?kan ?. A., &Gelbal S. , ( 2018). A comparison of traditional and kernel equating methods. International Journal of Assessment Tools in Education, 5( 3), 417-427.
11	Chen H. ( 2012). A comparison between linear IRT observed- score equating and Levine observed-score equating under the generalized kernel equating framework. Journal of Educational Measurement, 49( 3), 269-284.
12	Chen H. &Holland P. , ( 2009). Construction of chained true score equipercentile equatings under the kernel equating (KE) framework and their relationship to Levine true score equating. ETS Research Report Series, 2009( 1), i-15.
13	Chen H. &Holland P. , ( 2010). New equating methods and their relationships with Levine observed score linear equating under the kernel equating framework. Psychometrika, 75( 3), 542-557.
14	Chen H. H., Livingston S. A., & Holland P. W . ( 2009). Generalized equating functions for NEAT designs. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for social and behavioral sciences: Statistical models for test equating, scaling, and linking( pp. 185-200). New York City, NY: Springer.
15	Choi S.I . ( 2009). A comparison of kernel equating and traditional equipercentile equating methods and the parametric bootstrap methods for estimating standard errors in equipercentile equating (Unpublished doctorial dissertation). University of Illinois at Urbana-Champaign.
16	Cid J. A., & von Davier, A. A . ( 2015). Examining potential boundary bias effects in kernel smoothing on equating: An introduction for the adaptive and Epanechnikov kernels. Applied Psychological Measurement, 39( 3), 208-222.
17	de Ayala R. J., Smith B., & Norman Dvorak R . ( 2018). A comparative evaluation of kernel equating and test characteristic curve equating. Applied Psychological Measurement, 42( 2), 155-168.
18	Dorans N. J., Liu J., & Hammond S . ( 2008). Anchor test type and population invariance: An exploration across subpopulations and test administrations. Applied Psychological Measurement, 32( 1), 81-97.
19	Dorans N. J., &Puhan G. , ( 2017). Contributions to score linking theory and practice. In B. Veldkamp, & M. von Davier (Series Eds.) & R. E. Bennett, & M. von Davier (Vol. Eds.), Methodology of educational measurement and assessment: Advancing human assessment: The methodological, psychological and policy contributions of ETS( pp. 79-132). Cham, Zug, Switzerland: Springer.
20	Duong M. & von Davier, A. A . (2008,March). Kernel equating with observed mixture distributions in a single- group design. Paper presented at the annual meeting of the National Council on Measurement in Education, New York, NY.
21	ETS. ( 2007a). GENASYS [Computer software]. Princeton, NJ: Author.
22	ETS. ( 2007b). KE Software [Computer software]. Princeton, NJ: Author.
23	Godfrey K.E . ( 2007). A comparison of kernel equating and IRT true score equating methods (Unpublished doctorial dissertation). The University of North Carolina at Greensboro.
24	González, J. ( 2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59( 7), 1-30.
25	González J., Barrientos A. F., & Quintana F. A . ( 2015). Bayesian nonparametric estimation of test equating functions with covariates. Computational Statistics & Data Analysis, 89, 222-244.
26	González J. & von Davier, A. A . ( 2016). An illustration of the Epanechnikov and adaptive continuization methods in kernel equating.In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 .(Cham, Zug, Switzerland: Springer.
27	Grant M. C., Zhang L., & Damiano I . ( 2009). An evaluation of kernel equating: Parallel equating with classical methods in the SAT subject tests? program. ETS Research Report Series, 2009(1), i-25.
28	Haberman S.J . ( 1984). Adjustment by minimum discriminant information. The Annals of Statistics, 12( 3), 971-988.
29	Haberman S.J . ( 2015). Pseudo-equivalent groups and linking. Journal of Educational and Behavioral Statistics, 40( 3), 254-273.
30	H?ggstr?m J. &Wiberg M. , ( 2014). Optimal bandwidth selection in observed‐score kernel equating. Journal of Educational Measurement, 51( 2), 201-211.
31	Holland P. W., &Thayer D. T . ( 2000). Univariate and bivariate loglinear models for discrete test score distributions. Journal of Educational and Behavioral Statistics, 25( 2), 133-183.
32	Holland P. W., von Davier A. A., Sinharay S., & Han N . ( 2006). Testing the untestable assumptions of the chain and poststratification equating methods for the NEAT design. ETS Research Report Series, 2006(( 1), i-38.
33	Jiang Y., von Davier A. A., & Chen H . ( 2012). Evaluating equating results: Percent relative error for chained kernel equating. Journal of Educational Measurement, 49( 1), 39-58.
34	Jones M. C., Marron J. S., & Sheather S. J . ( 1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91( 433), 401-407.
35	Kim H.Y . ( 2014). A comparison of smoothing methods for the common item nonequivalent groups design (Unpublished doctorial dissertation). University of Iowa, Iowa City.
36	Kim S. &Lu R. , ( 2018). The pseudo-equivalent groups approach as an alternative to common-item equating. ETS Research Report Series, 2018( 1), 1-13.
37	Kolen M. J., &Brennan R. L . ( 2014). Test equating, scaling, and linking: methods and practices. New York City, NY: Springer Science & Business Media.
38	Lee Y. H., & von Davier, A. A . ( 2008). Comparing alternative kernels for the kernel method of test equating: Gaussian, logistic, and uniform kernels. ETS Research Report Series, 2008( 1), i-26.
39	Lee Y. H., & von Davier, A. A . ( 2011). Equating through alternative kernels. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for social and behavioral sciences: Statistical models for test equating, scaling, and linkingpp. 159-273). New York City, NY: Springer.
40	Le?ncio W. &Wiberg M. , ( 2017). Evaluating equating transformations from different frameworks. In M. Wiberg, S. Culpepper, R. Janssen, J. González, & D. Molenaar (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 233. Quantitative psychology: The 82nd annual meeting of the Psychometric Society, Zurich, Switzerland, 2017(pp. 101-110). Cham, Zug, Switzerland: Springer.
41	Liang T. & von Davier, A. A . ( 2014). Cross-validation: An alternative bandwidth-selection method in kernel equating. Applied Psychological Measurement, 38( 4), 281-295.
42	Liu J. &Low A. C . ( 2007). An exploration of kernel equating using SAT? data: Equating to a similar population and to a distant population. ETS Research Report Series, 2007( 1), i-22.
43	Liu J. &Low A. C . ( 2008). A comparison of the kernel equating method with traditional equating methods using SAT? data. Journal of Educational Measurement, 45( 4), 309-323.
44	Longford N.T . ( 2015). Equating without an anchor for nonequivalent groups of examinees. Journal of Educational and Behavioral Statistics, 40( 3), 227-253.
45	Lord F.M . ( 1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
46	Lu R. &Guo H. , ( 2018). A simulation study to compare nonequivalent groups with anchor test equating and pseudo-equivalent group linking. ETS Research Report Series, 2018( 1), 1-16.
47	Mao X. ( 2006). An investigation of the accuracy of the estimates of standard errors for the kernel equating functions (Unpublished doctorial dissertation). University of Iowa, Iowa City.
48	Meng Y. ( 2012). Comparison of kernel equating and item response theory equating methods (Unpublished doctorial dissertation). University of Massachusetts Amherst.
49	Moses T. &Holland P. , ( 2007). Kernel and traditional equipercentile equating with degrees of presmoothing. ETS Research Report Series, 2007( 1), 1-39.
50	Moses T. &Holland P. , ( 2008). Notes on a general framework for observed score equating. ETS Research Report Series, 2008( 2), i-34.
51	Moses T., Yang W. L., & Wilson C . ( 2007). Using kernel equating to assess item order effects on test scores. Journal of Educational Measurement, 44( 2), 157-178.
52	Norman Dvorak, R. L . ( 2009). A comparison of kernel equating to the test characteristic curve method (Unpublished doctorial dissertation). University of Nebraska, Lincoln.
53	Puhan G., von Davier A., & Gupta S . ( 2008). Impossible scores resulting in zero frequencies in the anchor test: Impact on smoothing and equating. ETS Research Report Series, 2008( 1), i-26.
54	R Core Team . ( 2017). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
55	Sansivieri V. &Wiberg M. , ( 2016). IRT observed-score equating with the nonequivalent groups with covariates design. In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 (pp. 275-285). Cham, Zug, Switzerland: Springer.
56	Sansivieri V., Wiberg M., & Matteucci M . ( 2017). A review of test equating methods with a special focus on IRT-based approaches. Statistica, 77( 4), 329-352.
57	Sinharay S. &Holland P. W . ( 2010). A new approach to comparing several equating methods in the context of the NEAT design. Journal of Educational Measurement, 47( 3), 261-285.
58	Underhill J.L . ( 2017). The robustness of kernel equating as non-normality occurs under the equivalent groups design (Unpublished doctorial dissertation). University of Florida, Gainesville.
59	van der Linden, W. J . ( 2010). On bias in linear observed-score equating. Measurement: Interdisciplinary Research & Perspective, 8( 1), 21-26.
60	van der Linden, W. J . ( 2013). Some conceptual issues in observed-score equating. Journal of Educational Measurement, 50( 3), 249-285.
61	van der Linden, W. J., &Wiberg M. , ( 2010). Local observed-score equating with anchor-test designs. Applied Psychological Measurement, 34( 8), 620-640.
62	von Davier, A. A . ( 2011a). An observed-score equating framework.In P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger (Series. Eds.) & N. J. Dorans, & S. Sinharay (Vol. Eds.), Lecture notes in statistics: Proceedings: Vol 202. Looking back: proceedings of a conference in honor of Paul W. Holland(pp. 221-238). New York City, NY: Springer.
63	von Davier, A. A . ( 2011 b). A statistical perspective on equating test scores. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for social and behavioral sciences: Statistical models for test equating, scaling, and linking( pp. 1-17). New York City, NY: Springer.
64	von Davier, A. A . ( 2013). Observed-score equating: An overview. Psychometrika, 78( 4), 605-623.
65	von Davier, A. A., &Chen H. , ( 2013). The kernel levine equipercentile observed-score equating function. ETS Research Report Series,( 2), i-27.
66	von Davier A. A., Fournier-Zajac S., & Holland P. W . ( 2007). An equipercentile version of the Levine linear observed-score equating function using the methods of kernel equating. ETS Research Report Series,( 1), i-19.
67	von Davier A. A., Holland P. W., Livingston S. A., Casabianca J., Grant M. C., & Martin K . ( 2006). An evaluation of the kernel equating method: A special study with pseudotests constructed from real test data. ETS Research Report Series,( 1), i-31.
68	von Davier A. A., Holland P. W., & Thayer D. T . ( 2004). The kernel method of test equating. New York City, NY: Springer-Verlag.
69	von Davier, A. A., &Kong N. , ( 2005). A unified approach to linear equating for the nonequivalent groups design. Journal of Educational and Behavioral Statistics, 30( 3), 313-342.
70	Wallin G., H?ggstr?m J., & Wiberg M . ( 2017). How to select the bandwidth in kernel equating-An evaluation of five different methods. In M. Wiberg, S. Culpepper, R. Janssen, J. González, & D. Molenaar (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 233. Quantitative psychology: The 82nd annual meeting of the Psychometric Society, Zurich, Switzerland, 2017 (pp. 91-100). Cham, Zug, Switzerland: Springer.
71	Wallin G. &Wiberg M. , ( 2016). Nonequivalent groups with covariates design using propensity scores for kernel equating. In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 (pp. 309-319). Cham, Zug, Switzerland: Springer.
72	Wallin G. &Wiberg M. , ( 2019). Kernel equating using propensity scores for nonequivalent groups. Journal of Educational and Behavioral Statistics, 44( 4), 390-414.
73	Wang, T. ( 2007). An alternative continuization method to the kernel method in von Davier, Holland and Thayer’s (2004) test equating framework (No.11).. Retrieved Jan 8, 2020, from
74	Wang T. ( 2011). An alternative continuization method: The continuized log-linear method. In S. E. Fienberg, & W. J. van der Linden (Series Eds.) & A. A. von Davier (Vol. Ed.), Statistics for Social and Behavioral Sciences: Statistical models for test equating, scaling, and linking( pp. 141-157). New York City, NY: Springer.
75	Wang T., Lee W. C., Brennan R. L., & Kolen M. J . ( 2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item nonequivalent groups design. Applied Psychological Measurement, 32( 8), 632-651.
76	Wedman J. ( 2017). Theory and validity evidence for a large-scale test for selection to higher education (Unpublished doctorial dissertation). Ume? University.
77	Wiberg M. ( 2016 a). Alternative linear item response theory observed-score equating methods. Applied Psychological Measurement, 40( 3), 180-199.
78	Wiberg M. ( 2016b). Ensuring test quality over time by monitoring the equating transformations. In L. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W. -C. Wang (Vol. Eds), Springer proceedings in mathematics & statistics: Vol. 196. Quantitative psychology: The 81st annual meeting of the Psychometric Society, Asheville, North Carolina, 2016 (pp. 239-251). Cham, Zug, Switzerland: Springer.
79	Wiberg M. &Br?nberg K. , ( 2015). Kernel equating under the non-equivalent groups with covariates design. Applied Psychological Measurement, 39( 5), 349-361.
80	Wiberg M. &González J. , ( 2016). Statistical assessment of estimated transformations in observed-score equating. Journal of Educational Measurement, 53( 1), 106-125.
81	Wiberg M. & van der Linden, W. J . ( 2011). Local linear observed-score equating. Journal of Educational Measurement, 48( 3), 229-254.
82	Wiberg M ., van der Linden, W. J., & von Davier, A. A. ( 2014). Local observed-score kernel equating. Journal of Educational Measurement, 51( 1), 57-74.
83	Wiberg M. & von Davier, A. A . ( 2017). Examining the impact of covariates on anchor tests to ascertain quality over time in a college admissions test. International Journal of Testing, 17( 2), 105-126.
84	Xin T. &Zhang J. , ( 2015). Local equating of cognitively diagnostic modeled observed scores. Applied Psychological Measurement, 39( 1), 44-61.

[1]	温忠麟, 方杰, 沈嘉琦, 谭倚天, 李定欣, 马益铭. 新世纪20年国内心理统计方法研究回顾[J]. 心理科学进展, 2021, 29(8): 1331-1344.
[2]	刘玥刘红云. 心理与教育测验中异常作答处理的新技术：混合模型方法[J]. 心理科学进展, 0, (): 0-0.
[3]	苏悦, 刘明明, 赵楠, 刘晓倩, 朱廷劭. 基于社交媒体数据的心理指标识别建模: 机器学习的方法[J]. 心理科学进展, 2021, 29(4): 571-585.
[4]	黎穗卿, 陈新玲, 翟瑜竹, 张怡洁, 章植鑫, 封春亮. 人际互动中社会学习的计算神经机制[J]. 心理科学进展, 2021, 29(4): 677-696.
[5]	王珺, 宋琼雅, 许岳培, 贾彬彬, 陆春雷, 陈曦, 戴紫旭, 黄之玥, 李振江, 林景希, 罗婉莹, 施赛男, 张莹莹, 臧玉峰, 左西年, 胡传鹏. 解读不显著结果：基于500个实证研究的量化分析[J]. 心理科学进展, 2021, 29(3): 381-393.
[6]	徐俊怡, 李中权. 基于游戏的心理测评[J]. 心理科学进展, 2021, 29(3): 394-403.
[7]	钟晓钰, 李铭尧, 李凌艳. 问卷调查中被试不认真作答的控制与识别[J]. 心理科学进展, 2021, 29(2): 225-237.
[8]	唐倩, 毛秀珍, 何明霜, 何洁. 认知诊断计算机化自适应测验的选题策略[J]. 心理科学进展, 2020, 28(12): 2160-2168.
[9]	王阳, 温忠麟, 付媛姝. 等效性检验——结构方程模型评价和测量不变性分析的新视角[J]. 心理科学进展, 2020, 28(11): 1961-1969.
[10]	张雪琴, 毛秀珍, 李佳. 基于CAT的在线标定:设计与方法[J]. 心理科学进展, 2020, 28(11): 1970-1978.
[11]	张沥今, 魏夏琰, 陆嘉琦, 潘俊豪. Lasso回归：从解释到预测[J]. 心理科学进展, 2020, 28(10): 1777-1788.
[12]	黄龙, 徐富明, 胡笑羽. 眼动轨迹匹配法：一种研究决策过程的新方法[J]. 心理科学进展, 2020, 28(9): 1454-1461.
[13]	张龙飞, 王晓雯, 蔡艳, 涂冬波. 心理与教育测验中异常反应侦查新技术：变点分析法[J]. 心理科学进展, 2020, 28(9): 1462-1477.
[14]	朱海腾. 多层次研究的数据聚合适当性检验:文献评价与关键问题试解[J]. 心理科学进展, 2020, 28(8): 1392-1408.
[15]	张银花, 李红, 吴寅. 计算模型在道德认知研究中的应用[J]. 心理科学进展, 2020, 28(7): 1042-1055.