删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

心理与教育测验中异常反应侦查新技术:变点分析法

本站小编 Free考研考试/2022-01-01

张龙飞, 王晓雯, 蔡艳, 涂冬波()
江西师范大学心理学院, 南昌 330022
收稿日期:2019-10-12出版日期:2020-09-15发布日期:2020-07-24
通讯作者:涂冬波E-mail:tudongbo@aliyun.com

基金资助:* 国家自然科学基金项目(31960186);国家自然科学基金项目(31760288);国家自然科学基金项目资助(31660278)

Change point analysis: A new method to detect aberrant responses in psychological and educational testing

ZHANG Longfei, WANG Xiaowen, CAI Yan, TU Dongbo()
School of Psychology, Jiangxi Normal University, Nanchang 330022, China
Received:2019-10-12Online:2020-09-15Published:2020-07-24
Contact:TU Dongbo E-mail:tudongbo@aliyun.com






摘要/Abstract


摘要: 变点分析法(change point analysis, CPA)近些年才引入心理与教育测量学, 相较于传统方法, CPA不仅可以侦查异常作答被试, 还能自动精确地定位变点位置, 高效清洗作答数据。其原理在于:判断作答序列中是否存在可将该序列划分为具有不同统计学属性两部分的点(即变点), 并且需使用被试拟合统计量(person-fit statistic, PFS)来量化两个子序列之间的差异。未来可将单变点分析拓展至多变点, 结合反应时等信息, 构建非参数化指标以及将现有指标拓展至多级计分或多维测验, 以提高CPA的适用广度及效力。



图1三名被试的CUSUM图像
图1三名被试的CUSUM图像


表1CUSUM与CPA的综合比较
CUSUM CPA
主要思想 按照题目顺序依次将各题上观察与期望得分间的残差累积求和。 找到某个可将序列划分为具有不同统计学属性两部分的点。
PFS 基于题目平均加权残差的单侧指标$C_{j}^{+}$, $C_{j}^{-}$和双侧指标${{C}^{T}}$。 双侧指标:基于似然比检验的${{L}_{\max }}$, 基于Wald检验的${{W}_{\max }}$, 基于得分检验的${{S}_{\max }}$和基于加权残差的${{R}_{\max }}$, 以及各自的单侧形式。
单双侧指标 在侦测前已明确目标效应时用单侧指标, 未明确目标效应或对目标效应不作具体要求时用双侧指标。
优点 输出图像, 可用于过程监控。 自动精确定位变点。
缺点 需人工检查图像来定位变点, 准确性较低。 当变点位于序列最前或最后几题时难以定位。
适用情境 变点前后模型参数已知。 变点前后模型参数未知。其中${{L}_{\max }}$、${{W}_{\max }}$和${{S}_{\max }}$适用于高风险(教育)测验, ${{R}_{\max }}$适用于低风险(心理)测验。

表1CUSUM与CPA的综合比较
CUSUM CPA
主要思想 按照题目顺序依次将各题上观察与期望得分间的残差累积求和。 找到某个可将序列划分为具有不同统计学属性两部分的点。
PFS 基于题目平均加权残差的单侧指标$C_{j}^{+}$, $C_{j}^{-}$和双侧指标${{C}^{T}}$。 双侧指标:基于似然比检验的${{L}_{\max }}$, 基于Wald检验的${{W}_{\max }}$, 基于得分检验的${{S}_{\max }}$和基于加权残差的${{R}_{\max }}$, 以及各自的单侧形式。
单双侧指标 在侦测前已明确目标效应时用单侧指标, 未明确目标效应或对目标效应不作具体要求时用双侧指标。
优点 输出图像, 可用于过程监控。 自动精确定位变点。
缺点 需人工检查图像来定位变点, 准确性较低。 当变点位于序列最前或最后几题时难以定位。
适用情境 变点前后模型参数已知。 变点前后模型参数未知。其中${{L}_{\max }}$、${{W}_{\max }}$和${{S}_{\max }}$适用于高风险(教育)测验, ${{R}_{\max }}$适用于低风险(心理)测验。







[1] 陈希孺. (1991). 变点统计分析简介. 数理统计与管理, (1), 52-59.
[2] Abahous, H., Ronchail, J., Sifeddine, A., Kenny, L., & Bouchaou, L. (2018). Trend and change point analyses of annual precipitation in the Souss-Massa Region in Morocco during 1932-2010. Theoretical and Applied Climatology, 134(3-4), 1153-1163.
doi: 10.1007/s00704-017-2325-0URL
[3] Allen, D. E., McAleer, M., Powell, R. J., & Singh, A. K. (2018). Non-parametric multiple change point analysis of the global financial crisis. Annals of Financial Economics, 13(02), 1850008.
[4] American Educational Research Association, American Psychological Association, & National Council for Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
[5] Aminikhanghahi, S., & Cook, D. J. (2017). A survey of methods for time series change point detection. Knowledge and Information Systems, 51, 339-367.
doi: 10.1007/s10115-016-0987-zURLpmid: 28603327
[6] Andrews, D. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica, 61(4), 821-856.
[7] Armstrong, R. D., & Shi, M. (2009). A parametric cumulative sum statistic for person fit. Applied Psychological Measurement, 33(5), 391-410.
doi: 10.1177/0146621609331961URL
[8] Baker, F. B., & Kim, H. S. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Marcel Dekker.
[9] Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289-300.
[10] Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348.
doi: 10.1111/jedm.2002.39.issue-4URL
[11] Bolt, D. M., Mroch, A. A., & Kim, J.-S. (2003, April). An empirical investigation of the hybrid IRT model for improving item parameter estimation in speeded tests. Paper presented at the meeting of the American Educational Research Association, Chicago, IL.
[12] Bradlow, E., & Weiss, R. E. (2001). Outlier measures and norming methods for computerized adaptive tests. Journal of Educational and Behavioral Statistics, 26(1), 85-104.
[13] Bradlow, E., Weiss, R. E., & Cho, M. (1998). Bayesian identification of outliers in computerized adaptive tests. Journal of the American Statistical Association, 93, 910-919.
[14] Chen, J., & Gupta, A. K. (2012). Parametric statistical change point analysis: With applications to genetics, medicine, and finance (2nd ed.). New York: Springer.
[15] Csorgo, M., & Horvath, L. (1997). Limit theorems in change-point analysis. New York, NY: Wiley.
[16] de Boeck, P., Cho, S. J., & Wilson, M. (2011). Explanatory secondary dimension modeling of latent differential item functioning. Applied Psychological Measurement, 35(8), 583-603.
[17] Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum, Inc.
[18] Estrella, A., & Rodrigues, A. (2005). One-sided test for an unknown breakpoint: Theory, computation, and application to monetary theory (Staff Reports No. 232). Federal Reserve Bank of New York.
[19] Evans, F. R., & Reilly, R. R. (1972). A study of speededness as a source of test bias. Journal of Educational Measurement, 9, 123-131.
[20] Fox, J. P., & Marianti, S. (2016). Joint modeling of ability and differential speed using responses and response times. Multivariate behavioral research, 51(4), 540-553.
URLpmid: 27269482
[21] Genovese, C. R., Lazar, N. A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4), 870-878.
URLpmid: 11906227
[22] Goegebeur, Y., de Boeck, P., Wollack, J. A., & Cohen, A. S. (2008). A speeded item response model with gradual process change. Psychometrika, 73(1), 65.
[23] Hawkins, D. M., Qiu, P., & Kang, C. W. (2003). The changepoint model for statistical process control. Journal of Quality Technology, 35(4), 355-366.
[24] Hong, M. R., & Cheng, Y. (2019). Robust maximum marginal likelihood (RMML) estimation for item response theory models. Behavior Research Methods, 51(2), 573-588.
URLpmid: 30350024
[25] Karabatsos, & George.(2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298.
[26] Kass-Hout, T. A., Xu, Z., McMurray, P., Park, S., Buckeridge, D. L., Brownstein, J. S., ... Groseclose, S. L. (2012). Application of change point analysis to daily influenza-like illness emergency department visits. Journal of the American Medical Informatics Association, 19(6), 1075-1081.
URLpmid: 22759619
[27] Lai, T. L. (2001). Sequential analysis: Some classical problems and new challenges. Statistica Sinica, 11(2), 303-408.
[28] Lee, Y. H., & von, Davier, A., A. (2013). Monitoring scale scores over time via quality control charts, model-based approaches, and time series techniques. Psychometrika, 78(3), 557-575.
URLpmid: 25106404
[29] Li, J., Witten, D.M., Johnstone, I.M., & Tibshirani, R. (2012). Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics, 13(3), 523-538.
URLpmid: 22003245
[30] Maleki, S., Bingham, C., & Zhang, Y. (2016). Development and realization of changepoint analysis for the detection of emerging faults on industrial systems. IEEE Transactions on Industrial Informatics, 12(3), 1180-1187.
[31] Meade, A. W. (2016). Understanding and detecting careless responding in survey research. Retrieved February 15, 2020, from https://cba.unl.edu/outreach/carma/documents/ CARMA-Meade-Presentation.pdf
[32] Meijer, R. R. (2002). Outlier detection in high-stakes certification testing. Journal of Educational Measurement, 39(3), 219-233.
[33] Mortaji, S. T. H., Noorossana, R., & Bagherpour, M. (2015). Project completion time and cost prediction using change point analysis. Journal of Management in Engineering, 31(5), 04014086.
[34] Nam, C. F. H., Aston, J. A. D., & Johansen, A. M. (2012). Quantifying the uncertainty in change points. Journal of Time Series Analysis, 33(5), 807-823.
doi: 10.1111/jtsa.2012.33.issue-5URL
[35] Nigro, M. B., Pakzad, S. N., & Dorvash, S. (2014). Localized structural damage detection: A change point analysis. Computer-Aided Civil and Infrastructure Engineering, 29(6), 416-432.
[36] Oshima, T. C. (1994). The effect of speededness on parameter estimation in item response theory. Journal of Educational Measurement, 31(3), 200-219.
[37] Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1-2), 100-115.
[38] Patton, J. M., Cheng, Y., Hong, M. R., & Diao, Q. (2019). Detection and treatment of careless responses to improve item parameter estimation. Journal of Educational and Behavioral Statistics, 44(3), 309-341.
[39] Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271-282.
[40] Schwartzman, A., & Lin, X. (2011). The effect of correlation in false discovery rate estimation. Biometrika, 98(1), 199-214.
doi: 10.1093/biomet/asq075URLpmid: 23049127
[41] Shao, C. (2016). Aberrant response detection using change-point analysis (Unpublished Doctoral dissertation). University of Notre Dame, Notre Dame, IN.
[42] Shao, C., Li, J., & Cheng, Y. (2016). Detection of test speededness using change-point analysis. Psychometrica, 81(4), 1118-1141.
[43] Sinharay, S. (2016). Person fit analysis in computerized adaptive testing using tests for a change point. Journal of Educational and Behavioral Statistics, 41(5), 521-549.
[44] Sinharay, S. (2017a). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46-68.
[45] Sinharay, S. (2017b). Some remarks on applications of tests for detecting a change point to psychometric problems. Psychometrika, 82(4), 1149-1161.
URLpmid: 27770307
[46] Sinharay, S. (2017c). Which statistic should be used to detect item preknowledge when the set of compromised items is known?. Applied Psychological Measurement, 41(6), 403-421.
doi: 10.1177/0146621617698453URLpmid: 29881099
[47] Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16), 9440-9445.
[48] Suh, Y., Cho, S. J., & Wollack, J. A. (2012). A comparison of item calibration procedures in the presence of test speededness. Journal of Educational Measurement, 49(3), 285-311.
[49] Suhaila, J., & Yusop, Z. (2018). Trend analysis and change point detection of annual and seasonal temperature series in Peninsular Malaysia. Meteorology and Atmospheric Physics, 130(5), 565-581.
[50] Tendeiro, J. N., & Meijer, R. R. (2012). A CUSUM to detect person misfit: A discussion and some alternatives for existing procedures. Applied Psychological Measurement, 36(5), 420-442.
[51] Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239-259.
doi: 10.1111/jedm.2014.51.issue-3URL
[52] Thies, S., & Molnár, P. (2018). Bayesian change point analysis of Bitcoin returns. Finance Research Letters, 27, 223-227.
[53] United States Department of Education. (2013). Testing integrity: Issues and recommendations for best practice. Retrieved November 21, 2019, from http://nces.ed.gov/ pubs2013/2013454.pdf.
[54] van der, Linden, W., J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287-308.
[55] van Krimpen-Stoop, E. M. L. A., Meijer, R. R. (2000). Detecting person misfit in adaptive testing using statistical process control techniques. In W. J. van der Linden & G. A. Glas (Eds.), Computerized Adaptive Testing: Theory and Practice (pp. 201-219). Dordrecht, Netherlands: Springer.
[56] van, Krimpen-Stoop, E. M. L., A., & Meijer, R. R. (2001). CUSUM-based person-fit statistics for adaptive testing. Journal of Educational and Behavioral Statistics, 26(2), 199-217.
[57] van, Krimpen-Stoop, E. M. L., A., & Meijer, R. R. (2002). Detection of person misfit in computerized adaptive tests with polytomous items. Applied Psychological Measurement, 26(2), 164-180.
[58] Vostrikova, L. Y. (1981). Detecting “disorder” in multidimensional random processes. Doklady Akademii Nauk, 259(2), 270-274.
[59] Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456-477.
URLpmid: 25873487
[60] Wang, T., & Hanson, B. A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29(5), 323-339.
doi: 10.1177/0146621605275984URL
[61] Wollack, J. A., & Cohen, A. S. (2004, April). A model for simulating speeded test data. Paper presented at the meeting of the American Educational Research Association. San Diego, CA.
[62] Worsley, K. J. (1979). On the likelihood ratio test for a shift in location of normal populations. Journal of the American Statistical Association, 74, 365-367.
[63] Yamamoto, K., & Everson, H. (1997). Modeling the effects of test length and test time on parameter estimation using the HYBRID model. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 89-98). New York: Waxmann.
[64] Ye, W., Liu, X., & Miao, B. (2012). Measuring the subprime crisis contagion: Evidence of change point analysis of copula functions. European Journal of Operational Research, 222(1), 96-103.
doi: 10.1016/j.ejor.2012.04.004URL
[65] Yu, M., & Ruggieri, E. (2019). Change point analysis of global temperature records. International Journal of Climatology, 39(8), 3679-3688.
[66] Yu, X., & Cheng, Y. (2019). A change-point analysis procedure based on weighted residuals to detect back random responding. Psychological Methods, 24(5), 658-674.
doi: 10.1037/met0000212URLpmid: 30762378
[67] Zhang, J. (2014). A sequential procedure for detecting compromised items in the item pool of a CAT system. Applied Psychological Measurement, 38(2), 87-104.




[1]温忠麟, 方杰, 沈嘉琦, 谭倚天, 李定欣, 马益铭. 新世纪20年国内心理统计方法研究回顾[J]. 心理科学进展, 2021, 29(8): 1331-1344.
[2]刘玥 刘红云. 心理与教育测验中异常作答处理的新技术: 混合模型方法[J]. 心理科学进展, 0, (): 0-0.
[3]苏悦, 刘明明, 赵楠, 刘晓倩, 朱廷劭. 基于社交媒体数据的心理指标识别建模: 机器学习的方法[J]. 心理科学进展, 2021, 29(4): 571-585.
[4]黎穗卿, 陈新玲, 翟瑜竹, 张怡洁, 章植鑫, 封春亮. 人际互动中社会学习的计算神经机制[J]. 心理科学进展, 2021, 29(4): 677-696.
[5]王珺, 宋琼雅, 许岳培, 贾彬彬, 陆春雷, 陈曦, 戴紫旭, 黄之玥, 李振江, 林景希, 罗婉莹, 施赛男, 张莹莹, 臧玉峰, 左西年, 胡传鹏. 解读不显著结果:基于500个实证研究的量化分析[J]. 心理科学进展, 2021, 29(3): 381-393.
[6]徐俊怡, 李中权. 基于游戏的心理测评[J]. 心理科学进展, 2021, 29(3): 394-403.
[7]钟晓钰, 李铭尧, 李凌艳. 问卷调查中被试不认真作答的控制与识别[J]. 心理科学进展, 2021, 29(2): 225-237.
[8]唐倩, 毛秀珍, 何明霜, 何洁. 认知诊断计算机化自适应测验的选题策略[J]. 心理科学进展, 2020, 28(12): 2160-2168.
[9]王阳, 温忠麟, 付媛姝. 等效性检验——结构方程模型评价和测量不变性分析的新视角[J]. 心理科学进展, 2020, 28(11): 1961-1969.
[10]张雪琴, 毛秀珍, 李佳. 基于CAT的在线标定:设计与方法[J]. 心理科学进展, 2020, 28(11): 1970-1978.
[11]张沥今, 魏夏琰, 陆嘉琦, 潘俊豪. Lasso回归:从解释到预测[J]. 心理科学进展, 2020, 28(10): 1777-1788.
[12]黄龙, 徐富明, 胡笑羽. 眼动轨迹匹配法:一种研究决策过程的新方法[J]. 心理科学进展, 2020, 28(9): 1454-1461.
[13]朱海腾. 多层次研究的数据聚合适当性检验:文献评价与关键问题试解[J]. 心理科学进展, 2020, 28(8): 1392-1408.
[14]张银花, 李红, 吴寅. 计算模型在道德认知研究中的应用[J]. 心理科学进展, 2020, 28(7): 1042-1055.
[15]杨晓梦, 王福兴, 王燕青, 赵婷婷, 高春颍, 胡祥恩. 瞳孔是心灵的窗口吗?——瞳孔在心理学研究中的应用及测量[J]. 心理科学进展, 2020, 28(7): 1029-1041.





PDF全文下载地址:

http://journal.psych.ac.cn/xlkxjz/CN/article/downloadArticleFile.do?attachType=PDF&id=5146
相关话题/心理 指标 科学 检验 图像