删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

让自适应测验更知人善选——基于推荐系统的选题策略

本站小编 Free考研考试/2022-01-01

王璞珏1, 刘红云1,2()
1. 北京师范大学心理学部
2. 北京师范大学心理学部应用实验心理北京市重点实验室, 北京 100875
收稿日期:2018-06-10出版日期:2019-08-21发布日期:2019-07-24
通讯作者:刘红云E-mail:hyliu@bnu.edu.cn

基金资助:* 国家自然科学基金项目(31571152);北京市与中央在京高校共建项目(019-105812);国家教育考试科研规划2017年度课题(GJK2017015)

Make adaptive testing know examinees better: The item selection strategies based on recommender systems

WANG Pujue1, LIU Hongyun1,2()
1. Faculty of Psychology, Beijing Normal University
2. Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, 100875, China
Received:2018-06-10Online:2019-08-21Published:2019-07-24
Contact:LIU Hongyun E-mail:hyliu@bnu.edu.cn






摘要/Abstract


摘要: 基于推荐系统中协同过滤推荐的思想, 提出两种可以利用已有答题者数据的CAT选题策略:直接基于答题者推荐(DEBR)和间接基于答题者推荐(IEBR)。通过两个模拟研究, 在不同题库和不同长度的测验中, 比较了两种推荐选题策略与两种传统选题策略(FMI和BAS)在测量精度和对题目曝光率控制上的表现, 以及影响推荐选题策略表现的因素。结果发现:两种推荐选题策略对题目曝光率的控制优于两种传统选题策略, 测量精度不亚于BAS方法, 其中DEBR侧重选题精度, IEBR对题目曝光率控制最好。已有答题者数据的特点和质量是影响推荐选题策略表现的主要因素。


表1模拟题库下各选题策略的表现
选题策略 均方误差 平均绝对误差 能力估计相关 卡方值 测验重叠率 曝光不足 曝光过度 答题者调用率
定长20道题目
随机选题 0.323 0.449 0.829 2.595 5.56% 0 0
FMI 0.090 0.234 0.954 127.852 40.80% 315 41
DEBR (FMI) 0.141 0.291 0.930 66.341 21.83% 22 29 14.12%
IEBR (FMI) 0.242 0.383 0.872 8.712 7.09% 1 2 2.53%
BAS 0.224 0.370 0.882 14.164 9.00% 46 6
DEBR (BAS) 0.217 0.365 0.884 11.246 8.25% 44 4 4.25%
IEBR (BAS) 0.222 0.369 0.882 11.187 8.15% 42 4 4.66%
定长40道题目
随机选题 0.198 0.354 0.890 4.572 11.05% 0 0
FMI 0.052 0.178 0.974 118.335 45.72% 240 80
DEBR (FMI) 0.089 0.228 0.956 95.045 34.38% 37 78 19.77%
IEBR (FMI) 0.126 0.277 0.937 7.571 11.80% 0 15 5.19%
BAS 0.126 0.278 0.932 18.962 15.03% 14 36
DEBR (BAS) 0.125 0.276 0.933 15.930 14.27% 13 27 6.98%
IEBR (BAS) 0.128 0.280 0.931 12.012 13.25% 14 17 7.22%

表1模拟题库下各选题策略的表现
选题策略 均方误差 平均绝对误差 能力估计相关 卡方值 测验重叠率 曝光不足 曝光过度 答题者调用率
定长20道题目
随机选题 0.323 0.449 0.829 2.595 5.56% 0 0
FMI 0.090 0.234 0.954 127.852 40.80% 315 41
DEBR (FMI) 0.141 0.291 0.930 66.341 21.83% 22 29 14.12%
IEBR (FMI) 0.242 0.383 0.872 8.712 7.09% 1 2 2.53%
BAS 0.224 0.370 0.882 14.164 9.00% 46 6
DEBR (BAS) 0.217 0.365 0.884 11.246 8.25% 44 4 4.25%
IEBR (BAS) 0.222 0.369 0.882 11.187 8.15% 42 4 4.66%
定长40道题目
随机选题 0.198 0.354 0.890 4.572 11.05% 0 0
FMI 0.052 0.178 0.974 118.335 45.72% 240 80
DEBR (FMI) 0.089 0.228 0.956 95.045 34.38% 37 78 19.77%
IEBR (FMI) 0.126 0.277 0.937 7.571 11.80% 0 15 5.19%
BAS 0.126 0.278 0.932 18.962 15.03% 14 36
DEBR (BAS) 0.125 0.276 0.933 15.930 14.27% 13 27 6.98%
IEBR (BAS) 0.128 0.280 0.931 12.012 13.25% 14 17 7.22%


表2模拟真实情境下各选题策略的表现
选题策略 均方误差 平均绝对误差 能力估计相关 卡方值 测验重叠率 曝光不足 曝光过度 答题者调用率
随机选题 0.320 0.440 0.830 2.551 8.02% 0 0
FMI 0.152 0.307 0.922 150.511 58.48% 214 33
DEBR (FMI) 0.190 0.341 0.901 101.793 40.81% 53 38 25.04%
DEBR (FMI+DEBR) 0.233 0.380 0.875 47.426 21.10% 29 35 12.69%
IEBR (FMI) 0.265 0.408 0.855 43.395 19.63% 0 24 5.24%
IEBR (FMI+IEBR) 0.274 0.414 0.852 11.830 8.19% 0 0 2.86%
BAS 0.259 0.404 0.861 42.965 19.48% 20 27
DEBR (BAS) 0.253 0.395 0.869 43.449 19.65% 12 33 9.75%
DEBR (BAS+DEBR) 0.262 0.403 0.865 39.684 18.29% 13 26 9.51%
IEBR (BAS) 0.266 0.408 0.858 37.491 17.49% 17 24 9.96%
IEBR (BAS+IEBR) 0.267 0.407 0.855 25.305 13.07% 8 18 5.13%

表2模拟真实情境下各选题策略的表现
选题策略 均方误差 平均绝对误差 能力估计相关 卡方值 测验重叠率 曝光不足 曝光过度 答题者调用率
随机选题 0.320 0.440 0.830 2.551 8.02% 0 0
FMI 0.152 0.307 0.922 150.511 58.48% 214 33
DEBR (FMI) 0.190 0.341 0.901 101.793 40.81% 53 38 25.04%
DEBR (FMI+DEBR) 0.233 0.380 0.875 47.426 21.10% 29 35 12.69%
IEBR (FMI) 0.265 0.408 0.855 43.395 19.63% 0 24 5.24%
IEBR (FMI+IEBR) 0.274 0.414 0.852 11.830 8.19% 0 0 2.86%
BAS 0.259 0.404 0.861 42.965 19.48% 20 27
DEBR (BAS) 0.253 0.395 0.869 43.449 19.65% 12 33 9.75%
DEBR (BAS+DEBR) 0.262 0.403 0.865 39.684 18.29% 13 26 9.51%
IEBR (BAS) 0.266 0.408 0.858 37.491 17.49% 17 24 9.96%
IEBR (BAS+IEBR) 0.267 0.407 0.855 25.305 13.07% 8 18 5.13%



图1FMI生成首批数据时两轮推荐选题的题目曝光率变化
图1FMI生成首批数据时两轮推荐选题的题目曝光率变化



图2BAS生成首批数据时两轮推荐选题的题目曝光率变化
图2BAS生成首批数据时两轮推荐选题的题目曝光率变化







1 Akbay L.., & Kaplan M. , ( 2017). Transition to multidimensional and cognitive diagnosis adaptive testing: An overview of cat. The Online Journal of New Horizons in Education-January.7( 1), 206-214.
2 Barrada J. R., Olea J., Ponsoda V., & Abad F. J . ( 2010). A method for the comparison of item selection rules in computerized adaptive testing. Applied Psychological Measurement.34( 6), 438-452.
3 Chang H.H . ( 2015). Psychometrics behind computerized adaptive testing. Psychometrika.80( 1), 1-20.
4 Chang H. H., Qian J. H., & Ying Z. L . ( 2001). a-stratified multistage computerized adaptive testing with b blocking. Applied Psychological Measurement.25( 4), 333-341.
5 , Chang H.H., & Ying Z.L . ( 1999). a-stratified multistage computerized adaptive testing. Applied Psychological Measurement.23( 3), 211-222.
6 Chen S. Y., Ankenmann R. D., & Spray J. A . ( 2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement.40( 2), 129-145.
7 Chen Y., Li X., Liu J., & Ying Z . ( 2018). Recommendation system for adaptive learning. Applied psychological measurement.42( 1), 24-41.
8 Cheng Y., Patton J. M., & Shao C . ( 2015). a-stratified computerized adaptive testing in the presence of calibration error. Educational and Psychological Measurement.75( 2), 260-283.
9 Covington P., Adams J., & Sargin E . (2016, September). Deep neural networks for Youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (pp. 191-198). Boston, MA: ACM.
10 Georgiadou E. G., Triantafillou E., & Economides A. A . ( 2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. The Journal of Technology.Learning and Assessment, 5( 8), 1-39.
11 He W., Diao Q., & Hauser C . ( 2014). A comparison of four item-selection methods for severely constrained CATs. Educational and Psychological Measurement.74( 4), 677-696.
12 Jia Z., Yang Y., Gao W., & Chen X . ( 2015,February). User-based collaborative filtering for tourist attraction recommendations. In 2015 IEEE International Conference on Computational Intelligence & Communication Technology (pp. 22-25). Ghaziabad, India: IEEE.
13 Kaplan M., de la Torre J., & Barrada J. R . ( 2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied psychological measurement.39( 3), 167-188.
14 Kla?nja-Mili?evi? A., Ivanovi? M., & Nanopoulos A . ( 2015). Recommender systems in e-learning environments: A survey of the state-of-the-art and possible extensions. Artificial Intelligence Review.44( 4), 571-604.
15 Koren Y. & Bell R. , ( 2015). Advances in collaborative filtering. In F. Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (2nd ed., pp. 77-118). Boston, MA: Springer.
16 Lika B., Kolomvatsos K., & Hadjiefthymiades S . ( 2014). Facing the cold start problem in recommender systems. Expert Systems with Applications.41( 4), 2065-2073.
17 Liu Q., Chen E. H., Zhu T. Y., Huang Z. Y., Wu R. Z., Su Y., & Hu G. P . ( 2018). Research on educational data mining for online intelligent learning. Pattern Recognition and Artificial Intelligence.31( 1), 77-90.
18 [ 刘淇, 陈恩红, 朱天宇, 黄振亚, 吴润泽, 苏喻, 胡国平 . ( 2018). 面向在线智慧学习的教育数据挖掘技术研究. 模式识别与人工智能.31( 1), 77-90.]
19 Lord F.M . ( 1980). Applications of item response theory to practical testing problems. Hillsdale NJ: Erlbaum.
20 Mao X.Z., & Xin T. , ( 2011). Item selection method in computerized adaptive testing. Advances in Psychological Science.19( 10), 1552-1562.
21 [ 毛秀珍, 辛涛 . ( 2011). 计算机化自适应测验选题策略述评. 心理科学进展.19( 10), 1552-1562.]
22 , Mao X.Z., & Xin T. , ( 2015). Multidimensional computerized adaptive testing: Model, techniques and methods. Advances in Psychological Science.23( 5), 907-918.
23 [ 毛秀珍, 辛涛 . ( 2015). 多维计算机化自适应测验: 模型, 技术和方法. 心理科学进展.23( 5), 907-918.]
24 Pirasteh P., Jung J. J., & Hwang D . (2014, April). Item-based collaborative filtering with attribute correlation: A case study on movie recommendation. In N. T. Nguyen, B. Attachoo, B. Trawiński, & K. Somboonviwat (Eds.), In Proceedings of the 6th Asian Conference on Intelligent Information and Database Systems (pp. 245-252). Cham, Switzerland: Springer.
25 Quijano-Sánchez L., Recio-García J. A., Díaz-Agudo B., & Jiménez-Díaz G . ( 2011, March). Happy movie: A group recommender application in facebook. In Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference (pp. 419-420). Palm Beach, FL: AAAI.
26 Ricci F., Rokach L., & Shapira B . ( 2015). Recommender systems: Introduction and challenges. In F. Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (2nd ed., pp.1-34). Boston, MA: Springer.
27 Smith B.., & Linden G. , ( 2017). Two decades of recommender systems at Amazon. com. IEEE Internet Computing.21( 3), 12-18.
28 Tan P. N., Steinbach M., & Kumar V. .,( 2006). Introduction to Data Mining .New York, NY: Pearson Education.
29 Thai-Nghe N., Drumond L., Krohn-Grimberghe A., & Schmidt-Thieme L . ( 2010). Recommender system for predicting student performance. Procedia Computer Science.1( 2), 2811-2819.
30 Wang H., Wang N., & Yeung D. Y . ( 2015, August). Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1235-1244).Sydney, NSW, Australia: ACM.
31 Weiss D.J . ( 1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement.6( 4), 473-492.
32 Zhang S., &Chang, H.H . ( 2016). From smart testing to smart learning: How testing technology can assist the new generation of education. International Journal of Smart Technology and Learning.1( 1), 67-92.
33 Zhu T. Y., Huang Z. Y., Chen E. H., Liu Q., Wu R. Z., Wu L., … Hu G. P . ( 2017). Cognitive diagnosis based personalized question recommendation. Chinese Journal of Computers.40( 1), 176-191.
34 [ 朱天宇, 黄振亚, 陈恩红, 刘淇, 吴润泽, 吴乐, .. 胡国平 . ( 2017). 基于认知诊断的个性化试题推荐方法. 计算机学报.40( 1), 176-191.]




[1]罗芬, 王晓庆, 蔡艳, 涂冬波. 基于基尼指数的双目标CD-CAT选题策略[J]. 心理学报, 2020, 52(12): 1452-1465.
[2]郭磊; 郑蝉金; 边玉芳; 宋乃庆; 夏凌翔. 认知诊断计算机化自适应测验中新的选题策略:结合项目区分度指标[J]. 心理学报, 2016, 48(7): 903-914.
[3]罗照盛;喻晓锋;高椿雷;李喻骏;彭亚风;王 睿;王钰彤. 基于属性掌握概率的认知诊断计算机化自适应测验选题策略[J]. 心理学报, 2015, 47(5): 679-688.
[4]郭磊;王卓然;王丰;边玉芳. 结合a分层的兼具项目曝光和广义测验重叠率控制的选题策略[J]. 心理学报, 2014, 46(5): 702-713.
[5]罗芬,丁树良,王晓庆. 多级评分计算机化自适应测验动态综合选题策略[J]. 心理学报, 2012, 44(3): 400-412.
[6]程小扬,丁树良,严深海,朱隆尹. 引入曝光因子的计算机化自适应测验选题策略[J]. 心理学报, 2011, 43(02): 203-212.
[7]刘珍,丁树良,林海菁. 基于GPCM的计算机自适应测验选题策略比较[J]. 心理学报, 2008, 40(05): 618-625.
[8]林海菁,丁树良. 具有认知诊断功能的计算机化自适应测验的研究与实现[J]. 心理学报, 2007, 39(04): 747-753.
[9]戴海琦,陈德枝,丁树良,邓太萍. 多级评分题计算机自适应测验选题策略比较[J]. 心理学报, 2006, 38(05): 778-783.
[10]陈平,丁树良,林海菁,周婕. 等级反应模型下计算机化自适应测验选题策略[J]. 心理学报, 2006, 38(03): 461-467.





PDF全文下载地址:

http://journal.psych.ac.cn/xlxb/CN/article/downloadArticleFile.do?attachType=PDF&id=4515
相关话题/推荐 心理 数据 控制 计算机