![](http://journal.psych.ac.cn/xlxb/images/email.png)
1. 北京师范大学心理学部
2. 北京师范大学心理学部应用实验心理北京市重点实验室, 北京 100875
收稿日期:
2018-06-10出版日期:
2019-08-21发布日期:
2019-07-24通讯作者:
刘红云E-mail:hyliu@bnu.edu.cn基金资助:
* 国家自然科学基金项目(31571152);北京市与中央在京高校共建项目(019-105812);国家教育考试科研规划2017年度课题(GJK2017015)Make adaptive testing know examinees better: The item selection strategies based on recommender systems
WANG Pujue1, LIU Hongyun1,2(![](http://journal.psych.ac.cn/xlxb/images/email.png)
1. Faculty of Psychology, Beijing Normal University
2. Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, 100875, China
Received:
2018-06-10Online:
2019-08-21Published:
2019-07-24Contact:
LIU Hongyun E-mail:hyliu@bnu.edu.cn摘要/Abstract
摘要: 基于推荐系统中协同过滤推荐的思想, 提出两种可以利用已有答题者数据的CAT选题策略:直接基于答题者推荐(DEBR)和间接基于答题者推荐(IEBR)。通过两个模拟研究, 在不同题库和不同长度的测验中, 比较了两种推荐选题策略与两种传统选题策略(FMI和BAS)在测量精度和对题目曝光率控制上的表现, 以及影响推荐选题策略表现的因素。结果发现:两种推荐选题策略对题目曝光率的控制优于两种传统选题策略, 测量精度不亚于BAS方法, 其中DEBR侧重选题精度, IEBR对题目曝光率控制最好。已有答题者数据的特点和质量是影响推荐选题策略表现的主要因素。
图/表 4
表1模拟题库下各选题策略的表现
选题策略 | 均方误差 | 平均绝对误差 | 能力估计相关 | 卡方值 | 测验重叠率 | 曝光不足 | 曝光过度 | 答题者调用率 |
---|---|---|---|---|---|---|---|---|
定长20道题目 | ||||||||
随机选题 | 0.323 | 0.449 | 0.829 | 2.595 | 5.56% | 0 | 0 | |
FMI | 0.090 | 0.234 | 0.954 | 127.852 | 40.80% | 315 | 41 | |
DEBR (FMI) | 0.141 | 0.291 | 0.930 | 66.341 | 21.83% | 22 | 29 | 14.12% |
IEBR (FMI) | 0.242 | 0.383 | 0.872 | 8.712 | 7.09% | 1 | 2 | 2.53% |
BAS | 0.224 | 0.370 | 0.882 | 14.164 | 9.00% | 46 | 6 | |
DEBR (BAS) | 0.217 | 0.365 | 0.884 | 11.246 | 8.25% | 44 | 4 | 4.25% |
IEBR (BAS) | 0.222 | 0.369 | 0.882 | 11.187 | 8.15% | 42 | 4 | 4.66% |
定长40道题目 | ||||||||
随机选题 | 0.198 | 0.354 | 0.890 | 4.572 | 11.05% | 0 | 0 | |
FMI | 0.052 | 0.178 | 0.974 | 118.335 | 45.72% | 240 | 80 | |
DEBR (FMI) | 0.089 | 0.228 | 0.956 | 95.045 | 34.38% | 37 | 78 | 19.77% |
IEBR (FMI) | 0.126 | 0.277 | 0.937 | 7.571 | 11.80% | 0 | 15 | 5.19% |
BAS | 0.126 | 0.278 | 0.932 | 18.962 | 15.03% | 14 | 36 | |
DEBR (BAS) | 0.125 | 0.276 | 0.933 | 15.930 | 14.27% | 13 | 27 | 6.98% |
IEBR (BAS) | 0.128 | 0.280 | 0.931 | 12.012 | 13.25% | 14 | 17 | 7.22% |
表1模拟题库下各选题策略的表现
选题策略 | 均方误差 | 平均绝对误差 | 能力估计相关 | 卡方值 | 测验重叠率 | 曝光不足 | 曝光过度 | 答题者调用率 |
---|---|---|---|---|---|---|---|---|
定长20道题目 | ||||||||
随机选题 | 0.323 | 0.449 | 0.829 | 2.595 | 5.56% | 0 | 0 | |
FMI | 0.090 | 0.234 | 0.954 | 127.852 | 40.80% | 315 | 41 | |
DEBR (FMI) | 0.141 | 0.291 | 0.930 | 66.341 | 21.83% | 22 | 29 | 14.12% |
IEBR (FMI) | 0.242 | 0.383 | 0.872 | 8.712 | 7.09% | 1 | 2 | 2.53% |
BAS | 0.224 | 0.370 | 0.882 | 14.164 | 9.00% | 46 | 6 | |
DEBR (BAS) | 0.217 | 0.365 | 0.884 | 11.246 | 8.25% | 44 | 4 | 4.25% |
IEBR (BAS) | 0.222 | 0.369 | 0.882 | 11.187 | 8.15% | 42 | 4 | 4.66% |
定长40道题目 | ||||||||
随机选题 | 0.198 | 0.354 | 0.890 | 4.572 | 11.05% | 0 | 0 | |
FMI | 0.052 | 0.178 | 0.974 | 118.335 | 45.72% | 240 | 80 | |
DEBR (FMI) | 0.089 | 0.228 | 0.956 | 95.045 | 34.38% | 37 | 78 | 19.77% |
IEBR (FMI) | 0.126 | 0.277 | 0.937 | 7.571 | 11.80% | 0 | 15 | 5.19% |
BAS | 0.126 | 0.278 | 0.932 | 18.962 | 15.03% | 14 | 36 | |
DEBR (BAS) | 0.125 | 0.276 | 0.933 | 15.930 | 14.27% | 13 | 27 | 6.98% |
IEBR (BAS) | 0.128 | 0.280 | 0.931 | 12.012 | 13.25% | 14 | 17 | 7.22% |
表2模拟真实情境下各选题策略的表现
选题策略 | 均方误差 | 平均绝对误差 | 能力估计相关 | 卡方值 | 测验重叠率 | 曝光不足 | 曝光过度 | 答题者调用率 |
---|---|---|---|---|---|---|---|---|
随机选题 | 0.320 | 0.440 | 0.830 | 2.551 | 8.02% | 0 | 0 | |
FMI | 0.152 | 0.307 | 0.922 | 150.511 | 58.48% | 214 | 33 | |
DEBR (FMI) | 0.190 | 0.341 | 0.901 | 101.793 | 40.81% | 53 | 38 | 25.04% |
DEBR (FMI+DEBR) | 0.233 | 0.380 | 0.875 | 47.426 | 21.10% | 29 | 35 | 12.69% |
IEBR (FMI) | 0.265 | 0.408 | 0.855 | 43.395 | 19.63% | 0 | 24 | 5.24% |
IEBR (FMI+IEBR) | 0.274 | 0.414 | 0.852 | 11.830 | 8.19% | 0 | 0 | 2.86% |
BAS | 0.259 | 0.404 | 0.861 | 42.965 | 19.48% | 20 | 27 | |
DEBR (BAS) | 0.253 | 0.395 | 0.869 | 43.449 | 19.65% | 12 | 33 | 9.75% |
DEBR (BAS+DEBR) | 0.262 | 0.403 | 0.865 | 39.684 | 18.29% | 13 | 26 | 9.51% |
IEBR (BAS) | 0.266 | 0.408 | 0.858 | 37.491 | 17.49% | 17 | 24 | 9.96% |
IEBR (BAS+IEBR) | 0.267 | 0.407 | 0.855 | 25.305 | 13.07% | 8 | 18 | 5.13% |
表2模拟真实情境下各选题策略的表现
选题策略 | 均方误差 | 平均绝对误差 | 能力估计相关 | 卡方值 | 测验重叠率 | 曝光不足 | 曝光过度 | 答题者调用率 |
---|---|---|---|---|---|---|---|---|
随机选题 | 0.320 | 0.440 | 0.830 | 2.551 | 8.02% | 0 | 0 | |
FMI | 0.152 | 0.307 | 0.922 | 150.511 | 58.48% | 214 | 33 | |
DEBR (FMI) | 0.190 | 0.341 | 0.901 | 101.793 | 40.81% | 53 | 38 | 25.04% |
DEBR (FMI+DEBR) | 0.233 | 0.380 | 0.875 | 47.426 | 21.10% | 29 | 35 | 12.69% |
IEBR (FMI) | 0.265 | 0.408 | 0.855 | 43.395 | 19.63% | 0 | 24 | 5.24% |
IEBR (FMI+IEBR) | 0.274 | 0.414 | 0.852 | 11.830 | 8.19% | 0 | 0 | 2.86% |
BAS | 0.259 | 0.404 | 0.861 | 42.965 | 19.48% | 20 | 27 | |
DEBR (BAS) | 0.253 | 0.395 | 0.869 | 43.449 | 19.65% | 12 | 33 | 9.75% |
DEBR (BAS+DEBR) | 0.262 | 0.403 | 0.865 | 39.684 | 18.29% | 13 | 26 | 9.51% |
IEBR (BAS) | 0.266 | 0.408 | 0.858 | 37.491 | 17.49% | 17 | 24 | 9.96% |
IEBR (BAS+IEBR) | 0.267 | 0.407 | 0.855 | 25.305 | 13.07% | 8 | 18 | 5.13% |
![](http://journal.psych.ac.cn/xlxb/fileup/0439-755X/FIGURE/2019-51-9/Images/0439-755X-51-9-1057/img_1.png)
图1FMI生成首批数据时两轮推荐选题的题目曝光率变化
![](http://journal.psych.ac.cn/xlxb/fileup/0439-755X/FIGURE/2019-51-9/Images/0439-755X-51-9-1057/img_1.png)
![](http://journal.psych.ac.cn/xlxb/fileup/0439-755X/FIGURE/2019-51-9/Images/0439-755X-51-9-1057/img_2.png)
图2BAS生成首批数据时两轮推荐选题的题目曝光率变化
![](http://journal.psych.ac.cn/xlxb/fileup/0439-755X/FIGURE/2019-51-9/Images/0439-755X-51-9-1057/img_2.png)
参考文献 34
1 | Akbay L.., & Kaplan M. , ( 2017). Transition to multidimensional and cognitive diagnosis adaptive testing: An overview of cat. The Online Journal of New Horizons in Education-January.7( 1), 206-214. |
2 | Barrada J. R., Olea J., Ponsoda V., & Abad F. J . ( 2010). A method for the comparison of item selection rules in computerized adaptive testing. Applied Psychological Measurement.34( 6), 438-452. |
3 | Chang H.H . ( 2015). Psychometrics behind computerized adaptive testing. Psychometrika.80( 1), 1-20. |
4 | Chang H. H., Qian J. H., & Ying Z. L . ( 2001). a-stratified multistage computerized adaptive testing with b blocking. Applied Psychological Measurement.25( 4), 333-341. |
5 | , Chang H.H., & Ying Z.L . ( 1999). a-stratified multistage computerized adaptive testing. Applied Psychological Measurement.23( 3), 211-222. |
6 | Chen S. Y., Ankenmann R. D., & Spray J. A . ( 2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement.40( 2), 129-145. |
7 | Chen Y., Li X., Liu J., & Ying Z . ( 2018). Recommendation system for adaptive learning. Applied psychological measurement.42( 1), 24-41. |
8 | Cheng Y., Patton J. M., & Shao C . ( 2015). a-stratified computerized adaptive testing in the presence of calibration error. Educational and Psychological Measurement.75( 2), 260-283. |
9 | Covington P., Adams J., & Sargin E . (2016, September). Deep neural networks for Youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (pp. 191-198). Boston, MA: ACM. |
10 | Georgiadou E. G., Triantafillou E., & Economides A. A . ( 2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. The Journal of Technology.Learning and Assessment, 5( 8), 1-39. |
11 | He W., Diao Q., & Hauser C . ( 2014). A comparison of four item-selection methods for severely constrained CATs. Educational and Psychological Measurement.74( 4), 677-696. |
12 | Jia Z., Yang Y., Gao W., & Chen X . ( 2015,February). User-based collaborative filtering for tourist attraction recommendations. In 2015 IEEE International Conference on Computational Intelligence & Communication Technology (pp. 22-25). Ghaziabad, India: IEEE. |
13 | Kaplan M., de la Torre J., & Barrada J. R . ( 2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied psychological measurement.39( 3), 167-188. |
14 | Kla?nja-Mili?evi? A., Ivanovi? M., & Nanopoulos A . ( 2015). Recommender systems in e-learning environments: A survey of the state-of-the-art and possible extensions. Artificial Intelligence Review.44( 4), 571-604. |
15 | Koren Y. & Bell R. , ( 2015). Advances in collaborative filtering. In F. Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (2nd ed., pp. 77-118). Boston, MA: Springer. |
16 | Lika B., Kolomvatsos K., & Hadjiefthymiades S . ( 2014). Facing the cold start problem in recommender systems. Expert Systems with Applications.41( 4), 2065-2073. |
17 | Liu Q., Chen E. H., Zhu T. Y., Huang Z. Y., Wu R. Z., Su Y., & Hu G. P . ( 2018). Research on educational data mining for online intelligent learning. Pattern Recognition and Artificial Intelligence.31( 1), 77-90. |
18 | [ 刘淇, 陈恩红, 朱天宇, 黄振亚, 吴润泽, 苏喻, 胡国平 . ( 2018). 面向在线智慧学习的教育数据挖掘技术研究. 模式识别与人工智能.31( 1), 77-90.] |
19 | Lord F.M . ( 1980). Applications of item response theory to practical testing problems. Hillsdale NJ: Erlbaum. |
20 | Mao X.Z., & Xin T. , ( 2011). Item selection method in computerized adaptive testing. Advances in Psychological Science.19( 10), 1552-1562. |
21 | [ 毛秀珍, 辛涛 . ( 2011). 计算机化自适应测验选题策略述评. 心理科学进展.19( 10), 1552-1562.] |
22 | , Mao X.Z., & Xin T. , ( 2015). Multidimensional computerized adaptive testing: Model, techniques and methods. Advances in Psychological Science.23( 5), 907-918. |
23 | [ 毛秀珍, 辛涛 . ( 2015). 多维计算机化自适应测验: 模型, 技术和方法. 心理科学进展.23( 5), 907-918.] |
24 | Pirasteh P., Jung J. J., & Hwang D . (2014, April). Item-based collaborative filtering with attribute correlation: A case study on movie recommendation. In N. T. Nguyen, B. Attachoo, B. Trawiński, & K. Somboonviwat (Eds.), In Proceedings of the 6th Asian Conference on Intelligent Information and Database Systems (pp. 245-252). Cham, Switzerland: Springer. |
25 | Quijano-Sánchez L., Recio-García J. A., Díaz-Agudo B., & Jiménez-Díaz G . ( 2011, March). Happy movie: A group recommender application in facebook. In Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference (pp. 419-420). Palm Beach, FL: AAAI. |
26 | Ricci F., Rokach L., & Shapira B . ( 2015). Recommender systems: Introduction and challenges. In F. Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (2nd ed., pp.1-34). Boston, MA: Springer. |
27 | Smith B.., & Linden G. , ( 2017). Two decades of recommender systems at Amazon. com. IEEE Internet Computing.21( 3), 12-18. |
28 | Tan P. N., Steinbach M., & Kumar V. .,( 2006). Introduction to Data Mining .New York, NY: Pearson Education. |
29 | Thai-Nghe N., Drumond L., Krohn-Grimberghe A., & Schmidt-Thieme L . ( 2010). Recommender system for predicting student performance. Procedia Computer Science.1( 2), 2811-2819. |
30 | Wang H., Wang N., & Yeung D. Y . ( 2015, August). Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1235-1244).Sydney, NSW, Australia: ACM. |
31 | Weiss D.J . ( 1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement.6( 4), 473-492. |
32 | Zhang S., &Chang, H.H . ( 2016). From smart testing to smart learning: How testing technology can assist the new generation of education. International Journal of Smart Technology and Learning.1( 1), 67-92. |
33 | Zhu T. Y., Huang Z. Y., Chen E. H., Liu Q., Wu R. Z., Wu L., … Hu G. P . ( 2017). Cognitive diagnosis based personalized question recommendation. Chinese Journal of Computers.40( 1), 176-191. |
34 | [ 朱天宇, 黄振亚, 陈恩红, 刘淇, 吴润泽, 吴乐, .. 胡国平 . ( 2017). 基于认知诊断的个性化试题推荐方法. 计算机学报.40( 1), 176-191.] |
相关文章 10
[1] | 罗芬, 王晓庆, 蔡艳, 涂冬波. 基于基尼指数的双目标CD-CAT选题策略[J]. 心理学报, 2020, 52(12): 1452-1465. |
[2] | 郭磊; 郑蝉金; 边玉芳; 宋乃庆; 夏凌翔. 认知诊断计算机化自适应测验中新的选题策略:结合项目区分度指标[J]. 心理学报, 2016, 48(7): 903-914. |
[3] | 罗照盛;喻晓锋;高椿雷;李喻骏;彭亚风;王 睿;王钰彤. 基于属性掌握概率的认知诊断计算机化自适应测验选题策略[J]. 心理学报, 2015, 47(5): 679-688. |
[4] | 郭磊;王卓然;王丰;边玉芳. 结合a分层的兼具项目曝光和广义测验重叠率控制的选题策略[J]. 心理学报, 2014, 46(5): 702-713. |
[5] | 罗芬,丁树良,王晓庆. 多级评分计算机化自适应测验动态综合选题策略[J]. 心理学报, 2012, 44(3): 400-412. |
[6] | 程小扬,丁树良,严深海,朱隆尹. 引入曝光因子的计算机化自适应测验选题策略[J]. 心理学报, 2011, 43(02): 203-212. |
[7] | 刘珍,丁树良,林海菁. 基于GPCM的计算机自适应测验选题策略比较[J]. 心理学报, 2008, 40(05): 618-625. |
[8] | 林海菁,丁树良. 具有认知诊断功能的计算机化自适应测验的研究与实现[J]. 心理学报, 2007, 39(04): 747-753. |
[9] | 戴海琦,陈德枝,丁树良,邓太萍. 多级评分题计算机自适应测验选题策略比较[J]. 心理学报, 2006, 38(05): 778-783. |
[10] | 陈平,丁树良,林海菁,周婕. 等级反应模型下计算机化自适应测验选题策略[J]. 心理学报, 2006, 38(03): 461-467. |
PDF全文下载地址:
http://journal.psych.ac.cn/xlxb/CN/article/downloadArticleFile.do?attachType=PDF&id=4515