让自适应测验更知人善选——基于推荐系统的选题策略

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2022-01-01

王璞珏¹, 刘红云¹^,²(

)

^1. 北京师范大学心理学部
^2. 北京师范大学心理学部应用实验心理北京市重点实验室, 北京 100875

收稿日期:2018-06-10出版日期:2019-08-21发布日期:2019-07-24
通讯作者:刘红云E-mail:hyliu@bnu.edu.cn

基金资助:* 国家自然科学基金项目(31571152);北京市与中央在京高校共建项目(019-105812);国家教育考试科研规划2017年度课题(GJK2017015)

Make adaptive testing know examinees better: The item selection strategies based on recommender systems

WANG Pujue¹, LIU Hongyun¹^,²(

)

^1. Faculty of Psychology, Beijing Normal University
^2. Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, 100875, China

Received:2018-06-10Online:2019-08-21Published:2019-07-24
Contact:LIU Hongyun E-mail:hyliu@bnu.edu.cn

摘要/Abstract

摘要： 基于推荐系统中协同过滤推荐的思想, 提出两种可以利用已有答题者数据的CAT选题策略：直接基于答题者推荐(DEBR)和间接基于答题者推荐(IEBR)。通过两个模拟研究, 在不同题库和不同长度的测验中, 比较了两种推荐选题策略与两种传统选题策略(FMI和BAS)在测量精度和对题目曝光率控制上的表现, 以及影响推荐选题策略表现的因素。结果发现：两种推荐选题策略对题目曝光率的控制优于两种传统选题策略, 测量精度不亚于BAS方法, 其中DEBR侧重选题精度, IEBR对题目曝光率控制最好。已有答题者数据的特点和质量是影响推荐选题策略表现的主要因素。

图/表 4

表1模拟题库下各选题策略的表现

选题策略	均方误差	平均绝对误差	能力估计相关	卡方值	测验重叠率	曝光不足	曝光过度	答题者调用率
定长20道题目
随机选题	0.323	0.449	0.829	2.595	5.56%	0	0
FMI	0.090	0.234	0.954	127.852	40.80%	315	41
DEBR (FMI)	0.141	0.291	0.930	66.341	21.83%	22	29	14.12%
IEBR (FMI)	0.242	0.383	0.872	8.712	7.09%	1	2	2.53%
BAS	0.224	0.370	0.882	14.164	9.00%	46	6
DEBR (BAS)	0.217	0.365	0.884	11.246	8.25%	44	4	4.25%
IEBR (BAS)	0.222	0.369	0.882	11.187	8.15%	42	4	4.66%
定长40道题目
随机选题	0.198	0.354	0.890	4.572	11.05%	0	0
FMI	0.052	0.178	0.974	118.335	45.72%	240	80
DEBR (FMI)	0.089	0.228	0.956	95.045	34.38%	37	78	19.77%
IEBR (FMI)	0.126	0.277	0.937	7.571	11.80%	0	15	5.19%
BAS	0.126	0.278	0.932	18.962	15.03%	14	36
DEBR (BAS)	0.125	0.276	0.933	15.930	14.27%	13	27	6.98%
IEBR (BAS)	0.128	0.280	0.931	12.012	13.25%	14	17	7.22%

表1模拟题库下各选题策略的表现

选题策略	均方误差	平均绝对误差	能力估计相关	卡方值	测验重叠率	曝光不足	曝光过度	答题者调用率
定长20道题目
随机选题	0.323	0.449	0.829	2.595	5.56%	0	0
FMI	0.090	0.234	0.954	127.852	40.80%	315	41
DEBR (FMI)	0.141	0.291	0.930	66.341	21.83%	22	29	14.12%
IEBR (FMI)	0.242	0.383	0.872	8.712	7.09%	1	2	2.53%
BAS	0.224	0.370	0.882	14.164	9.00%	46	6
DEBR (BAS)	0.217	0.365	0.884	11.246	8.25%	44	4	4.25%
IEBR (BAS)	0.222	0.369	0.882	11.187	8.15%	42	4	4.66%
定长40道题目
随机选题	0.198	0.354	0.890	4.572	11.05%	0	0
FMI	0.052	0.178	0.974	118.335	45.72%	240	80
DEBR (FMI)	0.089	0.228	0.956	95.045	34.38%	37	78	19.77%
IEBR (FMI)	0.126	0.277	0.937	7.571	11.80%	0	15	5.19%
BAS	0.126	0.278	0.932	18.962	15.03%	14	36
DEBR (BAS)	0.125	0.276	0.933	15.930	14.27%	13	27	6.98%
IEBR (BAS)	0.128	0.280	0.931	12.012	13.25%	14	17	7.22%

表2模拟真实情境下各选题策略的表现

选题策略	均方误差	平均绝对误差	能力估计相关	卡方值	测验重叠率	曝光不足	曝光过度	答题者调用率
随机选题	0.320	0.440	0.830	2.551	8.02%	0	0
FMI	0.152	0.307	0.922	150.511	58.48%	214	33
DEBR (FMI)	0.190	0.341	0.901	101.793	40.81%	53	38	25.04%
DEBR (FMI+DEBR)	0.233	0.380	0.875	47.426	21.10%	29	35	12.69%
IEBR (FMI)	0.265	0.408	0.855	43.395	19.63%	0	24	5.24%
IEBR (FMI+IEBR)	0.274	0.414	0.852	11.830	8.19%	0	0	2.86%
BAS	0.259	0.404	0.861	42.965	19.48%	20	27
DEBR (BAS)	0.253	0.395	0.869	43.449	19.65%	12	33	9.75%
DEBR (BAS+DEBR)	0.262	0.403	0.865	39.684	18.29%	13	26	9.51%
IEBR (BAS)	0.266	0.408	0.858	37.491	17.49%	17	24	9.96%
IEBR (BAS+IEBR)	0.267	0.407	0.855	25.305	13.07%	8	18	5.13%

表2模拟真实情境下各选题策略的表现

选题策略	均方误差	平均绝对误差	能力估计相关	卡方值	测验重叠率	曝光不足	曝光过度	答题者调用率
随机选题	0.320	0.440	0.830	2.551	8.02%	0	0
FMI	0.152	0.307	0.922	150.511	58.48%	214	33
DEBR (FMI)	0.190	0.341	0.901	101.793	40.81%	53	38	25.04%
DEBR (FMI+DEBR)	0.233	0.380	0.875	47.426	21.10%	29	35	12.69%
IEBR (FMI)	0.265	0.408	0.855	43.395	19.63%	0	24	5.24%
IEBR (FMI+IEBR)	0.274	0.414	0.852	11.830	8.19%	0	0	2.86%
BAS	0.259	0.404	0.861	42.965	19.48%	20	27
DEBR (BAS)	0.253	0.395	0.869	43.449	19.65%	12	33	9.75%
DEBR (BAS+DEBR)	0.262	0.403	0.865	39.684	18.29%	13	26	9.51%
IEBR (BAS)	0.266	0.408	0.858	37.491	17.49%	17	24	9.96%
IEBR (BAS+IEBR)	0.267	0.407	0.855	25.305	13.07%	8	18	5.13%

图1FMI生成首批数据时两轮推荐选题的题目曝光率变化

图2BAS生成首批数据时两轮推荐选题的题目曝光率变化

参考文献 34

1	Akbay L.., & Kaplan M. , ( 2017). Transition to multidimensional and cognitive diagnosis adaptive testing: An overview of cat. The Online Journal of New Horizons in Education-January.7( 1), 206-214.
2	Barrada J. R., Olea J., Ponsoda V., & Abad F. J . ( 2010). A method for the comparison of item selection rules in computerized adaptive testing. Applied Psychological Measurement.34( 6), 438-452.
3	Chang H.H . ( 2015). Psychometrics behind computerized adaptive testing. Psychometrika.80( 1), 1-20.
4	Chang H. H., Qian J. H., & Ying Z. L . ( 2001). a-stratified multistage computerized adaptive testing with b blocking. Applied Psychological Measurement.25( 4), 333-341.
5	, Chang H.H., & Ying Z.L . ( 1999). a-stratified multistage computerized adaptive testing. Applied Psychological Measurement.23( 3), 211-222.
6	Chen S. Y., Ankenmann R. D., & Spray J. A . ( 2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement.40( 2), 129-145.
7	Chen Y., Li X., Liu J., & Ying Z . ( 2018). Recommendation system for adaptive learning. Applied psychological measurement.42( 1), 24-41.
8	Cheng Y., Patton J. M., & Shao C . ( 2015). a-stratified computerized adaptive testing in the presence of calibration error. Educational and Psychological Measurement.75( 2), 260-283.
9	Covington P., Adams J., & Sargin E . (2016, September). Deep neural networks for Youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (pp. 191-198). Boston, MA: ACM.
10	Georgiadou E. G., Triantafillou E., & Economides A. A . ( 2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. The Journal of Technology.Learning and Assessment, 5( 8), 1-39.
11	He W., Diao Q., & Hauser C . ( 2014). A comparison of four item-selection methods for severely constrained CATs. Educational and Psychological Measurement.74( 4), 677-696.
12	Jia Z., Yang Y., Gao W., & Chen X . ( 2015,February). User-based collaborative filtering for tourist attraction recommendations. In 2015 IEEE International Conference on Computational Intelligence & Communication Technology (pp. 22-25). Ghaziabad, India: IEEE.
13	Kaplan M., de la Torre J., & Barrada J. R . ( 2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied psychological measurement.39( 3), 167-188.
14	Kla?nja-Mili?evi? A., Ivanovi? M., & Nanopoulos A . ( 2015). Recommender systems in e-learning environments: A survey of the state-of-the-art and possible extensions. Artificial Intelligence Review.44( 4), 571-604.
15	Koren Y. & Bell R. , ( 2015). Advances in collaborative filtering. In F. Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (2nd ed., pp. 77-118). Boston, MA: Springer.
16	Lika B., Kolomvatsos K., & Hadjiefthymiades S . ( 2014). Facing the cold start problem in recommender systems. Expert Systems with Applications.41( 4), 2065-2073.
17	Liu Q., Chen E. H., Zhu T. Y., Huang Z. Y., Wu R. Z., Su Y., & Hu G. P . ( 2018). Research on educational data mining for online intelligent learning. Pattern Recognition and Artificial Intelligence.31( 1), 77-90.
18	[ 刘淇, 陈恩红, 朱天宇, 黄振亚, 吴润泽, 苏喻, 胡国平 . ( 2018). 面向在线智慧学习的教育数据挖掘技术研究. 模式识别与人工智能.31( 1), 77-90.]
19	Lord F.M . ( 1980). Applications of item response theory to practical testing problems. Hillsdale NJ: Erlbaum.
20	Mao X.Z., & Xin T. , ( 2011). Item selection method in computerized adaptive testing. Advances in Psychological Science.19( 10), 1552-1562.
21	[ 毛秀珍, 辛涛 . ( 2011). 计算机化自适应测验选题策略述评. 心理科学进展.19( 10), 1552-1562.]
22	, Mao X.Z., & Xin T. , ( 2015). Multidimensional computerized adaptive testing: Model, techniques and methods. Advances in Psychological Science.23( 5), 907-918.
23	[ 毛秀珍, 辛涛 . ( 2015). 多维计算机化自适应测验: 模型, 技术和方法. 心理科学进展.23( 5), 907-918.]
24	Pirasteh P., Jung J. J., & Hwang D . (2014, April). Item-based collaborative filtering with attribute correlation: A case study on movie recommendation. In N. T. Nguyen, B. Attachoo, B. Trawiński, & K. Somboonviwat (Eds.), In Proceedings of the 6th Asian Conference on Intelligent Information and Database Systems (pp. 245-252). Cham, Switzerland: Springer.
25	Quijano-Sánchez L., Recio-García J. A., Díaz-Agudo B., & Jiménez-Díaz G . ( 2011, March). Happy movie: A group recommender application in facebook. In Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference (pp. 419-420). Palm Beach, FL: AAAI.
26	Ricci F., Rokach L., & Shapira B . ( 2015). Recommender systems: Introduction and challenges. In F. Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (2nd ed., pp.1-34). Boston, MA: Springer.
27	Smith B.., & Linden G. , ( 2017). Two decades of recommender systems at Amazon. com. IEEE Internet Computing.21( 3), 12-18.
28	Tan P. N., Steinbach M., & Kumar V. .,( 2006). Introduction to Data Mining .New York, NY: Pearson Education.
29	Thai-Nghe N., Drumond L., Krohn-Grimberghe A., & Schmidt-Thieme L . ( 2010). Recommender system for predicting student performance. Procedia Computer Science.1( 2), 2811-2819.
30	Wang H., Wang N., & Yeung D. Y . ( 2015, August). Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1235-1244).Sydney, NSW, Australia: ACM.
31	Weiss D.J . ( 1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement.6( 4), 473-492.
32	Zhang S., &Chang, H.H . ( 2016). From smart testing to smart learning: How testing technology can assist the new generation of education. International Journal of Smart Technology and Learning.1( 1), 67-92.
33	Zhu T. Y., Huang Z. Y., Chen E. H., Liu Q., Wu R. Z., Wu L., … Hu G. P . ( 2017). Cognitive diagnosis based personalized question recommendation. Chinese Journal of Computers.40( 1), 176-191.
34	[ 朱天宇, 黄振亚, 陈恩红, 刘淇, 吴润泽, 吴乐, .. 胡国平 . ( 2017). 基于认知诊断的个性化试题推荐方法. 计算机学报.40( 1), 176-191.]

[1]	罗芬, 王晓庆, 蔡艳, 涂冬波. 基于基尼指数的双目标CD-CAT选题策略[J]. 心理学报, 2020, 52(12): 1452-1465.
[2]	郭磊; 郑蝉金; 边玉芳; 宋乃庆; 夏凌翔. 认知诊断计算机化自适应测验中新的选题策略：结合项目区分度指标[J]. 心理学报, 2016, 48(7): 903-914.
[3]	罗照盛;喻晓锋;高椿雷;李喻骏;彭亚风;王睿;王钰彤. 基于属性掌握概率的认知诊断计算机化自适应测验选题策略[J]. 心理学报, 2015, 47(5): 679-688.
[4]	郭磊;王卓然;王丰;边玉芳. 结合a分层的兼具项目曝光和广义测验重叠率控制的选题策略[J]. 心理学报, 2014, 46(5): 702-713.
[5]	罗芬,丁树良,王晓庆. 多级评分计算机化自适应测验动态综合选题策略[J]. 心理学报, 2012, 44(3): 400-412.
[6]	程小扬,丁树良,严深海,朱隆尹. 引入曝光因子的计算机化自适应测验选题策略[J]. 心理学报, 2011, 43(02): 203-212.
[7]	刘珍,丁树良,林海菁. 基于GPCM的计算机自适应测验选题策略比较[J]. 心理学报, 2008, 40(05): 618-625.
[8]	林海菁,丁树良. 具有认知诊断功能的计算机化自适应测验的研究与实现[J]. 心理学报, 2007, 39(04): 747-753.
[9]	戴海琦,陈德枝,丁树良,邓太萍. 多级评分题计算机自适应测验选题策略比较[J]. 心理学报, 2006, 38(05): 778-783.
[10]	陈平,丁树良,林海菁,周婕. 等级反应模型下计算机化自适应测验选题策略[J]. 心理学报, 2006, 38(03): 461-467.