主 题:How to make model-free feature screening approaches for full data applicable to missing response case?
主讲人:王启华教授
主持人:林华珍教授
时 间:2015年5月28日下午5:00-6:00
地 点:通博楼B212学术会议室
主办单位:统计研究中心 统计学院 科研处
主讲人简介:
王启华,中国科学院研究员, 博士生导师, 国家杰出青年基金获得者, 教育部****奖励计划特聘教授, 中国科学院“百人计划”入选者,国际统计研究会推选会员(Elected member of the International Statistical Institute (ISI))。 研究兴趣是生存分析、缺失数据分析、半-非参数统计推断、高维数据统计分析,发表论文百余篇,其中在美国统计学会杂志(JASA),统计年鉴 (Ann. Statist.),Biometrik等国际重要刊物发表论文80余篇。是一些国际国内刊物的编委。先后访问香港科技大学、美国加州大学戴维斯分校,美国加州大学洛杉矶分校、美国耶鲁大学、美国西雅图华盛顿大学、加拿大卡尔顿大学、德国洪堡大学及澳大利亚国立大学等。
内容提要:
It is quite challenge to develop model-free feature screening approaches {\it directly} for missing response problems since the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops a novel technique by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh-dimensional covariates with full data can be applied to missing response case.
This technique is developed by proving that the set of the active predictors on the response is a subset of the active predictors on the product of the response and missingness indicator. Then, any standard model-free feature screening procedures with screening property for full data can be applied to estimating the latter one. Hence, the probability that the estimated set contains the set of the latter one and hence the previous one tends to one. It is shown that the complete case (CC) approach can also keep the feature screening property of any feature screening approach with feature screening property for full data. As an alternative, a two-step approach is also developed for obtaining a feature screening estimator of the active predictor set of interest. A simulation study was conducted to compare the proposed methods with the complete case" (CC) approach. Real data analysis was used to illustrate the proposed method. Both the simulation studies and real data analysis indicate that the proposed zero imputation feature screening method outperforms the CC method and the two step one.