School of Management, University of Science and Technology of China, Hefei 230026, China
Received 2 August 2019; Revised 21 October 2019
Foundation items: Supported by National Natural Science Foundation of China (11601501, 11671374, and 71731010), Anhui Provincial Natural Science Foundation (1708085QA02), and Fundamcntal Rescarch Funds for the Central Universities (WK2040160028)
Corresponding author: LI Yang, E-mail: tjly@mail.ustc.edu.cn
Abstract: Precision matrix inference is of fundamental importance nowadays in high-dimensional data analysis for measuring conditional dependence. Despite the fast growing literature, developing approaches to make simultaneous inference for precision matrix with low computational cost is still in urgent need. In this paper, we apply bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional precision matrix based on the recent de-biased nodewise Lasso estimator, which does not require the irrepresentability condition and is easy to implement with low computational cost. Furthermore, we summary a unified framework to perform simultaneous confidence intervals for high-dimensional precision matrix under the sub-Gaussian case. We show that as long as some precision matrix estimation effects are satisfied, our procedure can focus on different precision matrix estimation methods which owns great flexibility. Besides, distinct from earlier Bonferroni-Holm procedure, this bootstrap method is asymptotically nonconservative. Both numerical results confirm the theoretical results and computational advantage of our method.
Keywords: precision matrixhigh dimensionalitybootstrap-assistedconfidence intervalssimultaneous inferencede-biased
基于高维精度矩阵的置信区间的一致性理论
王月, 李阳, 郑泽敏
中国科学技术大学管理学院, 合肥 230026
摘要: 随着高维数据的不断发展,精度矩阵作为衡量变量间条件相依性的有效工具引起广泛关注。尽管已有大量文献研究精度矩阵,但如何发展一种低计算成本的方法构造高维精度矩阵的同时推断变得尤为迫切。基于nodewise Lasso估计量,利用bootstrap assisted策略构造同时置信区间。与现有方法相比,该方法在理论上不需要不可解释性条件且计算成本非常低。进一步,总结出在次高斯情形下,精度矩阵同时置信区间的一致性理论,即只要精度矩阵某些估计性质满足,该方法可以基于不同的精度矩阵估计方法进行推断。此外,不同于传统的Bonferroni-Holm,该方法是渐近非保守的。模拟结果验证了该方法的优势。
关键词: 精度矩阵高维bootstrap-assisted置信区间同时推断纠偏
Nowadays, high-dimensional data which are referred to as small n large p data, develop extremely rapidly. Graphical models have been extensively used as a solid tool to measure conditional dependence structure between different variables, ranging from genetics, proteins and brain networks to social networks, online marketing and portfolio optimization. It is well known that the edges of Gaussian graphical model (GGM) are encoded by the corresponding entries of the precision matrix[1]. While most of the existing work concentrates on the estimation and individual inference of precision matrix, simultaneous inference methods are generally reckoned to be more useful in practical applications because of the valid reliability assurance. Therefore, it is in urgent need to develop approaches to make inference for groups of entries of the precision matrix.
Making individual inference for the precision matrix has been widely studied in the literature. Ref.[2] first advocated multiple testing for conditional dependence in GGM with false discovery rates control. It's a pity that this method can not be applied to construct confidence intervals directly. To address this issue, based on the so-called de-biased or de-sparsified procedure, Refs.[3-4]designed to remove the bias term of the initial Lasso-type penalized estimators and achieved asymptotically normal distribution for each entry of the precision matrix. Difference lies in that Ref.[3] adopted graphical Lasso as initial Lasso-type penalized estimator but Ref.[4] focused on nodewise Lasso. They both followed the way of Refs.[5-8]which proposed de-biased steps for inference in high-dimensional linear models.
While most recent studies have focused on the individual inference in high-dimensional regime, the simultaneous inference remains largely unexplored. Refs.[9-11]creatively proposed multiplier bootstrap method. Based on the individual confidence interval, Ref.[12] proposed simultaneous confidence intervals via applying bootstrap scheme to high-dimensional linear models. Distinct from earlier Bonferroni-Holm procedure, this bootstrap method is asymptotically nonconservative because it considers the correlation among the test statistics. More recently, Ref.[13] considered combinatorial inference aiming at testing the global structure of the graph at the cost of heavy computation and only limited to the Gaussian case.
Motivated by these concerns, we develop a bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional precision matrix, based on the de-biased nodewise Lasso estimator. Moreover, we summary a unified framework to perform simultaneous inference for high-dimensional precision matrix. Our method imitates Ref.[12] but generalizes bootstrap-assisted scheme to graphical models and we conclude general theory that our method is applicative as long as precision matrix estimation satisfies some common conditions. The major contributions of this paper are threefold. First of all, we develop a bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional precision matrix, which is adaptive to the dimension of the concerned component and considers the dependence within the de-biased nodewise Lasso estimators while Bonferroni-Holm procedure cannot attain. Second, our method is easy to implement and enjoy nice computational efficiency without loss of accuracy. Last, we provide theoretical guarantees for constructing simultaneous confidence intervals of the precision matrix under a unified framework. We prove that our simultaneous testing procedure asymptotically achieves the preassigned significance level even when the model is sub Gaussian and the dimension is exponentially larger than sample size.
Notations. For a vector x=(x1, ?, xp)T, denote
1 Methodology1.1 Model settingUnder the graphical model framework, denote by X an n×p random design matrix with p covariates. Assume that X has independent sub-Gaussian rows X(i), that is, there exists constant K such that
$\sup \limits_{\boldsymbol{\alpha} \in \mathbb{R}^{p}:\|\boldsymbol{\alpha}\|_{2} \leqslant 1} \mathbb{E} \exp \left(\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)}\right|^{2} / K^{2}\right) \leqslant 2.$ | (1) |
$\breve{\boldsymbol{\varTheta}}=\widehat{\boldsymbol{\varTheta}}-\widehat{\boldsymbol{\varTheta}}^{\mathrm{T}}\left(\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\varTheta}}-\boldsymbol{I}_{p}\right).$ |
$\begin{aligned}\sqrt{n}\left(\breve{\varTheta}_{j k}-\varTheta_{j k}\right) &=-\sqrt{n} \boldsymbol{\varTheta}_{j}^{\mathrm{T}}(\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}) \boldsymbol{\varTheta}_{k}+\varDelta_{j k} \\&=-\sum\limits_{i=1}^{n}\left(\boldsymbol{\varTheta}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\varTheta}_{k} / \sqrt{n}-\boldsymbol{\varTheta}_{j k}\right)+\varDelta_{j k},\end{aligned}$ |
Various estimations of
$\widehat{\boldsymbol{\gamma}}_{j}=\arg \min \limits_{\boldsymbol{\gamma} \in \mathbb{R}^{p-1}}\left\{\frac{1}{n}\left\|\boldsymbol{X}_{j}-\boldsymbol{X}_{-j} \boldsymbol{\gamma}\right\|_{2}^{2}+2 \lambda_{j}\|\boldsymbol{\gamma}\|_{1}\right\} .$ |
${\widehat{\boldsymbol{\varGamma}}}_{j}=\left(-\widehat{\gamma}_{j, 1}, \cdots,-\widehat{\gamma}_{j, j-1}, 1,-\widehat{\gamma}_{j, j+1}, \cdots,-\widehat{\gamma}_{j, p}\right)^{\mathrm{T}}, $ |
$\widehat{\tau_{j}^{2}}=\left\|\boldsymbol{X}_{j}-\boldsymbol{X}_{-j} \widehat{\boldsymbol{\gamma}}_{j}\right\|{ }_{2}^{2} / n+\lambda_{j}\left\|\widehat{\boldsymbol{\gamma}}_{j}\right\|_{1}.$ |
$\widehat{\boldsymbol{\varTheta}}_{j}=\widehat{\boldsymbol{\varGamma}}_{j} / \widehat{\boldsymbol{\tau}_{j}^{2}}.$ |
1.3 Simultaneous confidence intervalsWe extend the idea of de-biased nodewise Lasso estimator to construct confidence intervals for any subsets of the entries of the precision matrix. Specifically, we are interested in deriving the distribution of
$T_{E}=: \max \limits_{(j, k) \in E} \sqrt{n}\left(\breve{\varTheta}_{j k}-\varTheta_{j k}\right).$ | (2) |
$W_{E}=: \max \limits_{(j, k) \in E}\left|\sum\limits_{i=1}^{n} \widehat{Z}_{i j k} e_{i} / \sqrt{n}\right|,$ | (3) |
$\left.c_{1-\alpha, E}=\inf \left\{t \in \mathbb{R}: \mathbb{P}_{e}\left(W_{E} \leqslant t\right) \leqslant 1-\alpha\right)\right\},$ | (4) |
Remark 1.1??Bonferroni-Holm adjustment states that if an experimenter is testing p hypotheses on a set of data, then the statistical significance level for each independent hypothesis separately is 1/p times what it would be if only one hypothesis were tested. However, the bootstrap uses the quantile of the multiplier bootstrap statistic to asymptotically estimate the quantile of the target statistic and takes dependence among the test statistics into account. Thus the original method with Bonferroni-Holm is on the conservative side, while the bootstrap is closer to the preassigned significance level.
1.4 A unified theory for confidence intervalsDefine the parameter set
$\begin{array}{*{35}{l}} \mathcal{M}(s)=\left\{ \mathit{\pmb{\Theta}} \in {{\mathbb{R}}^{p\times p}}:1/L\le {{\lambda }_{\min }}\left( \mathit{\pmb{\Theta}} \right)\le \right. \\ \begin{align} & \;\;\;\;\;\;\;\;\;\;\;\;{{\lambda }_{\max }}\left( \mathit{\pmb{\Theta}} \right)\le L, \\ & \left. \;\;\;\;\;\underset{j\in [p]}{\mathop{\max }}\,{{\left\| {{\mathit{\pmb{\Theta}} }_{j}} \right\|}_{0}}\le s,\|\mathit{\pmb{\Theta}} {{\|}_{1}}\le C \right\} \\ \end{align} \\\end{array}$ |
$\|\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta}\|_{\max }=O_{p}(\sqrt{\log p / n}),$ |
$\|\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta}\|_{1}=O_{p}(s \sqrt{\log p / n}),$ |
$\left\|\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\varTheta}}-\boldsymbol{I}_{p}\right\|_{\max }=O_{p}(\sqrt{\log p / n}),$ |
2 Theoretical propertiesBefore giving the theoretical properties, we list two technical conditions.
(A1) Assume that
(A2) Assume that Bn2(log(pn))7/n≤C1n-c1, where Bn≡1 be a sequence of constants and
Proposition 2.1??(Lemma 1 of Ref.[4]) Consider the sub-Gaussian model and let
$\sqrt{n}\left(\breve{\Theta}_{j k}-\Theta_{j k}\right)=- \boldsymbol{\varTheta}_{j}^{\mathrm{T}}(\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}) \boldsymbol{\varTheta}_{k}+\varDelta_{j k},$ |
$\lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)} \mathbb{P}\left\{\max \limits_{(j, k) \in[p] \times[p]}\left|\Delta_{j k}\right| \geqslant O\left(\frac{s \log p}{\sqrt{n}}\right)\right\}=0.$ |
$\begin{array}{c}\lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)} \mid \mathbb{P}\left(\sqrt{n}\left(\breve{\varTheta}_{j k}-\varTheta_{j k}\right) / \widehat{\sigma}_{j k} \leqslant z\right)- \\\Phi(z) \mid=0,\end{array}$ |
Based on the asymptotic normality properties established in Proposition 2.1, we have the following simultaneous confidence intervals for multiple entries Θjk.
Theorem 2.1??Assume that conditions (A1)-(A2) hold. Then for any E?[p]×[p], we have
$\begin{array}{c}\lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \mathbb{P}\left(\sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\(1-\alpha) \mid=0,\end{array}$ |
Theorem 2.1 states that we can approximate the (1-α)-quantile of
Next, we extend the above theory to more general case and conclude the unified theory for precision matrix inference.
Theorem 2.2??Assume that event H holds. Then we have
$\begin{array}{c}\lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \mathbb{P}\left(\sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\(1-\alpha) \mid=0,\end{array}$ |
Next, we extend the above theory to more general case and conclude the unified theory for precision matrix inference.
Theorem 2.3??Assume that event H holds. Then we have
(A)(Individual inference)
$\begin{align} & \underset{n\to \infty }{\mathop{\lim }}\,\underset{\mathit{\pmb{\Theta}} \to \mathcal{M}\left( s \right)}{\mathop{\sup }}\,\left| \mathbb{P}\left( \sqrt{n}\left( {{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\mathit{\Theta} }}}_{jk}}-{{\mathit{\Theta} }_{jk}} \right)/\widehat{{{\sigma }_{jk}}}\le z \right) \right.- \\ & \left. \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\Phi \left( z \right)=0, \right| \\ \end{align}$ |
(B)(Simultaneous inference)
$\begin{array}{c}\lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \mathbb{P}\left(\sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\(1-\alpha) \mid=0,\end{array}$ |
Theorem 2.3 presents general conclusions for both individual and simultaneous confidence intervals. That is, our inferential procedures work for any estimation methods for precision matrix as long as the estimation effect satisfies event H.
3 Numerical studiesIn this section, we investigate the finite sample performance of the methods proposed in Section 3 and provide a comparison to simultaneous confidence interval for de-biased graphical Lasso, denoted by S-NL and S-GL, respectively. We now present two numerical examples and evaluate the methods by estimated average coverage probabilities (avgcov) and average confidence interval lengths (avglen) over two cases: support set S and its complement Sc. For convenience, we only consider Gaussian setting. The implementation for de-biased nodewise Lasso and de-biased graphical Lasso are suggested by Ref.[4]. Throughout the simulation, the level of significance is set at α=0.05 and the coverage probabilities and interval lengths calculated by averaging over 100 simulation runs and 500 Monte Carlo replications. For extra comparison, we also record individual confidence intervals for de-biased nodewise Lasso and de-biased graphical Lasso, denoted by I-NL and I-GL, respectively.
3.1 Numerical example 1: band structureWe start with a numerical example which has the similar setting as that in Ref.[3]. We consider the precision matrix Θ with the band structure, where
Table 1
Table 1 Averaged coverage probabilities and lengths over the support set S and its complement Sc in Section 3.1
| Table 1 Averaged coverage probabilities and lengths over the support set S and its complement Sc in Section 3.1 |
In terms of avgcov and avglen, it is clear that our proposed S-NL method outperforms other alternative methods with higher avgcov and shorter avglen in most settings. Although the avglen over Sc may be a little longer in some cases, it is amazing that the coverage probabilities in S approach the nominal coverage 95%. On the other hand, the advantage becomes more evident as p and ρ increase. Compared with individual confidence intervals, simultaneous confidence intervals have longer lengths and lower coverage probabilities. This is reasonable because multiplicity adjustment damages partial accuracy which is inevitable.
3.2 Numerical example 2: nonband structureFor the second numerical example, we use the same setup as simulation example 1 in Ref.[16] to test the performance of S-NL in more general cases. We generate the precision matrix in two steps. First, we create a band matrix Θ0 the same as that in Section 3.1. Second, we randomly permute the rows and columns of Θ0 to obtain the precision matrix Θ. The final precision matrix Θ no longer has the band structure. Then we sample the rows of the n×p data matrix X as i.i.d. copies from the multivariate Gaussian distribution N(0, Σ) where Σ=Θ-1. Throughout this simulation, we fix the sample size n=200, dimensionality p=1 000 and consider a range of ρ=0.2, 0.3, 0.4, 0.5.
Simulation results summarized in Table 2 also illustrate that our method can achieve the preassigned significance level asymptotically and behaves better than others in most cases. Moreover, we can see our method is very robust especially in large ρ.
Table 2
Table 2 Averaged coverage probabilities and lengths over the support set S and its complement Sc in Section 3.2
| Table 2 Averaged coverage probabilities and lengths over the support set S and its complement Sc in Section 3.2 |
4 DiscussionsIn this paper, we apply bootstrap-assisted procedure to make valid simultaneous inference for high-dimensional precision matrix based on the recent de-biased nodewise Lasso estimator. In addition, we summary a unified framework to perform simultaneous confidence intervals for high-dimensional precision matrix under the sub-Gaussian case. As long as some estimation effects are satisfied, our procedure can focus on different precision matrix estimation methods which owns great flexibility. Further, this method can be expended to more general settings, such as functional graphical model where the samples are consisted of functional data. We leave this problem for further investigations.
AppendixA.1 PreliminariesWe first provide a brief overview of the results for the nodewise Lasso and Gaussian approximation in the following Propositions.
Proposition A.1??(Theorem 1 of Ref.[4], Asymptotic normality). Suppose that
$\lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)}\left|\mathbb{P}\left(\sqrt{n}\left(\breve{\varTheta}_{i j}-\varTheta_{i j}\right) / \sigma_{i j} \leqslant z\right)-\varPhi(z)\right|=0.$ |
$\lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)} \mathbb{P}\left(\max \limits_{i, j=1, \cdots, p}\left|\widehat{\sigma}_{i j}^{2}-\sigma_{i j}^{2}\right| \geqslant \eta\right)=0.$ |
$c_{1} \leqslant \sum\limits_{i=1}^{n} \mathbb{E} x_{i j}^{2} / n \leqslant C_{1},$ |
$\begin{array}{l}\max \limits_{r=1,2} \sum\limits_{i=1}^{n} \mathbb{E}\left(\left|x_{i j}\right|^{2+r} / B_{n}^{r}\right) / n+\mathbb{E}\left(\exp \left(\left|x_{i j}\right| / B_{n}\right)\right) \\\ \ \ \ \ \ \ \ \ \ \leqslant 4,\end{array}$ |
$\rho:=\sup \limits_{t \in \mathbb{R}}\left|P\left(T_{0} \leqslant t\right)-P\left(Z_{0} \leqslant t\right)\right| \leqslant C n^{-c} \rightarrow 0.$ |
$T_{E}=: \max \limits_{(j, k) \in E} \sqrt{n}\left(\breve{\Theta}_{j k}-\Theta_{j k}\right), W_{E}=: \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} \widehat{Z}_{i j k} e_{i} / \sqrt{n} ,$ |
$T_{0}=: \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Z_{i j k} / \sqrt{n}, W_{0}=: \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Z_{i j k} e_{i} / \sqrt{n},$ |
$\mathbb{P}\left(c_{1-\alpha, W} \leqslant c_{\left(1-\alpha+\xi_{2}\right), W_{0}}+\xi_{1}\right) \geqslant 1-\xi_{2} ,$ |
$\mathbb{P}\left(c_{1-\alpha, W_{0}} \leqslant c_{1-\alpha+{\rm{\pi }}(\nu), W}\right) \geqslant 1-\mathbb{P}(\Gamma>\nu).$ |
$\begin{array}{c}\lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \left(\mathbb{P} \sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\(1-\alpha) \mid=0,\end{array}$ |
A.3 Proof of Theorem 2.3To enhance the readability, we split the proof into three steps by providing the bound on bias term, establishing asymptotic normality and verifying the variance consistency.
Step 1
$\begin{aligned}\breve{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta}=& \widehat{\boldsymbol{\varTheta}}-\widehat{\boldsymbol{\varTheta}}^{\mathrm{T}}\left(\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\theta}}-\boldsymbol{I}_{p}\right)-\boldsymbol{\varTheta} \\=&-\boldsymbol{\varTheta}(\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}) \boldsymbol{\varTheta}-\left(\boldsymbol{\varTheta} \widehat{\boldsymbol{\varSigma}}-\boldsymbol{I}_{p}\right)(\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta})-\\&(\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta})^{\mathrm{T}}\left(\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\varTheta}}-\boldsymbol{I}_{p}\right) \\: & \boldsymbol{Z}+\boldsymbol{\varDelta}_{1}+\boldsymbol{\varDelta}_{2}.\end{aligned}$ |
Step 2??The proof is the direct conclusion of Theorem 1 of Ref.[4].
Step 3??The proof is the direct conclusion of Lemma 2 of Ref.[4].
The subsequent proof is similar to Theorem 2.1, thus we omit the details.
A.4 Lemmas and their proofsThe following lemmas will be used in the proof of the main theorem.
Lemma A.1??Assume that conditions (A1)-(A4) hold. Then for any E?[p]×[p] we have
$\begin{array}{l}\sup \limits_{t \in \mathbb{R}} \mid \mathbb{P}\left(\max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Z_{i j k} / \sqrt{n} \leqslant t\right)- \\\ \ \ \ \mathbb{P}\left(\max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Y_{i j k} / \sqrt{n} \leqslant t\right) \mid \leqslant C_{0} n^{-c_{0}},\end{array}$ |
Proof??The proof is based upon verifying conditions from Corollary 2.1 of Ref.[9]. To be concrete, we require to prove the following condition (E.1).
$c_{1} \leqslant \sum\limits_{i=1}^{n} \mathbb{E} Z_{i j k}^{2} / n \leqslant C_{1} ,$ |
$\begin{array}{l}\max \limits_{r=1,2} \sum\limits_{i=1}^{n} \mathbb{E}\left(\left|Z_{i j k}\right|^{2+r} / B_{n}^{r}\right) / n+ \\\ \ \ \ \ \ \ \ \ \ \ \ \mathbb{E}\left(\exp \left(\left|Z_{i j k}\right| / B_{n}\right)\right) \leqslant 4,\end{array}$ |
$\max \limits_{r=1,2} \mathbb{E}\left|Z_{i j k}\right|^{2+r} / B_{n}^{r}+\mathbb{E} \exp \left(\left|Z_{i j k}\right| / B_{n}\right) \leqslant 4,$ |
Lemma A.2??Let V and Y be centered Gaussian random vectors in
$\begin{array}{c}\sup \limits_{t \in \mathbb{R}} \mid \mathbb{P}\left(\max \limits_{1 \leqslant j \leqslant p} V_{j} \leqslant t\right)-\mathbb{P}\left(\max \limits_{1 \leqslant j \leqslant p} Y_{j} \leqslant t\right) \mid \\\leqslant C \varDelta_{0}^{1 / 3}\left(1 \vee \log \left(p / \varDelta_{0}\right)\right)^{2 / 3},\end{array}$ |
Proof??The proof is the same as Lemma 3.1 of Ref.[9]
Lemma A.3??Suppose that there are some constants 0 < c1 < C1 such that
$\mathbb{P}\left(c_{1-\alpha, W_{0}} \leqslant c_{1-\alpha+{\rm{\pi }}(\nu), Y_{0}}\right) \geqslant 1-\mathbb{P}(\Gamma>\nu) ,$ |
$\mathbb{P}\left(c_{1-\alpha, Y_{0}} \leqslant c_{1-\alpha+{\rm{\pi }}(\nu), W_{0}}\right) \geqslant 1-\mathbb{P}(\Gamma>\nu),$ |
Lemma A.4??Assume that conditions (A1)-(A4) hold. Then for any (j, k)∈E we have
$\mathbb{P}\left(\left|T_{E}-T_{0}\right|>\xi_{1}\right)<\xi_{2},$ |
$\mathbb{P}\left(\mathbb{P}_{e}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right)>\xi_{2}\right)<\xi_{2},$ |
Proof
Bounds for??|TE-T0|: Recall that
$\left|T_{E}-T_{0}\right| \leqslant \max \limits_{(j, k) \in E}\left|\varDelta_{j k}\right|$ |
$\mathbb{P}\left\{\max \limits_{(j, k) \in E}\left|\Delta_{j k}\right| \geqslant O\left(\frac{s \log p}{n}\right)\right\} \leqslant o\ \ \ \ (1),$ |
$\begin{array}{l}\ \ \ \ \ \ \ \ {\bf { Bounds\ \ for }} \quad\left|W_{E}-W_{0}\right|: \\\left|W_{E}-W_{0}\right| \leqslant \max \limits_{(j, k) \in E}\left|\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right) e_{i} / \sqrt{n}\right| .\end{array}$ |
$\begin{array}{l}\mathbb{E}\left(A_{n}\right) \leqslant \mathbb{E}_{X} \sqrt{\mathbb{E}_{e}\left[\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right) e_{i} / \sqrt{n}\right]^{2}} \leqslant\\\mathbb{E}_{X} \sqrt{\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right)^{2} / n} \leqslant \sqrt{\mathbb{E}_{X}\left[\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right)^{2} / n\right]}.\end{array}$ |
$\begin{array}{c}\mathbb{P}\left(\mathbb{P}_{e}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right)>\xi_{2}\right) \leqslant \\\mathbb{E}\left[\mathbb{P}_{e}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right)\right] / \xi_{2}= \\\mathbb{P}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right) / \xi_{2} \leqslant \xi_{2}^{2} / \xi_{2}=\xi_{2},\end{array}$ |
Lemma A.5
$\max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right)^{2} / n=o_{p}(1).$ |
Since (a-b)2≤2(a2+b2), we have
For the first part, it follows from triangle inequality that
For the second part, it is obvious that
$\left|I_{2}\right| \leqslant O_{p}(\log p / n),$ |
Combining them together, we conclude =Op(s2logplog(np)/n)=op(1).
Lemma A.6??Assume that conditions (A1)-(A4) hold. Let
$\|\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}\|_{\max } \leqslant O_{p}(\sqrt{\log p / n}).$ |
$\left\|\boldsymbol{X}_{i} \boldsymbol{X}_{j}\right\|_{\psi_{1}} \leqslant 2\left\|\boldsymbol{X}_{i}\right\|_{\psi_{2}}\left\|\boldsymbol{X}_{j}\right\|_{\psi_{2}} \leqslant 2 c^{-2}$ |
Lemma A.7??Let
$\max \limits_{i \in[n]}\left\|\boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}}\right\|_{\max }<O_{p}(\log (n p)).$ |
Lemma A.8??Let
$\begin{array}{c}\mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \beta-\mathbb{E} \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} / \\\left(2 M^{2} K^{2}\right)^{r} \leqslant r ! / 2.\end{array}$ |
$\mathbb{E} e^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\alpha}^{2} /(M K)^{2}} \leqslant 2 \text { and } \mathbb{E} e^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}^{2 /( { MK })^{2}}} \leqslant 2.$ |
$\begin{array}{c}\mathbb{E} \mathrm{e}^{\mid \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta} \boldsymbol{\gamma}(M K)^{2}} \leqslant \mathbb{E} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \alpha \mid^{2} /(M K)^{2} / 2} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\mid^{2 /(M K)^{2} / 2}} \leqslant \\\left\{\mathbb{E} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\alpha}\mid^{2} /(M K)^{2}}\right\}^{1 / 2}\left\{\mathbb{E} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\mid^{2 /(M K)^{2}}}\right\}^{1 / 2} \leqslant 2 .\end{array}$ |
$\begin{array}{c}1+\frac{1}{r !} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} /(M K)^{2 r} \leqslant \\\mathbb{E} \mathrm{e}^{\mid \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta} \mid /(M K)^{2}}.\end{array}$ |
$\begin{array}{c}\mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}-\mathbb{E} \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} /(M K)^{2 r} \leqslant \\2^{r-1} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} /(M K)^{2 r} \leqslant \\2^{r-1} r !\left(\mathbb{E} \mathrm{e}^{\mid \alpha^{\mathrm{T}} X_{(i)} X_{(i)}^{\mathrm{T}} \boldsymbol{\beta} \mid /(M K)^{2}}-1\right)=2^{r-1} r !=\frac{r !}{2} 2^{r}.\end{array}$ |
$\mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \beta-\mathbb{E} \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} {\beta}\right|^{r} /\left(2 M^{2} K^{2}\right)^{r} \leqslant \frac{r !}{2}.$ |
References
[1] | Lauritzen S L. Graphical Models[M]. New York: Oxford University Press, 1996. |
[2] | Liu W. Gaussian graphical model estimation with false discovery rate control[J]. Ann Statist, 2013, 41(6): 2948-2978. |
[3] | Janková J, van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation[J]. Electron J Statist, 2015, 9: 1205-1229. |
[4] | Janková J, Van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation[J]. Test, 2017, 26(1): 143-162. DOI:10.1007/s11749-016-0503-5 |
[5] | Bühlmann P. Statistical significance in high-dimensional linear models[J]. Bernoulli, 2013, 19(4): 1212-1242. |
[6] | Javanmard A, Montanari A. Confidence intervals and hypothesis testing for high-dimensional regression[J]. J Mach Learn Res, 2014, 15: 2869-2909. |
[7] | van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models[J]. Ann Statist, 2014, 42(3): 1166-1202. |
[8] | Zhang C, Zhang S. Confidence intervals for low dimensional parameters in high dimensional linear models[J]. J R Stat Soc Ser B Stat Methodol, 2014, 76(1): 217-242. DOI:10.1111/rssb.12026 |
[9] | Chernozhukov V, Chetverikov D, Kato K. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors[J]. Ann Statist, 2013, 41(6): 2786-2819. |
[10] | Chernozhukov V, Chetverikov D, Kato K. Comparison and anti-concentration bounds for maxima of Gaussian random vectors[J]. Probability Theory and Related Fields, 2015, 162: 47-70. DOI:10.1007/s00440-014-0565-9 |
[11] | Chernozhukov V, Chetverikov D, Kato K. Central limit theorems and bootstrap in high dimensions[J]. Annals of Probability, 2016, 45(4): 2309-2352. |
[12] | Zhang X, Cheng G. Simultaneous inference for high-dimensional linear models[J]. J Amer Statist Assoc, 2016, 112(2): 757-768. |
[13] | Neykov M, Lu J, Liu H. Combinatorial inference for graphical models[J]. Ann Statist, 2019, 47(6): 795-827. |
[14] | Cai T, Liu W, Luo X. A constrained l1 minimization approach to sparse precision matrix estimation[J]. J Amer Statist Assoc, 2011, 106(494): 594-607. DOI:10.1198/jasa.2011.tm10155 |
[15] | Liu W D, Luo X. Fast and adaptive sparse precision matrix estimation in high dimensions[J]. Journal of Multivariate Analysis, 2015, 135(4): 153-162. |
[16] | Fan Y, Lv J. Innovated scalable efficient estimation in ultra-large gaussian graphical models[J]. Ann Statist, 2016, 44(5): 2098-2126. |
[17] | Vershynin R. Introduction to the non-asymptotic analysis of random matrices[M]//Eldar Y C, Kutyniok G. Compressed sensing: theory and applications. Cambridge: Cambridge University Press, 2012. |