删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

Targeted local angle-based multi-category support vector machine

本站小编 Free考研考试/2021-12-25

康文佳1, 林文辉2, 张三国1
1. 中国科学院大学数学科学学院, 北京 100049;
2. 航天信息股份有限公司技术研究院, 北京 100195
摘要: 支持向量机作为机器学习中一个经典的分类算法,一直广受数据科学家的喜爱。无论是处理线性可分还是非线性可分数据,传统的支持向量机能够很好地解决二分类问题。针对给定的样本,支持向量机通过最大化最小间隔得到最佳的决策分界面,从而实现对新样本的类别预测。然而现实中的数据更为复杂多样,一方面数据的类别往往多于两个,近年不乏有优秀的多分类支持向量机算法出现;另一方面不同领域的数据的特征集中可能存在相对特殊的变量(称之为主变量,targeted variable),需要将其挑选出来并加以特殊处理,以保持主变量对最终分类结果的重要影响。考虑这两个方面,提出基于角度的变系数多分类支持向量机(TLAMSVM)模型以解决含有主变量的多分类问题。它使用具备更好几何解释能力的基于角度的间隔最大分类框架完成多分类,并引入变系数模型,通过选择合适的局部光滑函数处理主变量对模型的影响。把基于角度的变系数多分类支持向量机分别应用到模拟数据集和真实数据集上。数值结果显示,相比没有使用变系数思想或基于角度的多分类框架的多分类支持向量机,TLAMSVM模型具有更高的预测准确度。
关键词: 局部光滑多分类支持向量机基于角度的间隔最大分类框架
Machine learning, as a new research orientation, plays a significant role in the age of big data. A widely used classification method of machine learning is support vector machine(SVM)[1]. It is a kind of classic margin-based classifier, which seeks for the most appropriate hyperplane in the feature space and solves the optimization problem in the form of (loss+regularization)[2]. In the last decades, SVM and its derived algorithms have held an important place in many fields[3] on account of its perfect prediction accuracy and computational efficiency.
Typical SVM is good at handling binary problems with one discriminant function. As for its theoretical property, it has been proved to have good asymptotic convergence rates and Fisher consistency. Nevertheless, in realistic applications, multi-category problems are more common than binary ones. As an illustration, diagnosis for cancer patients would be divided into several categories by measuring grade malignancy of cancer cells or tissues. So, multi-category support vector machine(MSVM) is proposed. Apart from conventional methods such as one-versus-one(OvO) and one-versus-rest(OvR)[4-5], which train a sequence of binary SVMs, there are still many simultaneous multi-classification methods, which consider k classes in a single optimization simultaneously, learn k fitting functions, and make classification decision based on specific prediction rule, such as Crammer and Singer[6-7] etc. However, using only (k-1) functions should be adequate for simultaneous multi-classification methods since binary classifiers only need one function, and thus the classification function space is reduced to (k-1)-dimension by adding a sum-to-zero constraint, $\sum_{j=1}^{k} f_{j}=0 $. However, it would be inefficient and more computations would be needed. Zhang and Liu[8] proved that the number of classification functions could still be reduced to k-1, even though there is no sum-to-zero constraint. From the viewpoint of geometry, large-margin in MSVM is not so readily comprehensible as that in binary SVMs. In order to solve the problem better, Zhang and Liu[9] proposed the angle-based large-margin classification technique. It requires k-1 classification functions but no sum-to-zero constraint. When this technique is applied to SVMs, the only drawback is no Fisher consistency. In 2016, a modified method, reinforced angle-based multi-category support vector machine(RAMSVM)[10], was proposed. RAMSVM is Fisher consistent and has good results in numerical experiments.
Apart from expanding the numbers of category for SVM, researching the interrelation of covariant variables is a worthy direction to study as well. For example, SVM is a popular algorithm in disease research to predict disease risk status of patients. However Wei et al.[11] mentioned that some variables themselves might make little contribution to prediction but play significant roles when they are coordinated with other correlated variables. So the method that uses local linear smoothing SVM on the entire feature variable space is proposed by Ladicky and Torr[12]. Since it does not accomplish smoothing technique successfully in high-dimensional feature space, Chen[13] developed a targeted local smoothing kernel weighted support vector machine (KSVM), which only needs the data in targeted dimension to be plentiful. It achieves an age-dependent prediction rule to discriminate disease risk via optimizing a kernel weighted problem. It gets quite good results on binary classification problems. It is worth applying such a local weighted tool to MSVM.
In this study, we mainly make two contributions. Firstly, we combine angle-based multi-category classification framework of RAMSVM model and targeted local smoothing method of KSVM model, and propose a new MSVM model, targeted local angle-based multi-category support vector machine(TLAMSVM), which offers a better classification prediction for certain data sets. Secondly, TLAMSVM model provides more detailed analysis for a sample according to its value of targeted variable. Before introducing TLAMSVM model, some fundamental knowledge of RAMSVM model and KSVM model will be described in section 2, which are the basis of TLAMSVM model. In section 3, the detailed presentation of TLAMSVM model will be shown, followed by its computational process of coordinate descent method. In section 4, we will show some experimental results in four simulated and real data sets and give some illustrations to demonstrate the strengths of TLAMSVM model. At the end, we will conclude the article with discussions in section 5.
1 Background and related work1.1 Reinforced angle-based multi-category support vector machine(RAMSVM)As for k-class data sets, the angle-based large-margin classification framework defines a k-vertex simplex in (k-1) dimensional space, W1, W2, …, Wk, which correspond to each class,
$W_{j}=\left\{\begin{array}{ll}{(k-1)^{-\frac{1}{2}} \boldsymbol{\mathit{1}}, } & {j=1} \\ {-\frac{1+\sqrt{k}}{(k-1)^{3 / 2}} \boldsymbol{\mathit{1}}+\sqrt{\frac{k}{k-1}} e_{j-1}, } & {2 \leqslant j \leqslant k}\end{array}\right.$
where Wj represents the jth vertex, j=1, 2, …, k, 1 represents the vector where each element is 1, ej represents unit vector where every element is 0 except for the jth one. Wj, 1, $ e_{j} \in \mathbb{R}^{k-1}$. It can be verified that ‖Wj‖=1 and the angle between any two adjacent vectors is equal. Each observation x could be mapped to (x)=(f1(x), …, fk-1(x))∈$ \mathbb{R}^{k-1}$. Consequently, there are k angles ∠(f(x), Wj), and we regard argminj∈{1, …, k}∠(f(x), Wj) as the predicted category. It is equivalent to write this prediction rule as $\hat{y}(x)=\operatorname{argmax}_{j \in\{1, \cdots, k |}\left\langle f(x), W_{j}\right\rangle $.
According to the regular form of a large-margin classification algorithm, we add a penalty term J(·) to the loss function. So the optimization problem becomes
${\min\limits_{f \in F}}\frac{1}{n}\sum\limits_{i = 1}^n l \left( {\left\langle {f\left( {{x_i}} \right), {W_{{y_i}}}} \right\rangle } \right) + \lambda J(f), $ (1)
here l(·) is a loss function, which measures the cost of wrong predictions. Nevertheless, typical loss function such as hinge loss l(u)=[1-u]+ is not Fisher consistent. Inspired by Lee[7], RAMSVM uses a convex combination of loss function to overcome this drawback. Therefore (1) is turned into
$\begin{array}{*{20}{l}}{{\min \limits_{f \in F}}\frac{1}{n}\sum\limits_{i = 1}^n {\{ (1 - \gamma )\sum\limits_{j \ne {y_i}} {{{\left[ {1 + < f\left( {{x_i}} \right), {W_j} > } \right]}_ + }} + } }\\{\quad \gamma {{\left[ {(k - 1) - \left\langle {f\left( {{x_i}} \right), {W_{{y_i}}}} \right\rangle } \right]}_ + }\} + \lambda J(f), }\end{array}$ (2)
here γ∈[0, 1] is the convex combination parameter. It has been proved that, when γ∈[0, 0.5], the loss function of RAMSVM model is Fisher consistent[10]. In addition, when γ=0.5, RAMSVM is stable and almost reaches the optimal classification accuracy. Therefore we default γ=0.5 directly in the latter work.
1.2 Kernel weighted support vector machine(KSVM)Take binary medical data as an example, let Y denote the set of labels {1, -1}, which represents whether a subject is at the disease risk or not. Let "age" be the targeted variable U, and variable X contains other features. Now the purpose is to train a classifier which depends on age information. Firstly, we consider making some changes on typical discriminant score as follows:
$\alpha(U)+\boldsymbol{X}^{\mathrm{T}} \boldsymbol{\beta}(U), $ (3)
where α(U) is a baseline without specific functional form, β(U) is a vector whose elements represent the coefficients of X that varies with age. It is common sense that the subjects with similar ages and other features would also have similar disease risk. So when changing age U smoothly, β(U) varies smoothly too. Some features might provide more information for classification when age is larger while others do reversely. From this perspective, people would have better understanding of each physical sign. Then, after calculating the value in (3), the one with a negative score will be classified as risk-free and the one with a positive score will be classified to risk group.
After choosing an appropriate loss function, the targeted local smoothing kernel weighted support vector machine (KSVM) model trains a classifier via solving
$\begin{array}{*{20}{c}}{\mathop {\min }\limits_{\alpha \left( {{u_0}} \right), \beta \left( {{u_0}} \right)} \sum\limits_i {{K_{{h_n}}}} \left( {{U_i} - {u_0}} \right) \cdot }\\{l\left\{ {{Y_i}, {X_i};\alpha \left( {{u_0}} \right), \beta \left( {{u_0}} \right)} \right\} + \lambda \left\| {\beta \left( {{u_0}} \right)} \right\|, }\end{array}$
where Khn(·) is kernel density function, which could pool information of subjects whose ages are around u0 within certain bandwidth hn. In another word, KSVM model trains different SVMs at every u0, respectively, and the weight of the ith sample is Khn(Ui-u0).
After getting the estimated values of coefficients $ \hat{\alpha}(u)$ and $\hat{\boldsymbol{\beta}}(u) $, KSVM model predicts the disease risk status for any new subjects with age u and feature x via $\hat d(\mathit{\boldsymbol{x}}, u) = {\mathop{\rm sign}\nolimits} \left\{ {\widehat \alpha (u) + {\mathit{\boldsymbol{x}}^{\rm{T}}}\mathit{\boldsymbol{\widehat \beta }}(u)} \right\} $.
2 Methodology and computationIn this section we firstly illuminate how to design multi-category local smoothing kernel weighted optimization problem in TLAMSVM model. Then we list out some main computational steps of TLAMSVM model using coordinate descent method. We take linearly separable case as example, since to non-separable case, we can use kernel trick.
2.1 Targeted local angle-based multi-category support vector machine(TLAMSVM)Assume that there are n observations, each of which has p+1 features, one of which is the targeted variable U. For simplicity, the intercepts are included in X. Denote the value of each feature of ith observation as Xi=(1, Xi(1), Xi(2), …, Xi(p))T. X(j)=(X1(j), X2(j), …, Xn(j)) representing the value of jth feature for all observations. Therefore the sample matrix Xp*n could be written as
$\mathit{\boldsymbol{X}} = \left( {{X_1}, {X_2}, \cdots , {X_n}} \right) = {\left( {1, {X^{(1)}}, {X^{(2)}}, \cdots , {X^{(p)}}} \right)^{\rm{T}}}.$ (4)
Assume the classification function vector at U=u0 is
${f_q}\left( {{u_0}, \mathit{\boldsymbol{X}}} \right) = {\mathit{\boldsymbol{X}}^{\rm{T}}}{\mathit{\boldsymbol{\beta }}_q}\left( {{u_0}} \right)\;\;\;(q = 1, \cdots , k - 1), $ (5)
where βq(u0)=(αq(u0), βq(1)(u0), βq(2)(u0), …, βq(p)(u0))T. We use convex combination of loss function in the model. Hence, the original optimization problem of TLAMSVM model with targeted variable U is
$\begin{array}{*{20}{l}}{\mathop {\min }\limits_{f \in F} \sum\limits_i {{K_{{h_n}}}} \left( {{U_i} - {u_0}} \right)\{ (1 - \gamma )}\\{\sum\limits_{j \ne {y_i}} {{{\left[ {1 + \left\langle {f\left( {{u_0}, {X_i}} \right), {W_j}} \right\rangle } \right]}_ + }} + }\\\gamma\left[(k-1)-\left\langle f\left(u_{0}, X_{i}\right), W_{y_{i}}\right\rangle\right]_{+} \}+\frac{\lambda}{2} J(f), \end{array}$ (6)
where the {(1-γ)∑jyi[1+〈f(u0, Xi), Wj〉]++γ[(k-1)-〈f(u0, Xi), Wyi〉]+} is the loss function of our model, and the weight before loss of each observation, Khn(Ui-u0), is smoothing kernel function with bandwidth hn, like that of KSVM model. Considering the more general situation, we bring in the concept of "soft margin" to allow part of samples dissatisfying constrain
$\begin{array}{c}{-\left\langle f\left(u_{0}, X_{i}\right), W_{j}\right\rangle> 1 \text { and } 2-k+} \\ {\left\langle f\left(u_{0}, X_{i}\right), W_{y_{i}}\right\rangle> 1}, \end{array}$
that is to say, put slack variables ξi, j (ξi, j≥0) into constrains, and we get
$\begin{array}{c}{-\left\langle f\left(u_{0}, X_{i}\right), W_{j}\right\rangle> 1-\xi_{i, j}, (i=1, \cdots, n)}, \\ {2-k+\left\langle f\left(u_{0}, X_{i}\right), W_{y_{i}}\right\rangle> 1-\xi_{i, j}}, \\ {\left(i=1, \cdots, n, j \neq y_{i}\right).}\end{array}$ (7)
We set the form of penalty in (2) as $J=\sum_{q=1}^{k-1} \boldsymbol{\beta}_{q}^{\mathrm{T}}\left(u_{0}\right) \boldsymbol{\beta}_{q}\left(u_{0}\right) $ and plug (7) in (6), and after some simplification one obtains
$\begin{array}{*{20}{c}}{\mathop {\min }\limits_{{\mathit{\boldsymbol{\beta }}_q}\left( {{u_0}} \right), {\xi _{i, j}}} \frac{{n\lambda }}{2}\sum\limits_{q = 1}^{k - 1} {\mathit{\boldsymbol{\beta }}_q^{\rm{T}}} \left( {{u_0}} \right)\mathit{\boldsymbol{\beta }}_q^{\rm{T}}\left( {{u_0}} \right) + }\\{\sum\limits_i {{K_{{h_n}}}} \left( {{U_i} - {u_0}} \right)\left[ {\gamma {\xi _{i, {y_i}}} + (1 - \gamma )\sum\limits_{j \ne {y_i}} {{\xi _{i, j}}} } \right]}\\{\rm{s}}{\rm{.t}}{\rm{.}}\left\{ {\begin{array}{*{20}{l}}{{\xi _{i, j}} \ge 0(i = 1, \cdots , n, j = 1, \cdots , k)}\\{{\xi _{i, {y_i}}} + \left\langle {f\left( {{u_0}, {X_i}} \right), {W_{{y_i}}}} \right\rangle - (k - 1) \ge 0(i = 1, \cdots , n).}\\{{\xi _{i, j}} - \left\langle {f\left( {{u_0}, {X_i}} \right), {W_j}} \right\rangle - 1 \ge 0\left( {i = 1, \cdots , n, j \ne {y_i}} \right)}\end{array}} \right.\end{array}$ (8)
Obviously, (8) is a constrained optimization problem, whose Lagrangian function could be defined as
$\begin{aligned} L=& \frac{n \lambda}{2} \sum\limits_{q=1}^{k-1} \boldsymbol{\beta}_{q}^{\mathrm{T}}\left(u_{0}\right) \boldsymbol{\beta}_{q}\left(u_{0}\right)+\\ & \sum\limits_{i=1}^{n} K_{h_{n}}\left(U_{i}-u_{0}\right)\left[\gamma \xi_{i, y_{i}}+(1-\gamma) \sum\limits_{j \neq y_{i}} \xi_{i, j}\right]-\\ & \sum\limits_{i=1}^{n} \alpha_{i, y_{i}}\left[\xi_{i, y_{i}}+\left\langle f\left(u_{0}, X_{i}\right), W_{j}\right\rangle-(k-1)\right]-\\ & \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{k} \tau_{i, j} \xi_{i, j}-\sum\limits_{i=1}^{n} \sum\limits_{j \neq y_{i}} \alpha_{i, j}\\ &\left[\xi_{i, j}-\left\langle f\left(u_{0}, X_{i}\right), W_{j}\right\rangle- 1\right]. \end{aligned}$
Here αi, j and τi, j are Lagrangian multipliers. After simplification, we get
$\begin{aligned} L=& \frac{n \lambda}{2} \sum\limits_{q=1}^{k-1} \boldsymbol{\beta}_{q}^{\mathrm{T}}\left(u_{0}\right) \boldsymbol{\beta}_{q}\left(u_{0}\right)+\\ & \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{k}\left[A_{i, j}-\tau_{i, j}-\alpha_{i, j}\right] \xi_{i, j}+\sum\limits_{i=1}^{n} \alpha_{i, y_{i}}(k-1)+\\&{\sum\limits_{i = 1}^n {\sum\limits_{j \ne {y_i}} {{\alpha _{i, j}}} } - \sum\limits_{i = 1}^n {{\alpha _{i, {y_i}}}} \left\langle {f\left( {{u_0}, {X_i}} \right), {W_{{y_i}}}} \right\rangle + }\\&{\sum\limits_{i = 1}^n {\sum\limits_{j \ne {y_i}} {{\alpha _{i, j}}} } \left\langle {f\left( {{u_0}, {X_i}} \right), {W_j}} \right\rangle , }\end{aligned}$ (9)
where Ai, j=Khn(Ui-u0)[γI(j=yi)+(1-γ)I(jyi)].
Let the last two parts of (10) as L1 and L2,
$\begin{aligned} L_{1} &=-\sum\limits_{i=1}^{n} \alpha_{i, y_{i}}\left\langle f\left(u_{0}, X_{i}\right), W_{y_{i}}\right\rangle \\ &=-\sum\limits_{i=1}^{n} \sum\limits_{q=1}^{k-1} \alpha_{i, y_{i}}\left(\boldsymbol{X}_{i}^{\mathrm{T}} \boldsymbol{\beta}_{q}\left(u_{0}\right)\right) W_{y_{i}, q} \\ L_{2} &=\sum\limits_{i=1}^{n} \sum\limits_{j \neq y_{i}} \alpha_{i, j}\left\langle f\left(u_{0}, X_{i}\right), W_{j}\right\rangle \\ &=\sum\limits_{i=1}^{n} \sum\limits_{q=1}^{k-1} \sum\limits_{j \neq y_{i}} \alpha_{i, j}\left(\boldsymbol{X}_{i}^{\mathrm{T}} \boldsymbol{\beta}_{q}\left(u_{0}\right)\right) W_{j, q} .\end{aligned}$
Take partial derivatives of L with respect to βq(u0) and ξi, j, then let the partial derivatives be zero,
$\begin{array}{*{20}{l}}{\frac{{\partial L}}{{\partial {\xi _{i, j}}}} = {A_{i, j}} - {\tau _{i, j}} - {\alpha _{i, j}} = 0}\\{(i = 1, \cdots , n;j = 1, \cdots , k), }\end{array}$ (10)
$\begin{array}{*{20}{c}}{\frac{{\partial L}}{{\partial {\mathit{\boldsymbol{\beta }}_q}\left( {{u_0}} \right)}} = \lambda {\mathit{\boldsymbol{\beta }}_q}\left( {{u_0}} \right) - \sum\limits_{i = 1}^n {{\alpha _{i, {y_i}}}} {W_{{y_i}, q}}{X_i} + }\\{\sum\limits_{i = 1}^n {\sum\limits_{j \ne {y_i}} {{\alpha _{i, j}}} } {W_{j, q}}{X_i} = 0.}\end{array}$ (11)
From (12) one obtains
${\mathit{\boldsymbol{\beta }}_q}\left( {{u_0}} \right) = \frac{1}{{n\lambda }}\left[ {\sum\limits_{i = 1}^n {{\alpha _{i, {y_i}}}} {W_{{y_i}, q}}{X_i} - \sum\limits_{i = 1}^n {\sum\limits_{j \ne {y_i}} {{\alpha _{i, j}}} } {W_{j, q}}{X_i}} \right]$ (12)
and then obtains
$X_{i}^{\mathrm{T}}=-\frac{\lambda \boldsymbol{\beta}_{q}^{\mathrm{T}}}{\sum_{i=1}^{n}\left(\sum_{j \neq y_{i}} \alpha_{i, j} W_{j, q}-\alpha_{i, y_{i}} W_{y_{i}, q}\right)}.$
We get
$\begin{array}{*{20}{c}}{{L_1} + {L_2} = \sum\limits_{q = 1}^{k - 1} {\sum\limits_{i = 1}^n {\left( {\mathit{\boldsymbol{X}}_i^{\rm{T}}{\mathit{\boldsymbol{\beta }}_q}\left( {{u_0}} \right)} \right)} } \left[ {\sum\limits_{j \ne {y_i}} {{\alpha _{i, j}}} {W_{j, q}} - } \right.}\\{{\alpha _{i, {y_i}}}{W_{{y_{i, q}}}}] = - \lambda \sum\limits_{q = 1}^{k - 1} {\mathit{\boldsymbol{\beta }}_q^{\rm{T}}} \left( {{u_0}} \right){\mathit{\boldsymbol{\beta }}_q}\left( {{u_0}} \right).}\end{array}$ (13)
Pluging (10), (11), and (13) in (9), we simplify Lagrangian function as
$L=-\frac{n \lambda}{2} \sum\limits_{q=1}^{k-1} \boldsymbol{\beta}_{q}^{\mathrm{T}}\left(u_{0}\right) \boldsymbol{\beta}_{q}\left(u_{0}\right)+\sum\limits_{i=1}^{n} \sum\limits_{j \neq y_{i}} \alpha_{i, j}.$ (14)
Pluging (12) in (14) and maximizing Lagrangian function, we get a new form of that,
$\begin{array}{*{20}{c}}{\mathop {\min }\limits_{{\alpha _{i, j}}} \frac{1}{{2n\lambda }}\sum\limits_{q = 1}^{k - 1} {{{\left[ {\sum\limits_{i = 1}^n {{\alpha _{i, {y_i}}}} {W_{{y_i}, q}}{X_i} - \sum\limits_{i = 1}^n {\sum\limits_{j \ne {y_i}} {{\alpha _{i, j}}} } {W_{i, q}}{X_i}} \right]}^{\rm{T}}}} }\\{\left[ {\sum\limits_{i = 1}^n {{\alpha _{i, {y_i}}}} {W_{{y_i}, q}}{X_i} - \sum\limits_{i = 1}^n {\sum\limits_{j \ne {y_i}} {{\alpha _{i, j}}} } {W_{i, q}}{X_i}} \right] - }\\{\sum\limits_{i = 1}^n {{\alpha _{i, {y_i}}}} (k - 1) - \sum\limits_{i = 1}^n {\sum\limits_{j \ne {y_i}} {{\alpha _{i, j}}} } }\\{{\rm{ s}}{\rm{.t}}{\rm{.}}\;\;\;0 \le {\alpha _{i, j}} \le {A_{i, j}}.}\end{array}$ (15)
2.2 Computation using coordinate descent methodCoordinate descent (CD) method is a remarkable tool to solve large-scale data problems. It is mainly used to solve the convex optimization problems. It treats each dimension of the solution vector of the optimization problem as a coordinate, and selects one-dimensional coordinate for iterative optimization. The iterative process consists of the outer loop and the inner loop. In each outer loop, the inner loop optimization can be considered as a single variable sub-optimization problem, and only one scalar solution needs to be optimized at a time.
Assume that $ \boldsymbol{w} \in \mathbb{R}^{n}$ is the solution vector of an optimization problem. We set a total of k iterations. Beginning with initialized w0, we will get w1, w2, …, wk in turn. Regarding each iteration as an outer loop, each of that will iterate on each dimension of w=[w1, w2, …, wn]. So here each outer loop contains n inner loops. From ith outer loop and jth inner loop, wi, j can be generated,
$\boldsymbol{w}^{i, j}=\left[w_{1}^{i+1}, \cdots, w_{j-1}^{i+1}, w_{j}^{i}, \cdots, w_{n}^{i}\right]^{\mathrm{T}}.$
Notice that wi, 1=wi and wi, n+1=wi+1. Update wi, j into wi, j+1 through solving single variable subproblem
$\begin{array}{*{20}{c}}{\mathop {\min }\limits_z f\left( {w_1^{j + 1}, \cdots , w_{j - 1}^{i + 1}, w_j^i + z, w_{j + 1}^i, \cdots , w_n^i} \right) \equiv }\\{\mathop {\min }\limits_z f\left( {{w^{i, j}} + z{\mathit{\boldsymbol{e}}_j}} \right), }\end{array}$
where ej=[0, …, 0, 1, 0, …, 0]T. At inner loop, many tools can be used, such as Newton method and CMLS.
Now we present the computational procedure of TLAMSVM using coordinate descent method.
Table
Algorithm 1 coordinate descent method
1: Initialize w0
2: for i=0, 1, …, k do
3: ??for j=1, 2, …, n do
4: ????fix w1i+1, …, wj-1i+1, wj+1i, …, wni
5: ??????solve the single variable suboptimal problem
6: ????end for
7: end for


Step 1. Initialize the solution vector $ \hat{\boldsymbol{B}}_{j}=\boldsymbol{\mathit{0}}$.
Step 2. Assume the estimated value in mth iteration is $ \left(\hat{\boldsymbol{B}}_{1}^{(m)}, \cdots, \hat{\boldsymbol{B}}_{k-1}^{(m)}\right)$, and each element of that is
$\begin{aligned} \hat{\boldsymbol{B}}_{j}^{(m)}=&\left(\hat{\alpha}_{j}^{(m)}\left(u_{0}\right), \hat{\boldsymbol{\beta}}_{j}^{(m), (1)}\left(u_{0}\right)\right., \\ & \hat{\boldsymbol{\beta}}_{j}^{(m), (2)}\left(u_{0}\right), \cdots, \hat{\boldsymbol{\beta}}_{j}^{(m), (p)}\left(u_{0}\right) )^{\mathrm{T}}.\end{aligned}$
Fix $\left(\hat{\boldsymbol{B}}_{1}^{(m)}, \cdots, \hat{\boldsymbol{B}}_{j-1}^{(m)}, \hat{\boldsymbol{B}}_{j}^{(m-1)}, \hat{\boldsymbol{B}}_{j+1}^{(m-1)}, \cdots, \hat{\boldsymbol{B}}_{k-1}^{(m-1)}\right) $. After updating we get new $ \hat{\boldsymbol{B}}_{j}^{(m)}$.
Therefore, in the m+1th step, =1, …, n, j=1, …, k-1, l=1, …, p, we set
${G_{i, j}}(z) = \sum\limits_{q < l} {{X_{i, q}}} \hat B_j^{(m + 1), (q)} + z{X_{i, l}} + \sum\limits_{q > l} {{X_{i, q}}} \hat B_j^{(m), (q)}, $
where $\hat B_j^{(m + 1), (q)} $ is the qth element of $\hat{\boldsymbol{B}}_{j}^{(m+1)} $, q=1, …, p+1, Gi, j(z) is a function of $z \in \mathbb{R} $. Hence we have
$\begin{array}{c}{F_{i, j, -l}(z)=\left(\boldsymbol{X}_{i}^{\mathrm{T}} \hat{\boldsymbol{B}}_{1}^{(m+1)}, \cdots, \boldsymbol{X}_{i}^{\mathrm{T}} \hat{\boldsymbol{B}}_{j-1}^{(m+1)}\right.}, \\ {G_{i, j}(z), \boldsymbol{X}_{i}^{\mathrm{T}} \hat{\boldsymbol{B}}_{j+1}^{(m)}, \cdots, \boldsymbol{X}_{i}^{\mathrm{T}} \hat{\boldsymbol{B}}_{k-1}^{(m)} )^{\mathrm{T}}}.\end{array}$
Make the form of $ \hat B_j^{(m + 1), (l)}$ as (16) (see below). It is a single variant suboptimal problem, and we have many typical methods to solve it,
$\hat{B}_{j}^{(m+1), (l)}=\operatorname{argmin}_{z}\left[\lambda z^{2}+\frac{1}{n} \Sigma_{i=1}^{n} l\left\{\left\langle F_{i, j, -l}(z), W_{y_{i}}\right\rangle\right\}\right].$ (16)
Step 3. Repeat above until all estimated values are convergent.
After having $ \hat{\boldsymbol{\beta}}_{q}(u)$, we can use $ \hat{\boldsymbol{\beta}}_{q}(u)=\left(\hat{\alpha}_{q}(u), \hat{\beta}_{q}^{(1)}(u), \cdots, \hat{\beta}_{q}^{(p)}(u)\right)^{\mathrm{T}}$ to get each component of classification function vector by calculating
$\begin{aligned} f_{q}(u, \boldsymbol{X})=& \boldsymbol{X}^{\mathrm{T}} \hat{\boldsymbol{\beta}}_{q}(u)=\left(1, X^{(1)}, X^{(2)}, \cdots, X^{(p)}\right)\cdot \\ &\left(\hat{\alpha}_{q}(u), \hat{\boldsymbol{\beta}}_{q}^{(1)}(u), \cdots, \hat{\boldsymbol{\beta}}_{q}^{(p)}(u)\right)^{\mathrm{T}}. \end{aligned}$
Classification function vector is
$f(u, \boldsymbol{X})=\left(f_{1}(u, \boldsymbol{X}), f_{2}(u, \boldsymbol{X}), \cdots, f_{k-1}(u, \boldsymbol{X})\right).$
Therefore, any new sample (u′, X′) will be mapped into f(u′, X′) and form k angles with vertex vectors ∠(f(u′, X′), Wj). Then we assign its label as
$\hat y = {{\mathop{\rm argmin}\nolimits} _{j \in \{ 1, \cdots , k\} }}\angle \left( {f\left( {{u^\prime }, {X^\prime }} \right), {W_j}} \right).$
3 Numerical studyIn this section, we use numerical experiments to verify the effect of TLAMSVM model and choose f1 score to measure its classification result. f1 score is a common criterion in binary classification problems. In this paper, we generalize f1 score to measure multi-category cases. In detail, we compute corresponding f1 score for each class by viewing current class as positive one and the rest as negative one, the average of all f1 score is the final result for classifier.
It is worth noting that when dealing with non-linear problems, TLAMSVM model needs both Mercer kernel function K(·) and local smoothing kernel function khn(·). The difference is obvious, the former is used to construct non-linear decision boundaries, while the latter is used to aggregate information near u0. In numerical experiments, we use linear Mercer kernel, gaussian Mercer kernel and polynomial Mercer kernel as Mercer kernel function. And uniform kernel, gaussian kernel and Epanechikov(short as ep) kernel function as local smoothing kernel function.
3.1 Simulation resultsWe generate two simulated data sets: S1 and S2. Data in S1 has 1 000 samples with 3-class label and 3 features. Data in S2 has 1 500 samples with 5-class label and 7 features. We carry out 10 simulation runs for each setting and the final results are averaged values. We generate targeted variable Ui from U(0, 1) for both S1 and S2. And the label of S1 is defined as (17). As for S2, we use the same method to generate label, due to limitation of space, the formula wouldn't be listed in this article.
$Y_{i}=\left\{\begin{array}{ll}{1} & {\text { if } U_{i}^{2}+U_{i}-1+\epsilon_{i} \leqslant U_{\frac{1}{3}}} \\ {2} & {\text { if } U_{\frac{1}{3}} \leqslant U_{i}^{2}+U_{i}-1+\epsilon_{i} \leqslant U_{\frac{2}{3}}} \\ {3} & {\text { if } U_{i}^{2}+U_{i}-1+\epsilon_{i}>U_{\frac{2}{3}}}\end{array}\right.$ (17)
here $U \frac{1}{3} $ and $U \frac{2}{3} $ are tertiles of U, ?i~N(0, 0.1). When given Yi and Ui, we generate features of S1, Xi=(Xi1, Xi2)T from
${\mathit{\boldsymbol{X}}_i}|{Y_i}, {U_i}\sim MVN\left\{ {\beta \left( {{U_i}} \right){Y_i}, {\sigma ^2}I} \right\}$
where β(u)=(sin(4πu), 2e-20(u-0.5)2)T
As for S2, we generate features Xi=(Xi1, Xi2, Xi3, Xi4, Xi5, Xi6)T from and σ=2
$\begin{aligned}& {\mathit{\boldsymbol{X}}_i}\sim MVN\left( {{\mu ^{(k)}}\eta \left( {{U_i}} \right){I_{|{X_i} \in {\rm{ classk }}|}}, {\sigma ^2}I} \right) \\ \boldsymbol{\eta}\left(U_{i}\right)=&\left(2 U_{i}, U_{i}^{2}+2 U_{i}-1, \cos \left(2 \pi U_{i}\right)\right., \\ & 4 \mathrm{e}^{\left(U_{i}-0.3\right)^{2}}, \ln \left(\frac{1}{U_{i}^{2}}\right), \sin \left(3 \pi U_{i}\right) )^{\mathrm{T}}. \end{aligned}$ (18)
$\begin{array}{c}{\mu^{(1)}=(5, 5, 0, 0, 0, 0), \mu^{(2)}=(0, 5, 5, 0, 0, 0)} \\ {\mu^{(3)}=(0, 0, 5, 5, 0, 0), \mu^{(4)}=(0, 0, 0, 5, 5, 0)} \\ {\mu^{(5)}=(0, 0, 0, 0, 5, 5)}\end{array}$
We mainly compare effects of three models: KMSVM, RAMSVM and TLAMSVM. KMSVM is a multi-category generalization form of KSVM, it combines the OvO multi-classification framework with KSVM method. While RAMSVM doesn't use varying-coefficient model and regards targeted variable as a common feature. Both TLAMSVM and RAMSVM use angel-based multi-category framework. In each model, we utilize 5-fold cross validation to determine parameters λ and hn.
Table 1 and Table 2 record the effects of model RAMSVM and TLAMSVM on simulated data S1 and S2.
Table 1
Table 1 Comparison of f1 score between RAMSVM and TLAMSVM on data set S1
S1 RAMSVM TLAMSVM
kernel uniform ep gaussian
linear 0.82 0.83 0.85 0.86
gaussian 0.87 0.94 0.92 0.89
polynomial 0.84 0.82 0.81 0.79
Time/s
linear 7 43 53 42
gaussian 2 19 33 18
polynomial 7 59 71 49

Table 1 Comparison of f1 score between RAMSVM and TLAMSVM on data set S1


Table 2
Table 2 Comparison of f1 score between RAMSVM and TLAMSVM on data set S2
S1 RAMSVM TLAMSVM
kernel uniform ep gaussian
linear 0.87 0.93 0.94 0.91
gaussian 0.54 0.80 0.80 0.74
polynomial 0.80 0.76 0.81 0.78
Time/s
linear 15 61 53 42
gaussian 3 18 32 19
polynomial 7 59 71 49

Table 2 Comparison of f1 score between RAMSVM and TLAMSVM on data set S2

Since RAMSVM model does not need to use local smoothing kernel function to build many MSVMs according to targeted variable, it has advantage of computation time. But as for predictive accuracy, TLAMSVM model still has a better performance no matter we use which Mercer kernel or local smoothing kernel function.
Table 3 and Table 4 record f1 scores and training time of KMSVM and TLAMSVM model on simulated data S1 and S2.
Table 3
Table 3 Comparison of f1 score between KMSVM and TLAMSVM on data sets S1 and S2
S1 model uniform ep gaussian
linear KMSVM 0.29 0.42 0.24
TLAMSVM 0.83 0.85 0.86
gaussian KMSVM 0.46 0.24 0.29
TLAMSVM 0.94 0.92 0.89
polynomial KMSVM 0.51 0.33 0.47
TLAMSVM 0.82 0.81 0.79
Time/s
linear KMSVM 24 31 30
TLAMSVM 43 53 42
gaussian KMSVM 26 37 33
TLAMSVM 19 33 18
polynomial KMSVM 26 33 31
TLAMSVM 59 71 49

Table 3 Comparison of f1 score between KMSVM and TLAMSVM on data sets S1 and S2


Table 4
Table 4 Comparison of f1 score between KMSVM and TLAMSVM on data set S2
S2 model uniform ep gaussian
linear KMSVM 0.33 0.24 0.22
TLAMSVM 0.93 0.94 0.91
gaussian KMSVM 0.49 0.39 0.41
TLAMSVM 0.80 0.80 0.74
polynomial KMSVM 0.38 0.35 0.29
TLAMSVM 0.76 0.81 0.78
Time/s
linear KMSVM 63 71 62
TLAMSVM 61 53 42
gaussian KMSVM 14 38 19
TLAMSVM 18 32 19
polynomial KMSVM 72 87 58
TLAMSVM 59 71 49

Table 4 Comparison of f1 score between KMSVM and TLAMSVM on data set S2

TLAMSVM model provides a better prediction than KMSVM model with same Mercer or local smoothing kernel function and similar training time. The values in Table 5 are the average results of prediction of model RAMSVM and TLAMSVM on S1.
Table 5
Table 5 Comparison between RAMSVM and TLAMSVM on data set S1
marker Mercer RAMSVM TLAMSVM
uniform ep gaussian
Xi1 linear 0.49 0.85 0.86 0.87
gaussian 0.69 0.91 0.91 0.91
polynomial 0.44 0.87 0.87 0.88
Xi2 linear 0.85 0.85 0.81 0.87
gaussian 0.87 0.91 0.91 0.92
polynomial 0.86 0.84 0.74 0.85
multiple linear 0.82 0.83 0.85 0.86
gaussian 0.86 0.94 0.92 0.89
polynomial 0.84 0.82 0.81 0.79

Table 5 Comparison between RAMSVM and TLAMSVM on data set S1

Here we compare three cases, which have different feature space respectively, Xi1 only, Xi2 only and both of them. We find that the f1 score of TLAMSVM model is much larger than that of RAMSVM model when using single feature variable. So TLAMSVM model does not rely on the number of feature variables in model so much as RAMSVM model does. Likewise, TLAMSVM model has better results with any kernel function in three cases.
3.2 Real data analysisIn this subsection, we demonstrate the classification effects of TLAMSVM model via using two real data sets, Cervical Cancer data and Protein Localization Sites data, named as D1 and D2. Both of them are from University of California Irvine machine learning repository website. The Cervical Cancer data is derived from Universitario de Caracas Hospital. It is collected from 858 patients and it includes demographic information, patients' habits and their historic medical records. The Protein Localization Sites data is derived from a research about an expert system, which uses the information of the amino acid sequence and source origin to predict localization sites of protein. It has 1 484 samples and 10 categories.
3.2.1 Application to Cervical Cancer data setCervical cancer remains one of the leading causes of death among patients with malignant gynecologic disease. It could be caused by a number of factors, for instance, viral infection, the number of sexual behavior, the number of pregnancies, smoking habits etc. From the viewpoint of Computed Aided Diagnosis System, there are several diagnosis methods in the detection of pre-cancerous cervical lesions. Here are the four most frequently-used methods in examination, colposcopy using acetic acid-Hinselmann, colposcopy using Lugol iodine-Schiller, cytology and biopsy[14]. The data set records the results of these four tests(Hinselmann, Schiller, cytology and biopsy). From these four tests results, we generate artificially a response variable y∈{1, 2, 3} that reflects the confidence of the diagnosis. When all of the four test results are normal, y=1, when the results of any one or two test are normal, y=2, and y=3 for rest situations. Then we get a 3-category data set.
After some common data preprocessing procedures, we get a data set without missing data nor outliers. And we reduce the number of variables in model from 31 to 12 by calculating the conditional information entropy and information gain of each variable. And we regard feature "age" as targeted variable in this data set. Due to the special property of data, we consider use a combination of undersampling and oversampling techniques (ENN+SMOTE) as balancing tool to process training data. The principle of ENN is to remove the samples whose class is different from class of the majority of its k-nearest neighbor. SMOTE[15] is a relatively basic oversampling method that reduces the differences of number between different categories. For each sample of the minority class, connect this sample between the nearest k samples of it with same category, then there will be k lines, generate one sample randomly from each line, then get k new samples of the minority class.
Table 6 is the numerical results of model TLAMSVM and RAMSVM with different Mercer kernel functions and local smoothing kernel functions on cervical cancer data set. It can be seen that, when using the same local smoothing kernel function, the polynomial Mercer kernel has the best classification accuracy. With the same Mercer kernel function, the Epanechikov and gaussian local kernel function has the relatively better prediction performance. Comparing the average accuracy of TLAMSVM model, the average prediction effect of TLAMSVM model is slightly better than that of RAMSVM model.
Table 6
Table 6 Summary of accuracy of TLAMSVM and RAMSVM on Cervical Cancer data set
accuracy RAMSVM TLAMSVM
kernel uniform ep gaussian
linear 0.68 0.83 0.86 0.87
gaussian 0.62 0.68 0.56 0.47
polynomial 0.87 0.87 0.90 0.89
Time/s
linear 7 43 53 42
gaussian 2 18 33 18
polynomial 7 59 71 49

Table 6 Summary of accuracy of TLAMSVM and RAMSVM on Cervical Cancer data set

We can get the corresponding estimated coefficient β(u) for any given value of targeted variable u. Figure 1 is the line chart of the coefficient in the first dimension, it shows the variation trend of each coefficient corresponding to 5 feature variables as "age" changes.
Fig. 1
Download: JPG
larger image


Fig. 1 β1(u) vs. the targeted variable (age) on D1

From Fig. 1 we can see, the influence of feature "Smokes" is small when age is between 20 and 40, but it increases a lot when age becomes larger. It means when facing the young patients, the information about his or her smoking habit is not as important as other variables. However, when judging the middle-aged patients, their smoke habit could offer more information for classification.
Figure 2 describe the inner product of classification function vector of sample and each simplex (〈f(x), Wj〉, (j=1, 2, 3)) when targeted variable U takes different values.
Fig. 2
Download: JPG
larger image

solid 〈f(x), W1〉, dashed: 〈f(x), W2〉, dotted:〈f(x), W3〉.
Fig. 2 Predictions for the three given samples on D1

From Section 3 we know that the sample will be classified to the category which has the maximal inner product value. So at any fixed age, the class which the top line corresponds to is the prediction result. Fig. 2 reflects that different value for the targeted variable will affect final classification result of 3 given samples. These three subgraphs are the prediction details of three samples, which belong to the class-1, class-2 and class-3 respectively. The age of these these samples are 29, 52, 51 in sequence.
Firstly, we can find that TLAMSVM model gives the correct prediction according to the prediction rules. Secondly, for identical feature variables, different values of the targeted variable will affect the result of determination for that sample. That is to say, with fixed other physiological characteristics, if we assume that the patient is at different ages, it would come out different diagnosis results. Take the third sample as an example, the patient is 51 years old and the true label is 3, which means the patient has high-risk of suffering from cervical cancer. However, we would give an entirely different judgement if we regard the same physiological characteristics values as information from a patient with other age. Patient whose age is from 20 to 30, would be diagnosed as the class-1(no cervical cancer), and patient whose age is over 75, would be diagnosed as class-2(no confirmed). Thus it can be observed that the TLAMSVM model can accurately grasp the essential information of the sample by studying the relationship between the targeted variable and other feature variables to provide better prediction.
3.2.2 Application to Protein Localization Sites data setProtein Localization Sites data set has 8 features, which are scores of some methods for measuring information about protein or amino acid. It has 10 classes to represent the different localization sites of protein, such as CYT(cytosolic or cytoskeletal), NUC(nuclear), MIT(mitochondrial) etc. We use model RAMSVM and TLAMSVM to train data and get classifiers.
Figure 3 is based on D2 (Protein Localization Sites data).
Fig. 3
Download: JPG
larger image

solid: feature 'gvh', dashed: feature 'alm', dotted: feature 'mit', dashed-dotted: feature 'nuc'.
Fig. 3 Estimated values for the coefficients of the first three dimensions on D2

And we set "mcg"(McGeoch's method for signal sequence recognition) as targeted variable. Because of limitations of space, Fig. 3 only demonstrates the coefficient β(u) in the first, second and third dimension. And these subgraphs are enough to show the variability of coefficients in model TLAMSVM.
Figure 4 is the inner product graph for a certain sample, whose feature "mcg" equals to 0.45 and real label is class-1. From Fig. 4, we know that the TLAMSVM model gives a correct prediction (class-1) for this sample, and the classification results would be different if the "mcg" value of this sample was different.
Fig. 4
Download: JPG
larger image


Fig. 4 Predictions for a given sample under different targeted variables (mcg) on D2

Table 7 records the effect of TLAMSVM and RAMSVM model on Protein Localization Sites data set. From Table 7 we can get the same conclusion as Cervical Cancer data. The classification accuracy of TLAMSVM model is better than that of RAMSVM model, consistent with the simulation results. And we can get the same result of classification prediction from the model and its prediction rule.
Table 7
Table 7 Summary of accuracy of TLAMSVM and RAMSVM on Protein Localization Sites dataset
accuracy RAMSVM TLAMSVM
kernel uniform ep gaussian
linear 0.78 0.87 0.85 0.84
gaussian 0.76 0.89 0.84 0.85
polynomial 0.76 0.86 0.85 0.85
Time/s
linear 8 32 33 32
gaussian 11 33 33 32
polynomial 9 31 32 31

Table 7 Summary of accuracy of TLAMSVM and RAMSVM on Protein Localization Sites dataset

4 DiscussionIn this study, TLAMSVM model is proposed to solve multi-category classification problems and make a better prediction for different samples. In numerical experiment section, we firstly predict the risk of disease and show its dependency on the age. Building age-specific predictive model helps to study the appropriate time of intervention and find potential features which provides individualized treatment for a certain disease. The fitting coefficient β(u) describes the age sensitivity of each feature variable to disease risk. Then we obtain good results on protein localization sites data and make a more "personalized" prediction for samples whose "mcg" values are different. Finally, the simulation results and real data analysis show that the different selections for Mercer kernel and smoothing kernel functions may lead to minor differences in prediction accuracy.
We consider the feature variables with the targeted variable dependency, but it is easy to ignore the features which have stable effect. Thus, we will add some features with invariant effects to the initial classification function in future work, like α(U)+XTβ(U)+ZTγ.
References
[1] Cortes C, Vapnik V. Support vector networks[J]. Machine Learning, 1995, 20(3): 273-297.
[2] Yau G X, Zhang C. Multi-category angle-based classifier refit[EB/OL]. (2016-07-19)[2017-08-12]. https://arxiv.org/abs/1607.05709.
[3] Moguerza J M, Mu?oz A. Support vector machines with applications[J]. Statistical Science, 2006, 21(3): 322-336. DOI:10.1214/088342306000000493
[4] Allwein E L, Schapire R E, Singer Y. Reducing multiclass to binary:a unifying approach for margin classifiers[J]. Proc International Conference on Machine Learning, San Francisco Ca:Morgan Kaufmann, 2000, 1(2): 9-16.
[5] Hastie T, Tibshirani R. Classification by pairwise coupling[C]//Conference on Advances in Neural Information Processing Systems. MIT Press, 1998: 507-513.
[6] Crammer K, Singer Y. On the algorithmic implementation of multiclass kernel-based vector machines[J]. J Machine Learning Res, 2001, 2(2): 265-292.
[7] Lee Y, Lin Y, Wahba G. Multicategory support vector machines:theory and application to the classification of microarray data and satellite radiance data[J]. Journal of the American Statistical Association, 2004, 99(465): 67-81. DOI:10.1198/016214504000000098
[8] Zhang C, Liu Y, Wu Z. On the effect and remedies of shrinkage on classification probability estimation[J]. American Statistician, 2013, 67(3): 134-142. DOI:10.1080/00031305.2013.817356
[9] Zhang C, Liu Y. Multicategory angle-based large-margin classification[J]. Biometrika, 2014, 3(3): 625-640.
[10] Zhang C, Liu Y, Wang J, et al. Reinforced angle-based multicategory support vector machines[J]. Journal of Computational and Graphical Statistics, 2016, 25(3): 806-825. DOI:10.1080/10618600.2015.1043010
[11] Wei Z, Wang K, Qu H Q, et al. From disease association to risk assessment:an optimistic view from genome-wide association studies on type 1 diabetes[J]. Plos Genetics, 2009, 5(10): e1000678. DOI:10.1371/journal.pgen.1000678
[12] Torr P H S. Locally linear support vector machines[C]//International Conference on International Conference on Machine Learning. Omnipress, 2011: 985-992.
[13] Chen T, Wang Y, Chen H, et al. Targeted local support vector machine for age-dependent classification[J]. Journal of the American Statistical Association, 2014, 109(507): 1174-1187. DOI:10.1080/01621459.2014.881743
[14] Fernandes K, Cardoso J S, Fernandes J. Transfer Learning with Partial Observability Applied to Cervical Cancer Screening[M] Pattern Recognition and Image Analysis. 2017: 243-250.
[15] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE:synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.


相关话题/数据 图片 北京 数学 信息

  • 领限时大额优惠券,享本站正版考研考试资料!
    大额优惠券
    优惠券领取后72小时内有效,10万种最新考研考试考证类电子打印资料任你选。涵盖全国500余所院校考研专业课、200多种职业资格考试、1100多种经典教材,产品类型包含电子书、题库、全套资料以及视频,无论您是考研复习、考证刷题,还是考前冲刺等,不同类型的产品可满足您学习上的不同需求。 ...
    本站小编 Free壹佰分学习网 2022-09-19
  • 激光雷达回波强度数据辐射特性分析
    杜松1,2,李晓辉1,刘照言1,陈育伟1,3,王震1,唐伶俐1,贺文静11.中国科学院光电研究院定量遥感信息技术重点实验室,北京100094;2.中国科学院大学,北京100049;3.芬兰地球空间研究所,马萨拉FI-024302018年2月13日收稿;2018年4月12日收修改稿基金项目:科技部国际 ...
    本站小编 Free考研考试 2021-12-25
  • 基于MODIS数据的黄河源区土壤干湿状况时空格局变化
    刘馨1,宋小宁1,冷佩2,夏龙11.中国科学院大学资源与环境学院,北京100049;2.中国农业科学院农业资源与农业区划研究所,北京1000812017年11月23日收稿;2018年3月17日收修改稿基金项目:国家重点研发计划项目(2016YFC0501801)资助通信作者:宋小宁,E-mail:s ...
    本站小编 Free考研考试 2021-12-25
  • 基于时间序列DMSP/OLS夜间灯光数据的GDP预测模型
    顾鹏程1,2,王世新1,周艺1,刘文亮1,尚明1,21.中国科学院遥感与数字地球研究所,北京100101;2.中国科学院大学资源与环境学院,北京1000492017年12月26日收稿;2018年3月29日收修改稿基金项目:国家重点研发计划(2017YFB0503805)和高分辨率对地观测系统重大专项 ...
    本站小编 Free考研考试 2021-12-25
  • 北京张坊地区中上元古界中岩溶发育与构造作用
    刘建明1,张玉修1,曾璐1,琚宜文1,芮小平2,乔小娟11.中国科学院大学地球与行星科学学院,北京100049;2.中国科学院大学资源与环境学院,北京1000492017年11月3日收稿;2018年3月23日收修改稿基金项目:北京岩溶水资源勘查评价工程项目(BJYRS-ZT-03)和中国科学院大学校 ...
    本站小编 Free考研考试 2021-12-25
  • 城市功能区语义信息挖掘与遥感分类
    李娅1,2,刘亚岚1,任玉环1,王智灏1,2,曲畅31.中国科学院遥感与数字地球研究所,北京100101;2.中国科学院大学,北京100049;3.北京大学地球与空间科学学院遥感与地理信息系统研究所,北京1008712017年10月24日收稿;2018年1月19日收修改稿基金项目:国家自然科学基金青 ...
    本站小编 Free考研考试 2021-12-25
  • 中国北方农牧交错带草本植物δ15N梯度变化及其对环境信息的指示
    刘贤赵1,2,张勇1,宿庆3,李振国1,冯腾1,宋焱11.湖南科技大学资源环境与安全工程学院,湖南湘潭411201;2.中国科学院南京土壤研究所土壤与农业可持续发展国家重点实验室,南京210000;3.湖南科技大学生命科学学院,湖南湘潭4112012017年7月24日收稿;2017年11月30日收修 ...
    本站小编 Free考研考试 2021-12-25
  • 基于车载激光扫描数据的城区道路自动提取
    赵海鹏1,2,习晓环1,王成1,雷钊1,21.中国科学院遥感与数字地球研究所中国科学院数字地球重点实验室,北京100094;2.中国科学院大学资源与环境学院,北京1000492017年7月24日收稿;2017年9月30日收修改稿基金项目:国家科技部重大科学仪器设备开发专项(2013YQ120343) ...
    本站小编 Free考研考试 2021-12-25
  • 一种基于深度学习的电磁信息泄漏检测方法*
    计算机、信息化通信终端等电子信息设备在处理信息过程中,会不可避免地向周围环境发射电磁波,产生无意的、非主观通信的电磁辐射。由于这些无意发射的电磁信号不仅频谱成份丰富,而且携带着大量有用信息,存在信息泄漏隐患。令人深思的是,目前对网络攻击、系统安全漏洞和计算机病毒等网络信息安全问题非常重视,却对信息设 ...
    本站小编 Free考研考试 2021-12-25
  • 分离式飞机应急数据记录跟踪系统设计与试验*
    在法航AF447和马航MH370等海上空难中,由于飞行数据记录器(FlightDataRecorder,FDR),也称“黑匣子”,固定在机身中,空难发生时记录器随飞机残骸沉入海底,给打捞和定位造成困难,甚至无法打捞,严重制约了及时救援和事故调查。特别是马航MH370事故,数国历时几年竭尽全力的救援和 ...
    本站小编 Free考研考试 2021-12-25
  • 基于中心-对数半长的区间数据主成分分析*
    主成分分析(PrincipalComponentAnalysis,PCA)是一种对包含多个变量的平面数据表进行最佳综合简化的多元分析方法[1]。其主要目的是在保证数据信息损失最小的前提下,对多元数据进行降维处理,基本原理是通过正交变换将p个相互关联的变量转换为p个相互无关的"主成分",并省却数据变异 ...
    本站小编 Free考研考试 2021-12-25