1.
Introduction
Accurate device modeling is essential for circuit simulation and design. In 1996, BSIM3 Version 3 (commonly abbreviated as BSIM3v3) was established by SEMATECH as the first industry-wide standard of its kind[1]. It has since been widely used by most semiconductor and IC design companies world-wide for device modeling and CMOS IC design. Though the BSIM model is accurate, it needs a long time to adjust for non-ideal effects. Meanwhile, Moore’s law has nearly come to an end and lots of new devices show up which need study[2–4]. It is unwise to invest a huge amount of resources to model a new device; we may just test its circuit characteristics rather than business use. Efficient modeling methods should be proposed for these purposes.
A lot of methods have been proposed for the similar usage. Gustanven proposed a lot of black box Macro-modeling methods for huge complex systems such as the high-voltage system and electronic magnetic system[5]. The whole large systems are viewed as a black box with input and output, and statistical regression models are built to describe these systems. By using this method, the complexity of systems is greatly reduced[6–8]. The same thought can be used in semiconductor device modeling since a semiconductor device can also be viewed as a black box. If an ideal statistical regression model is found, we can save a lot of time and money. For example, the carbon nanotubes field effect transistor (CNT-FET) is a promising candidate for MOS-FET, which generates much less heat and runs as fast as MOS-FET[9]. But CNT-FET shows different I–V characteristics with a different manufacturing process, which is hard for us to build a universal physical model for all of them. A black box statistical modeling is a choice for efficiently modeling different CNT-FET[10, 11].
The keys of statistical modeling are the choice of model and at what extent the model can fit the true data to. Numerous statistical models are proposed for this task. Ordinary least square (OLS) and its regularization method (Ridge and LASSO) are the most commonly used regression approaches[12]. But they are mainly used for linear regression and may not fit for semiconductor device modeling, since the device has a highly nonlinear property. In this paper, we propose two numerical methods for semiconductor device modeling, single-pole denominator numerator fitting (single-pole MRR) and double-pole MRR. They have a great nonlinear curve fitting ability and good numerical stability which are critical in semiconductor device modeling.
2.
Background
A lot of methods have been proposed for the curve fitting task. Suppose we have p dimension input attributes and one dimension output observed data points from the curve,
m{ }}{x_i} in {R^P},{
m{ }}{y_i} in R,{
m{ }}P in {N^ + }.$
$${w^*} = arg mathop {min }limits_w ||y - {X^T}w|{|_2}.$$ | (1) |
If XTX is invertible, the optimal solution of coefficient w is w* = (XTX)–1XTy[1]. The Lasso is a shrinkage method for OLS, which adds the L1 norm to the objective function,
$${w^*} = arg mathop {min }limits_w ||y - {X^T}w|{|_2} + lambda sumlimits_{j = 1}^P {{w_j}} .$$ | (2) |
It can shrink some of the parameters to be exactly zeros if positive penalty
Compared with the linear models, the rational model,
m{ = }}{X^T}{w_{
m n}}/{X^T}{w_{
m d}}$
$${w^*} = arg mathop {min }limits_w ||y - frac{{N(X)}}{{D(X)}}|{|_2}.$$ | (3) |
However, Eq. (3) cannot be solved directly in closed form as there are unknown parameters in the denominator.
Since the 1950s, considerable effort has been devoted to the development of methods for parameters extraction of rational function. Levy, Sanathanan and Koerner, Lawrence and Rogers, and Stahl have presented various techniques by posing linear least squares problems. Pintelon and Guilaume analyze these and several other techniques[13–16]. Vector-Fitting(VF), introduced by Gustavsen and Semlyen and using partial fraction basis, has been widely accepted as a robust modeling method for approximating frequency domain responses[17–19].
In this work, we proposed single-pole MRR and double-pole MRR methods based on vector fitting, which are of great help in a nonlinear curve fitting task. The power of MRR is shown by numerical examples involving artificially created binary function, SMIC 40 nm NMOS DC characteristic, CNT-FET, and the LNA performance model.
3.
Algorithm
3.1
Single-pole MRR
Consider the rational function approximation,
$$y = frac{{N(X)}}{{D(X)}} = frac{{{w_0} + {X^T}{w_{ m n}}}}{{1 - {X^T}{w_{ m d}}}},$$ | (4) |
where
ight]^T}$
We solve Eq. (4) by transforming it into an iterating OLS problem. We knew that equation
$$w_{ m n}^*,w_{ m d}^* = arg mathop {min }limits_{{w_n},{w_d}} ||y - frac{{{w_0} + {X^T}{w_{ m n}}}}{{1 - {X^T}{w_{ m d}}}}|{|_2}.$$ | (5) |
Suppose in our iteration process, we have a set of
m d}^t} ,t = 0,1,...{
m{ }}w_{
m d}^0 = 0$
m d}^{t - 1}$
m d}^t}}{{1 - {X^T}w_{
m d}^{t - 1}}}|{|_2}$
m d}^t = w_{
m d}^{t - 1} + w_{
m d}^ + $
$$begin{split}& ||frac{{1 - {X^T}w_{ m d}^t}}{{1 - {X^T}w_{ m d}^{t - 1}}}|{|_2}*arg mathop {min }limits_{{w_{ m n}},{w_{ m d}}} ||y - frac{{{w_0} + {X^T}w_{ m n}^t}}{{1 - {X^T}w_{ m d}^t}}|{|_2} & qquad = { m{arg}}mathop {min ||}limits_{{w_{ m n}},{w_{ m d}}} (1 - frac{{{X^T}w_{ m d}^ + }}{{1 - {X^T}w_{ m d}^{t - 1}}})*(y - frac{{{w_0} + {X^T}w_{ m n}^t}}{{1 - {X^T}w_{ m d}^t}})|{|_2} & qquad = arg mathop {min }limits_{{w_{ m n}},{w_{ m d}}} ||y - frac{{{X^T}w_{ m d}^ + y}}{{1 - {X^T}w_{ m d}^{t - 1}}} - frac{{{w_0} + {X^T}w_{ m n}^t}}{{1 - {X^T}w_{ m d}^{t - 1}}}|{|_2} & qquad = arg mathop {min }limits_{{w_{ m n}},{w_{ m d}}} ||y - frac{{{w_0} + {X^T}w_{ m n}^t + {X^T}yw_{ m d}^ + }}{{1 - {X^T}w_{ m d}^{t - 1}}}|{|_2}. end{split}$$ | (6) |
Here in the last equation, a new OLS problem is made where the feature vector is
m d}^{t - 1}}},frac{{{X^T}}}{{1 - {X^T}w_{
m d}^{t - 1}}},frac{{{X^T}y}}{{1 - {X^T}w_{
m d}^{t - 1}}}
ight]$
m n}^t,w_{
m d}^ +
ight]$
m d}^ + $
m d}^t$
m d}^t = w_{
m d}^{t - 1} + w_{
m d}^ + $
m d}^t$
m d}}$
m d}^t = C$
m d}^ + = w_{
m d}^t - w_{
m d}^{t - 1} = 0$
m d}^t}}{{1 - {X^T}w_{
m d}^{t - 1}}}|{|_2} = 1$
m n}^*,w_{
m d}^*$
m d}}$
3.2
Double-pole MRR
Consider the rational function approximation,
$$y = frac{{N(X)}}{{D(X)}} = frac{{{w_{01}} + {X^T}{w_{{ m n}1}}}}{{1 - {X^T}{w_{{ m d}1}}}} + frac{{{w_{02}} + {X^T}{w_{{ m n}2}}}}{{1 - {X^T}{w_{{ m d}2}}}}.$$ | (7) |
Though we can still multiply a factor
m d}1}^t}}{{1 - {X^T}w_{{
m d}1}^{t - 1}}} * $
m d}2}^t}}{{1 - {X^T}w_{{
m d}2}^{t - 1}}}|{|_2}$
m d}1}}{X^T}{w_{{
m d}2}}$
$$begin{split}&frac{{w_{01}^t + {X^T}w_{{ m n}1}^t}}{{1 - {X^T}w_{{ m d}1}^{t - 1}}} + frac{{w_{02}^t + {X^T}w_{{ m n}2}^t}}{{1 - {X^T}w_{{ m d}2}^{t - 1}}}&quad = left(1 - frac{{{X^T}w_{{ m d}1}^ + }}{{1 - {X^T}w_{{ m d}1}^{t - 1}}} - frac{{{X^T}w_{{ m d}2}^ + }}{{1 - {X^T}w_{{ m d}2}^{t - 1}}} ight)y,end{split}$$ | (8) |
$$frac{{w_{01}^t + {X^T}w_{{ m n}1}^t}}{{1 - {X^T}w_{{ m d}1}^{t - 1}}} + frac{{w_{02}^t + {X^T}w_{{ m n}2}^t}}{{1 - {X^T}w_{{ m d}2}^{t - 1}}} + frac{{{X^T}yw_{{ m d}1}^ + }}{{1 - {X^T}w_{{ m d}1}^{t - 1}}}{ m{ + }}frac{{{X^T}yw_{{ m d}2}^ + }}{{1 - {X^T}w_{{ m d}2}^{t - 1}}} = y.$$ | (9) |
Here
m d}1}^{t - 1} + w_{{
m d}1}^ + = w_{{
m d}1}^t$
m d}2}^{t - 1} + w_{{
m d}2}^ + = w_{{
m d}2}^t$
m d}1}^{t - 1}}},
ight. $
m d}1}^{t - 1}}},frac{1}{{1 - {X^T}w_{{
m d}2}^{t - 1}}},frac{{{X^T}}}{{1 - {X^T}w_{{
m d}2}^{t - 1}}},frac{{{X^T}y}}{{1 - {X^T}w_{{
m d}1}^{t - 1}}},frac{{{X^T}y}}{{1 - {X^T}w_{{
m d}2}^{t - 1}}}
ight]$
m n}1}^t, w_{02}^t, w_{{
m n}2}^t, w_{{
m d}1}^ + , w_{{
m d}2}^ +
ight]$
m d}1}}, {w_{{
m d}2}}$
m d}1}}, {w_{{
m d}2}}$
m d}1}}, {w_{{
m d}2}}$
m d}}$
m d}}$
m d}}$
3.3
Data preprocessing method
3.3.1
Normalization
The key step of MRR is accurately solving the over-determined Eq. (7). However, because the condition of such a problem is poor, the solving of the normal equation is of reduced numerical stability and may result in large errors in the solution. Besides, if there are degrees of magnitude difference between y and x, the rank of matrix A might be rank deficient, which also lead to the worst solution. In order to circumvent these cases, normalizing input attributes x and target output y before the parameters extraction procedure is usually of great help.
3.3.2
Logarithm transformation
Semiconductor device I–V characteristic value might be small in number and varies in a wide range, e.g. the Idof NMOS-FET may vary from
m{ to }};1 times {10^{ - 13}}$
4.
Experiment
4.1
Fitting artificial created function
To illustrate the validity of the proposed method, we firstly consider an artificially created function defined
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/9/PIC/18010005-1.jpg'"
class="figure_img" id="Figure1"/>
Download
Larger image
PowerPoint slide
Figure1.
(Color online) The original function and MRR fitting function. (a) Original artificial function. (b) Single-pole MRR fitting result. (c) Step NMSEs of single-pole MRR. (d) Step NMSEs of double-pole MRR.
4.2
Fitting I–V characteristics curve of SMIC 40 nm NMOS-FET
DC characteristics of NMOS-FET are used to show the performance of MRR, compared with OLS and LASSO. As BSIM has been widely used in industry for years and, to some extent, it can represent the authentic physical property of NMOS-FET, so we use Cadence and SPICE to get DC simulation data of SMIC 40 nm NMOS-FET with a channel width of 1 μm and channel length of 40 nm. In this case, we choose two independent variables,
m d}}$
m g}}$
m d}}$
4.2.1
Analysis of fitting result
Fig. 2 shows the fitting results of different algorithms for the dataset. We plot the Id–Vd curve and Id–Vg curves to show the fitting performance. Table 1 shows the parameters and NMSE of the fitting result. Single-pole MRR with sextic polynomial, double-pole MRR with quartic polynomial and OLS nonic polynomial are used for contrast. The number of parameters in each model is close. Because of the regularization property, we choose the nonic polynomial for LASSO to include as many meaningful features as possible. After 10-fold cross validation, the optimal regularization parameter
Algorithm | Single-pole MRR (sextic polynomial) | Double-pole MRR (quartic polynomial) | OLS (nonic polynomial) | LASSO (nonic polynomial) |
Parameter | 55 | 58 | 55 | 36 |
NMSE | 3.0157 × 10–8 | 2.631 × 10–6 | 4.357 × 10–4 | 5.082 × 10–4 |
Table1.
NMSE and parameters number results of different algorithms.
Table options
-->
Download as CSV
Algorithm | Single-pole MRR (sextic polynomial) | Double-pole MRR (quartic polynomial) | OLS (nonic polynomial) | LASSO (nonic polynomial) |
Parameter | 55 | 58 | 55 | 36 |
NMSE | 3.0157 × 10–8 | 2.631 × 10–6 | 4.357 × 10–4 | 5.082 × 10–4 |
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/9/PIC/18010005-2.jpg'"
class="figure_img" id="Figure2"/>
Download
Larger image
PowerPoint slide
Figure2.
(Color online) The fitting results of different algorithms. (a) Id–Vd curve by single-pole MRR. (b) Id–Vg curve by single-pole MRR. (c) Id–Vd curve by double-pole MRR. (d) Id–Vg curve by double-pole MRR. (e) Id–Vd curve by OLS. (f) Id–Vg curve by OLS. (g) Id–Vd curve by LASSO. (h) Id–Vg curve by LASSO.
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/9/PIC/18010005-3.jpg'"
class="figure_img" id="Figure3"/>
Download
Larger image
PowerPoint slide
Figure3.
(Color online) Zoom-in pictures of Id–Vg curves fitting by different algorithms. (a) Zoom-in of Id–Vg curve by single-pole MRR, Vg ranges from 0.15 to 0.45 V. (b) Zoom-in of Id–Vg curve by single-pole MRR, Vg ranges from 0.65 to 1.1 V. (c) Zoom-in of Id–Vg curve by double-pole MRR, Vg ranges from 0.15 to 0.45 V. (d) Zoom-in of Id–Vg curve by double-pole MRR, Vg ranges from 0.65 to 1.1 V. (e) Zoom-in of Id–Vg curve by OLS, Vg ranges from 0.15 to 0.45 V. (f) Zoom-in of Id–Vg curve by single-pole MRR, Vg ranges from 0.65 to 1.1 V. (g) Zoom-in of Id–Vg curve by LASSO, Vg ranges from 0.15 to 0.45 V. (h) Zoom-in of Id–Vg curve by LASSO, Vg ranges from 0.65 to 1.1 V.
The Table 1 details the NMSE of four algorithms. As the data is very small, the Normalized Mean Square Error (NMSE) is used, as shown in Eq. 10. Single-pole MRR with sextic polynomial
$$frac{1}{m}sumlimits_{i = 1}^K {{{left[frac{{{t_i}}}{{max (t)}} - frac{{{y_i}}}{{max (t)}} ight]}^2}} .$$ | (10) |
NMSE is
Although single-pole MRR reaches the minimum of NMSE in the results, double-pole MRR uses a lower polynomial (cubic polynomial) and the fewest parameters to reach the second best fitting solution. In fact, it is the best model with polynomial of low degree (1, 2, 3). Double-pole MRR with cubic polynomial’s NMSE is 1.593 × 10?5. This is very helpful in the circumstance where the number of input attributes is large and polynomial of a high degree cannot be achieved due to the limit of CPU memory, e.g. a sextic polynomial of 12 input attributes requires 1237 GB CPU memory.
4.2.2
Analysis of convergence
The convergence of the algorithm is critical, for our parameters extraction method is based on iteration. As has been analyzed, the stability of every least squared problem would be improved after data normalization. Single-pole MRR has very good numerical stability with all polynomial while double-pole MRR shows good numerical stability for a polynomial of low degrees. When the polynomial degree is high, double-pole MRR may suffer from rank deficiency and the numerical stability will decrease. We get the sequential NMSE of two MRRs’ iteration with a quadratic polynomial. The result is plotted in Fig. 4 in log scale. Fig. 4(a) shows step NMSEs of the global optimal fitting results for 40 nm MOSFET. Single-pole MRR adopts a sextic polynomial and double-pole MRR adopts a quartic polynomial. Fig. 4(b) shows step NMSEs of two MRR methods when they adopt the quadratic polynomial. The best NMSE of single-pole MRR and double-pole MRR with the quadratic polynomial are
4.3
Fitting CNT-FET
In this section, we apply single-pole MRR to CNT-FET DC behavior modeling. Fig. 4 visualizes the measured
m d}}$
m d}}$
m g}}$
m m}$
m nm}$
m g}}$
m d}}$
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/9/PIC/18010005-4.jpg'"
class="figure_img" id="Figure4"/>
Download
Larger image
PowerPoint slide
Figure4.
(Color online) Step NMSEs of two MRR methods in different polynomial degrees. (a) Step NMSEs of MRR methods in the situation of a polynomial of high degree. (b) Step NMSEs of MRR methods in the situation of a polynomial of low degree.
Normalized mean square error (NMSE) of Idis
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/9/PIC/18010005-5.jpg'"
class="figure_img" id="Figure5"/>
Download
Larger image
PowerPoint slide
Figure5.
(Color online) The observed CNT-FET data point surface and MRR fitting surface. (a) Original data of CNT-FET. (b) Single-pole MRR fitting result.
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/9/PIC/18010005-6.jpg'"
class="figure_img" id="Figure6"/>
Download
Larger image
PowerPoint slide
Figure6.
(Color online) |Id|–|Vd|, |Id|–|Vg| curves and step NMSEs by MRR methods. (a) |Id|–|Vd| curve by single-pole MRR. (b) |Id|–|Vg| curve by single-pole MRR. (c) Zoom-in |Id|–|Vg| curve, Vg ranges from 0–0.8 V. (d) Zoom-in |Id|–|Vg| curve, Vg ranges from 1–2 V. (e) Step NMSEs of single-pole MRR. (f) Step NMSEs of double-pole MRR.
4.4
Fitting LNA performance model
In this section we apply MRR methods to model LNA performance characteristics and demonstrate our algorithm in multi-variables (more than 2) regression. Generally, we get a certain performance of a circuit by means of solving the KCL and KVL equations. The solution is accurate but very time-consuming. If we want to do behavioral simulation and optimization for the RF circuit, directly mathematical mapping between design parameters and performance is highly useful.
We chose an LNA which worked at 4GHz and use Cadence to get simulation data for modeling. Figs. 7(a)–7(c) show the fitting result of three LNA indicators, NF (noise factor), gain and power by single-pole MRR. Figs. 7(d)–7(f) are step NMSEs of single-pole MRR and Figs. 7(g)–7(i) are step NMSEs of double-pole MRR. The performance of two MRR methods are close. We sort the sample point in an increased order and record the index sequence, then we plot data points, predictions and errors according to the sorted index sequence in one figure. We can see that the training result is good. Normalized mean square error (NMSE) of NF, gain and power are 0.0031, 0.00710, and
m{3}}{
m{.5887}} times {10^{{
m{ - }}7}}$
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/9/PIC/18010005-7.jpg'"
class="figure_img" id="Figure7"/>
Download
Larger image
PowerPoint slide
Figure7.
(Color online) The original function and MRR fitting function. (a) Noise factor fitting by single-pole MRR. (b) Gain fitting by single-pole MRR. (c) Power fitting by single-pole MRR.(d) Step NMSEs of single-pole MRR for noise factor. (e) Step NMSEs of single-pole MRR for gain. (f) Step NMSEs of single-pole MRR for power. (g) Step NMSEs of double-pole MRR for noise factor. (h) Step NMSEs of double-pole MRR for gain. (i) Step NMSEs of double-pole MRR for power.
5.
Conclusion
This paper proposes a family of numerical methods---MRR to approximate an unknown system and extract model parameters. We firstly use single-pole MRR to fit an artificial function and the result is extremely good. Then we compare the performance among single-pole MRR, double-pole MRR, OLS and LASSO on SMIC 40 nm DC characteristics dataset. The results show single-pole MRR has the highest fitting precision and double-pole performs better than single-pole MRR in circumstances of a low degree polynomial. The MRR methods have a more powerful nonlinear curve fitting ability than OLS and LASSO and are proved to be numerically stable. CNT-FET and LNA performance indicators are also modeled, of which the fitting results are good as well. But there are two key points for using MRR methods. Firstly, users have to pay close attention to the numerical stability of MRR methods. We have done one artificial function fitting task and three device model fitting tasks for the convergence analysis of two MRR methods. The results show single-pole MRR has better numerical stability than double-pole MRR. Secondly, the datasets used for the fitting curve should be well-distributed and not be sparse. MRR methods are powerful in fitting a highly-nonlinear function but can also lead to overfitting if the dataset is ill-distributed. Our paper shows the MRR methods are good choices for semiconductor devices statistical modeling as well as other highly-nonlinear curve fitting tasks.