The Relationship between Deterministic and Ensemble Mean Forecast Errors Revealed by Global and Loca

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2022-01-02

Jie FENG¹,
Jianping LI^2,3,*,,,
Jing ZHANG⁴,
Deqiang LIU^5,6,
Ruiqiang DING^7,8

Corresponding author: Jianping LI,ljp@bnu.edu.cn;

1.School of Meteorology, University of Oklahoma, Norman, OK 73072, USA
2.College of Global Change and Earth System Science (GCESS), Beijing Normal University, Beijing 100875, China
3.Laboratory for Regional Oceanography and Numerical Modeling, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
4.Cooperative Institute for Research in the Atmosphere, GSD/ESRL/OAR/NOAA, Boulder, CO 80305, USA
5.Fujian Meteorological Observatory, Fuzhou 350001, China
6.Wuyishan National Park Meteorological Observatory, Wuyishan 354306, China
7.State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China
8.Plateau Atmosphere and Environment Key Laboratory of Sichuan Province, Chengdu University of Information Technology, Chengdu 610225, China
Manuscript received: 2018-06-06
Manuscript revised: 2018-09-07
Manuscript accepted: 2018-10-10
Abstract:It has been demonstrated that ensemble mean forecasts, in the context of the sample mean, have higher forecasting skill than deterministic (or single) forecasts. However, few studies have focused on quantifying the relationship between their forecast errors, especially in individual prediction cases. Clarification of the characteristics of deterministic and ensemble mean forecasts from the perspective of attractors of dynamical systems has also rarely been involved. In this paper, two attractor statistics——namely, the global and local attractor radii (GAR and LAR, respectively)——are applied to reveal the relationship between deterministic and ensemble mean forecast errors. The practical forecast experiments are implemented in a perfect model scenario with the Lorenz96 model as the numerical results for verification. The sample mean errors of deterministic and ensemble mean forecasts can be expressed by GAR and LAR, respectively, and their ratio is found to approach $\sqrt{2}$ with lead time. Meanwhile, the LAR can provide the expected ratio of the ensemble mean and deterministic forecast errors in individual cases.
Keywords: attractor radius,
ensemble forecasting,
ensemble mean,
forecast error saturation
摘要:前人研究表明集合平均预报在大样本平均的情况下比确定性（或单一）预报有更高的预报技巧。然而，很少研究关注它们预报误差之间的定量关系，尤其在一些个例预报中。同时，从动力系统吸引子的角度对确定性和集合平均预报的特征进行的研究也很少。本文利用吸引子的两个统计量即全局和局部吸引子半径来揭示确定性和集合平均预报误差的关系。基于Lorenz96模型的完美模式情景下的实际预报试验结果用来作为理论的检验。确定性预报和集合平均预报的样本平均误差可以分别用全局和局部吸引子半径来表达，它们的比值随着预报时间接近$\sqrt{2}$。同时，局部吸引子半径提供了确定性和集合平均预报误差在不同个例中的期望比值。
关键词:吸引子半径,
集合预报,
集合平均,
预报误差饱和

HTML

--> --> -->

1. Introduction

The atmospheric model, as a nonlinear chaotic system, shows sensitivity to initial and model related errors. In other words, any arbitrarily small errors in initial conditions will grow with time, finally causing a loss of most forecast information (Thompson, 1957; Lorenz, 1963, Lorenz, 1965). The deterministic (or single) forecast, as the most widely used product in weather forecasting, can provide useful and informative prediction within a certain range of predictability limit, but is unable to give quantitative estimation of the prediction reliability. Instead, ensemble forecasting is a feasible approach to supply quantitative reliability information for forecasts in the form of probability (Leith, 1974; Toth and Kalnay, 1993, Toth and Kalnay, 1997; Molteni et al., 1996). Another general advantage of ensemble forecasting over deterministic forecasts is the forecast errors can be efficiently reduced by nonlinear filtering, in which the arithmetic mean of an ensemble of forecasts is taken (Leith, 1974; Szunyogh and Toth, 2002).
Over the past several decades, numerous studies have demonstrated the overall higher forecasting skill of the ensemble mean over that of deterministic forecasts by using numerical models with different degrees of complexity (Houtekamer and Derome, 1995; Toth and Kalnay, 1997; Buizza et al., 1999; Wang and Bishop, 2003; Wei et al., 2008; Zheng et al., 2009; Feng et al., 2014; Duan and Huo, 2016). However, quantitative estimations and comparisons of sample-mean deterministic and ensemble mean forecast errors are subject to sampling errors as a result of limited numbers of forecast samples and ensemble members, especially in complex numerical weather prediction models. It is even more challenging to compare the deterministic and ensemble mean forecasting skill for specific weather and climate events, due to the sampling uncertainties from the day-to-day variation of the underlying flow (Toth and Kalnay, 1997; Corazza et al., 2003).
The forecasts and the corresponding verifying references (generally the analysis states) are the evolving states of the model and reference attractors, respectively. Therefore, the forecast errors essentially are the distances between states in attractor space. (Li et al., 2018) proposed two statistics, i.e., the global and local attractor radii (GAR and LAR, respectively), with regard to the average distances between states on attractors. GAR measures the average distance between two randomly selected states on an attractor, while LAR quantifies the average distance of all states on the attractor from a given state. For complex nonlinear dynamical systems, e.g., the atmosphere, GAR and LAR can be estimated simply using a long time series of their observed states. Moreover, GAR is found to be a more accurate criterion to measure the predictability limit than the traditional saturated value of the sample mean deterministic forecast errors. The latter, due to model errors, usually overestimates the actual error size that totally chaotic forecasts should have on average (Li and Ding, 2015; Li et al., 2018). In our study, GAR and LAR will be further used to interpret the differences of deterministic and ensemble mean forecast errors in both sample-mean and single-case contexts without running the numerical forecasts. It is expected to supply a reference for the verification and assessment of the skill of deterministic and ensemble mean forecasts.
The paper is organized as follows: Section 2 briefly introduces the definitions of GAR and LAR used in this study and the relevant theories. The experimental setup is presented in section 3. Section 4 displays and analyzes the roles of the attractor statistics in interpreting the relationship between deterministic and ensemble mean forecast errors. A discussion and conclusions are provided in section 5.

2. Definitions of GAR and LAR

The definitions of GAR and LAR are based on the premise that a compact attractor has an invariant probability density function and marginal density function. Following (Li et al., 2018), consider x=(x₁,x₂,…,x_n) to be the state vector on a compact attractor $\mathcal{A}$. R_L,i is the LAR of one given state x_i on attractor A, defined by \begin{equation} \label{eq1} R_{{\rm L},i}=R_{\rm L}({x}_i)=\sqrt{E(\|{x}_i-{x}\|^2)},\quad {x}_i,{x}\in \mathcal{A} , \ \ (1)\end{equation} where E($\cdot$) represents the mathematical expectation and \|$\cdot$\| is the L₂ norm of a vector. Geometrically, LAR measures the average root-mean-square (RMS) distance of all states on the attractor from a given state. Assuming that $\mathcal{A}$ does not vary with time, for a specific state x_i, R_L,i is an invariant quantity. Particularly, if x_i is chosen to be the mean state x_E of $\mathcal{A}$, R_E is calculated by $\sqrt{E(\|x_{E}-x\|^2)}$ and is defined as the attractor radius. It has the same form as the standard deviation (SD) in statistics, measuring the variability of a variable.
Theorem 1: Let d_i and d_j denote the RMS distances of two states x_i and x_j on $\mathcal{A}$ from the mean state x_E. Let R_L,i and R_L,j represent the LAR of x_i and x_j, respectively. Then, they satisfy the following relationship: \begin{equation} R_{{\rm L},i}>R_{{\rm L},j},\ {\rm if}\ d_{i}>d_{j} . \ \ (2)\end{equation}
This means that the minimum value of LAR is exactly the attractor radius. The proof of Theorem 1 can be referred to in the Appendix.
The RMS of LARs over all states on $\mathcal{A}$ is defined as the GAR: \begin{equation} \label{eq2} R_{\rm G}=\sqrt{E(R_{\rm L}^2)}=\sqrt{E(\|{x}-{y}\|^2)} , \ \ (3)\end{equation} where x and y are two randomly selected state vectors from $\mathcal{A}$. GAR is an estimate of the average RMS distance between any two states on the same attractor space.
Theorem 2: A constant proportional relationship between R_G and attractor radius R_E of a compact attractor $\mathcal{A}$ exists as: \begin{equation} \label{eq3} R_{\rm G}=\sqrt{2}R_{\rm E} . \ \ (4)\end{equation}
The two statistics, GAR and LAR, and their relevant theorems will be applied to the quantitative estimation of deterministic and ensemble mean forecast errors.

3. Experimental setup

In our experiments, the simple Lorenz96 model (Lorenz, 1996) is used so that a large sample of ensemble and forecasts can be generated with low computational cost to significantly reduce the sampling noise of the estimation of forecast errors. The Lorenz96 model is a 40-variable model and has been widely used to investigate theorems and methods of ensemble prediction and data assimilation (e.g., Lorenz and Emanuel, 1998; Ott et al., 2004; Basnarkov and Kocarev, 2012; Feng et al., 2014). The model is given by: \begin{equation} \label{eq4} dx_k/dt=(x_{k+1}-x_{k-2})x_{k-1}-x_k+F , \ \ (5)\end{equation} where x_k(k=1,2,…,40) represents the state variable and F is a forcing constant. It assumes that x_-1=x₃₉, x₀=x₄₀ and x₄₁=x₁. In the case of F=8, the model has chaotic behavior. The solutions can be solved by a fourth-order Runge-Kutta scheme with a time step of 0.05 time units (tu).
After an initial spin-up stage of 1000 tu, the model is naturally run for a sufficiently long time (10⁴ tu, i.e., 2× 10⁵ time steps) to generate the true states used as references for the forecasts. There are a total of 2× 10⁵ cases initiated from each true state. If the initial true state is denoted by x_t, the initial analysis state x_a is given by superposing analysis errors δ on x_t: \begin{equation} {x}_{\rm a}={x}_{\rm t}+{\delta} . \ \ (6)\end{equation}
For simplicity, each element of the analysis error δ is arbitrarily generated from the Gaussian distribution with expectation 0 and SD 1. The RMS size of δ is then rescaled to 0.1, which is about 3% of the climatic SD of x_t (3.63). Each ensemble perturbation is generated with the same approach as the analysis error but using different realizations of noise and the RMS size of each perturbation is rescaled to 0.1 as well. There are a total of 2.5× 10⁵ ensemble perturbations produced in each case and added and subtracted from the analysis x_a to generate N=5× 10⁵ initial ensemble members (2.5× 10⁵ pairs), making their mean still equal to x_a. The deterministic and ensemble forecasts in each case are derived by integrating analysis states and initial ensemble members for 10 tu using the same model generating the truth (i.e., perfect model scenario).
Although the analyses in our study are not generated through generally used data assimilation approaches, the initial ensemble perturbations have the same probability distribution as the analysis errors and thus are expected to optimally sample the analysis errors. Moreover, the ensemble member number (5× 10⁵) is significantly larger than the model dimension (40). The above two designs eliminate the possible effects from suboptimal initial ensemble members and a limited number of ensembles on the ensemble mean skill.

4. Results

-->

4.2. Evolution of ensemble mean and deterministic forecast states

The differences between deterministic and ensemble mean forecast errors are essentially associated with their differing forecast states. Therefore, the statistical characteristics of deterministic and ensemble mean forecast states are analyzed before comparing their forecast errors. Each panel of Fig. 3 illustrates the probability distribution of the deterministic (black line) and ensemble mean (blue line) forecast states over all cases at the same lead time. It shows that the probability distribution of deterministic forecasts is always consistent with that of the reference (red line) from 0.5 to 6 tu, since they are from the same attractor. In contrast, the probability distribution of ensemble mean states appears to have a narrower range and a higher peak as time increases. In other words, the ensemble mean forecasts tend to, on the whole, move toward the climatic mean value (2.22) with lead time because of the nonlinear smoothing effect of the arithmetic mean of the forecast ensemble (Toth and Kalnay, 1997). Finally, when all forecast members become chaotic with sufficiently long lead time, their ensemble mean without exception would equal the climatic mean in any individual case. It indicates that the ensemble mean reduces the forecast error compared to deterministic forecasts, but at the expense of losing information and variability in forecasts. On the other hand, according to the characteristics of forecast states, it could be expected that the saturation value of sample mean ensemble mean forecast errors would be consistent with the attractor radius, while deterministic forecast errors will saturate at the level of GAR. The conclusion is verified through the results of forecast experiments in section 4.3.

Figure3. Probability (%) distribution of the ensemble mean (blue line; right-hand scale) and deterministic (black line; left-hand scale) forecast states and true states (red line; left-hand scale) over all 2× 10⁵ samples as a function of lead time.

2

4.3. Sample mean forecast errors

-->

4.3. Sample mean forecast errors

Figure 4 shows the RMS error of x₁ for deterministic and ensemble mean forecasts as a function of lead time averaged over all cases. Within the initial 1 tu, the deterministic and ensemble mean forecasts have similar errors due to the offset of the approximate linear growth of the positive and negative initial ensemble perturbations. After 1 tu, ensemble mean forecasts retain smaller errors compared to deterministic ones, and their difference continuously increases with lead time. Finally, deterministic and ensemble mean forecast errors both enter the nonlinear saturation stage and reach 5.13 and 3.63, respectively. The former is the same as the GAR and the latter equals the attractor radius. Their ratio of the saturation values is $\sqrt 2$, as is derived in section 4.2. It is also consistent with the conclusions in (Leith, 1974) and (Kalnay, 2003).

Figure4. RMS error averaged over 2× 10⁵ samples for the deterministic (black solid line) and ensemble mean (red solid line) forecasts as a function of time. The dashed lines are the saturation values, 5.13 and 3.63, of deterministic and ensemble mean forecasts errors, respectively.

2

4.4. Forecast errors in individual cases

-->

4.4. Forecast errors in individual cases

In comparison with the sample mean forecasts, the forecasts of a specific weather or climate event is strongly influenced by the evolving dynamics (Ziehmann et al., 2000; Corazza et al., 2003), and it is thus difficult to estimate the expected values of both deterministic and ensemble mean forecast errors. LAR is a feasible statistic to estimate the expected value of deterministic and ensemble mean forecast errors in individual cases without running practical forecasts. As the nonlinearity in forecasts intensifies, the ensemble mean approaches the mean state (see Fig. 3), while the deterministic forecast tends to be a random state on the attractor. Referring to the definition of LAR in Eq. (2), the ratio r of the expected values of deterministic and ensemble mean forecast errors for a specific predicted state x_i can be expressed by: \begin{equation} r=\frac{R_{{\rm L},i}}{\|{\textbf{x}}_i-{\textbf{x}}_{\rm E}\|}=\frac{\sqrt{\|{\textbf{x}}_i-{\textbf{x}}_{\rm E}\|^2+R_{\rm E}^2}}{\|{\textbf{x}}_i-{\textbf{x}}_{\rm E}\|} . \ \ (7)\end{equation}
Figure 5 shows the variation of r as a function the true state x₁. It can be seen that the ensemble mean has the maximum advantage over the deterministic forecast if the truth (or the observed state) is close to the climatic mean state. When the truth gradually deviates from the mean state, the superiority of the ensemble mean over deterministic forecasts diminishes fast. For an event within 1 to 2 SD, r ranges approximately from 0.7 to 0.9. Once the event is out of 2 SD, r is almost 0.95, which means the ensemble mean and deterministic forecasts perform very similarly. This indicates that the ensemble mean has no advantage over deterministic forecasts in predicting the variabilities of extreme events, and the overall better performance of the former (see Fig. 4) originates from its higher skill for neutral events. With a long-term series of a variable, its distribution of r can be estimated in advance and used as a reference for deterministic and ensemble mean forecast skill in individual cases, especially for long-range forecasts.

Figure5. Ratio between the expected values of the ensemble mean (e_EM) and deterministic (e_Det) forecast errors of x₁ as a function of the observed value of x₁. The red dashed line represents the mean state and the black dashed lines are 1 and 2 SD, respectively.

To verify the above result, the practical errors of deterministic and ensemble mean forecasts are compared. The forecast skills are assessed against the truth divided into three categories——namely, the neutral (within 1 climatic SD), weak extreme (within 1-2 SD), and strong extreme (beyond 2 SD) events. Figure 6 compares the deterministic and ensemble mean forecast errors for the three groups of events at lead times of 1, 2, 3 and 4 tu. It can be seen that at 1 tu the deterministic and ensemble mean forecast errors are within similar ranges; at later times, the range of the ensemble mean errors, due to the nonlinear filtering, is evidently smaller than that of the deterministic forecast errors. After 1 tu, for both the deterministic and ensemble mean forecasts, the forecast errors of an extreme event are overall larger than those of a neutral event at the same lead time, as shown in Table 1, which is essentially related to the distribution of LAR on an attractor. At long lead time (4 tu), the ratios between the average ensemble mean and deterministic forecast errors are 0.54 (1.69 vs 3.11), 0.87 (4.19 vs 4.79) and 0.99 (7.23 vs 7.33) for neutral, weak and strong extreme events, respectively, which are within the range of the expected ratio in Fig. 5. At shorter lead times, the errors of deterministic and ensemble mean forecasts become closer for neutral and weak extreme events, but the ensemble mean performs much worse (about a 20% error increase at 1 and 2 tu) for strong extreme events. For more extreme events at a given lead time, the ensemble mean forecasts are less likely to have small RMS errors, especially for longer lead times (see Figs. 6c, f, i and l).

Figure6. RMS error of the ensemble mean and the deterministic forecasts for neutral events within one SD at time (a) 1 tu, (d) 2 tu, (g) 3 tu and (j) 4 tu. The second and third columns are same as the first, but for weak extreme states within 1-2 SD and strong extreme states out of 3 SD, respectively.

5. Discussion and conclusions

In this study we investigate the quantitative relationship between the forecast errors of deterministic and ensemble mean forecasts using the Lorenz96 model as an example. Instead of evaluating the results from a large number of forecast samples as most studies do, the skills of deterministic and ensemble mean forecasts are compared by using two statistics defined on attractors, namely the global and local attractor radii (GAR and LAR, respectively). GAR and LAR quantitatively describe the average distances among states on the same attractors, which are found closely related to forecast errors. The sample mean saturated errors of deterministic and ensemble mean forecasts with a perfect model can be approximately estimated by the GAR and attractor radius, respectively, and their ratio equals $\sqrt 2$. Moreover, the expected ratio between deterministic and ensemble mean forecast errors in individual cases can be quantified by the LAR-related statistics. The results indicate that the superiority of the ensemble mean over deterministic forecasts significantly reduces from predicting neutral to strong extreme events.
GAR and LAR can be applied to practical weather and climate predictions. Since GAR and LAR are independent of specific forecast models, but derived from the attractor of observed states, they can provide objective and accurate criteria for quantifying the predictability of sample mean forecasts and individual cases in operations, respectively. The deviations of GAR and LAR between observed and practical model states may indicate the level of model deficiencies and give guidance on the development of model performance.
The relative performance of deterministic and ensemble mean forecasts revealed by GAR and LAR will not change for practical weather and climate forecasts with model errors. However, GAR and LAR calculated based on the observed states may introduce bias when used to estimate the expected errors of deterministic and ensemble mean forecasts in imperfect prediction models. It may be more appropriate to use the other two statistics on attractors introduced by (Li et al., 2018)——namely, the global and local average distances (GAD and LAD, respectively), which are similar to GAR and LAR, respectively, but estimate the average distance of states on two different attractors. The application of GAD and LAD to practical deterministic and ensemble mean forecasts will be further studied in the future.
Since the occurrence of neutral events carries large probability, the ensemble mean can still provide a valuable reference for most of the time. However, the filtering effect of the ensemble mean algorithm results in its inherent disadvantage for predicting extreme events, which cannot be easily overcome. In operations, each ensemble forecast usually has not only the amplitude but also the positional errors when predicting specific flow patterns, e.g., a trough. Therefore, the ensemble mean may have stronger smoothing effects than our theoretical results in a simple model, and thus becomes more incapable of capturing extreme flow features. To identify extreme weather, the model performance of deterministic forecasts needs further improvement toward a higher spatial resolution and more accurate model physics and parameterization. Additionally, more efficient post-processing methods for ensemble forecast members need to be developed to extract more accurate probability forecast information.

3

APPENDIX

-->

APPENDIX

This appendix shows the processes to prove Theorem 1. R_L,i and R_L,j are the local attractor radii of the compact attractor $\mathcal{A}$ at state x_i and x_j, respectively. x_E and R_E are the mean state and attractor radii of $\mathcal{A}$. Based on Eq. (2), the expression of R_L,i can be derived as follows: \begin{eqnarray*} R_{{\rm L},i}^2&=&E(\|{x}_i-{x}\|^2) ,\quad {x}_i,{x}\in \mathcal{A} ,\\ &=&E({x}^2-2{x}{x}_i+{x}_i^2) ,\\ &=&E({x}^2)-2{x}_iE({x})+{x}_i^2 ,\\ &=&{x}_i^2-2{x}_{\rm E}{x}_i+({x}_{\rm E}^2+R_{\rm E}^2) ,\\ &=&({x}_i-{x}_{\rm E})^2+R_{\rm E}^2 . \end{eqnarray*}
R_L,i reaches the minimal value R_E, i.e., the attractor radius, when x_i=x_E; and if d_i>d_j, R_L,i>R_L,j, where d_i and d_j denote the RMS distances of x_i and x_j from the mean state x_E.

闁瑰瓨鍔掔拹鐔烘嫚閸欍儱鏁╅悶娑辩厜缁辨繈宕氶崱鏇㈢叐閻犲洤澧介埢鑲╂導閸曨剚鐏愰梺鍓у亾鐢浜告潏顐㈠幋闁兼儳鍢茶ぐ锟�40%闁圭粯鍔栭崹姘辨導濮樿埖灏柨娑虫嫹
闁规亽鍔岀粻宥囨導濮樿埖灏柡澶婂暟濞夘參濡撮崒婵愬殾濞寸媴缍€閵嗗啴宕ｉ鐐╁亾濮樺磭绠栧ù婊勫笩娴犲牏绱旈幋鐘垫惣闂侇偅鏌ㄧ欢鐐寸▔閻戞ɑ鎷辩紒鏃€鐟︾敮褰掔嵁閸噮鍚呭ù鑲╁Л閳ь剚閽扞P濞村吋鑹鹃幉鎶藉灳濠垫挾绀夐柣鈧妽閸╂盯鏌呭宕囩畺閻犲洤褰為崬顒傛偘閵娧勭暠闁告帒妫旈棅鈺呮煣閻愵剙澶嶉柟瀛樼墬閹癸綁骞庨妷銊ユ灎濞戞梹婢橀幃妤呮晬瀹€鍐惧殾濞寸媴缍€閵嗗啴鎳㈠畡鏉跨悼40%闁圭粯鍔栭崹姘跺Υ閸屾繍鍤﹀ù鐙呯秬閵嗗啰鎷归婵囧闁哄牜鍓涢悵顖涚鐠佸磭绉垮ù婧犲啯鎯傞柨娑樿嫰濞煎孩绂嶉銏犵秬9闁硅埖菧閳ь剙鍊搁惃銏ゅ礆閸℃洟鐓╅梺鍓у亾鐢挳濡存担瑙勫闯闁硅翰鍎卞ù姗€鎮ч崶鈺冩惣闁挎稑鑻ぐ鍌炲礆閺夋鍔呴柡宓氥値鍟堥柛褎绋忛埀顑胯兌濞呫劍鎯旈敃浣稿灡闁告皜浣插亾娴ｉ晲绨抽柛妤佸搸閳ь兛绀佹禍鏇熺┍鎺抽埀顑垮倕Q缂佸本妞藉Λ鍧楀Υ娴ｈ櫣鍙€濞戞柨绨洪埀顑挎祰閻挳鎮洪敐鍥╂惣闁告艾瀚妵鍥嵁閸愭彃閰遍柕鍡嫹