HTML
--> --> -->Over the past several decades, numerous studies have demonstrated the overall higher forecasting skill of the ensemble mean over that of deterministic forecasts by using numerical models with different degrees of complexity (Houtekamer and Derome, 1995; Toth and Kalnay, 1997; Buizza et al., 1999; Wang and Bishop, 2003; Wei et al., 2008; Zheng et al., 2009; Feng et al., 2014; Duan and Huo, 2016). However, quantitative estimations and comparisons of sample-mean deterministic and ensemble mean forecast errors are subject to sampling errors as a result of limited numbers of forecast samples and ensemble members, especially in complex numerical weather prediction models. It is even more challenging to compare the deterministic and ensemble mean forecasting skill for specific weather and climate events, due to the sampling uncertainties from the day-to-day variation of the underlying flow (Toth and Kalnay, 1997; Corazza et al., 2003).
The forecasts and the corresponding verifying references (generally the analysis states) are the evolving states of the model and reference attractors, respectively. Therefore, the forecast errors essentially are the distances between states in attractor space. (Li et al., 2018) proposed two statistics, i.e., the global and local attractor radii (GAR and LAR, respectively), with regard to the average distances between states on attractors. GAR measures the average distance between two randomly selected states on an attractor, while LAR quantifies the average distance of all states on the attractor from a given state. For complex nonlinear dynamical systems, e.g., the atmosphere, GAR and LAR can be estimated simply using a long time series of their observed states. Moreover, GAR is found to be a more accurate criterion to measure the predictability limit than the traditional saturated value of the sample mean deterministic forecast errors. The latter, due to model errors, usually overestimates the actual error size that totally chaotic forecasts should have on average (Li and Ding, 2015; Li et al., 2018). In our study, GAR and LAR will be further used to interpret the differences of deterministic and ensemble mean forecast errors in both sample-mean and single-case contexts without running the numerical forecasts. It is expected to supply a reference for the verification and assessment of the skill of deterministic and ensemble mean forecasts.
The paper is organized as follows: Section 2 briefly introduces the definitions of GAR and LAR used in this study and the relevant theories. The experimental setup is presented in section 3. Section 4 displays and analyzes the roles of the attractor statistics in interpreting the relationship between deterministic and ensemble mean forecast errors. A discussion and conclusions are provided in section 5.
Theorem 1: Let di and dj denote the RMS distances of two states xi and xj on $\mathcal{A}$ from the mean state x E. Let R L,i and R L,j represent the LAR of xi and xj, respectively. Then, they satisfy the following relationship: \begin{equation} R_{{\rm L},i}>R_{{\rm L},j},\ {\rm if}\ d_{i}>d_{j} . \ \ (2)\end{equation}
This means that the minimum value of LAR is exactly the attractor radius. The proof of Theorem 1 can be referred to in the Appendix.
The RMS of LARs over all states on $\mathcal{A}$ is defined as the GAR: \begin{equation} \label{eq2} R_{\rm G}=\sqrt{E(R_{\rm L}^2)}=\sqrt{E(\|{x}-{y}\|^2)} , \ \ (3)\end{equation} where x and y are two randomly selected state vectors from $\mathcal{A}$. GAR is an estimate of the average RMS distance between any two states on the same attractor space.
Theorem 2: A constant proportional relationship between R G and attractor radius R E of a compact attractor $\mathcal{A}$ exists as: \begin{equation} \label{eq3} R_{\rm G}=\sqrt{2}R_{\rm E} . \ \ (4)\end{equation}
The two statistics, GAR and LAR, and their relevant theorems will be applied to the quantitative estimation of deterministic and ensemble mean forecast errors.
After an initial spin-up stage of 1000 tu, the model is naturally run for a sufficiently long time (104 tu, i.e., 2× 105 time steps) to generate the true states used as references for the forecasts. There are a total of 2× 105 cases initiated from each true state. If the initial true state is denoted by x t, the initial analysis state x a is given by superposing analysis errors δ on xt: \begin{equation} {x}_{\rm a}={x}_{\rm t}+{\delta} . \ \ (6)\end{equation}
For simplicity, each element of the analysis error δ is arbitrarily generated from the Gaussian distribution with expectation 0 and SD 1. The RMS size of δ is then rescaled to 0.1, which is about 3% of the climatic SD of x t (3.63). Each ensemble perturbation is generated with the same approach as the analysis error but using different realizations of noise and the RMS size of each perturbation is rescaled to 0.1 as well. There are a total of 2.5× 105 ensemble perturbations produced in each case and added and subtracted from the analysis x a to generate N=5× 105 initial ensemble members (2.5× 105 pairs), making their mean still equal to x a. The deterministic and ensemble forecasts in each case are derived by integrating analysis states and initial ensemble members for 10 tu using the same model generating the truth (i.e., perfect model scenario).
Although the analyses in our study are not generated through generally used data assimilation approaches, the initial ensemble perturbations have the same probability distribution as the analysis errors and thus are expected to optimally sample the analysis errors. Moreover, the ensemble member number (5× 105) is significantly larger than the model dimension (40). The above two designs eliminate the possible effects from suboptimal initial ensemble members and a limited number of ensembles on the ensemble mean skill.

Figure 2 shows the variation of LAR (red solid line) as a function of the value of x1 calculated with a 2.5× 106 tu time series. The probability distribution of the system (black solid line) is also given as a reference. It is found that LAR is dependent on the specific state on the attractor. The states with longer distances to the mean state have smaller probability to occur and larger LAR, as in Theorem 1. When x1 moves to the mean state, the minimal value of LAR——namely, the attractor radius R E——is exactly reached and equal to the SD (3.63). Additionally, the R G of variable x1 (5.13), calculated through the RMS of R L over all given states on the attractor, is exactly $\sqrt 2$ times the R E, as revealed by Theorem 2.

2
4.2. Evolution of ensemble mean and deterministic forecast states
The differences between deterministic and ensemble mean forecast errors are essentially associated with their differing forecast states. Therefore, the statistical characteristics of deterministic and ensemble mean forecast states are analyzed before comparing their forecast errors. Each panel of Fig. 3 illustrates the probability distribution of the deterministic (black line) and ensemble mean (blue line) forecast states over all cases at the same lead time. It shows that the probability distribution of deterministic forecasts is always consistent with that of the reference (red line) from 0.5 to 6 tu, since they are from the same attractor. In contrast, the probability distribution of ensemble mean states appears to have a narrower range and a higher peak as time increases. In other words, the ensemble mean forecasts tend to, on the whole, move toward the climatic mean value (2.22) with lead time because of the nonlinear smoothing effect of the arithmetic mean of the forecast ensemble (Toth and Kalnay, 1997). Finally, when all forecast members become chaotic with sufficiently long lead time, their ensemble mean without exception would equal the climatic mean in any individual case. It indicates that the ensemble mean reduces the forecast error compared to deterministic forecasts, but at the expense of losing information and variability in forecasts. On the other hand, according to the characteristics of forecast states, it could be expected that the saturation value of sample mean ensemble mean forecast errors would be consistent with the attractor radius, while deterministic forecast errors will saturate at the level of GAR. The conclusion is verified through the results of forecast experiments in section 4.3.
2
4.3. Sample mean forecast errors
Figure 4 shows the RMS error of x1 for deterministic and ensemble mean forecasts as a function of lead time averaged over all cases. Within the initial 1 tu, the deterministic and ensemble mean forecasts have similar errors due to the offset of the approximate linear growth of the positive and negative initial ensemble perturbations. After 1 tu, ensemble mean forecasts retain smaller errors compared to deterministic ones, and their difference continuously increases with lead time. Finally, deterministic and ensemble mean forecast errors both enter the nonlinear saturation stage and reach 5.13 and 3.63, respectively. The former is the same as the GAR and the latter equals the attractor radius. Their ratio of the saturation values is $\sqrt 2$, as is derived in section 4.2. It is also consistent with the conclusions in (Leith, 1974) and (Kalnay, 2003).
2
4.4. Forecast errors in individual cases
In comparison with the sample mean forecasts, the forecasts of a specific weather or climate event is strongly influenced by the evolving dynamics (Ziehmann et al., 2000; Corazza et al., 2003), and it is thus difficult to estimate the expected values of both deterministic and ensemble mean forecast errors. LAR is a feasible statistic to estimate the expected value of deterministic and ensemble mean forecast errors in individual cases without running practical forecasts. As the nonlinearity in forecasts intensifies, the ensemble mean approaches the mean state (see Fig. 3), while the deterministic forecast tends to be a random state on the attractor. Referring to the definition of LAR in Eq. (2), the ratio r of the expected values of deterministic and ensemble mean forecast errors for a specific predicted state xi can be expressed by: \begin{equation} r=\frac{R_{{\rm L},i}}{\|{\textbf{x}}_i-{\textbf{x}}_{\rm E}\|}=\frac{\sqrt{\|{\textbf{x}}_i-{\textbf{x}}_{\rm E}\|^2+R_{\rm E}^2}}{\|{\textbf{x}}_i-{\textbf{x}}_{\rm E}\|} . \ \ (7)\end{equation}Figure 5 shows the variation of r as a function the true state x1. It can be seen that the ensemble mean has the maximum advantage over the deterministic forecast if the truth (or the observed state) is close to the climatic mean state. When the truth gradually deviates from the mean state, the superiority of the ensemble mean over deterministic forecasts diminishes fast. For an event within 1 to 2 SD, r ranges approximately from 0.7 to 0.9. Once the event is out of 2 SD, r is almost 0.95, which means the ensemble mean and deterministic forecasts perform very similarly. This indicates that the ensemble mean has no advantage over deterministic forecasts in predicting the variabilities of extreme events, and the overall better performance of the former (see Fig. 4) originates from its higher skill for neutral events. With a long-term series of a variable, its distribution of r can be estimated in advance and used as a reference for deterministic and ensemble mean forecast skill in individual cases, especially for long-range forecasts.

To verify the above result, the practical errors of deterministic and ensemble mean forecasts are compared. The forecast skills are assessed against the truth divided into three categories——namely, the neutral (within 1 climatic SD), weak extreme (within 1-2 SD), and strong extreme (beyond 2 SD) events. Figure 6 compares the deterministic and ensemble mean forecast errors for the three groups of events at lead times of 1, 2, 3 and 4 tu. It can be seen that at 1 tu the deterministic and ensemble mean forecast errors are within similar ranges; at later times, the range of the ensemble mean errors, due to the nonlinear filtering, is evidently smaller than that of the deterministic forecast errors. After 1 tu, for both the deterministic and ensemble mean forecasts, the forecast errors of an extreme event are overall larger than those of a neutral event at the same lead time, as shown in Table 1, which is essentially related to the distribution of LAR on an attractor. At long lead time (4 tu), the ratios between the average ensemble mean and deterministic forecast errors are 0.54 (1.69 vs 3.11), 0.87 (4.19 vs 4.79) and 0.99 (7.23 vs 7.33) for neutral, weak and strong extreme events, respectively, which are within the range of the expected ratio in Fig. 5. At shorter lead times, the errors of deterministic and ensemble mean forecasts become closer for neutral and weak extreme events, but the ensemble mean performs much worse (about a 20% error increase at 1 and 2 tu) for strong extreme events. For more extreme events at a given lead time, the ensemble mean forecasts are less likely to have small RMS errors, especially for longer lead times (see Figs. 6c, f, i and l).

GAR and LAR can be applied to practical weather and climate predictions. Since GAR and LAR are independent of specific forecast models, but derived from the attractor of observed states, they can provide objective and accurate criteria for quantifying the predictability of sample mean forecasts and individual cases in operations, respectively. The deviations of GAR and LAR between observed and practical model states may indicate the level of model deficiencies and give guidance on the development of model performance.
The relative performance of deterministic and ensemble mean forecasts revealed by GAR and LAR will not change for practical weather and climate forecasts with model errors. However, GAR and LAR calculated based on the observed states may introduce bias when used to estimate the expected errors of deterministic and ensemble mean forecasts in imperfect prediction models. It may be more appropriate to use the other two statistics on attractors introduced by (Li et al., 2018)——namely, the global and local average distances (GAD and LAD, respectively), which are similar to GAR and LAR, respectively, but estimate the average distance of states on two different attractors. The application of GAD and LAD to practical deterministic and ensemble mean forecasts will be further studied in the future.
Since the occurrence of neutral events carries large probability, the ensemble mean can still provide a valuable reference for most of the time. However, the filtering effect of the ensemble mean algorithm results in its inherent disadvantage for predicting extreme events, which cannot be easily overcome. In operations, each ensemble forecast usually has not only the amplitude but also the positional errors when predicting specific flow patterns, e.g., a trough. Therefore, the ensemble mean may have stronger smoothing effects than our theoretical results in a simple model, and thus becomes more incapable of capturing extreme flow features. To identify extreme weather, the model performance of deterministic forecasts needs further improvement toward a higher spatial resolution and more accurate model physics and parameterization. Additionally, more efficient post-processing methods for ensemble forecast members need to be developed to extract more accurate probability forecast information.
3
APPENDIX
This appendix shows the processes to prove Theorem 1. R L,i and R L,j are the local attractor radii of the compact attractor $\mathcal{A}$ at state xi and xj, respectively. x E and R E are the mean state and attractor radii of $\mathcal{A}$. Based on Eq. (2), the expression of R L,i can be derived as follows: \begin{eqnarray*} R_{{\rm L},i}^2&=&E(\|{x}_i-{x}\|^2) ,\quad {x}_i,{x}\in \mathcal{A} ,\\ &=&E({x}^2-2{x}{x}_i+{x}_i^2) ,\\ &=&E({x}^2)-2{x}_iE({x})+{x}_i^2 ,\\ &=&{x}_i^2-2{x}_{\rm E}{x}_i+({x}_{\rm E}^2+R_{\rm E}^2) ,\\ &=&({x}_i-{x}_{\rm E})^2+R_{\rm E}^2 . \end{eqnarray*}R L,i reaches the minimal value R E, i.e., the attractor radius, when xi=x E; and if di>dj, R L,i>R L,j, where di and dj denote the RMS distances of xi and xj from the mean state x E.