1.State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China 2.College of Earth Sciences, University of Chinese Academy of Sciences, Beijing 100049, China 3.Institute of Space Weather, School of Math and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China 4.College of Physics and Optoelectronic Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China 5.College of Global Change and Earth System Sciences, Beijing Normal University, Beijing 100875, China Manuscript received: 2019-02-26 Manuscript revised: 2019-04-18 Manuscript accepted: 2019-04-30 Abstract:A new method to quantify the predictability limit of ensemble forecasting is presented using the Kullback-Leibler (KL) divergence (also called the relative entropy), which provides a measure of the difference between the probability distributions of ensemble forecasts and local reference (true) states. The KL divergence is applicable to a non-normal distribution of ensemble forecasts, which is a substantial improvement over the previous method using the ensemble spread. An example from the three-variable Lorenz model illustrates the effectiveness of the KL divergence, which can effectively quantify the predictability limit of ensemble forecasting. On this basis, the KL divergence is used to investigate the dependence of the predictability limit of ensemble forecasting on the initial states and the magnitude of initial errors. The local predictability limit of ensemble forecasting varies considerably with the initial states, as well as with the magnitude of initial errors. Further research is needed to examine the real-world applications of the KL divergence in measuring the predictability of ensemble weather forecasts. Keywords: predictability, ensemble forecasting, Kullback-Leibler divergence 摘要:Kullback–Leibler(KL)散度(相对熵)可以定量表征集合预报和局部参考状态的概率分布之差。本文提出了利用KL散度来定量估计集合预报可预报期限的新方法。KL散度方法不但适用于概率分布呈现正态分布的集合预报,而且适用于概率分布呈现非正态分布的集合预报,比传统方法具有更广的适用性。将KL散度方法应用于Lorenz模型中,研究表明它可以有效的定量确定集合预报的可预报期限。在此基础上,利用KL散度方法研究了集合预报的可预报期限对于初始状态和初始误差大小的依赖性。结果表明,集合预报的局部可预报期限会随着初始状态以及初始误差的大小发生较大的变化。在未来研究中,我们将KL散度方法应用于定量估计真实天气集合预报的可预报性。 关键词:可预报性, 集合预报, KL散度
HTML
--> --> --> -->
2.1. KL divergence
The KL divergence measures the difference between two probability distributions P and Q(Kullback and Leibler, 1951). For discrete probability distributions P and Q, the KL divergence from Q to P is defined as \begin{equation} \label{eq1} D_{\rm KL}(P\|Q)=\sum_i{P(i)\log\frac{P(i)}{Q(i)}} , \ \ (1)\end{equation} where "\|" denotes "relative to", and Eq. (2) is equivalent to \begin{equation} \label{eq2} D_{\rm KL}(P\|Q)=-\sum_i {P(i)\log\frac{Q(i)}{P(i)}} . \ \ (2)\end{equation} For distributions P and Q of a continuous random variable x, the KL divergence is defined as \begin{equation} \label{eq3} D_{\rm KL}(P\|Q)=\int_{-\infty}^\infty p(x)\log\frac{p(x)}{q(x)}dx , \ \ (3)\end{equation} where p and q represent the probability densities of P and Q. The KL divergence is always non-negative, with DKL(P\|Q) zero if and only if P=Q.
2 2.2. Local attractor radius -->
2.2. Local attractor radius
Let xi be a specific state on a compact attractor Ω, then the local attractor radius (LAR, RL) with respect to the state xi is defined by (Li et al., 2018) as \begin{equation} \label{eq4} R_L({x}_i)=\sqrt{E(\|{x}_i-{x}\|^2)} ,\quad {x}_i,{x}\in\Omega , \ \ (4)\end{equation} where the norm $\| \|$ represents the L2-norm and E denotes the expectation. The LAR measures the root-mean-square distance between one specific state xi and all other states on an attractor. In terms of the LAR, the local attractor with respect to the state xi can be defined as a subset of all states on the attractor whose distance to the state xi is less than the LAR. (Li et al., 2018) showed that the LAR can be used as an objective metric to quantify the local predictability limit of forecast models. In the present study, the LAR is used to define the local attractor with respect to a specific reference state and to construct the probability distributions of local reference (true) states. An example from the three-variable Lorenz system is given to illustrate the spatial structure of the LAR over the Lorenz attractor. The three-variable Lorenz system is \begin{equation} \label{eq5} \left\{ \begin{array}{l} \dfrac{{\rm d}X}{{\rm d}t}=-\sigma X+\sigma Y \\ \dfrac{{\rm d}Y}{{\rm d}t}=rX-Y-XZ\\ \dfrac{{\rm d}Z}{{\rm d}t}=XY-bZ \end{array} \right., \ \ (5)\end{equation} where σ=10, r=28, and b=8/3, for which the system exhibits chaotic behavior (Lorenz, 1963). Figure 1 shows a projection of the LAR over the Lorenz attractor in the x-y plane. Obviously, the LAR varies widely over the attractor, with a minimum value of the LAR of ~15 and the maximum value exceeding 35. The LAR is not randomly distributed but exhibits a distinct organization in phase space, consistent with the results of (Li et al., 2018). The LAR is antisymmetric with respect to the x- or y-axis, with minimum values at the intersection of the two wings and maximum values at the outermost rims. As the LAR varies over the attractor, the local attractor with respect to a specific state also changes with the state. Figure1. Projection of the LAR over the Lorenz attractor in the x-y plane.
2 2.3. Calculation of the KL divergence in ensemble forecasting -->
2.3. Calculation of the KL divergence in ensemble forecasting
The definition of the KL divergence in Eq. (2) aims to quantify the difference between two probability distributions, P and Q. To compute the KL divergence in ensemble forecasting, it is necessary to estimate the probability distribution of local reference (true) states (hereafter P) and the probability distribution of ensemble forecasts (hereafter Q). For a specific reference state xi, we first calculate the LAR of the state xi. Then, we can obtain the subset of all states on the attractor whose distance to the reference state is less than the LAR. Finally, the probability distribution P of local reference (true) states can be obtained based on the subset of the states on the local attractor. When N random perturbations are added to or subtracted from the reference states, N different results of ensemble forecasts can be generated from the prediction model. Based on N ensemble forecasts, the probability distribution Q of ensemble forecasts can then be obtained. Once both P and Q are obtained, we can directly compute the KL divergence. As the reference state and ensemble forecasts change with the forecast time, the KL divergence will vary with the forecast time. By examining the evolution of the KL divergence with the forecast time, we can quantitatively estimate the predictability limit of ensemble forecasting.
2 2.4. Nonlinear local Lyapunov exponent method -->
2.4. Nonlinear local Lyapunov exponent method
The nonlinear local Lyapunov exponent (NLLE), which is a nonlinear extension of the existing linear finite-time or local Lyapunov exponents (Yoden and Nomura, 1993; Boffetta et al., 1998; Ziehmann et al., 2000), measures the mean growth rate of the initial errors of nonlinear dynamical systems without having to linearize the nonlinear equations of motion (Ding and Li, 2007, Ding et al., 2008a; Li and Ding, 2011). The NLLE and its derivative (i.e., the mean relative growth of the initial error) have been widely applied to quantitatively determine the limit of dynamic predictability of weather or climate variables (Ding et al., 2008b, Ding et al., 2010, Ding et al., 2011, Ding et al., 2015), exhibiting superior performance to the existing linear finite-time or local Lyapunov exponents. A brief description of the NLLE method is given in Appendix A. Note that the NLLE method is defined based on nonlinear error dynamics, while the KL divergence is defined based on probability and information theory. Some differences exist between both methods. For example, the NLLE method uses the root-mean-square error as the measure of error, and therefore depends on the dimension of variables. In contrast, the KL divergence uses the difference between two probability distributions as the measure of uncertainty, and therefore does not depend on the dimension of variables. This may be one advantage of the KL divergence relative to the NLLE method. Nevertheless, although the NLLE method (the KL divergence) is used to determine the predictability limit by exploring the evolution of initial errors (the evolution of forecast probability distributions), considering that the predictability limit is an intrinsic property of a given dynamical system that does not depend on specific methods (Lorenz, 1969; Mu et al., 2017), the predictability limit of ensemble forecasting derived from the KL divergence and from error evolution should be consistent (see Fig. 2). Therefore, we compare the predictability limits of ensemble forecasting derived from the KL divergence and NLLE. Their consistency would support the effectiveness of the KL divergence in measuring the predictability of ensemble forecasting. Figure2. For the initial state on the Lorenz attractor x01 (-5.76, -0.29, 30.5), we show (a) the KL divergence and (b) the mean error growth obtained using the NLLE method with ε =10-3 as a function of time t. In (a), the time at which the KL divergence reaches its maximum value is indicated by the red dashed line. In (b), the average value of the nonlinear stochastic fluctuation states of the mean error is indicated by the black dashed line, and the time at which the error growth enters the nonlinear stochastic fluctuation states is indicated by the red dashed line.
-->
APPENDIX A
3 Introduction to the NLLE method -->
Introduction to the NLLE method
Consider a general n-dimensional nonlinear dynamical system whose evolution is governed by \begin{equation} \frac{{d}{x}}{dt}={F}({x}) , \ \ (A1)\end{equation} where x=[x1(t),x2(t),…,xn(t)] T is the state vector at time t, the superscript T is the transpose, and F represents the dynamics. The evolution of a small error δ=[δ1(t),δ2(t),…,δn(t)] T, superimposed on a state x is governed by the following nonlinear equation: \begin{equation} \frac{d}{dt}{\delta}={J}({x}){\delta}+{G}({x},{\delta}) , \ \ (A2)\end{equation} where J(x)δ are the tangent linear terms and G(x,δ) are the high-order nonlinear terms of the error δ. Without a linear approximation, the solutions of Eq. (A2) can be obtained by numerical integration along the reference solution x from t=t0 to t0+τ: \begin{equation} {\delta}_1={\eta}({x}_0,{\delta}_0,\tau){\delta}_0 , \ \ (A3) \end{equation} where δ1=δ(t0+τ), x0=x(t0), δ0=δ(t0), and η(x0,δ0,τ) is the nonlinear propagator. The NLLE is then defined as \begin{equation} \lambda({x}_0,{\delta}_0,\tau)=\frac{1}{\tau}\ln\frac{\|{\delta}_1\|}{\|{\delta}_0\|} , \ \ (A4)\end{equation} where Λ(x0,δ0,τ) depends in general on the initial state x0 in phase space, the initial error δ0, and time τ. The NLLE differs from existing local or finite-time Lyapunov exponents defined from linear error dynamics, which depend solely on the initial state x0 and time τ, and not on the initial error δ0. Assuming that all initial perturbations with amplitude ε and random directions are on an n-dimensional spherical surface centered at an initial point x0, then we have \begin{equation} {\delta}_0^{\rm T}{\delta}_0=\varepsilon^2 . \ \ (A5) \end{equation} The local ensemble mean of the NLLE over a large number of random initial perturbations is given by \begin{equation} \bar{\lambda}({x}_0,\tau)=\langle{\lambda({x}_0,{\delta}_0,\tau)}\rangle_N , \ \ (A6)\end{equation} where $\langle\ \rangle_N$ denotes the local ensemble average of samples of large enough size N ($N\to\infty$). Here, $\bar\lambda(x_0,\tau)$ characterizes the average growth rate of random perturbations superimposed on x0 within a finite time τ. For a fixed time τ, $\bar\lambda(x_0,\tau)$ depends on x0 and reflects the local error growth dynamics of the attractor. The mean local relative growth of the initial error can be obtained by \begin{equation} \bar{E}({x}_0,\tau)=e^{[\bar{\lambda}({x}_0,\tau)\tau]} . \ \ (A7) \end{equation} For a given initial state x0, $\bar{E}(x_0,\tau)$ initially increases with time τ and finally reaches a state of nonlinear stochastic fluctuation, which means that error growth reaches saturation with a constant average value. At that moment, almost all information on the initial state is lost and the prediction becomes meaningless. If the local predictability limit is defined as the time at which the error reaches the average value of the nonlinear stochastic fluctuation states, the predictability limit of the system at x0 can be quantitatively determined.