1.Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters/Key Laboratory of Meteorological Disaster, Ministry of Education/Joint International Research Laboratory of Climate and Environment Change, Nanjing University of Information Science and Technology, Nanjing 210044, China 2.Key Laboratory of Mesoscale Severe Weather/Ministry of Education and School of Atmospheric Sciences, Nanjing University, Nanjing 210093, China 3.Center for Analysis and Prediction of Storms, University of Oklahoma, Norman, Oklahoma 73072, USA Manuscript received: 2017-04-26 Manuscript revised: 2017-10-08 Manuscript accepted: 2017-10-27 Abstract:A dual-resolution (DR) version of a regional ensemble Kalman filter (EnKF)-3D ensemble variational (3DEnVar) coupled hybrid data assimilation system is implemented as a prototype for the operational Rapid Refresh forecasting system. The DR 3DEnVar system combines a high-resolution (HR) deterministic background forecast with lower-resolution (LR) EnKF ensemble perturbations used for flow-dependent background error covariance to produce a HR analysis. The computational cost is substantially reduced by running the ensemble forecasts and EnKF analyses at LR. The DR 3DEnVar system is tested with 3-h cycles over a 9-day period using a 40/∼13-km grid spacing combination. The HR forecasts from the DR hybrid analyses are compared with forecasts launched from HR Gridpoint Statistical Interpolation (GSI) 3D variational (3DVar) analyses, and single LR hybrid analyses interpolated to the HR grid. With the DR 3DEnVar system, a 90% weight for the ensemble covariance yields the lowest forecast errors and the DR hybrid system clearly outperforms the HR GSI 3DVar. Humidity and wind forecasts are also better than those launched from interpolated LR hybrid analyses, but the temperature forecasts are slightly worse. The humidity forecasts are improved most. For precipitation forecasts, the DR 3DEnVar always outperforms HR GSI 3DVar. It also outperforms the LR 3DEnVar, except for the initial forecast period and lower thresholds. Keywords: dual-resolution 3D ensemble variational data assimilation system, Rapid Refresh forecasting system 摘要:本文介绍了一套为业务快速更新预报系统建立的区域集合卡尔曼滤波和3D集合变分耦合的双分辨率混合同化系统. 双分辨率混合同化系统使用了确定性高分辨率背景场和低分辨率的集合卡尔曼滤波的集合扰动, 其中后者为高分辨率分析场的获得提供了流依赖背景误差协方差. 低分辨率的集合卡尔曼滤波分析和集合预报减小了计算成本. 基于9天每3小时循环, 和40/~13km分辨率的配置对双分辨率的混合同化系统进行了测试. 通过与GSI三维变分同化系统、和基于粗分辨率混合同化系统分析场插值场的高分辨率预报结果进行的对比和分析显示:使用90%的集合流依赖背景误差协方差, 双分辨率混合同化系统能够获得最小预报误差, 并显著优于高分辨率GSI三维变分系统. 此外, 湿度和风场的预报明显优于粗分辨率混合同化系统分析场插值预报的结果, 但温度场不然. 对各变量的评分显示湿度预报提高最为显著. 并与之一致, 双分辨率混合同化系统相对于GSI三维变分系统, 得到了更为精确的降水预报;并且, 除低阈值降水和初始预报阶段以外, 双分辨率混合同化系统优于粗分辨率混合同化系统分析场的插值场预报. 关键词:双分辨率三维集合变分同化系统, 快速同化预报系统
HTML
--> --> --> -->
3.1. Model and domain configuration
In the RAP hybrid DA system, WRF is used as the forecast model. The Model Evaluation Tools (Brown et al., 2009) developed by the Developmental Testbed Center is used for forecast verification. Figure2. Example of the horizontal distributions of (a) sounding, profile and VAD (Velocity Azimuth Display), (b) surface stations over land and for ships, (c) GPS-PW (Precipitable Water) and GPS-RO (Radio Occultation), and (d) aircraft observations at 0000 UTC 8 May.
As stated earlier, the DR hybrid system uses a 40/~13-km horizontal grid spacing combination. The LR domain at 40-km grid spacing for the ensemble covers North America with 207× 207 horizontal grid points (bold box in Fig. 2a), while the HR domain at ~13-km grid spacing has 616× 616 horizontal grid points covering roughly the same domain (the HR domain is not plotted in Fig. 2). Both domains have 50 vertical levels extending up to 10 hPa at the model top using terrain-following hydrostatic-pressure-based vertical coordinates that stretch with height (Skamarock et al., 2008). The static background error statistics calculated based on the NCEP North American Model forecasts using the National Meteorological Center method, as provided in the GSI system (Hu et al., 2016), are used in this study. The error statistics are latitude and sigma-level dependent only; they are interpolated to the analysis grid within GSI. The flow-dependent BECs are derived from the ensemble forecasts provided by the EnKF system at the 40-km grid spacing.
2 3.2. Observations for assimilation and verification -->
3.2. Observations for assimilation and verification
As in (Pan et al., 2014), the operational data stream of RAP excluding satellite radiance data is assimilated in the DR hybrid system. The distributions of the data at 0000 UTC May 8 are shown in Fig. 2. Eighteen-hour forecasts from the analyses are verified against surface and sounding data in the HR domain. The surface data verified include surface pressure, 2-m relatively humidity (RH), 2-m temperature (T), 10-m zonal and meridional wind components (U and V, respectively); the sounding observations include RH, T, U and V.
2 3.3. Verification techniques -->
3.3. Verification techniques
The root-mean square error (RMSE) is used to evaluate the forecasts, and the bootstrap resampling method (Candille et al., 2007; Buehner and Mahidjiba, 2010; Schwartz and Liu, 2014) following (Pan et al., 2014) is used to determine the statistical significance of error differences. The RMSEs are calculated against the observations at certain levels and forecast hours first, and then aggregated over all cycles for specific forecast hours for skill evaluation. To assess the statistical significance, bootstrap resampling is performed. New samples are created by randomly drawing from the dataset 3000 times, allowing the same data to be drawn more than once. With the resample, we calculate the aggregated RMSEs along with a two-tailed confidence interval from 5% to 95%. As in (Pan et al., 2014), RMSE differences are calculated between a specific experiment and its benchmark first. The bootstrap method is then applied to the RMSE differences with confidence intervals from 5% to 95% to determine the significance of improvement. When all confidence intervals of the RMSE differences are below/above zero, the experiment is significantly better/worse than the benchmark experiment at a 90% confidence level. Additional discussion on the use of the bootstrap method for calculating the statistical significance of forecast differences can be found in (Pan et al., 2014), (Schwartz and Liu, 2014), and (Xue et al., 2013). The Gilbert skill score (GSS) (Gandin and Murphy, 1992) is used to evaluate the precipitation forecast skills of 12-h deterministic forecasts against the 4-km NCEP Stage IV precipitation data in the CONUS (Conterminous United States) domain in which the data are available (Lin and Mitchell, 2005).
2 3.4. Experimental design -->
3.4. Experimental design
As in (Zhu et al., 2013) and (Pan et al., 2014), the same test period from 8-16 May 2010 is used, which contained active episodes of convection. All DA experiments start at 0000 UTC 8 May and end at 2100 UTC 16 May with continuous three-hourly cycles. The initial fields and boundary conditions are interpolated from operational GFS analyses and forecasts. Random perturbations are created by the random CV3 option in the WRF DA system (Barker, 2005; Barker et al., 2012) and added to the GFS analysis initial condition at 0000 UTC 8 May 2010 to start the ensemble forecasts for the EnKF and the GFS forecasts to create perturbed ensemble boundary conditions (Torn et al., 2006). (Pan et al., 2014) suggested that the BEC weighting factor, as one of the important tuning parameters in a hybrid algorithm, has a great impact on the performance of the hybrid DA system. To examine the performance of the DR hybrid system and its sensitivity to the weighting factor, three DR hybrid experiments——namely, HyDR05, HyDR09 and HyDR10——are run using 50%, 90% and 100% weights given to the ensemble BEC, respectively. Because HyDR09 produces the best analyses and forecasts among all DR hybrid experiments, it is also called the DR hybrid control experiment, named HyDR_Ctl (Table 1). The well-tuned 3DEnVar experiment (Hybrid1W_Ctl) at the 40-km grid from (Pan et al., 2014) is adopted as a benchmark, and the same well-tuned EnKF experiment with 40 members from (Pan et al., 2014) is also used in this study to provide the LR ensemble perturbations for the DR hybrid experiments. Hybrid1W_Ctl from (Pan et al., 2014) employed half and half static/flow-dependent covariance, a horizontal covariance localization scale of 300 km, and a vertical covariance localization scale of 0.3 in terms of natural logarithm of pressure. Experiment HyLR_HRF (Table 1) involves forecasts initialized from interpolated fields from the analyses of Hybrid1W_Ctl every cycle. A HR GSI 3DVar DA experiment, named VarHR_HRF, is also run at the ~13-km resolution (Table 1). In other words, HR forecasts from the LR hybrid control experiments and the HR 3DVar analyses are used as references for evaluating forecasts from DR DA experiments. The analyses of DR DA may benefit from the HR background forecasts, which contain more detailed flow structures. The configurations of HyDR05 are the same as HyLR_HRF, except that the deterministic background forecasts of HyDR05 are performed on the ~13-km grid instead of the 40-km grid. In HyLR_HRF, forecasts are run at ~13-km grid spacing from interpolated 40-km analyses of Hybrid1W_Ctl. The comparison between HyDR05 and HyLR_HRF isolates the impact of the increased background forecast resolution. For variational minimization in either 3DVar or 3DEnVar, two outer-loop iterations and 100 inner-loop iterations are used. Evaluations are mainly based on forecasts on the ~13-km grid, launched from either the HR analyses or fields interpolated from LR 40-km analyses. All experiments are listed in Table 1.
-->
4.1. Sensitivity to the covariance weighting factor in DR hybrid experiments
In (Pan et al., 2014), the lowest RMSEs were obtained when using 50% ensemble BEC in their 40-km control hybrid experiment, Hybrid1W_Ctl. At an ~13-km grid spacing, smaller-scale features can be captured, which tend to be more transient and hence more flow-dependent. The analysis may benefit from a higher weight for the ensemble covariance. Experiments HyDR05, HyDR09 (also named HyDR_Ctl) and HyDR10 are compared to examine the impact of flow-dependent covariance in the DR hybrid system. Figure3. Aggregated 3-h forecast RMSEs along with confidence error bars at different height levels verified against sounding data for (a) RH, (b) T, (c) U, and (d) V for experiments HyDR05, HyDR09, and HyDR10. The error bars represent the two-tailed 90% confidence interval (5% on the left and 95% on the right) using the bootstrap distribution method.
The aggregated 3-h forecast RMSEs verified against sounding data are shown in Fig. 3. The RMSEs at each pressure level were obtained by averaging values within a layer 50 hPa above and below that pressure from all cycles, except for the topmost and lowest levels. The 3-h forecasts are also used as the background in each DA cycle, and their errors can be used as a proxy for measuring the DA quality. The results show that HyDR09 has the smallest RMSEs for RH, U and V at almost all levels. For T, the RMSEs from HyDR09 are higher than those from HyDR05 above 800 hPa. The performance of HyDR10 is comparable to or worse than HyDR05 for RH below 600 hPa, and for T, U and V at all levels. These results indicate that, with the DR hybrid 3DEnVar system, when the grid spacing of the hybrid analysis as well as the background forecast is decreased from the 40-km used in (Pan et al., 2014) to ~13-km, optimum results are obtained when the weight for the ensemble BECs is 90% (among the weights examined), instead of the 50% for the SR LR case. This may be because of the increased level of flow dependency of the background errors at HR, as suggested earlier. Raising the weighting factor for the flow-dependent covariances means that more mesoscale information can be involved in the DA. However, the forecasting skill of T to the weighting factor is opposite to RH, U and V at the middle to upper levels.
2 4.2. Comparison of DR hybrid DA with HR 3DVAR -->
4.2. Comparison of DR hybrid DA with HR 3DVAR
In this section, we compare the performance of experiments HyDR_Ctl (i.e., HyDR09) using hybrid 3DEnVar with VarHR_HRF, which uses the pure 3DVar DA method run at HR (see Table 1). The 9-day aggregated RMSEs of the 3-h forecasts verified against sounding data at all levels are shown in Fig. 4. As shown in Fig. 4, VarHR_HRF underperforms HyDR_Ctl, with its errors being significantly larger for RH, U and V at most levels, while the errors for T are comparable. The overall domain- and level-aggregated RMSEs verified against sounding and surface data are shown in Fig. 5 and Fig. 6, respectively, for analyses (hour 0) and forecasts at 3-h intervals up to 18 hours. HyDR_Ctl significantly outperforms VarHR_HRF at the analysis and forecast for all variables throughout the entire forecast period. The RMSEs of all variables are noticeably lower in the analyses than in the forecasts, and forecast errors increase quickly in the first three hours before becoming more stable thereafter; such rapid error growth is likely associated with fast small-scale error growth. Overall, the DR coupled EnKF-3DEnVar hybrid scheme significantly outperforms the 3DVar scheme for all variables at all forecast hours when verified against soundings and surface observations. The results suggest the efficacy of using a DR configuration for a hybrid DA system. Figure4. As in Fig. 3 but for experiments HyLR_HRF, HyDR_Ctl, VarHR_HRF and HyDR05. The error bars indicate the two-tailed 90% confidence interval using the bootstrap method with 5% on the left and 95% on the right.
2 4.3. Impact of HR background forecast -->
4.3. Impact of HR background forecast
The impacts of the HR background forecast are investigated by comparing HyLR_HRF with HyDR05, in which the only differences lie with the resolution of the background forecasts. HyDR_Ctl is also included in this section to assess the impacts of HR background forecasts and flow-dependent covariance. The 3-h forecast RMSEs verified against sounding data (Fig. 4) show that HyDR05 underperforms HyLR_HRF for RH and performs comparably for T, U and V at most levels. When using a higher weighting factor of 90% for flow-dependent covariances in HyDR_Ctl, the RMSEs are smaller than those from HyDR05 for RH at all levels, except for T at 1000-800 hPa, and U and V at 500-300 hPa. These results suggest that HyDR_Ctl benefits from the HR with 90% flow-dependent covariances. The comparisons among HyLR_HRF, HyDR_Ctl and HyDR05 of the domain-aggregated RMSEs for the analyses and forecasts up to 18 hours against sounding data are shown in Fig. 5. The RMSEs of HyDR05 are comparable or slightly worse than those from HyLR_HRF, while the RMSEs of HyDR_Ctl are significantly smaller than those of HyLR_HRF for all variables except T. At the lower resolution of 40 km, the best analyses and forecasts were obtained in (Pan et al., 2014) when equal weights were given to the static and flow-dependent covariances in the hybrid DA. As the grid resolution increases, the smooth static covariance becomes less appropriate, which explains why a higher ensemble covariance weight of 90% used in HyDR_Ctl is beneficial. Figure5. The bar chart in each frame shows the RMSEs of forecasts verified against sounding data, aggregated over the entire domain and over the nine-day period. The lower panel shows the 90% confidence interval of the RMSE differences between HyDR_Ctl and VarHR_HRF or HyDR05 and HyLR_HRF for (a) RH, (b) T, (c) U, and (d) V, for different forecast hours. If the interval does not include zero, the difference is statistically significant at the 90% confidence level. The error bars in the histograms represent the two-tailed 90% confidence interval with 5% at the bottom and 95% on the top using the bootstrap distribution method.
Figure6. The bar chart in the upper panel of each frame shows the RMSEs of forecasts verified against surface station observations, aggregated over the entire domain and over the nine-day period, for (a) surface pressure, (b) 2-m RH, (c) 2-m T, (d) 10-m U, and (e) 10-m V for different forecast hours. Confidence error bars represent the two-tailed 90% confidence interval (5% at the bottom and 95% on the top) using the bootstrap distribution method. The lower panel of each frame shows the 90% confidence interval of the RMSE differences between HyDR_Ctl, VarHR_HRF or HyDR05 and HyLR_HRF.
The impacts of the HR background forecasts are further examined by verifying analyses and forecasts up to 18 hours against surface data (Fig. 6). HyDR_Ctl has significantly smaller RMSEs than HyLR_HRF for 2-m RH and 10-m U and V from the analysis time and throughout the entire forecast period, and for surface pressure except at a few forecast hours. Large differences are found in the RH errors between the HR 3DVar/DR hybrid and the LR 3DEnVar hybrid (Fig. 6b), suggesting that for the surface moisture field DA can benefit significantly from the increased background resolution, given the better resolution of terrain and mesoscale boundary layer structures. For 2-m T, smaller errors at the analysis time in HyDR_Ctl and VarHR_HRF than those in HyLR_HRF indicate a better fit of the analyses to surface T observations. However, the forecast errors of T in HyDR_Ctl and VarHR_HRF become larger after three hours of forecasting than those in HyLR_HRF. The results seem to suggest that the humidity and wind fields benefit more from the higher background resolution with increasing flow-dependent covariance, while this is not necessarily the case for the temperature forecasts, at least when verified against conventional data in terms of the RMSEs. Experiments with various combinations of resolutions used in the analysis and forecasting steps shed some light on such complex behaviors (not shown), but are not enough to fully answer the questions. Results also imply a need for multi-scale DA algorithms that explicitly treat observations and background errors representing different scales (Li et al., 2015) and use scale-dependent (Buehner and Shlyaeva, 2015) and/or multi-scale covariance localization (Miyoshi and Kondo, 2013).
2 4.4. Precipitation forecast skill -->
4.4. Precipitation forecast skill
In this section, the precipitation forecasts from HyLR_HRF, HyDR_Ctl, VarHR_HRF and HyDR05 are verified against the 4-km NCEP Stage IV precipitation data. The GSS (Gandin and Murphy, 1992), also known as the equitable threat score, is calculated, as in (Pan et al., 2014), for the 0.1, 1.25 and 2.5 mm h-1 thresholds. The GSSs are shown in Fig. 7. That HyDR_Ctl outperforms VarHR_HRF for all thresholds and all forecast hours suggests that the analysis method is important for precipitation forecasting skill. The results are consistent with those of (Schwartz, 2016), who examined DR hybrid DA with a 20/4-km grid combination. With the HR forecasts, HyDR05 has better skill than HyLR_HRF after five hours at the threshold of 0.1 mm h-1, and at one to eight hours at the threshold of 2.5 mm h-1. With more flow-dependent covariance being used, HyDR_Ctl shows the best skill among all experiments after 10 hours at the 0.1 mm h-1 threshold, and generally all hours at the 1.25 and 2.5 mm h-1 thresholds. The results indicate that, for precipitation, especially heavier precipitation, there is a clear benefit to running the hybrid DA at HR (relative to the LR hybrid), and to using ensemble-derived flow-dependent covariance (relative to 3DVar). The improved precipitation forecasts are consistent with reduced errors in the analyses and forecasts of humidity. Figure7. Aggregated precipitation GSSs of 13-km forecasts as a function of forecast length for thresholds of (a) 0.1 mm h-1, (b) 1.25 mm h-1 and (c) 2.5 mm h-1.