HTML
--> --> --> -->2.1. VarQC with non-Gaussian observational error distribution mode
VarDA assumes that the independent errors from observation and background are Gaussian-distributed and that the probability of outliers (gross errors) is zero, implying all the outliers can be removed with the CQC strategy. However, not all outliers are a result of gross errors. Such outliers contain useful information and thus should not be removed (Hampel, 2001). Moreover, with the imperfect CQC, the outliers probably occur in the assimilated observations due to the ambiguous rejection threshold. As a result, outliers that make it past the CQC results in innovation distributions with long, non-Gaussian tails (i.e., the estimated innovation frequencies have a slower decay rate with larger innovations than the decay rate predicted by the Gaussian distribution, as seen in Fig. 1).Figure1. Statistics of innovations (normalized by observational error) for (a) aircraft-reported temperature, (b) radiosonde pressure, (c) radiosonde horizontal wind, and (d) surface humidity. The red, blue, and black lines are, respectively, the Gaussian, Gaussian plus flat, and Huber norm distributions that have been fitted to the histograms of normalized innovations. The titles of the panels also indicate the fitted left and right transition points for Huber distribution, as well as the observation sample size (S). The observations are obtained from the domain in Fig. 2 from 1 July to 31 September 2013.
In this study, we will only consider the non-Gaussianity of observation errors caused by outliers. It is well-known that improving the consistency between the assumed observational error distribution mode (OEDM) and the actual mode can generate a more accurate posterior solution in DA (Fowler and Van Leeuwen, 2013; Legrand et al., 2016).
In order to obtain a better analysis, the contaminated Gaussian distribution (Tukey, 1960) is put forward to reduce (absorb) the impacts (information) of outliers on the posterior analysis. Compared to the Gaussian distribution, the contaminated Gaussian distribution is a better OEDM. In general, the contaminated Gaussian distribution is written as the sum of a Gaussian distribution (the “main” distribution) and some other distribution (the “perturbation” distribution; e.g., Gaussian distributions with different means and variances). The CGD is expressed mathematically as:
where
One possible choice of the perturbation distribution in Eq. (1) is the flat distribution (box-car: Lahoz and Schneider, 2014). When used as an OEDM, the Gaussian plus flat distribution is consistent with the long-tail observations, meaning that these observations can be used effectively. In contrast, these observations would have been removed during the gross check of CQC. Following Eq. (1), the Gaussian plus a flat distribution
where F is the perturbed flat distribution (Anderson and J?rvinen, 1999).
Using Eq. (2), we can derive an observational cost function (
where
Apart from the flat distribution, the Laplace distribution can be used as the perturbation distribution. The Laplace distribution is used frequently as the perturbation distribution because it is a powerful tool in robust statistics (Huber, 2011). The Gaussian plus Laplace distribution, a special CGD (Huber, 1972; Tavolato and Isaksen, 2015), is defined as
where the c is the transition value and
As seen in other fields, Huber distribution methods can yield more accurate and robust solutions than methods using Gaussian-distributed errors (Guitton and Symes, 2003; Huber, 2011). This increased robustness suggests that the use of the Huber distribution might be able to alleviate the negative impacts of outliers on the optimization analyses. Therefore, VarQC using the Huber distribution (hereafter referred to as “Huber-VarQC’’) can potentially produce more promising applications with respect to Flat-VarQC.
2
2.2. Non-Gaussian observational errors in the GRAPES m3DVAR
To illustrate the non-Gaussian observation error characteristics of commonly assimilated observations, we compared several types of observations against the GRAPES background to obtain innovation statistics. These observations are obtained from the Global Telecommunications System (GTS) and span a period from July 2013 to September 2013. The observation types are surface observations (SYNOP), radiosonde observations (TEMP), automated aircraft reports (AIREP), ship observations (SHIP), and satellite winds (SATOB). Before examining the innovation statistics, it should be noted in this study that we will assume that the non-Gaussianity of the innovations are only due to non-Gaussian observation errors. For instance, if the innovation statistics follow a Gaussian plus flat CGD, then the observation errors follow a similar CGD. With this in mind, the transition points of the observation error Huber distribution can be determined by the innovation Huber distribution.Figure 1 shows the result of fitting several distributions (Gaussian, Gaussian plus flat, and Huber norm) to the estimated innovation distribution for several variables for several observation types. It can be seen that near the center of the innovation histogram, the innovations for the AIREP temperature (Fig. 1a) and TEMP pressure and wind (Figs. 1b and c) are consistent with all three distributions. However, towards the tails of the innovation histogram, the Gaussian plus flat distribution and the Huber distribution are more consistent with the histogram than that of the Gaussian distribution. Furthermore, the Huber distribution best fits the long tails of the innovation statistics in Fig. 1. Similar long tail and fitting characteristics have also been identified for the innovation statistics of other observed variables of the various observation types (temperature, pressure, and wind; not shown), except for specific humidity. In other words, the CGD better fits the innovation statistics of pressure, temperature, and wind observations.
However, the distribution of specific humidity innovations cannot be reasonably fitted by any of the three profiles (Fig. 1d). A similar issue with the humidity has been described by Pires et al. (2010) and Tavolato and Isaksen (2015). Although the left tail is fitted reasonably by the Huber distribution, the left transition point of 0.85 implies that the contamination rate (~20%) is unreasonable. A reasonably normal contamination rate would be less than 10% for conventional observations (Hampel, 1977). A detailed discussion of VarQC parameters can be found in our companion paper.
-->
3.1. Model configurations
The new-generation operational numerical forecast system of the China Meteorological Administration, the GRAPES (Chen et al., 2008; Zhang and Shen, 2008; Ma et al., 2009; He et al., 2019a) model version 3.0, and its three-dimensional variational assimilation system has been applied in modeling many weather phenomena. These phenomena include extreme weather events, typhoons, sandstorms, and floods (Xu et al., 2012; An et al., 2016; Wang et al., 2016). We have implemented two VarQC methods (Flat-VarQC and Huber-VarQC) within the GRAPES m3DVAR system. Here, we discuss these model confiurations, and we will examine the posterior analysis of the mass fields from these VarQC methods in section 4.Figure 2 shows the domain used in the simulation experiments. The simulation domain is defined by a 351 × 251 grid in the horizontal, with a meridional and zonal spacing of 20 km. In the vertical, the domain is broken into 31 levels, with a model top pressure of 10 hPa. The operational forecasts from the Global Forecast System (GFS) are used to construct the initial and lateral boundary conditions (ICs and LBCs) used for running the GRAPES model.
Figure2. Simulation domain (10°?60°N, 70°?140°E) used in running the GRAPES model for the real-data experiments of VarQC methods. The red shade shows the verified domain (18°?40°N, 100°?125°E).
We have selected a region over east China (domain shaded in red in Fig. 2) for validation of the VarQC methods established in this paper. This region was selected because the high terrain over west China, particularly over the Tibetan Plateau, induces complex thermodynamic and dynamical effects (Wang and Zeng, 2012; Bao and Zhang, 2013; He et al., 2019b) that make it difficult to obtain accurate simulations.
2
3.2. Idealized experiments
Both idealized and real-data experiments are performed. Tables 1 and 2 describe the configurations of the idealized experiments used to examine the robustness of using the different VarQC methods to handle the outliers. The CTRL1 control experiment assimilates the actual pressure observations of 12 sounding sites, which are referred to as “normal pressures”. Some of these 500-hPa and 850-hPa level normal pressure observations are then replaced with outlier pressure values (underlined in Table 1). We will refer to these replaced observations as “outlier pressures” observations. Any pressure not equal to a specified level’s pressure will be treated as an outlier. For example, if an observed pressure at the 500-hPa height is not equal to 500 hPa, then that observation is an outlier. These outliers centered at the specified level are created by adding/subtracting a random draw from a uniform distribution within [1, 2]. The pressure errors (Table 1) from the observation report at 500 hPa (0.7766), 850 hPa (0.7975), and other levels (not shown) are consistent in the idealized experiments. We examine the impact of assimilating these outlier pressures with/without VarQC using the experiments given in Table 2 and detailed in section 4.1. The CTRL1 and CTRL1-Outlier experiments do not utilize any VarQC algorithms, whereas the Flat-Outlier and Huber-Outlier experiments respectively utilize the Flat-VarQC and Huber-VarQC methods.Sounding site number | The observed and artificial pressures | |
Pressures and error (0.7766) at 500-hPa level | Pressures and error (0.7975) at 850-hPa level | |
1 | 501.085 | 851.633 |
2 | 500.000 | 848.206 |
3 | 498.441 | 851.506 |
4 | 500.000 | 851.002 |
5 | 501.408 | 848.148 |
6 | 500.000 | 851.356 |
7 | 500.000 | 851.612 |
8 | 500.000 | 851.001 |
9 | 500.000 | 851.165 |
10 | 500.000 | 851.679 |
11 | 500.000 | 850.000 |
12 | 500.000 | 850.000 |
Table1. The rebuilding outliers (underlined) for pressure (units: hPa) on sounding sites.
Experiment name | VarQC schemes | Observations |
CTRL1 | Without VarQC | Normal pressures |
CTRL1-Outlier | Without VarQC | Outlier pressures |
Flat-Outlier | With Flat-VarQC | Outlier pressures |
Huber-Outlier | With Huber-VarQC | Outlier pressures |
Table2. Summary of the idealized experiment for different variational quality control.
Out of the 12 assimilated sounding sites, 10 sites are scattered around East China, near the middle and lower reaches of the Yangtze River (Fig. 3a). The remaining two sounding sites are located on the Korean Peninsula. Considering that observations that are sufficiently close can be used for "buddy checks" (Auligné, 2014) during VarQC, the observations of sites 11 and 12 were not constructed as outliers (Table 1) since they are far from the other 10 sites. The pressure values from sites 1, 3, and 5 at the 500-hPa level, as well as the pressure values from sites 1 to 10 at the 850-hPa level, were constructed as outliers (Table 1). Note that the pressures set as outliers would not be rejected by CQC. In other words, these outliers would be assimilated in all experiments, except for the CTRL1 experiment. Apart from CTRL1, which assimilates the normal pressures from 12 sounding sites without using VarQC, the other three experiments assimilate the same outlier pressures by using different quality control methods.
Figure3. (a) Positions of sounding sites used in the robustness experiments. (b) Vertical profiles of RMSE of geopotential height for CTRL1 (red line), CTRL1-Outlier (green line), Flat-Outlier (blue line), and Huber-Outlier (black line). (c) Vertical distribution of observation weights at the sounding sites in the Huber-Outlier experiment. The top (bottom) of each x-coordinate shows the total number of the assimilated pressures (the site number).
2
3.3. Real-data experiments
The configurations of the three real-data VarQC assimilation experiments are listed in Table 3. These experiments spanned the entire month of August in 2015 by using the fitted transition points from the training observations in 2013. Unlike the earlier idealized experiments, which only assimilated pressure observations, these real data experiments assimilated GTS observations. The GTS observation types include TEMP, SYNOP, AIREP, SHIP, and SATOB, and are assimilated using a 6-hour assimilation window into the three experiments. Furthermore, all three experiments are performed using the cold start method. The analyses are performed each day at 0000, 0600, 1200, and 1800 UTC. The experiments utilize the old BgQC threshold limit to evaluate the impacts of existing long-tails observations (identified in Fig. 1).Experiment names | Quality control | Observations |
CTRL2 | Without VarQC | GTS observations |
Flat-VarQC | With Flat-VarQC | GTS observations |
Huber-VarQC | With Huber-VarQC | GTS observations |
Table3. Summary of simulation experiments for different variational quality control schemes.
In this study, the Flat-VarQC was turned on during the first iteration of the 3DVAR cost function optimization in every cycle of the assimilation experiments. This first-iteration activation is unlike earlier work where VarQC’s modification to the 3DVAR cost function was only introduced after iterating the cost function minimization a specified number of times (Anderson and J?rvinen, 1999). This late-inclusion in earlier work was done to prevent convergence issues. We were able to activate the Flat-VarQC algorithm in the first iteration because we did not experience convergence issues in most cases. The only time where we experienced convergence issues is represented in Fig. 8b. We were able to mostly avoid convergence issues because the first iterations’ innovations were relatively small, meaning that the starting point of the Flat-VarQC-modified cost function minimization should be within or near the convex region containing the cost function’s global minima. Future work can investigate whether we should turn on the Flat-VarQC at a later iteration step.
It should be noted that for the real-data experiments listed in Table 3, the innovations of specific humidity in the Huber-VarQC experiment cannot be effectively fitted by a Gaussian plus flat or a Huber norm OEDM in the statistics due to its unknown non-Gaussian property (Pires et al., 2010). To reduce the possibility of the Huber-VarQC experiment producing analyses that are worse than the CTRL2 experiment, while keeping VarQC active for specific humidity, we opted to use the OEDM that is closer to the traditionally prescribed Gaussian observation error distribution: the Gaussian plus flat OEDM. Thus, specific humidity observations in the Huber-VarQC are assimilated using the Gaussian plus flat OEDM, while all other observations are assimilated using Huber norm OEDM. In other words, the Huber-VarQC experiment utilized a hybrid of both Gaussian plus flat and Huber norm OEDMs.
-->
4.1. Robustness of variational quality control
Current VarQC methods are based on using contaminated Gaussian distributions to robustly handle outliers. To explore the actual robustness of two VarQC methods, four experiments were designed to assimilate pressure observations by including/excluding outliers as listed in Table 1. Figure 3a shows the position of the sounding sites used in the idealized experiments, which assimilate the same number of pressure observations with the vertical level up to 10 hPa, as shown in Fig. 3c. Over the domain in Fig. 3a, the ERA-Interim reanalysis pressure level data was used to estimate the root-mean-square errors (RMSEs) of the posterior geopotential height fields of the four experiments (Fig. 3b).Compared to CTRL1, the RMSEs of CTRL1-Outlier are substantially larger at 500 hPa (400?600 hPa) and 850 hPa (700?1000 hPa). These RMSE increases happened since CTRL1-Outlier assimilated the rebuilt outlier observations near these pressure levels (Table 1), but CTRL1 assimilated the normal versions of said observations. In other words, replacing some of the normal pressures with outlier pressures and assimilating the “contaminated” dataset without VarQC algorithms caused a degradation of the posterior geopotential height field.
When the outlier pressures are absorbed in Flat-Outlier, the posterior geopotential height field is also degraded with respect to CTRL1. However, Flat-Outlier’s RMSEs are slightly smaller than those of CTRL1-Outlier near 500 hPa. These results indicate that even with Flat-VarQC method, the inclusion of outlier pressures degraded the posterior geopotential height field. In contrast to CTRL1-Outlier and Flat-Outlier, Huber-Outlier showed no increase in the geopotential height RMSEs when the outlier pressures were included. More encouragingly, in the vicinity of 500 hPa, Huber-Outlier’s geopotential height RMSEs are smaller than those of CTRL1. These results indicate that the Huber-VarQC method has strong robustness against outliers.
To explain why the Huber-VarQC method has strong robustness against outliers, the analysis weights of Huber-VarQC were examined. The analysis weights of Flat-VarQC [Eq. (5)] and Huber-VarQC [Eq. (9)] of the assimilated observations in the experiments are determined at each iteration step by its parameters and intermediate innovation by the VarQC 3DVAR system. The weights from the last iteration are analyzed in this study. Following the discussions of Anderson and J?rvinen (1999), we identify observations with weights in the range of (0, 0.25] as erroneous, observations with weights in the range of (0.25, 0.5] as possibly erroneous, observations with weights in the range of (0.5, 0.75] as suspicious, and observations with weights in the range of (0.75, 1] as valid.
The last-step weights of Huber-Outlier show that all the created outlier pressures were subjected to weight reduction (Fig. 3c). More specifically, the pressure observations of sites 2 and 5 at the 850-hPa level are reduced from 1 to within (0.5, 0.75], meaning that they were identified as suspicious observations. The weight of the pressure observation of site 8 at the 850-hPa level fell from 1 to within (0.25, 0.5], meaning that the Huber-VarQC method identified the pressure observation as a possible erroneous observation. Similarly, some of the pressure observations at the 500-hPa level also are identified as suspicious (site 4), possibly erroneous (sites 3 and 8), and erroneous (sites 1 and 5). In other words, the negative impacts of assimilating outlier pressures were mitigated by the relatively small observation weights assigned by the Huber-VarQC method.
It should be noted that even though both Huber-VarQC and Flat-VarQC adjusted the pressure observation weights, Huber-Outlier substantially outperformed Flat-Outlier. While the Flat-VarQC method’s weight adjustments resulted in a slight RMSE improvement near 500 hPa, relative to CTRL1-Outlier, all of the pressure observations were assigned weights within (0.75, 1.0] (not shown). These large weights indicate that outliers are not well identified. The origin of these large weights is likely due to the fact that the Gaussian plus flat distribution does not match the actual observation error distribution as well as the Huber distribution. Moreover, the Huber-VarQC method also reduced observation weights at pressure levels with no outliers (Fig. 3c). For example, Huber-VarQC recognized the innovation with a pressure equal to 925 hPa at site 4 (red dot) as erroneous. In other words, the VarQC algorithms also assigns a small weight to observations where the quality of the background field is likely poor. Thus, high-quality initial conditions are required when initiating the VarQC assimilation system. Consequentially, because the Flat-VarQC method had difficulties in mitigating the negative impacts of assimilating outlier pressures and dealing with poor quality background fields, Flat-Outlier did not perform as well as Huber-Outlier.
In summary, the idealized experiments show that the Huber-VarQC method has a substantially stronger capability to recognize outliers than the Flat-VarQC method. This suggests that the Huber-VarQC method is more robust at handling outliers than Flat-VarQC. Also, the Flat-Outlier experiment produced a slightly better posterior analysis than the CTRL1-Outlier experiment because the Flat-VarQC method did adjust the observations’ weights. Finally, the experiments indicate that in the absence of VarQC, outliers that pass CQC can degrade the posterior analysis in the current GRAPES m3DVAR system, as opposed to a posterior analysis generated without outliers.
2
4.2. Observational weight features
The impacts of the Flat- and Huber-VarQC methods were also examined in the context of assimilating real data. Figure 4a shows the three-month statistics for surface pressures in a fashion similar to Fig. 1. As seen in Fig. 4a, the three distributions (Gaussian, Gaussian plus flat, and Huber distribution) show a similar fit to the right tail of the innovation distribution. Nonetheless, because the left tail of the innovation distribution (green histogram bars in Fig. 4a) is fatter than the right tail, the innovation distribution is not fitted well by any of these profiles. This is as expected as fitting symmetrical distributions to asymmetric statistics would result in some failure in the fitting. While it is uncertain that the histogram-estimated innovation left-tail probability density is correct, the current DA with a Gaussian OEDM would treat observations from this regime as valid (weights equal to 1). As such, the inclusion of these observations could bring uncertainty to the posterior analysis. Furthermore, the histogram in Fig. 4a shows a precipitous cut-off at the ?4 magnitude of innovations due to the threshold limit assigned in the current background quality check. This confirms that observation errors after CQC are not guaranteed to be Gaussian-distributed.Figure4. (a) As in Fig. 1 but for surface pressure observations from July to September 2013. (b) Statistics of innovations and weights for surface pressure at 0600 UTC 10 August 2015, in a Huber-VarQC test with the background quality check threshold coefficient relaxed from 4 to 16 in the background quality check. (c, d) Statistics of the innovations and weights for aircraft-reported temperature, as assimilated by Flat-VarQC (c) and Huber-VarQC (d). The colored dots represent the magnitude of observation weights in the VarQC methods and correspond to the y-axes to the right. The theoretical weight curves of the two VarQC methods are shown in the bottom-right subplots of panels (c) and (d). The colors of the theoretical weight curves are consistent with the weight colorbar at the bottom of the figure.
To examine Huber-VarQC’s ability to handle the uncertainties discussed in the previous paragraph, a special test of Huber-VarQC was performed for 0600 UTC 10 August 2015. In view of the robustness of Huber-VarQC, this test was performed with the same configurations explained in Table 3 except that the background quality check threshold coefficient was relaxed from 4 to 16. As seen in Fig. 4b, the relaxation resulted in the extension of the green histogram to stronger negative normalized innovation values, indicating that substantially more surface pressure observations were assimilated. More importantly, most of the weights corresponding to observations in the green histograms are less than 0.25 (Fig. 4b) and are thus identified as erroneous. Therefore, Huber-VarQC can assign weights based on the observational quality to absorb (reject) the usable (harmful) information of outliers. This suggests that when the Huber-VarQC method is used, the threshold limits in BgQC can be relaxed to assimilate more observations. This way, the uncertainty introduced by assimilating observations with strong negative normalized innovations (left tail in Fig. 4a) can also be relieved significantly. The sensitivity of the posterior analysis to different threshold limits can be studied in the future. Nonetheless, this particular test indicates that the Huber-VarQC method can robustly handle outlier surface pressure observations.
To examine the effectiveness of the two VarQC methods in adjusting the observation weights, we assimilated the conventional observations using the two VarQC methods at 0600 UTC on 10 August 2015 but without relaxing the BgQC threshold limits. This time was randomly chosen from the long-period assimilation experiments. Figures 4c and 4d respectively show the last-step observation weights of the aircraft-reported temperature observations produced by Flat-VarQC and Huber-VarQC.
As seen in Figs. 4c and 4d, the weights assigned by the two VarQC methods have different characteristics. Firstly, in the Gaussian domain, the analysis weights of the Flat-VarQC method are all approximately equal to but less than 1, whereas the Huber-VarQC method’s Gaussian domain analysis weights are exactly equal to 1. When we go beyond the Gaussian domain, the weights of the Flat-VarQC experiment decrease steeply. In contrast, Huber-VarQC’s weights decrease smoothly with increasing innovations. In other words, Flat-VarQC’s weights display an "n" shape [Eq. (5), subgraph in Fig. 4c], whereas those of Huber-VarQC display a "π" shape [Eq. (9), subgraph in Fig. 4d]. Therefore, these VarQC methods can effectively adjust the analysis weights for real observations.
2
4.3. Optimization of analysis increment
The analysis increment is the difference between the analysis and the background. Thus, the incremental magnitude can be used as an indicator of how much an observation can correct the background after quality control and data assimilation. This incremental response can also reveal the influence of variational quality controlled observations. The differences in 850-hPa geopotential height increment magnitudes between Flat-VarQC/Huber-VarQC and CTRL2 are shown in Fig. 5. Note that the increment magnitude characteristics of the geopotential height and temperature fields are similar (not shown).Figure5. Differences in analysis increment magnitudes for geopotential height (units: gpm) at the 850-hPa level between the Flat-VarQC/Huber-VarQC and the CTRL2 experiments at 0600 UTC 16 August 2015. (a) Flat-VarQC minus CTRL2, (b) Huber-VarQC minus CTRL2. The small circles indicate the VarQC weights of pressure observations at sites from TEMP, SYNOP, and SHIP data. The black circles represent erroneous observations with weights within (0,0.25], the purple circles represent potentially erroneous observations with weights within (0.25,0.5], the blue circles represent suspicious observations with weights within (0.5,0.75], and the grey circles represent valid observations with weights within (0.75,1].
The two VarQC methods have distinctly different impacts on the analysis increment of the geopotential height as compared to CTRL2. The signs of the increment magnitude differences in Fig. 5 are consistent between both VarQC methods in most regions. However, in regions where both VarQC methods have larger increment magnitudes than CTRL2 (red shaded regions in Figs. 5a and 5b), Flat-VarQC has noticeably larger increment magnitudes than Huber-VarQC. Furthermore, in regions where both VarQC methods have smaller increment magnitudes than CTRL2 (green shaded regions in Figs. 5a and 5b), Huber-VarQC has noticeably smaller increment magnitudes than Flat-VarQC. Taken together, these increment magnitude differences imply that Flat-VarQC generally has larger increment magnitudes than Huber-VarQC.
The observational weights generated by Flat-VarQC and Huber-VarQC are also plotted in Fig. 5. While the two VarQC methods utilize different observational error distributions, the reduction of the observational weight for Flat-VarQC and Huber-VarQC occurs at the same or adjacent stations. The observations in Flat-VarQC, whose weights fall in the range of (0.75,1] and are flagged by grey circles, are in locations similar to those of Huber-VarQC. However, there are differences between the two VarQC methods’ weights. Observations with weights under 0.75 are spaced further apart in Flat-VarQC than those of Huber-VarQC. Furthermore, Flat-VarQC’s sub-0.75 weights tend to be smaller than those of Huber-VarQC. From these results, we can infer that Huber-VarQC is more inclusive than Flat-VarQC.
Aside from that, Fig. 5a reveals that the absolute differences between the increment magnitudes of Flat-VarQC and CTRL2 are stronger around observations with reduced weights. This tendency is particularly noticeable around observation sites with severe weight reduction. A similar pattern can be seen with the Huber-VarQC method (Fig. 5b). In other words, for both VarQC methods, the higher the weight reduction, the greater the change in increment magnitude around the weight-reduction sites. As we will see in the next section, this pattern improved the analyzed geopotential height field relative to CTRL2.
2
4.4. Improvement of mass field
The improvement of initial conditions is critical to improving model forecasts. Figure 6 shows the differences between the CTRL2, Flat-VarQC, and Huber-VarQC experiments and ERA-Interim reanalysis for geopotential height at 850 hPa for 0600 UTC 16 August 2015, which is after one DA cycle. We use the ERA-Interim reanalysis as our validating truth. The smaller the difference, the closer the geopotential height is to the ERA-Interim reanalysis. The results from the three experiments are noticeably different from the ERA-Interim reanalysis. These differences are especially large over the middle-west and northern parts shown in Fig. 6, where the maximum difference is about 20 geopotential meters (gpm). Over other areas, the simulations from the GRAPES model are closer to ERA-Interim with a difference within about 5 geopotential meters.Figure6. Differences of posterior analysis for geopotential height (units: gpm) between (a) CTRL2, (b) Flat-VarQC, (c) Huber-VarQC experiments, and ERA-Interim reanalysis at the 850-hPa level at 0600 UTC 16 August 2015. Circles indicate the weight magnitude at sites, as shown in Fig. 5.
We now compare the experiments against each other. Compared to CTRL2, the differences between the VarQC experiments and ERA-Interim are smaller over most regions (Figs. 6b and c). These regions, which include the regions marked by black ellipses, are also areas in the vicinity of reduced-weight observations. Another point of interest is that the Huber-VarQC method’s posterior analysis is closer to the truth than that of Flat-VarQC. For example, even though the Flat-VarQC method only identified one erroneous observation (black circle) in the ellipse regions of the bottom-left corner of Fig. 6b, the resulting analysis over the Sichuan province was further from the ERA-Interim than that of CTRL2. In contrast, over the same regions, the Huber-VarQC method identified four observations for which the weights are reduced (one possible erroneous observation and three suspicious observations) and resulted in a deviation from ERA-Interim that is smaller than both CTRL2 and Flat-VarQC. More notably, the Huber-VarQC method reduced the geopotential height deviation from ERA-Interim to about 9 gpm (as opposed to about 13 gpm in other experiments) without treating any of those observations as erroneous.
Aside from the region marked by the lower-left black ellipse, the Huber-VarQC results are also closer to ERA-Interim in the other regions marked by black ellipses, as compared to the Flat-VarQC results. This suggests that the increment response (Fig. 5) to the weight-reduction of observations made the geopotential height closer to the ERA-Interim reanalysis in the VarQC experiments, particularly for Huber-VarQC. The temperature shows a similar performance (not shown). Similar improvements are also observed for geopotential heights at most other pressure levels (shown in Figs. 7 and 8).
Figure7. Vertical RMSE profiles of posterior geopotential height for CTRL2 (red line), Flat-VarQC (blue line), and Huber-VarQC (black line) at 0600 UTC 16 August 2015.
Figure8. Time evolution of the RMSE differences calculated with Flat-VarQC (blue line) and Huber-VarQC (black line) minus CTRL2 for posterior geopotential height at (a) 850 hPa and (b) 500 hPa, spanning August 2015.
Figure 7 shows the vertical profiles of geopotential height RMSE for the three experiments at 0600 UTC on 16 August 2015, using the ERA-Interim reanalysis as the benchmark. The RMSEs of geopotential height from the two VarQC experiments are noticeably smaller than those of CTRL2 at most levels, and the Huber-VarQC method has the smallest RMSE. The best performance in Huber-VarQC is consistent with the idealized experiments (Fig. 3b). This is because the Huber-VarQC method can assimilate outliers more robustly than Flat-VarQC while mitigating non-Gaussian observation errors' negative impacts. Furthermore, the VarQC experiments performed better at the lower-to-middle levels than at the middle-upper levels (200?500 hPa). This improvement is also seen in the long period DA experiments (Fig. 9). This improvement in the long period DA experiments is probably because the Huber-VarQC method in the GRAPES m3DVAR system used the unchanged transition points at different levels based on the discussion by Tavolato and Isaksen (2015).
Figure9. As in Fig. 7 but for the mean of RMSE spanning August 2015.
To confirm that the Huber-VarQC method generally yields better RMSEs than Flat-VarQC, the two VarQC methods are continuously performed at 0000, 0600, 1200, and 1800 UTC each day for a total of 31 days over August 2015. The configurations of these VarQC continuous DA experiments and a control continuous DA experiment (CTRL2) are shown in Table 3. The ERA-Interim-relative RMSEs of the posterior geopotential height over eastern China (red shaded region shown in Fig. 2) are shown in Figs. 8 and 9. The differences of geopotential height RMSE at 850 hPa (Fig. 8a) and 500 hPa (Fig. 8b) are calculated with Flat-VarQC and Huber-VarQC minus CTRL2. The more negative the RMSE differences are for a VarQC experiment, the better the performance of said VarQC experiment relative to CTRL2.
The 850-hPa geopotential height RMSE differences in Fig. 8a from the two VarQC experiments are negative at most times, indicating that the VarQC experiments are superior to CTRL2 at 850 hPa. Furthermore, the Huber-VarQC method has smaller 850-hPa geopotential height RMSEs than the Flat-VarQC method most of the time. The RMSE improvement of geopotential height at 500 hPa is not as straightforward as at 850 hPa, but the RMSEs (Fig. 8b) of Huber-VarQC are still smaller than that of CTRL2 at most times. In general, the RMSEs of Flat-VarQC are better than those of CTRL2. But, Flat-VarQC shows a bad RMSE difference with a value of ~2.8 against CRTL2 at 1800 UTC on 26 August. This event is probably because of the convergence issues in the cost function minimization. These findings suggest that the two VarQC methods tested here generally improve the low-level analysis field for geopotential height, especially in the Huber-VarQC experiment, and that the improvements are weaker at middle-upper levels.
The mean of vertical profile RMSE for geopotential height from the long period experiments over August 2015 is shown in Fig. 9. Both Flat-VarQC and Huber-VarQC improve the geopotential height at low-middle levels (500?1000 hPa), with the latter VarQC method producing greater improvements. In middle-upper levels, the RMSE of both VarQC methods is similar to the control experiment. The RMSE profiles of temperature also show similar performances, but weaker (not shown). These results indicate that the initial version of VarQC needs to be further improved. For instance, the parameters of the left- and right-transition points in the Huber-VarQC method can vary with height. However, in these experiments, these parameters are fixed. Further study is needed to refine the usage of these parameters.