HTML
--> --> -->In the variational and EnKF-based data assimilation techniques, the analysis is obtained by finding a maximum likelihood of the probability density functions (PDFs) of the true state of atmosphere when observations and a priori background estimation are given. Common data assimilation systems theoretically require that the PDFs of analysis, background, and observation errors satisfy the Gaussian unbiased distribution. If the assumption is not satisfied, unrealistic analysis will arise in the data assimilation process (Errico et al., 2000; Ravela et al., 2007).
A non-Gaussian (NG) nature of the background errors could result from the time integration of the model nonlinearity (Bocquet et al., 2010), especially the highly nonlinear physical processes in NWP (Auligné et al., 2011). The displacement errors of meteorological features may also lead to NG in the background errors (Lawson and Hansen, 2005). Among the model variables which are commonly used in the data assimilation systems, hydrometeors tend to have the highest degree of nonlinearity and the lowest predictability (Fabry and Sun, 2010; Fabry, 2010). Thus, NG is inevitable when the control variables of data assimilation systems include hydrometeors, which is of vital importance for assimilating cloud sensitive observations, like radar reflectivity and cloudy satellite radiances (Errico et al., 2007).
Various studies have focused on how to include hydrometeors as control variables in the data assimilation systems in order to directly analyze hydrometeors from the cloud sensitive observations. The total water mixing ratio (TWMR), which is the sum of the humidity-related variable and hydrometeors, is used as the control variable in many studies (Xiao et al., 2007; Liu et al., 2009; Yang et al., 2016; Li et al., 2017). However, TWMR is indeed a humidity control variable, and it is often limited to the simple and incomplete microphysical process employed to separate the hydrometeor increments from the total humidity increments. Hydrometeor mixing ratios are also chosen as the control variables in some data assimilation systems (Gao and Stensrud, 2012; Wang et al., 2013; Chen et al., 2015). Hydrometeor mixing ratios are easy to implement in a data assimilation system, but the NG of the hydrometeor background errors has not been taken into consideration in past studies. Some other studies chose the logarithm of hydrometeor mixing ratios as the control variables in the data assimilation systems (Boukabara et al., 2011; Michel et al., 2011; Liu et al., 2020), but did not include a thorough discussion of the NG aspect of the logarithm of hydrometeors. Recently, some researchers have used reflectivity as the control variable in data assimilation system (Wang and Wang, 2017), but its application is limited to radar reflectivity assimilation with pure ensemble background error covariances. To obtain more Gaussian hydrometeor control variables, Ho?lm and Gong (2010) explored how to extend the humidity control variable transform method (Ho?lm, 2002) of the European Centre for Medium-Range Weather Forecasts (ECMWF) to include hydrometeors. In their research, the normalized hydrometeors were selected as the control variable candidates, but the exact formulation of the normalization was not given and needs further investigation.
In this study, a new Gaussian transform method is proposed with the objective to construct more Gaussian hydrometeor control variables in variational data assimilation systems. The new Gaussian transform is modified based on the Softmax function (Bridle, 1990), and is named the Quasi-Softmax function in this study. This article will be organized as follows. In section 2, the Quasi-Softmax method, the D′Agostino test (D′Agostino, 1970), the configuration of the experiments, and the description of statistical samples are presented. In section 3, the discussion of the transformed hydrometeors from the perspective of spatial distribution, NG and characteristics of background errors is given. Finally, the conclusions are drawn in section 4.
2.1. Quasi-Softmax function
For a set of samples


This transformation is called the Softmax function (Bridle, 1990), which is commonly used in neural networks. The numerator of Eq. (1) is the exponential function of

Considering that the magnitude of hydrometeor mixing ratios is relatively small and the typical non-precipitation region may cover a large area, the calculated denominator in Softmax function may be very close at different levels, making it possible that the vertical distribution characteristics of hydrometeor mixing ratios may be lost after transformation. To handle this issue, a modification has been made to the Softmax function, renamed as the Quasi-Softmax function. With the Quasi-Softmax function, the original hydrometeor mixing ratio


In Eq. (2),




where K is the number of vertical levels. Compared to the original Softmax function, the denominator of Quasi-Softmax function becomes the sum of the exponential function of









2
2.2. D′Agostin test
The degree to which samples deviate from being truly Gaussian can be detected from the PDF’s skewness and kurtosis. The skewness measures asymmetry of the PDF about its mean, while kurtosis is a measure of how peaked is the distribution. For a given sample
where
























and

Positive (negative)









The

2
2.3. Background Error covariance modeling
In this study, we focus more on the variational DA method, in which the background error covariance is static, homogeneous, and isotropic. The control variable transform (CVTs) method (Barker et al., 2004), which is common employed to model the background error covariance in variational DA systems, is used in this study. With the CVTs method, the square root of the background error
where














The Gaussian transform is conducted before the physical transform, and it is applied to the full model variables rather than perturbations.
2
2.4. Statistical samples and experimental configurations
In this study, a heavy rainfall case that occurred in the middle and lower reaches of the Yangtze River from late June to early July 2016 was studied. This event resulted in great economic losses in China. The period from 0600 to 1800 UTC 2 July 2016 was selected as the period of interest. The 12-h accumulated precipitation for this period in the simulation domain is shown in Fig. 1a, as reported by the China Hourly Merged Precipitation Analysis (CHMPA; Shen et al., 2014). Figure 1b shows the brightness temperature of the channel 8 of the Himawari-8 Advanced Himawari Imager (AHI) valid at 1800 UTC 2 July 2016, where the cold colors indicate the cloudy regions, corresponding well to the precipitation areas shown in Fig. 1a. Figure1.                                            (a) Observed 12-h accumulated precipitation (units: mm) from 0600 UTC to 1800 UTC 2 July 2016 in the study domain, (b) the brightness temperature (K) of channel 8 from Himawari-8 AHI valid at 1800 UTC 2 July 2016, and (c) the vertical profiles of qc, qi, qr, and qs (g kg?1) from one ensemble member valid at 1800 UTC 2 July 2016.
                                                                                                                                                                                                Figure1.                                            (a) Observed 12-h accumulated precipitation (units: mm) from 0600 UTC to 1800 UTC 2 July 2016 in the study domain, (b) the brightness temperature (K) of channel 8 from Himawari-8 AHI valid at 1800 UTC 2 July 2016, and (c) the vertical profiles of qc, qi, qr, and qs (g kg?1) from one ensemble member valid at 1800 UTC 2 July 2016.The Weather Research and Forecasting (WRF) model V3.8.1 (Skamarock et al., 2008) is used as the NWP model in this study. The horizontal grid spacing is 4 km, and the number of horizontal grid points is 550×450. The number of vertical levels is 51, and the model top set to 10 hPa. The following physics parameterization schemes are adopted: the WRF single-moment 6-class microphysics scheme (WSM6); the Rapid Radiative Transfer Model for GCMs (RRTMG) shortwave and longwave radiation schemes; the Mellor-Yamada-Janji? (MYJ) boundary layer scheme. No cumulus parameterization is employed.
Considering that hydrometeors evolve rapidly with time, in this study we chose to use the ensemble sample to calculate hydrometeor background errors, as employed in previous studies (Michel et al., 2011; Legrand et al., 2016). In order to obtain the statistical samples of background errors for hydrometeors, an 80-member ensemble forecast was carried out, which was initialized from an 80-member ensemble analysis valid at 0600 UTC 2 July 2016. The 80-member ensemble analysis was provided by the EnKF system of NCEP’s operational Global Data Assimilation System (GDAS). The 12-h forecasts of the 80-member ensemble valid at 1800 UTC 2 July 2016 were used as the statistical samples, and the background errors of hydrometeors were approximated by the deviations of each ensemble member from the ensemble mean. Figure 1c shows the vertical profiles of
















This study aims to find a Gaussian transform method to construct more Gaussian hydrometeor control variables in data assimilation systems. Four experiments are designed, and the details of the four experiments are shown in Table 1. The experiment Origin uses the original hydrometeors as a benchmark. It has been pointed out that the logarithmic transform, like denary logarithmic (Log10), can bring the PDFs of background errors for some variables closer to Gaussian (Errico et al., 2007; Fletcher and Zupanski., 2007), so the experiment Log10 employs the logarithm of hydrometeors as in Michel et al. (2011). The Softmax function is used in experiment Softmax in this study; The newly constructed Quasi-Softmax function is employed in the experiment Q_softmax.
| Experiments | Hydrometeor control variables | 
| Origin | ${q_{i,j,k}}$ | 
| Log10 | $\lg \left(\dfrac{{{q_{i,j,k}}}}{{{q_{\rm{0}}}}}\right);{q_0} = {10^{ - 3}}{\rm{kg}}\;{\rm{k}}{{\rm{g}}^{ - 1}} $ | 
| Softmax | $\dfrac{{\exp (\beta {q_{i,j,k}})}}{{\displaystyle\sum {\exp (\beta {q_{i,j,k}})} }}$ | 
| Q_softmax | $\frac{{\exp (\beta {q_{i,j,k}})}}{{\displaystyle\sum\limits_{{{\overline q }_{i,j}} > 0} {\exp (\beta {{\overline q }_{i,j}}){\rm{ - }}\displaystyle\sum\limits_{{q_{i,j,k}} > 0} {\exp (\beta {{\overline q }_{i,j}})} } }},{\overline q _{i,j}} = \dfrac{1}{K}\displaystyle\sum\limits_{k = 1}^K {{q_{i,j,k}}} $ | 
Table1. Four experiments and their hydrometeor control variables.
3.1. Spatial distribution of transformed hydrometeors
In order to evaluate the impacts of different transform methods on the spatial distribution of hydrometeors, in this subsection the horizontal and vertical distribution of hydrometeors before and after transformation are studied.Figure 2 shows the horizontal distribution of the various transformed




































 Figure2.                                            Transformed
                                                                                                                                                                                                Figure2.                                            Transformed 
Compared to the horizontal distribution, the characteristics of the vertical distribution of hydrometeors are greatly changed. Figure 3 shows the vertical profiles of




 Figure3.                                            The vertical profiles of (a) qc, (b) qi, (c) qr, and (d) qs for Origin (10?5 kg kg?1), Log10 (kg kg?1), Softmax (10?6 kg kg?1) and Q_softmax (10?5 kg kg?1) from one sample.
                                                                                                                                                                                                Figure3.                                            The vertical profiles of (a) qc, (b) qi, (c) qr, and (d) qs for Origin (10?5 kg kg?1), Log10 (kg kg?1), Softmax (10?6 kg kg?1) and Q_softmax (10?5 kg kg?1) from one sample.2
3.2. NG of the background errors for transformed hydrometeors
The spatial distribution of the hydrometeors for the four experiments were compared, and the results show that the characteristics of the distribution of the original hydrometeors are similar with that in the Q_softmax. The NG of the background errors of hydrometeors for different transform methods are further studied in this subsection.An example of the horizontal structures of NG is given for

























 Figure4.                                            K2 of qs at model level 25 (~ 300 hPa) for (a) Origin, (b) Log10, (c) Softmax and (d) Q_Softmax.
                                                                                                                                                                                                Figure4.                                            K2 of qs at model level 25 (~ 300 hPa) for (a) Origin, (b) Log10, (c) Softmax and (d) Q_Softmax.The horizontal distribution of K2 for




















 Figure5.                                            K2 of qi at model level 25 (~ 300 hPa) for (a) Origin, (b) Log10, (c) Softmax and (d) Q_softmax.
                                                                                                                                                                                                Figure5.                                            K2 of qi at model level 25 (~ 300 hPa) for (a) Origin, (b) Log10, (c) Softmax and (d) Q_softmax.The vertical profiles of K2 for the four experiments related to NG are shown in Fig. 6 for















































 Figure6.                                            Vertical profiles of K2 of (a) qc, (b) qi, (c) qr, and (d) qs for the four experiments. For each level, values are averaged over the horizontal domain.
                                                                                                                                                                                                Figure6.                                            Vertical profiles of K2 of (a) qc, (b) qi, (c) qr, and (d) qs for the four experiments. For each level, values are averaged over the horizontal domain.2
3.3. Characteristics of BE for transformed hydrometeors
In the previous two subsections, the spatial distribution characteristics and NG of the background errors of the hydrometeors for the four experiments were discussed. It was shown that the transformed hydrometeors in Q_Softmax exhibit a reasonable spatial distribution and are the most Gaussian variables among the four experiments. In this subsection, the background error characteristics of hydrometeors for the four experiments are discussed to further evaluate whether the background errors of the transformed hydrometeors are reasonable. The horizontal, vertical variances and the horizontal length scale are discussed, respectively.In the data assimilation, the weight of observations to analysis depends on the relative size of the background errors and the observation errors, so the variance of background errors plays a vital role in the data assimilation. Figure 7 shows the horizontal standard deviation (SD) of

 Figure7.                                            Horizontal standard deviation of the transformed qs at the 25th model level for (a) Origin (10?4 kg kg?1), (b) Log10 (kg kg?1), (c) Softmax (10?7 kg kg?1) and (d) Q_Softmax (10?6 kg kg?1).
                                                                                                                                                                                                Figure7.                                            Horizontal standard deviation of the transformed qs at the 25th model level for (a) Origin (10?4 kg kg?1), (b) Log10 (kg kg?1), (c) Softmax (10?7 kg kg?1) and (d) Q_Softmax (10?6 kg kg?1).Figure 8 shows the vertical profiles of the SD of the hydrometeors for the four experiments. For the original hydrometeor mixing ratios, the vertical distribution of SD is similar to the vertical distribution of hydrometeors themselves. The SD is larger at the levels where the hydrometeors are greater, which means that the uncertainty of hydrometeors are larger at these levels. The values of the SD of the experiment Log10 are almost the same at all levels for the four hydrometeors, though this may not properly represent the vertical characteristics of the background errors of hydrometeors. The vertical characteristics of SD in Softmax is similar to that in Origin. The vertical profile of SD for hydrometeors in Quasi-Softmax is also close to that in Origin except for


 Figure8.                                            Vertical standard deviation profile of (a) qc, (b) qi, (c) qr, and (d) qs for Origin (10?5 kg kg?1), Log10 (10 kg kg?1), Softmax (10?8 kg kg?1) and Q_softmax (10?7 kg kg?1).
                                                                                                                                                                                                Figure8.                                            Vertical standard deviation profile of (a) qc, (b) qi, (c) qr, and (d) qs for Origin (10?5 kg kg?1), Log10 (10 kg kg?1), Softmax (10?8 kg kg?1) and Q_softmax (10?7 kg kg?1).The horizontal length scale is an important parameter which determines how far the observations can be spread in the control variable space. Figure 9 shows the horizontal length scales of the four hydrometeors for the four experiments. The length scales of the original hydrometeors are all < 8 km (2 model grid lengths), which means that the observations containing hydrometeor information will spread a much shorter distance when compared to the common control variables like wind, temperature, and humidity. For












 Figure9.                                            Horizontal length scale (km) for (a) qc, (b) qi, (c) qr, and (d) qs for Origin, Log10, Softmax and Q_softmax.
                                                                                                                                                                                                Figure9.                                            Horizontal length scale (km) for (a) qc, (b) qi, (c) qr, and (d) qs for Origin, Log10, Softmax and Q_softmax.Firstly, the horizontal and vertical distribution characteristics of hydrometeors for the four different transform methods were discussed. The characteristics of the horizontal distribution characteristics of original hydrometeor mixing ratios were kept in experiments Log10, Softmax and Q_softmax, but the vertical distribution characteristics varied a lot in these three experiments. The Log10 and Softmax methods changed the vertical distribution greatly, while the Quasi-Softmax method basically kept the characteristics of vertical structures of hydrometeors.
The D′Agostino test was used to diagnose the NG of background errors of the transformed hydrometeors. The original hydrometeors showed great NG, especially in the intersection areas between cloudy and clear regions, where the greatest uncertainty occurred. The Log10 method slightly improves the NG in the intersection areas but increased the NG in cloudy areas. The Softmax method improved the NG considerably in the intersection areas between cloudy and clear regions, but did not help in the cloudy area. The Quasi-Softmax decreased the NG of hydrometeors significantly and the transformed hydrometeors were much closer to a Gaussian distribution. The Softmax and Quasi-Softmax methods produced a new problem that the hydrometeors in clear areas are transformed such as not to have null values, indicating that the transformation should not be carried out in clear areas. These and other issues should be taken into consideration and will be explored in future work.
The new Gaussian transform was added to the CVTs of the background error covariances, and the characteristics of

In this study, the new Gaussian transform method was only evaluated by measuring the NG of the transformed hydrometeors and diagnosing the characteristics of the background errors. The new Gaussian transform method will be implemented to the data assimilation system, and its application to the assimilation of radar reflectivity or satellite radiance retrievals will be studied further in the near future. We also noticed that some Gaussian transform methods have been applied to other fields, like the Gaussian anamorphosis method applied to precipitation variables (Lien et al., 2013; Kotsuki et al., 2017). It is worth exploring application of this method to hydrometeors. Besides, some assimilation techniques based on the non-Gaussian framework, like particle filters, has been developed and applied in the recent decade (Poterjoy, 2016; Buehner and Jacques, 2020; Kawabata and Ueno, 2020), and it is also worth exploring their handling of NG in the assimilation of cloud-sensitive observations at convective scales.
Acknowledgements. This research was funded by National Key Research and Development Program of China (Grant No. 2017YFC1502102), National Natural Science Foundation of China (Grant No. 42075148), and Graduate Research and Innovation Projects of Jiangsu Province (Grant No. KYCX20_0910). The numerical calculations of this study are supported by the High-Performance Computing Center of Nanjing University of Information Science and Technology (NUIST).
