HTML
--> --> -->The study domain is located at 35.125°–47°N and 103°–126.875°E, which roughly covers northeast China. The grid size is 96 × 192 (lat × lon). The study domain and terrain features are shown in Fig. 1. In this paper, the ECMWF-IFS grid forecast data will also be referred to as the "forecast data", which provides predictor variables (inputs to CU-Net); the ERA5 provides target variables (correct answers for outputs from CU-Net).
Figure1. Study domain. The color bar stands for the terrain altitude.
This study uses 14 years (2005–18) of ECMWF-IFS forecast and ERA5 data. The 2005–16 data were used as the training dataset, and the 2017 and 2018 data were used as the validation and test datasets, respectively. As the correction was performed at 24 h intervals from 24–240 h, we needed to train 10 models corresponding to each correction, i.e., 24 h, 48 h, 72 h, and so on, up to 240 h. For example, the input of the 24 h correction model included observation data (ERA5) and the 24 h forecast data of ECMWF-IFS at the issue time t, whereas the label data, or ground truth data, corresponded to the observation data at t + 24 h. Table 1 shows the training, validation, and testing data sample statistics for the 10 (24–240 h) models.
Lead time | Number of training examples |
24 h | 8760 |
48 h | 8758 |
72 h | 8756 |
96 h | 8754 |
120 h | 8752 |
144 h | 8750 |
168 h | 8748 |
192 h | 8746 |
216 h | 8744 |
240 h | 8742 |
Table1. Statistics of the training, validation, and testing datasets for 10 models. There are 730 validation and 730 testing examples for each lead time.
3.1. Deep learning model
The basic structure of the CU-net model proposed in this study is similar to U-net, which is based on CNN. A standard CNN consists of convolution, pooling, fully connected, and activation layers (Zeiler and Fergus, 2014). The convolution kernel of the convolution layer is similar to the filters used in image processing, such as the Sobel and Roberts filters, which have pre-determined weights known to be good for a certain operation (e.g., smoothing or edge detection). However, the weights of the convolution kernel in the CNN are learned autonomously through network training. The output of a convolution layer is a feature map. Pooling is a downsampling operation, often inserted between two convolutional layers. Pooling layers allow shallow convolutional layers to learn fine-scale features and deeper convolutional layers to learn large-scale features.Compared to traditional convolutional networks, U-net has a large number of feature channels in both downsampling and upsampling parts, resulting in a u-shaped network architecture as shown in Fig. 3. The downsampling side is constructed by stacking downsampling convolutional modules (downconv) on the left side, and the upsampling side is constructed by stacking upsampling convolutional modules (upconv) on the right side. Different layers receive data with different spatial resolutions on two sides. The downconv modules accomplish the encoding process, which combines low-level features to obtain high-level features. High-level features have a higher level of abstraction as they have passed through more convolutions and non-linear activations, but also a coarser spatial resolution due to the pooling operation. The downsampling and upsampling factors in this study are always 2 (i.e., always either halve or double the spatial resolution).
Figure3. Architecture of the CU-net. The stacked downsampling convolutional modules (downconv) on the left side accomplish the encoding process, whereas the stacked upsampling convolutional modules (upconv) on the right side are responsible for the decoding process. The green arrows represent skip connections, which can preserve fine-scale information from shallower layers leading to better predictions. The data flow through the U-net from the top left to the bottom, then to the top right.
The upconv modules perform the decoding process, which reconstructs the compressed information layer by layer and finally transforms it into predictions. The encoding and decoding processes are necessary because they turn the images into high-level features, which are better predictors than using raw pixel values, then they transform high-level features into final predictions. The input (pt+Δt and yt) is fed directly to a standard convolutional layer (conv), which is the first convolutional layer on the encoding side. On the decoding side, the last convolutional layer is used to output the correction result. It should be noted that the data flow through CU-net from the top left (i.e., first layer) to the bottom, then to the top right (i.e., last layer). The other convolutional layers on the decoding side deal with intermediate representations. The benefit of having so many intermediate representations is that different convolutional layers can detect features at different spatial resolutions.
The green arrows in Fig. 3 represent skip connections, which mean that the features in the encoding process are reused in the decoding process through the concat layer. The purpose of skip connections is to enable U-net to preserve fine-scale information from shallower layers. Without skip connections, U-net would amount to simple downsampling followed by simple upsampling, leading to worse predictions, because upsampling can never recover all the fine-scale information lost during downsampling, which means that U-net would destroy a lot of fine-scale information.
The detailed structures of downconv and upconv are shown in Fig. 4, where downconv and upconv are mainly composed of convolution, pooling, and activation layers. The only change to U-net in this study is that “sub-pixel” is used to replace “interpolation” in the upconv module. Shi et al. (2016) has shown that “sub-pixel” is superior to “interpolation”, as the interpolation method only uses handcrafted filters for upsampling, whereas the sub-pixel layer is capable of learning a better and more complex mapping filter for upsampling.
Figure4. Illustration of the detailed structures of downconv, upconv, and sub-pixel. The concat layer appends the channels from the skip connection to channels produced by the sub-pixel operation.
In the upconv module, the sub-pixel layer is responsible for the upsampling operation, which increases the data resolution by r times (r is set to 2 in this study). Through a series of upconv modules, the network will finally yield 96×192 (lat×lon) images, which have the same size as the input (i.e., the output grid has the same size as the input grid), which allows CU-net to make a prediction at every grid point. After data with a size of C × W × H pass through the sub-pixel layer, their size becomes (C × r × r) × W × H. Here, C represents the number of channels; W and H are the width and height of a feature map. Then the data are reshaped to C × (W × r) × (H × r), which leads to an increase in the data resolution by r times. The concat layer in the upconv module combines the features from the encoding process through a skip connection and the features from the decoding process through a sub-pixel operation. Assuming that the size of the feature map in the decoding process is C1 × W × H and the size of the feature map in the encoding process is C2 × W × H, the data size becomes (C1 + C2) × W × H after passing through the concat layer. The concat layer just appends the channels from the encoding process (skip connection) to the channels from the decoding process (sub-pixel operation).
As illustrated in Fig. 3, the input of the network is the observation data of the 2-m temperature yt and the 24 h forecast pt+Δt of the 2m-T at the issue time t; the output is the corrected 24 h forecast pt+Δt of the 2m-T. The observation data yt+Δt at t + 24 h is used as the ground truth during training.
Following the original U-net, the rectified linear unit (ReLU) is used as the activation function in CU-net (Nair and Hinton, 2010). The ReLU function is defined as: f(x) = max(0, x). The convolution filter size is 3 × 3, which is commonly used. The Adam optimizer (Kingma and Ba, 2015) and max pooling is used, and the pooling filter size is 2 × 2, which means that the pooling layer will always reduce the size of each feature map by a factor of 2. In this study, the number of epochs is set 50, which controls the number of complete passes through the training dataset. The learning rate is a hyper-parameter that controls how much to adjust the weights of the model with respect to the loss gradient. Typically, learning rates are set by the user, and it is often hard to get it right. Past experiences are often helpful to gain the intuition on what is the appropriate value. The learning rate is 0.001 in this study. Batch normalization is useful for training very deep neural networks, as it normalizes the inputs to a layer. Large neural networks trained on relatively small datasets tend to overfit to the training data, but the dropout technique can be used to mitigate this effect. As our model is not very large or deep, no batch normalization or dropout techniques are used. The number of CU-net parameters (i.e., all the weights in the conv, upconv, and downconv layers) is 14, 791, 233. The loss function of the CU-net model is defined as mean squared error:
where y is the ground truth (i.e., ERA5 data); y′ is the corrected forecast; and N is the number of grid points in one batch. To minimize the overhead and make maximum use of the GPU memory, the batch size, which indicates how many data samples are fed into the model for training at each time, is set to 32 in this study. Therefore, the size of N is 32×96×192.
In summary, CU-net has symmetric u-shaped network architecture. After the ECMWF-IFS forecast data are input to CU-net, its encoding process turns the images into high-level features, which are better predictors than using low-level features or raw pixel values, then its decoding process transforms high-level features into final predictions. CU-net has many intermediate representations, whose benefit is that different convolutional layers can detect features at different spatial resolutions. It also uses skip connections to reuse the features from the encoding process in the decoding process to preserve fine-scale information from shallower layers. As the output grid has the same size as the input grid, CU-net is able to make a correction at every grid point. The above advantages are beneficial in correcting the forecast bias of numerical weather prediction models and make CU-net useful in this atmospheric application.
2
3.2. ANO method
As a traditional correction method, the ANO method is used for comparison with our CU-net technique in this study. The basic correction process of ANO is as follows. Given the coordinates (i, j) of a grid point, the model climate mean at (i, j) (i.e., the average value of all pi,j in n years) is given by:The climate mean of observations is given by:
The corrected value is as follows:
where pci,j represents the corrected forecast value at (i, j);
-->
4.1. Correction results for 2m-T
Figure 5 shows the RMSE spatial distribution of the corrected 24 h forecast for 2m-T in all seasons in 2018. Significance tests were conducted on the data for the whole year, and significant grid points (at the 95% confidence level) are represented in Fig. 5 with stippling.Figure5. Root mean square error (RMSE) distributions of the corrected 24 h forecast of 2m-T in different seasons in 2018. The left column represents the forecast errors of ECMWF, whereas the middle and right columns are for corrected product based on ANO and CU-net, respectively. In each panel, points with stippling denote places where differences with respect to ECMWF-IFS are statistically significant at the 95% level. The number on the right represents the percentage of stippled points in all points, which means the correction methods provide improvement over ECMWF-IFS on these grid points.
The forecast RMSE of ECMWF- IFS (as shown in the left column of Fig. 5) is relatively large in spring and winter, and smaller in summer and autumn; the error over the ocean is very small, whereas the error over complex terrain is relatively large. Both ANO (middle column of Fig. 5) and CU-net (right column of Fig. 5) had smaller RMSE than raw IFS output; however, CU-net outperformed ANO in every season, as well as for the whole year. Over areas with complex terrain, the RMSE of the ANO method exceeded 2.0°C, whereas CU-net reduced this to about 1.5°C.
Figure 6a shows the RMSE values for temperature for each model in different seasons. CU-net had smaller RMSEs than ANO, which also outperforms ECMWF-IFS. In all seasons, CU-net outperforms ANO and ECMWF-IFS. Table 2 shows the bias, mean absolute error (MAE), correlation coefficient, and RMSE values of the corrected 24 h forecast for 2m-T in 2018 using ECMWF, ANO, and CU-net. The confidence intervals are at the 95% confidence level. ANO achieves better performance than ECMWF-IFS, but CU-net has the best performance in terms of all evaluation metrics.
Figure6. RMSE of the corrected 24 h forecast in all seasons in 2018 for 2m-T (a), 2m-RH (b), 10m-WS (c), and 10m-WD (d). The confidence intervals at the 95% confidence level are shown with black error bars.
Score | ECMWF-IFS | ANO | CU-net |
RMSE | (1.68, 1.75) | (1.47, 1.52) | (1.21, 1.25) |
Bias | (0.27, 0.40) | (?0.17, ?0.10) | (0.07, 0.14) |
MAE | (1.26, 1.31) | (1.11, 1.15) | (0.91, 0.94) |
CC | (0.95, 0.96) | (0.95, 0.96) | (0.96, 0.97) |
Table2. Bias, MAE, correlation coefficient (CC), and RMSE of the corrected 24 h forecast for 2m-T. The confidence intervals are at the 95% confidence level.
Figure 7 shows an example 24 h forecast case at 1200 UTC on 11 January 2018. It is obvious that the corrected result using CU-net is more consistent with the observation (ERA5). It should be noted that ERA5 is reanalysis data, which is smoother than ECMWF and ANO. As CU-net uses ERA5 as the ground truth to perform correction, its result also seems smooth.
Figure7. Illustration of 24 h 2m-T forecast at 1200 UTC on 11 January 2018: (a) ECMWF; (b) corrected forecast using ANO; (c) corrected forecast using CU-net; (d) ERA5.
For longer-term forecasts of 2m-T, Fig. 8 shows the change in the RMSE of CU-net and ANO according to different forecast lead times (24–240 h). CU-net achieved the smallest RMSE for all forecast lead times. Even for the 240 h forecast, CU-net had a percentage decrease of 10.75%, compared to almost 0% for ANO.
Figure8. Change in the RMSE of the corrected 24 h forecast for 2m-T with respect to different forecast lead times. The evaluation was performed on the testing data (2018). The confidence intervals at the 95% confidence level are shown with black error bars.
2
4.2. Correction results for 2m-RH
Figure 9 shows the RMSE spatial distribution of the corrected 24 h forecast for 2m-RH. The same significance tests as in Fig. 5 were conducted for the data from different seasons. Compared to the ECMWF results in the left column of Fig. 10, both ANO and CU-net exhibited improved forecast accuracy; however, CU-net was superior to ANO for every season, as well as for the entire year. In Fig. 9, the area marked in red represents a large RMSE of about 0.14. For the winter season, over the red areas, we can see that ANO and CU-net reduced the RMSE to 0.12 and <0.1, respectively.Figure9. Same as Fig. 5, but for 2m-RH.
Figure10. Illustration of 24 h 2m-RH forecast at 1200 UTC on 19 October 2018: (a) ECMWF; (b) corrected forecast using ANO; (c) corrected forecast using CU-net; (d) ERA5.
Figure 6b shows the RMSE values for each model in different seasons for 2m-RH. The confidence intervals are at the 95% confidence level. ANO achieved positive correction performance in spring, autumn, and winter, but had negative performance during the summer. By contrast, CU-net achieved better performance in all seasons than ANO and ECMWF-IFS. Table 3 shows the bias, MAE, correlation coefficient, and RMSE values of the corrected 24 h forecast for 2m-RH. CU-net achieved the best performance for all four evaluation metrics.
Score | ECMWF-IFS | ANO | CU-net |
RMSE | (8.80, 9.11) | (8.32, 8.53) | (6.83, 7.03) |
Bias | (1.38, 1.92) | (0.08, 0.48) | (?0.22, 0.09) |
MAE | (6.47, 6.69) | (6.19, 6.36) | (5.09, 5.23) |
CC | (0.88, 0.89) | (0.88, 0.89) | (0.91, 0.92) |
Table3. Same as Table 2 but for 2m-RH.
Figure 10 shows an example 24 h forecast case at 1200 UTC on 19 October 2018, to illustrate that the corrected result using U-net is more consistent with the observation (ERA5).
For longer-term forecasts of 2m-RH, Fig. 11 shows the change in the RMSE of CU-net and ANO, according to different forecast lead times (24–240 h). Similar to the results of the 2m-T correction discussed above, CU-net achieved the smallest RMSE for all forecast lead times. For the 240 h forecast correction, ANO showed a percentage decrease of –2.26% compared to 20.14% for CU-net.
Figure11. Same as Fig. 8, but for 2m-RH.
2
4.3. Correction results for 10m-WS
Figure 12 shows the RMSE spatial distribution of the corrected 24 h forecast for 10m-WS. Significance tests were also conducted in the same way as in Fig. 5. Compared to the results of ECMWF in the left column of Fig. 12, ANO showed no improvement, whereas CU-net showed obvious improvements in all seasons.Figure12. Same as Fig. 5, but for 10m-WS.
Figure 6c shows the RMSE values for each model in different seasons for 10m-WS. ANO improved over the ECMWF only in autumn and had negative performance for all other seasons, whereas CU-net achieved improvement in all seasons. Table 4 shows the bias, MAE, correlation coefficient, and RMSE values of the corrected 24 h forecasts for 10m-WS using ECMWF, ANO, and CU-net. The confidence intervals are at the 95% confidence level. Again, CU-net has the best correction performance in terms of all evaluation metrics.
Score | ECMWF-IFS | ANO | CU-net |
RMSE | (1.05, 1.08) | (1.06, 1.09) | (0.76, 0.79) |
Bias | (?0.22, ?0.20) | (0.13, 0.15) | (?0.01, 0.02) |
MAE | (0.76, 0.79) | (0.80, 0.82) | (0.55, 0.57) |
CC | (0.84, 0.86) | (0.84, 0.85) | (0.89, 0.90) |
Table4. Same as Table 2 but for 10m-WS.
Figure 13 shows the 24 h forecast on 15 March 2018, and therein the red ellipse indicates that ECMWF and ANO have obvious error while CU-net’s correction is more consistent with the observations.
Figure13. Illustration of 24 h 10m-WS forecast at 1200 UTC on 15 March 2018: (a) ECMWF; (b) corrected forecast using ANO; (c) corrected forecast using CU-net; (d) ERA5. The red ellipse indicates that ECMWF and ANO have obvious error while CU-net’s correction is more consistent with the observations.
For longer-term forecasts of 10m-WS, Fig. 14 shows the change in the RMSEs of CU-net and ANO, according to the forecast lead time. Similar to the results of 2m-T and 2m-RH correction, CU-net achieved the smallest RMSE for all forecast lead times. In general, ANO did not perform well for 10m-WS forecast correction. For all different forecast lead times, ANO did not have a positive correction effect. CU-net continued to provide positive results as the forecast lead time approached 240 h.
Figure14. Same as Fig. 8, but for 10m-WS.
2
4.4. Correction results for 10m-WD
Figure 15 shows the RMSE spatial distribution of the corrected 24 h forecast for 10m-WD. Significance tests were also conducted in the same way as in Fig. 5. Similar to the results for 10m-WS correction, ANO showed no improvement and in some cases showed worse results. CU-net showed improvements in all seasons. Notably, the correction of wind direction has been a challenging issue, as described in previous studies (Bao et al., 2010).Figure15. Same as Fig. 5, but for 10m-WD.
Figure 6d shows the RMSE values for each model in different seasons for 10m-WD. ANO did not show a positive performance in any season. By contrast, CU-net achieved a positive performance in all seasons. Table 5 shows the bias, MAE, correlation coefficient, and RMSE values of the corrected 24 h forecast for 10m-WD. The confidence intervals are at the 95% confidence level. Figure 16 shows the forecast results on 18 April 2018. Similar to previous experiments, CU-net’s correction is more consistent with the observation although it has a smoothing effect.
Score | ECMWF-IFS | ANO | CU-net |
RMSE | (38.22, 39.41) | (38.84, 40.09) | (30.98, 32.24) |
Bias | (23.11, 24.05) | (23.82,24.79) | (17.77, 18.67) |
MAE | (23.11, 24.05) | (23.81,24.79) | (17.77, 18.67) |
CC | (0.60, 0.62) | (0.59, 0.61) | (0.68, 0.70) |
Table5. Same as Table 2 but for 10m-WD.
Figure16. Illustration of 24 h 10m-WD forecast at 1200 UTC on 18 April 2018: (a) ECMWF; (b) corrected forecast using ANO; (c) corrected forecast using CU-net; (d) ERA5.
For longer-term forecasts of 10m-WD, Fig. 17 shows the change in the RMSE of CU-net and ANO according to different forecast lead times. Again, CU-net achieved the smallest RMSE for all forecast lead times. Similar to the 10m-WS correction, ANO did not perform well for the 10m-WD forecast correction and did not have a positive correction effect for any lead time. Although CU-net continued to provide positive results as the forecast lead time increased, its performance continued to degrade from 18.57% (24 h) to 3.7% (240 h).
Figure17. Same as Fig. 8, but for 10m-WD.
2
4.5. Using reliability curves for conditional forecast verification
In order to identify if CU-net has conditional bias, we use reliability curves to further evaluate its performance for each forecast value. The 24 h forecast results are used for analysis. As shown in Fig. 18, the reliability curve has forecast values on the x-axis and mean observation on the y-axis. The black line is the perfect-reliability line (x = y) which represents the observation ERA5 (correct answers).Figure18. Reliability curves of the corrected 24 h forecast in 2018 for 2m-T (a), 2m-RH (b), 10m-WS (c), and 10m-WD (d).
Figures 18a and b show that for 2m-T and 2m-RH, CU-net performs better than other methods. For 10m-WS, as shown in Fig. 18c, CU-net achieves overall good performance, though there are slight underestimations whenever it predicts >~6.5 m s?1. For 10m-WD, Fig. 18d shows that CU-net obviously fits better with the diagonal line than other models. Besides, all panels show that each model has conditional bias, while CU-net has smallest bias.
2
4.6. Correction results after incorporating terrain information
This section describes an additional experiment that was conducted to test whether including terrain information in the proposed CU-net model could offer further improvements in the correction (Steinacker et al., 2006). The terrain data Q (a grid of orographic height), along with pt+Δt and yt, were input into the CU-net model, as shown in Fig. 3; the new model with terrain data is referred to as TCU-net. The experimental results of the 24 h forecast correction are shown in Table 6. The confidence intervals are at the 95% confidence level. It can be seen that TCU-net improved the performance, as all four weather variables showed smaller RMSEs after including the terrain information.Variables | ECMWF | CU-net | TCU-net |
2m-T (°C) | (1.68, 1.75) | (1.21, 1.25) | (1.17, 1.21) |
2m-RH (%) | (8.80, 9.11) | (6.83, 7.03) | (6.60, 6.77) |
10m-WS (m s?1) | (1.05, 1.08) | (0.76, 0.79) | (0.76, 0.79) |
10m-WD (°) | (38.22, 39.41) | (30.98, 32.24) | (30.78, 32.05) |
Table6. RMSE of the corrected 24 h forecast in 2018 for four weather variables using ECMWF, CU-net, and TCU-net. The confidence intervals are at the 95% confidence level.
2