HTML
--> --> --> -->2.1. Data
The TC best-track dataset over the WNP, containing the maximum sustained surface wind speed and location (longitude and latitude) information in 6-hour intervals, was obtained from the Shanghai Typhoon Institute (STI) of the China Meteorological Agency (CMA). In this study, TC intensity is defined as the maximum two-minute average 10-m wind speed (V). TCs with V ≥ 17 m s?1 were selected as samples to develop the LGEM. We note that all data over land were excluded since the maximum potential intensity (MPI) included in the LGEM is limited to the ocean. TC samples during 1982?2014 were used to construct the LGEM, while those which occurred during 2015?17 were utilized as independent samples to evaluate the prediction skills of the LGEM. Figure 1 shows the numbers of the training and test samples, in which the training samples account for more than 90% of the total samples. To further evaluate the performance of the LGEM, the official real-time forecast data of TC intensity from the CMA during 2015?19 were derived from the TC operational database at the STI.Figure1. The numbers of training and testing samples for different forecast times at 6-h intervals.
Over the past decade, the WNP Intensity Prediction Scheme developed by the STI (WIPS; Chen et al., 2011) has been continuously operating and has generally shown good skill among the CMA’s operational intensity forecast models (Chen et al., 2019). In this study, we used the same inputs as the operational WIPS model, including potential predictors and MPI. Following the WIPS model, we used the 6-hourly reanalysis data with a horizontal resolution of 2.5° × 2.5° from the National Centers for Environmental Prediction and National Center for Atmospheric Research (NCEP/NCAR) (Kalnay et al., 1996) to calculate the various environmental predictors. Note that the location of TC center is also needed to calculate the predictors in this study. The weekly optimum interpolation (OI) SST V2 data at a horizontal resolution of 1° × 1° from the National Oceanic and Atmospheric Administration (NOAA) (Reynolds et al., 2002) were used to calculate the ocean predictors after linear interpolation into 6-hourly data. Furthermore, the NCEP Global Forecasting System (GFS) forecast fields (Yang et al., 2006) during 2017?19 were also used for additional applications.
2
2.2. Methodology
32.2.1. The LGE
Following DeMaria (2009), the generalized prediction equation for TC intensity (V) based on the LGE can be written aswhere dV/dt is the intensity tendency, Vmpi is the MPI, κ is the time-dependent growth rate, and β and n are two positive constants that determine the magnitude of diffusive processes caused by the ocean and atmosphere. The TC intensity tendency is mainly determined by the growth and the diffusion processes. The first term of the right-hand side of the equation is the intensity growth term, which is determined by the degree of (un) favorable environmental factors. The second term reflects the diffusive processes, which include the increase in friction that occurs along with the intensity growth and the damping process that occurs when the TC moves into colder SSTs or an otherwise unfavorable atmospheric environment. For simplicity, the 6-h forward difference will be used to approximate V every six hours from 6 to 168 h.
3
2.2.2. LightGBM
In this study, we applied a step-wise regression (SWR) method and an ML method for the LGE-based TC intensity forecast. Here, the ML method used is LightGBM, which is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithms (Ke et al., 2017). It originates from the Gradient boosting decision tree (GBDT) but possesses significant improvements in resolving its scalability and long computational time by adopting a leaf-wise, tree growth strategy and introducing novel techniques. Previous studies have demonstrated that the LightGBM offers good prediction performance, consumes short computational time, and is a promising ML method (Ju et al., 2019; Zhang et al., 2019). In addition, since the average lifetime of TCs is about one week, the number of samples rapidly decreased from 21330 to 3905 for the predictions every six hours from 6 hto 168 h (seven days; Fig. 1). The LightGBM is well-balanced in processing such great changes of samples. Therefore, we will apply it to the LGEM construction and compare its prediction performance with that of conventional regression.3
2.2.3. RMSE
Here, the Root Mean Square Error (RMSE) was used to evaluate the intensity prediction skills of the LGEM. The calculation formula of the RMSE is written aswhere the term fi refers to the value of a forecast V for the forecast time i, and the term oi is the value of V from observation. m is the number of the sample.
3
2.2.4. POD and FAR
The skill of TC rapid intensification and rapid weakening forecasts was evaluated utilizing the probability of detection (POD) and the false alarm rate (FAR) (Wilks, 2006). The POD is the percentage of time that rapid intensification or rapid weakening events are correctly identified. The FAR is the ratio of the number of times that an event is forecast to occur but does not, divided by the total number of times that an event does not occur.To quantify the relative importance of the potential predictors in affecting TC intensity changes, we employed the Lindeman, Merenda, and Gold method (LMG; Lindeman, 1980) of the relaimpo package (Groemping, 2006) within the R environment for statistical computing (R Core Team, 2013). The LMG method takes the average of the sequential sums of squares over all orderings of regressors, which addresses both the direct effects and those effects adjusted for other regressors in the model.
-->
3.1. Predictor selection
Factors affecting TC intensity vary from basin to basin. DeMaria (2009) constructed the North Atlantic and eastern North Pacific LGEMs based on the predictors from the simple Statistical Hurricane Intensity Prediction Scheme (SHIPS). As mentioned above, the potential predictors in this study were selected based on the WIPS. As shown in Table 1, these predictors include the climatology and persistence predictors and the atmospheric and oceanic predictors for each 6-h forecast interval out to 168 h (seven days). Similar to the WIPS, all of these were derived along the TC tracks. The MPI was estimated using the equation by Knaff et al. (2005). Moreover, we tested the other three common formulas of MPI over the WNP as inputs (DeMaria and Kaplan, 1994; Baik and Paek, 1998; Zeng et al., 2007). The results show that the MPI developed by Knaff et al. (2005) used in the LGE-based model generally shows better skill in forecasting TC intensity than others. Therefore, the MPI developed by Knaff et al. (2005) was selected in this study. Following Knaff et al. (2005), the maximum value of MPI is set to 95 m s?1 (185 kt) to avoid unreasonable MPI.Predictors | Units | Description |
VWS | m s?1 | The averaged vertical wind shear between 200 and 850 hPa within a radius of 5 degrees of the TC center |
CMV | m s?1 | The meridional component of TC moving speed |
TMP20 | K | The averaged 200-hPa temperature within a radius of 5–10 degrees of the TC center |
VOR85_lon | ° | The longitude of the greatest vorticity at 850 hPa in the range of 2 degrees (4 degrees) of the TC center at 0–24 h (>24 h) forecasts |
VOR85_lat | ° | The latitude of the greatest vorticity at 850 hPa in the range of 2 degrees (4 degrees) of the TC center at 0–24 h (>24 h) forecasts |
RH5030 | % | The averaged relative humidity at 500–300 hPa within a radius of 5–10 degrees of the TC center |
RH8570 | % | The averaged relative humidity at 850–700 hPa within a radius of 5–10 degrees of the TC center |
PENV | hPa | The averaged sea level pressure within a radius of 5–10 degrees of the TC center |
SST | °C | Sea surface temperature at the TC center |
H50 | gpm | The 500-hPa geopotential height at the TC center |
AT850 | K | The 850-hPa temperature difference relative to the left and right semicircle of TC moving path |
DIV20 | s?1 | The averaged divergence at 200 hPa within a radius of 5–10 degrees of the TC center |
DIV85 | s?1 | The averaged divergence at 850 hPa within the radius of 5–10 degrees of the TC center |
AV85 | s?1 | The averaged absolute vorticity at 850 hPa within a radius of 5–10 degrees of the TC center |
AU | m s?1 | The averaged zonal wind at 200 hPa within the radius of 0–5 degrees of the TC center |
MPI | m s?1 | Maximum potential intensity |
DV12 | m s?2 | Previous 12-h intensity change |
Table1. Description of the potential predictors.
Since the predictors are vital to a statistical model, we first reexamine them using correlation and relative importance analyses. Note that all of the predictors, as well as the predictands, were normalized before they were further analyzed. Figure 2 illustrates the scatter distributions of the potential predictors and the 24-h TC intensity tendency from 1982 to 2014. As expected, these predictors show high correlations with the 24-h TC intensity change that is significant at the 99% confidence level except for the average 200-hPa divergence (DIV20). Most notable is the strong correlation between the MPI and the 24-h TC intensity change, with a correlation coefficient of 0.48. Note that there are two reasons for the relationship being examined only for the 24-h TC intensity change. The first is because the 24-h centered time difference will be used to determine β and n and to calculate the “observed” κ as indicated in the next section, which is consistent with DeMaria (2009). The other is because the 6-h forward difference will be used to predict TC intensity as indicated in section 2.2.1, which means that the predictors at the previous six hours of each forecast time are also important. Compared to 24-h TC intensity change, 6-h TC intensity change shows similar correlations with the potential predictors (not shown).
Figure2. Scatter plots of environmental factors and 24-h TC intensity changes. The regressed line is marked in each subplot, and the corresponding correlation coefficient is shown in the lower right corner.
Further, we calculated the relative contributions of each factor that affects TC intensity change using the LMG method as introduced in section 2. As shown in Fig. 3, among all of the factors, the previous 12-h intensity (DV12), MPI, the latitude of the greatest vorticity at 850 hPa (VOR85_LAT), and SST contribute the most to TC intensity changes, with contributions of 33.0%, 8.3%, 5.6%, and 5.5%, respectively, all of which are statistically significant above the 95% bootstrap confidence level. In contrast, the absolute vorticity and temperature difference between right and left semicircle relative to the TC track at 850-hPa and 500-hPa geopotential heights contribute the least to TC intensity. The following optimal predictors were selected to construct the LGEM according to the above analyses based on the correlation and relative importance: DV12, MPI, VWS, AU, TMP20, VOR85_LAT, VOR85_LON, RH8570, RH5030, and SST, each of which made contributions larger than 0.5%.
Figure3. Distribution of relative importance (%) of potential predictors.
2
3.2. Construction of the LGEM over the WNP
With the optimal predictors and the LGE as introduced in section 2, the LGE-based TC intensity forecast scheme over the WNP is developed based on the TC best-track data and the reanalysis data in this study. A separate set of submodules is used to predict TC intensity every six hours, from 6 h to 168 h.Figure 4 summarizes the workflow in constructing the LGEM. The workflow consists of three parts: data preprocessing, model development, and model prediction. In the data preprocessing, the optimal predictors and predictands every six hours from 0 to 168 h were calculated using the historical CMA TC best-track data, NCEP/NCAR reanalysis, and NOAA SST data during 1982?2017. The training dataset during 1982?2014 is used to build the LGEM by fitting the two constant parameters of β and n and estimating the growth rate κ. The two constants are determined by the least square method which makes the regressed growth rate from the optimal predictors as close as possible to the "observed" growth rate. The growth rate is further estimated based on the SWR and LightGBM, respectively. Furthermore, the testing dataset during 2015?17 is used to indicate the performance of the LGEM by predicting κ and then the TC intensity. Finally, the CMA real-time forecast dataset of TC intensity is compared to the LGEM to further evaluate its forecast potential.
Figure4. A schematic diagram of the prediction system of LGEM, including data preprocessing, model development, and model prediction.
2
3.3. Estimation of the constants β and n
In order to determine the values of β and n, Eq. (1) can be written aswhere dv/dt was calculated from the best-track intensities of TCs over water during 1982?2014 using a 24-h centered time difference, similar to DeMaria (2009). First, we discretized β from 0 to 0.05 using 0.001 intervals and n from 0 to 5 using an increment of 0.1 according to the values over the Atlantic (DeMaria, 2009) in which the final values of β and n were 1/24 and 2.5. Using historical observed TC intensity and MPI data, we can calculate the "observed" κ (denoted as κ1) values with Eq. (3). Then, we can also obtain the estimated κ (denoted as κ2) based on the regression equations using the above optimal predictors derived from reanalysis data. κ1 and κ2 were recalculated with different values of β and n which were determined by minimizing the square errors between κ1 and κ2. Figure 5 shows the distribution of total square errors of the growth rate κ between observation and regression as a function of β and n based on the samples during 1982?2014. The total square error reaches a minimum of 18.035 when the values of β and n are 0.023 h?1 and 2.3, respectively, which are very close to their counterparts over the Atlantic (β = 0.025 h?1 and n = 2.6). This suggests that although the factors which affect TC intensity changes are different over the WNP compared to the Atlantic basins, the values of β and n are similar to each other.
Figure5. The distribution of total square errors of the growth rate
2
3.4. Estimation of the growth rate κ
According to DeMaria (2009), the growth rate κ is a function of large-scale variables and persistence predictors, which are time-dependent. After determining the constant parameters of β and n, we can obtain the exact values of “observed” κ using Eq. (3). Then, the SWR and LightGBM were used to train and predict κ using the optimal predictors and the “observed” κ, respectively. As mentioned above, the training dataset during 1982?2014 was used to train the relationship between predictors and κ. As a result, a separate set of regression models and a separate set of LightGBM models were built to predict κ every six hours from 6 to 168 h. Using these two sets of models and the testing dataset during 2015?17, we can predict κ at each forecast time. Given that κ and other parameters in Eq. (1) are known, the LGEM with a forward-time-differencing scheme was used to predict the intensity (V) at each forecast time.-->
4.1. Case study demonstration
The test cases are Typhoon Maysak (201504) and Typhoon Champi (201525), both of which were maintained for more than 10 days over the WNP and experienced rapid intensification, but exhibited different tracks and intensity changes. Figures 6a–6d show tracks and intensities for these two TCs. Maysak formed east of Pohnpei on 27 March as a tropical storm, intensified to a category super typhoon on 31 March with the intensity of 65 m s?1, and weakened to a tropical storm before striking the Philippines. Champi formed northeast of the Marshall Islands on 13 October, intensified to a typhoon on October 16, and reached peak intensity with the intensity of 55 m s?1 on 18 October. Then, Champi started to weaken but experienced a short-lived re-intensification on 22 October. It became an extratropical cyclone on 25 October before fully dissipating on 28 October.Figure6. (a, b) Tracks for Maysak and Champi in 2015 and (c, d) the corresponding intensity (blue) and the calculated growth rate κ (red) at 6-h intervals based on the CMA best track data; The 7-day forecasts of the intensity (unmarked color lines) for (e, g) Maysak and (f, h) Champi in 2015 at different forecast times with 6-h intervals based on (e, f) SWR-based and (g, h) LightGBM-based LGEMs and the corresponding CMA best-track intensity (red dotted line). In (e–h), those unmarked color lines mean 7-day TC intensity predictions with 6-h intervals, and the first point of each line indicates the initial forecast time.
Figures 6c and 6d show the evolution of the observed values of the growth rate κ for these two TCs. It can be seen that κ for Typhoon Maysak maintained a positive and high value during the early stages of TC genesis and development, and then reached a second maximum 6–12 hours before Maysak reached peak intensity. Afterwards, κ started to gradually decay before becoming negative during the decaying period. The evolution of κ in Typhoon Champi is similar to that in Typhoon Maysak, but κ in Typhoon Champi also experienced another peak before TC re-intensification. It should be noted that in the early stages, although the value of κ is large due to conducive environmental factors which support TC development at this stage, the net effect of κ is relatively small due to the small TC intensity. At the development and peak stages, the changes in κ are consistent with those in TC intensity with leading indicators, which suggests that the effect of κ is vital. This indicates that κ in Eq. (1) indeed is reasonable in promoting TC development.
Figures 6e–6h show the maximum winds from the 7-day forecasts of the SWR-based and LightGBM-based LGEMs and the CMA best track for Typhoon Maysak and Typhoon Champi. Both LGEMs reproduce every aspect of the intensity evolution of corresponding TCs reasonably well. It is worthy to note that the LightGBM-based scheme demonstrates better skill in predicting the rapid intensification and re-intensification of the TCs with a smaller mean bias and a smaller spread than the SWR-based scheme. In contrast, the SWR-based scheme incurs large errors in predicting TC peak intensity. To further compare the forecast performance, we calculated the RMSEs of two LGEMs for two cases at lead times from 24 to 168 h every 24 h. As shown in Table 2, the RMSEs in the LightGBM-based scheme are smaller than those in the SWR-based scheme except for the 144-h and 168-h forecasts for Typhoon Champi. We also compared the forecasts of the LGEM with those from the CMA (not shown) and found that the LGEM forecasts generally show better forecasting skill at every time. The evidence suggests that the LGEM, especially the ML scheme, seems to be promising in predicting TC intensity.
Forecast Time (h) | Maysak | Champi | |||
LGEM (SWR) | LGEM (LightGBM) | LGEM (SWR) | LGEM (LightGBM) | ||
24 | 5.1 | 5.0 | 5.3 | 3.7 | |
48 | 6.1 | 4.9 | 8.0 | 4.7 | |
72 | 7.3 | 6.1 | 9.5 | 5.4 | |
96 | 9.2 | 7.5 | 10.3 | 7.7 | |
120 | 9.0 | 6.5 | 9.8 | 9.1 | |
144 | 8.3 | 4.9 | 9.1 | 9.8 | |
168 | 8.0 | 4.3 | 8.5 | 10.4 |
Table2. RMSEs of intensity forecasts for Maysak and Champi in 2015 at 24, 48, 72, 96, 120, 144, 168 h forecasts. Smaller RMSEs between the two methods are shown in boldface.
2
4.2. Comprehensive verifications
To confirm the results from the above case test, we further examine the forecast performance of the LGEM based on 2015?17 TC samples, which include 80 TCs. First, we calculated the RMSEs of the 7-day dV/dt forecasts in Eq. (1) from the SWR-based and LightGBM-based LGEMs at 6-h intervals for the independent cases during 2015?17. Since a forward-time-differencing scheme every 6 h from 6 to 168 h was used to predict V at each forecast time, dV/dt denotes the rate of TC intensity change between the forecast time and 6 h before the forecast time. Generally, the RMSEs of the dV/dt forecasts at 6–168 h are similar, ranging from 1.09 × 10?4 m s?2 to 1.38 × 10?4 m s?2 for the LightGBM-based LGEM and from 1.07 × 10?4 m s?2 to 1.32 × 10?4 m s?2 for the SWR-based LGEM. The small changes in RMSEs of the dV/dt forecasts among different forecast times suggest that the LGEM has a good potential for making longer-time TC intensity forecasts (DeMaria, 2009; Cangialosi, 2020), further noting that the longer-time forecast errors might be due to the cumulative errors of TC intensity forecasts.Figure 7 displays the RMSEs of the 7-day intensity forecasts from the two LGEMs and the 5-day forecasts from the CMA at 24-h intervals for independent cases during 2015?17. In general, RMSE increases with the longer forecast times for all three kinds of forecasts. A prominent feature in Fig. 7 is that the CMA forecast errors were larger than those from both the SWR-based and LightGBM-based LGEMs at all forecast times. The differences between the SWR-based LGEM and the CMA forecasts were statistically significant above the 95% confidence level at 48 h and 120 h. and those between LightGBM-based LGEM and the CMA forecasts were statistically significant above the 95% confidence level at 24–120 h. This indicates a good potential for the LGEM to produce reliable TC intensity forecasts. Another interesting feature is that the LightGBM-based LGEM showed smaller errors than the SWR-based LGEM for all of the forecast periods except the 168 h forecast, suggesting an advantage for the LightGBM method in improving TC intensity forecasts compared to the conventional SWR method.
Figure7. Averaged RMSEs (m s?1) of the 7-day intensity forecasts from the SWR-based and LightGBM-based LGEMs and the 5-day forecasts from the CMA at 24-h intervals for independent cases during 2015?17.
It is interesting and important to evaluate the performance of the LGEM-based model in forecasting TC rapid intensification and rapid weakening. Here, we used the POD and the FAR to make an evaluation based on the testing dataset during 2015?17. To increase sample size, we defined rapid intensification and rapid weakening as the values of the 24-h intensity change DV24 ≥ 12 m s?1 and DV24 ≤ ?12 m s?1, respectively. There is a total of 182 and 162 events during 2015?17 that demonstrated rapid intensification and rapid weakening, respectively. Since the LightGBM-based LGEM has better skill at 24-h forecasts than the SWR-based LGEM (Fig. 7), we only examined the performance of the LightGBM-based model. For the 2015?17 WNP samples, the PODs of TC rapid intensification and rapid weakening forecasts were 35% and 41%, while the FARs of them were 29% and 13%, respectively. Their effective time is at 24-h lead time. The POD of rapid intensification forecasts for WNP TCs based on the LGEM is generally comparable to that for Atlantic hurricanes from the NHC official forecasts during 2015?17 (Fig. 6 of Cangialosi et al., 2020).
We further evaluate the spatial distribution of differences in RMSEs between the CMA and the LightGBM-based LGEM forecasts as shown in Fig. 8. The positive difference indicates better skill for the LightGBM-based LGEM forecasts compared to those of the CMA operational forecasts. The differences in RMSEs in Fig. 8 show nearly spatially uniform positive values at all forecast times, which suggests that the LGEM can potentially improve upon current official forecasts from the CMA. The improvement of the LGEM compared to the CMA forecasts is particularly noteworthy in coastal regions since the intensity forecasts for TCs in the coastal regions are of great importance for disaster prevention.
Figure8. The spatial distribution (m s?1) of differences in RMSEs between the CMA and the LightGBM-based LGEM forecasts during 2015?17 at (a) 24, (b) 48, (c) 72, (d) 96, and (e) 120 h.
Figure 9 presents the spatial distribution of RMSEs for the LightGBM-based LGEM forecasts at 144 h and 168 h. Both show that RMSEs over most of the WNP are smaller than 11 except over the high latitudes southeast of Japan where the RMSE is slightly larger. Compared to the RMSE of the current CMA operational forecasts at 120 h (Fig. 8), the LGEM is promising at longer forecast times. In this sense, the LGEM exhibits strong forecasting potential for extending the CMA forecast length from the current five days to seven days.
Figure9. The spatial distribution (m s-1) of RMSEs for the LightGBM-based LGEM forecasts during 2015?17 at (a) 144 and (b) 168 h.