1.Institute of Atmospheric Environment, China Meteorological Administration, Shenyang 110016, China 2.Regional Climate Center of Shenyang, Liaoning Province Meteorological Administration, Shenyang 110016, China 3.Key Opening Laboratory for Northeast China Cold Vortex Research, China Meteorological Administration, Shenyang 110016, China 4.Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters/Key Laboratory of Meteorological Disaster, Ministry of Education/International Joint Research Laboratory on Climate and Environment Change, Nanjing University of Information Science and Technology, Nanjing 210044, China 5.Liaoning Provincial Meteorological Service Center, Shenyang 110016, China 6.Climate Change Research Center, Institute of Atmospheric Physics, and Nansen-Zhu International Research Centre, Chinese Academy of Sciences, Beijing 100029, China Manuscript received: 2020-04-29 Manuscript revised: 2020-09-30 Manuscript accepted: 2020-10-09 Abstract:The classification of the Northeast China Cold Vortex (NCCV) activity paths is an important way to analyze its characteristics in detail. Based on the daily precipitation data of the northeastern China (NEC) region, and the atmospheric circulation field and temperature field data of ERA-Interim for every six hours, the NCCV processes during the early summer (June) seasons from 1979 to 2018 were objectively identified. Then, the NCCV processes were classified using a machine learning method (k-means) according to the characteristic parameters of the activity path information. The rationality of the classification results was verified from two aspects, as follows: (1) the atmospheric circulation configuration of the NCCV on various paths; and (2) its influences on the climate conditions in the NEC. The obtained results showed that the activity paths of the NCCV could be divided into four types according to such characteristics as the generation origin, movement direction, and movement velocity of the NCCV. These included the generation-eastward movement type in the east of the Mongolia Plateau (eastward movement type or type A); generation-southeast long-distance movement type in the upstream of the Lena River (southeast long-distance movement type or type B); generation-eastward less-movement type near Lake Baikal (eastward less-movement type or type C); and the generation-southward less-movement type in eastern Siberia (southward less-movement type or type D). There were obvious differences observed in the atmospheric circulation configuration and the climate impact of the NCCV on the four above-mentioned types of paths, which indicated that the classification results were reasonable. Keywords: northeastern China, early summer, Northeast China Cold Vortex, classification of activity paths, machine learning method, k-means clustering, high-pressure blocking 摘要:对东北冷涡活动路径进行分类是详细分析其特征的重要手段。本文采用中国东北区域逐日降水量资料,和ECMWF ERA-Interim逐6小时大气环流场、温度场资料,对1979––2018年初夏(6月)东北冷涡过程进行了客观识别,将客观识别得到的东北冷涡过程根据其活动路径制定特征参数,然后采用机器学习方法(k-means)将东北冷涡活动路径进行了客观分类。并从各类路径的大气环流配置及其对东北区域气候的影响两个方面,验证了分类结果的合理性。结果表明:综合考虑东北冷涡生成源地、移动方向和移动速率等特征,可以将东北冷涡活动路径分为4类,分别是:蒙古高原东部生成-东移型(东移型或类型A)、勒拿河上游生成-东南长距离移动型(东南长距离移动型或类型B)、贝加尔湖附近生成-东移少动型(东移少动型或类型C)和东西伯利亚生成-南移少动型(南移少动型或类型D);从大气环流配置和对东北区域气候的影响来看,4类东北冷涡过程的差别较为明显,且这些差别均与活动路径特征相吻合,说明东北冷涡路径的分类结果是合理的。分析冷涡过程对东北区域气候的影响,类型A和类型B引起低温并导致东北区域大部的降水增多。类型C和类型D的影响主要表现为低温,仅在东北区域局部(如偏北部)出现了降水增多。 关键词:中国东北区域, 初夏, 东北冷涡, 活动路径分类, 机器学习, k-means聚类, 阻塞高压
HTML
--> --> --> -->
2.1. Data
The data used in this study included the daily precipitation observation data of 208 stations in the NEC region from June 1979 to June 2018, which were provided by the National Meteorological Information Center. In addition, the atmospheric circulation field data of the ERA-Interim (resolution: 1° × 1°), which were reanalyzed by ECMWF every six hours between 1979 and 2018, were utilized in this study. Figure 1 shows the spatial distribution of the aforementioned 208 meteorological stations in the NEC region. Figure1. Distribution of the 208 stations in the NEC region.
2 2.2. Research methods -->
2.2. Research methods
This study mainly used a machine learning method (k-means clustering method) and a synthesis analysis method to analyze the research data content.
3 2.2.1. Objective identification method for the NCCV system -->
2.2.1. Objective identification method for the NCCV system
Step 1. Tracing the equipotential height line: The equipotential height line was traced in the range of 500 to 600 dgpm on the 500 hPa isobaric surface, with an interval of 4 dgpm, using the ECMWF ERA-Interim data every six hours during the period ranging from June 1979 to June 2018, and the longitude and latitude values were output. Step 2. Screening the equipotential height line: Using the results obtained in Step 1, the closed equipotential height line in the range of 30°N to 80°N and 85°E to 150°E was screened out. Step 3. Identification of the center of the NCCV system: The center of the innermost circle of the isopotential height line of the same system was defined as the NCCV center, and the average value of all the longitudes and latitudes of the innermost circle of the isopotential height line was identified as the center longitude and center latitude value. Only the NCCV systems with a center in the range of 30°N to 60°N and 95°E to 140°E were examined in this study. If there was a low-pressure center on the 500 hPa isobaric surface, the center corresponding to the low-pressure center with a temperature less than 0°C was selected as the NCCV center. Next, the time, latitude and longitude values of the NCCV center were output. Step 4. Identification of the NCCV durations: This study determined that if two adjacent time levels had NCCV centers, and the distance between the two NCCV centers was less than 800 km, it could be regarded as the same NCCV system. The durations of the NCCV centers screened in Step 3 were counted, and the NCCV systems with duration greater than or equal to 72 hours were screened out. Then, the NCCV processes and the related variables were output. Step 5. Rationality verification of the objective identification results: At this point, the obtained results were checked in order to determine whether or not the identified NCCV processes matched the NEC precipitation process times and geographical locations.
3 2.2.2. NCCV activity path classification based on a k-means method -->
2.2.2. NCCV activity path classification based on a k-means method
3 2.2.2.1. Selection of the characteristic parameters of the NCCV activity paths -->
2.2.2.1. Selection of the characteristic parameters of the NCCV activity paths
The parameters used in this study’s clustering process were determined according to the information that represented the characteristics of the NCCV activity paths. This included the generation origins, movement directions, and movement velocities. The longitude and latitude information of the starting points were selected, as well as the longitude information of ending points of the NCCV processes, in order to represent the generation origins and movement directions of the NCCV system. The average values of the latitudes and longitudes of the NCCV processes were calculated for the purpose of identifying the center positions of the NCCV processes. In addition, the variance diagonal(VOD) was calculated by using the formula where $ {x}_{i} $ and $ {y}_{i} $ respectively represent the latitude and longitude of the cold vortex center during the NCCV process, $ \stackrel{-}{x} $ and $ \stackrel{-}{y} $ represent the mean value of the latitude and longitude of the process, and n is the total number of cold vortex centers of the process. VOD represents the relationship between the longitudinal and latitudinal movement distances of the NCCV. The larger the value of the variance of the diagonal is, the straighter the movement track will be, and the larger the movement distance will be. The reason for the latitude information of the end points not being selected was that the longitude and latitude information of both the start points and the end points may have caused confusion in the clustering process. The analysis of all of the NCCV path laws showed that the processes of the NCCV mainly moved in an east?west direction. Therefore, the longitude differences between the starting point positions and the ending point positions were obvious. These data had been completely retained, while the latitude differences between the starting points and the ending points were relatively small. Therefore, the latitude information of the ending points was removed from the parameter selection. The latitudes of the ending points were indirectly represented by the latitudes of the starting points and the average value of the latitude.
3 2.2.2.2. Z-score transformation method -->
2.2.2.2. Z-score transformation method
Due to the fact that there were two dimensions in the characteristic parameters, a data normalization process was required to have been performed before the clustering process commenced, in order to avoid the research results being influenced due to too many data differences. In this study, a Z-score standardization method was adopted. It was found that after processing, the data had conformed to the standard normal distribution. The average value of the entire data was 0, and the standard deviation was 1. The conversion function could be written as follows: where Z is the value after the Z-score transformation; $ \mu $ represents the mean value; and δ indicates the standard deviation.
3 2.2.2.3. Brief introduction to the k-means clustering method and determination of the clustering numbers -->
2.2.2.3. Brief introduction to the k-means clustering method and determination of the clustering numbers
k-means clustering originated from the field of signal processing and belongs to the category of unsupervised clustering in machine learning clustering analysis methods. The Euclidean distance is used to measure the similarity between samples, and data clustering is performed according to the degree of similarity. These methods are widely used in many fields due to their intuitive and fast characteristics. As the k-means clustering methods cannot determine the number of classifications independently, this study set the number of clusters as integers between 2 and 9, and then compared the silhouette coefficients of the different clustering results. The silhouette coefficient is calculated by the dissimilarity degree between the inside and outside of the cluster, and its value is between ?1 and 1. The closer the value is to 1, the better the classification result will be. Due to the fact that it can show the cohesion and separation of clustering results, the silhouette coefficients are important parameters used to measure the clustering effects. The larger the silhouette coefficient is, the better the classification effects will be (Wang et al., 2018). Figure 2 indicates that the silhouette coefficients were the largest when the number of clusters was four. Therefore, the number of clusters of the NCCV system was set as four in this study. Figure2.k-means clustering numbers and the corresponding silhouette coefficient.
-->
5.1. Atmospheric circulation backgrounds on the first days of the NCCV processes
As illustrated in Fig. 6, in order to define the atmospheric circulation backgrounds of each NCCV system on the generation days (representing the generation origin information), a composite graph of the 500 hPa geopotential height fields and 850 hPa wind fields on the first days of all the NCCV processes under each path type was generated in this study. According to the figure, it can be seen that the origin locations of type A NCCV systems (eastern part of the Mongolian Plateau) were controlled by the negative height anomalies that appeared in combination with the Okhotsk Sea blocking height in the northeast. The role of the Okhotsk Sea blocking height was to maintain and strengthen the negative anomaly centers. In addition, there was a relatively weak positive height anomaly center located to the west of Lake Baikal, in the upper reaches of the NCCV generation origin area, which was also conducive to the enhancement of the vortex over this type of NCCV system’s origin. Since the Okhotsk Sea blockage action in the lower reaches was stronger than the positive height anomaly in the upper reaches, the atmospheric circulation configurations of the eastward movement NCVV are referred to as an “east blocking type”. The origin locations of type B of the NCCV systems (upper reaches of the Lena River) were controlled by the negative height anomalies that extended southeast to the Sea of Japan through the NEC region, and corresponded to the activity paths of the southeast long-distance movement NCCV system type. The center of the negative height anomaly of the origin region occurred in conjunction with the Yenisei River blockage action in the northwest, and the East Siberian high-pressure in the northeast. Due to the strong Yenisei River blockage actions in the upper reaches, the atmospheric circulation configurations corresponding to the southeast long-distance movement NCCV are referred to as the “west blocking pattern” in this study. The origin locations of type C of the NCCV systems (near Lake Baikal) were controlled by a negative height anomaly. This negative height anomaly appeared in combination with the upstream Obi River blockage action and the downstream Okhotsk Sea?Japan Sea blocking actions. Although this distribution pattern was similar to the “west blocking type” on the whole, the distribution pattern was found to have a stronger eastern blocking and a larger scope, leading to the blockage of the eastward activities of the NCCV (short movement distance). Therefore, it corresponded to the eastward less-movement paths. The atmospheric circulation configurations corresponding to the eastward less-movement NCCV are referred to as the “double blocking type”. The origin locations of type D of the NCCV (south of East Siberia) were found to be controlled by negative height anomalies and extended southward slightly. Also, the corresponding NCCV systems were observed to move in a slightly southward direction. These negative height anomalies appeared in combination with the blocking actions from central Siberia to East Siberia on the northern side, and the atmospheric circulation configurations corresponding to the southward less-movement NCCV are referred to in this study as the “north blocking type”. Figure6. Composite graph for the atmospheric circulation fields on the first day of (a) type A, (b) type B, (c) type C, and (d) type D of the NCCV system processes. The isoline indicates a 500 hPa equipotential height line (units: gpm), and the arrows denote the 850 hPa vector wind field.
The 850 hPa wind fields corresponding to each type of NCCV system showed cyclonic rotations near the low-value centers of the height fields. With the exception of the southward less-movement type, the centers of the other three types of cyclones were located east of the low-value centers of the height fields. The wind field results confirmed that the northeast cold vortexes were deep systems, and displayed the baroclinic characteristics of tilting in a westward direction with height. From the above-mentioned analysis results, it was determined that the atmospheric circulation patterns corresponding to the first days of the four types of NCCV processes were obviously different. In addition, they were consistent with the NCCV activity paths, which indicated that the achieved classification results were reasonable.
2 5.2. Atmospheric circulation backgrounds of all the occurrence days of the NCCV processes -->
5.2. Atmospheric circulation backgrounds of all the occurrence days of the NCCV processes
For the purpose of clarifying the overall atmospheric circulation backgrounds of all the NCCV process types, composite graphs were constructed in this study for the atmospheric circulation fields of all the NCCV occurrence days under each type of path, as shown in Fig. 7. It can be seen in the figure that the low-value centers’ positions in the height fields were different in the composite graphs for the four NCCV processes during all the occurrence days. The low-value centers of type A of the NCCVs’ height fields were just over the NEC region, which may have had major impacts on the climate conditions of the area. The areas north of the low-value centers displayed positive height anomalies, and the low-value center height fields of type B of the NCCV were located in the eastern sections of the Mongolian Plateau to the northwest of the NEC region. Furthermore, the high-value centers were located northwest of Lake Baikal. It can also be observed that the low-value centers of height fields of type C of the NCCV systems were located near Lake Baikal. The northwestern and eastern sides displayed height anomalies. The eastern side contained positive anomalies that shortened the moving distance of that type of cold vortex to the east. Moreover, the low-value centers of height fields of type D of the NCCV systems were located near Sakhalin Island, and the northwestern and northern sides displayed positive height anomalies. It should be noted that the negative height anomaly centers of the first two types were relatively far from the generation origins, which was consistent with the characteristics of the relatively large movement distances of those two types of NCCV systems. The latter two types were observed to be very close to the generation origins, which was consistent with the characteristics of the small movement distances (less movement) of those two types of NCCV systems. Figure7. As in Fig. 6 but for the atmospheric circulation fields on all of the NCCV occurrence days.
The 850 hPa wind fields corresponding to each type of NCCV system displayed cyclonic rotations near the low-value centers of the height fields. With the exception of type D of the NCCV, the centers of the other three types of cyclones were located east of the low-value centers of the height fields, which indicated the baroclinic characteristics of tilting westward with height.
2 5.3. Atmospheric circulation backgrounds on the peak days of the NCCV processes -->
5.3. Atmospheric circulation backgrounds on the peak days of the NCCV processes
In the current investigation, in order to clarify the atmospheric circulation backgrounds of all the types of NCCV systems on peak occurrence days, Fig. 8 shows this study’s composite graphs for the atmospheric circulation fields of the days with the lowest central geopotential heights of all the NCCV processes. It can be seen in the graphs that the type A and type B’s negative anomaly centers were near the NEC region. This indicated that the strongest days of the cold vortex were located near the geographical scope of the NEC region, which may have had major impacts on the climate conditions of the area. However, the negative anomaly centers of type C and type D were located relatively far away from the NEC region. For example, they were observed to be located in the northwestern and northeastern sections of the NEC region, respectively. Therefore, these NCCV processes had mainly affected the climate conditions in the northern parts of the NEC region. It was found in this study that the 850 hPa wind field characteristics corresponding to each NCCV type were consistent with those of the first day occurrences, as well as the overall occurrence days. Figure8. As in Fig. 6 but for the atmospheric circulation fields on the days with the lowest central potential heights of the NCCV processes.
-->
6.1. Influences on the air temperature values
Figure 9 shows the composite graphs of the NEC region’s temperature anomalies that corresponded to the dates when all of the NCCV systems’ centers were within the NEC geographic range (38°?53°N, 116°?135°E). It can be seen from the graphs that when type A NCCV systems occurred, the air temperatures in the majority of the areas of the NEC region were low, and the low-temperature centers were located in the middle sections of the NEC region. However, the air temperatures in the entire NEC region were low in the case of occurrence of type B NCCVs, and the low-temperature centers were located in the northwestern sections, which was consistent with the conclusions that the southeast long-distance movement NCCV process type trajectories were mainly in the northwestern sections of the NEC region. When a type C NCCV system happened, the air temperatures in the northern section of the NEC region were low. However, since that type of NCCV was characterized by less movement, the low-temperature centers were located in the northwestern sections of the NEC region. In addition, the air temperatures in the north and the middle sections of the NEC region were low in the case of type D NCCVs, and the low-temperature centers were located in the northeastern sections. It can be seen from the above analysis results that the four types of NCCV processes had led to the abnormalities in the low-temperature values in different areas of the NEC region. Furthermore, there were good corresponding relationships observed between the locations of low-temperature values and the activity trajectories of the four types of NCCV systems. Figure9. Composite graphs of the air temperature anomalies in the NEC region corresponding to the dates when the center locations of (a) type A, (b) type B, (c) type C, and (d) type D NCCVs occurred within the NEC geographical area (units: °C).
2 6.2. Influences on the precipitation values -->
6.2. Influences on the precipitation values
Similar to the aforementioned influences on the air temperatures, the influences of the various paths of the NCCV systems on precipitation levels in the NEC region were investigated in this study using a composite analysis method. Figure 10 shows the composite graphs of the percentages of NEC precipitation anomalies that corresponded to the dates when the centers of the NCCV systems were within the geographical range of the NEC region. It can be seen in the graphs that that when type A of the NCCV happened, the majority of the NEC region experienced unusually higher precipitation, and the highest rain centers were located in the middle and eastern sections of the NEC region. It can also be seen that when type B of the NCCV type occurred, the precipitation levels in the majority of the region were high, and the highest rain centers were located to the north and south of the central section of the NEC region. Also, the amount of rain in some regions had reached more than double the average. Moreover, the range of rainy areas was larger than that of type A of the NCCV systems. It is worth noting that, although the trajectories of type B of the NCCV systems were less than those of type A of the NCCV systems, the anomaly magnitudes and range of precipitation were larger than those of type A of the NCCV systems. When type C of the NCCV systems appeared, abnormally higher levels of precipitation had occurred in the northern and eastern sections of the NEC region. Also, the high rain centers were located in the northern part of the region, and the area that experienced more than double the average of precipitation was larger. Among those sections, the largest precipitation increases occurred in the southern part of the NEC region, which may have been due to other influencing factors. It was observed that when type D NCCV systems developed, the precipitation levels in the northeast parts of the NEC region increased. Therefore, from the above-mentioned observations, it was determined that the four types of NCCV processes had led to abnormal precipitation in different areas of the NEC region, and the positions of the high rain areas had displayed good corresponding relationships with the activity trajectories of the four types of NCCV systems. Figure10. As in Fig. 9 but with the precipitation anomaly percentages added.
Based on this study’s analyses of the influencing impacts of the four types of NCCV processes on the air temperature and precipitation levels in the NEC region, it was concluded that the sections in which all the types of NCCV paths passed through had corresponded well with the regions where the climate elements of the NEC region were observed to be abnormal. These findings further explain the rationality of the classification process of the NCCV activity paths proposed in this study.