CN110930282B - Local rainfall type analysis method based on machine learning - Google Patents
Local rainfall type analysis method based on machine learning Download PDFInfo
- Publication number
- CN110930282B CN110930282B CN201911243213.6A CN201911243213A CN110930282B CN 110930282 B CN110930282 B CN 110930282B CN 201911243213 A CN201911243213 A CN 201911243213A CN 110930282 B CN110930282 B CN 110930282B
- Authority
- CN
- China
- Prior art keywords
- rainfall
- cluster
- matrix
- distance
- thread
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 22
- 238000010801 machine learning Methods 0.000 title claims abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 45
- 238000007621 cluster analysis Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 230000001133 acceleration Effects 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 58
- 238000004364 calculation method Methods 0.000 claims description 31
- 235000012571 Ficus glomerata Nutrition 0.000 claims description 15
- 244000153665 Ficus glomerata Species 0.000 claims description 15
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 238000005259 measurement Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000003672 processing method Methods 0.000 claims description 2
- 238000004088 simulation Methods 0.000 abstract description 2
- 238000013461 design Methods 0.000 description 5
- 238000009825 accumulation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a local rainfall type analysis method based on machine learning, which comprises the following steps: 1) collecting, processing and storing data; 2) automatic extraction of rainfall events; 3) generating a local rainfall sample set; 4) performing cluster analysis on rainfall events based on GPU acceleration; 5) and analyzing the generated clustering tree to obtain a representative rain pattern. According to the rainfall event analysis method, the rainfall events are automatically extracted by collecting the station observation data of the drainage basin, and then the most representative rainfall process is analyzed by adopting a machine learning method to serve as the representative rainfall type of local rainfall, so that the workload of artificial analysis can be greatly saved, the difference caused by subjective judgment is avoided, meanwhile, the analysis result has regional pertinence, and powerful support can be provided for analysis of critical rainfall of mountain torrents and numerical simulation of urban inland inundation.
Description
Technical Field
The invention belongs to the technical field of water conservancy projects, particularly relates to the technical field of flood control forecast, and particularly relates to a local rainfall type analysis method based on machine learning.
Background
In recent years, extreme rainstorm events in China are frequent, local rainstorm has strong burstiness and short time efficiency, and are main inducing factors of mountain torrents and urban waterlogging. For local rainstorm, except for rainfall and raininess, the rainstorm type is used as a description of the rainstorm process, the distribution of the rainstorm intensity on a time scale is shown, and the distribution is one of main disaster-causing characteristics of the rainstorm event, and even if the rainstorm process has the same rainfall and raininess, the disaster-causing characteristics are different from one another.
Because the torrential rain and flood process in hilly areas steeply rises and falls and is difficult to forecast in real time, the method of dynamic critical rainfall is mainly adopted to carry out torrential flood early warning at present. Meanwhile, the urbanization of China is rapidly developed, flood control and drainage standards adopted by most cities are low, waterlogging disasters are frequent, and influence assessment is usually carried out on rainstorm waterlogging by adopting a numerical simulation mode at present. Research has shown that the rainstorm type has a direct influence on the determination of critical rainfall of mountain torrents disasters and the determination of the maximum range and the maximum depth of urban waterlogging.
At present, when determining the critical rainfall of the mountain torrents and simulating the urban waterlogging numerical value, the local rainstorm rainfall type is mainly calculated by adopting a same-frequency analysis method or a design rainfall type, and the commonly used design rainfall types comprise a Chicago rainfall type, a Huff rainfall type, a Pilgrime rainfall type, a Yen & Chow rainfall type and the like. The same-frequency method needs more human intervention, and results are subjective due to different samples and different comprehensions caused by different expert experiences. Various design rain types such as Chicago rain type, Huff rain type, Yen & Chow rain type and the like are obtained by generalized design of foreign scholars according to a rainstorm sample in a certain area, have a certain difference with the actual rainfall process, and currently, no accepted rain type exists as the basis for design.
Disclosure of Invention
The invention aims to provide a local rainfall type analysis method aiming at the problem.
A local rainfall type analysis method based on machine learning comprises the following steps:
1) collecting, processing and storing data: collecting the rainfall data of hydrology and meteorological sites in a flow domain to be analyzed and carrying out equal-time-period processing;
2) automatic extraction of rainfall events: sequentially reading continuous rainfall time sequences of all stations in the database, dividing the continuous rainfall time sequences into independent rainfall sessions, and generating rainfall time sequences of a plurality of rainfall sessions;
3) generating a local rainfall sample set: generating a sample set by utilizing the rainfall time sequences of the plurality of occasions extracted in the step 2), wherein the elements of the sample set are independent rainfall events and are subjected to standardization treatment, and the number of the elements in the set is the same as that of the rainfall occasions; dividing the sample set into a plurality of subsets according to different duration of each rainfall event;
4) rainfall event cluster analysis based on GPU acceleration: performing cluster analysis based on each subset of the sample set generated in the step 3) to generate a plurality of cluster trees, wherein the specific steps of the cluster analysis are as follows: 4-1, generating initial clusters: treating each element in the subset as an initial cluster; 4-2, calculating a distance matrix: the matrix size is (N multiplied by N), N is the number of rainfall events contained in the subset, the element (i, j) of the matrix is the distance measurement of the i cluster and the j cluster, the similarity of the rainfall event i and the rainfall event j is represented, the DTW distance is used as the similarity measurement standard, and the similarity is stronger when the distance is smaller; accelerating the calculation of the matrix by adopting GPU parallel calculation; 4-3, merging the clusters based on the distance matrix in the step 4-2, finding out two clusters with the shortest distance for merging, renumbering the clusters, calculating the distance between the new cluster and each other cluster, and updating the distance matrix; 4-4, repeating the step 4-3 until all the cluster clusters are combined into one cluster, thereby generating a cluster tree; 4-5, repeating the steps 4-2-4, so that a corresponding clustering tree is generated based on each subset in the sample set;
5) analyzing the clustering tree generated in the step 4): taking the root node as the 1 st layer of the clustering tree, the nth layer of the clustering tree comprises n nodes, each node is 1 cluster and comprises 1 clustering center, traversing and searching the node clusters of the given layer, calculating the distance matrix of the rainfall events contained in each node cluster, and the matrix size is (m)i×mi) N, n is the number of the nodes in the layer, miThe number of rainfall events contained in the ith node is, and the element (i, j) of the matrix is the DTW distance between the rainfall event i and the rainfall event j; and calculating a distance matrix of rainfall events contained in each node cluster, and then calculating the sum of all rows of the node clusters, wherein the rainfall event corresponding to the row index with the minimum sum is the clustering center of the node cluster, namely the representative rainfall type of the local rainfall of the watershed.
Further, the time span of the rainfall data in step 1) covers 10 years or more than 10 years.
Further, the method for dividing the rainfall in the step 2) comprises the following steps: setting a time threshold, and regarding the rainfall process as two rainfall processes when the intermission time of the rainfall process exceeds the threshold, and regarding the rainfall process as one rainfall process when the intermission time of the rainfall process is less than the threshold; and setting a magnitude threshold, and when the total rainfall in one rainfall process is lower than the magnitude threshold, determining that the rainfall is micro rainfall and not taking the rainfall into consideration.
Further, the standardized processing method of the rainfall event in the step 3) comprises the following steps:where n is the length of the rainfall event, PiFor standardized rainfall sequence points, PjFor the original rainfall sequence points, i, j are the normalized sequence and the time index of the original sequence, respectively.
Further, the subset dividing method in step 3) is as follows: and dividing the total duration into a plurality of time intervals according to the duration of each event in the rainfall event set, and extracting and generating the events with the rainfall durations in the same interval into a subset.
Further, the DTW distance calculation method includes: for time series X ═ X1,x2,...,xi,...,xmY ═ Y1,y2,...,yi,...,ynFinding a warped path W to represent the mapping relation W between the time sequences X and Y, wherein m and n represent the lengths of the two time sequences, respectively, { W ═ W }1,w2,...,wk,...,wKK is more than or equal to max (n, m) and less than or equal to K and n + m-1, and the kth element of W is recorded as WkThe cumulative distance calculation formula of the point (i, j) is represented by a correspondence relationship between the ith element of the time series X and the jth element of the time series Y: γ (i, j) ═ d (x)i,yj) + min { γ (i-1, j-1), γ (i-1, j), γ (i, j-1) }, given an initial condition γ (1, 1) ═ d (x)1,y1) And the accumulated distance matrix is obtained by iterative calculation,i.e. the DTW distance of time series X and Y.
Further, the specific method for accelerating the calculation of the matrix by adopting the parallel calculation of the GPU in the steps 4) and 4-2 is as follows: assigning a thread to each matrix element to account for DTW distance, assigning a thread count to the thread block first, and for two-dimensional matrix operations, assigning a thread count (tb, tb) to each thread block, where tb is2The number of threads contained in the thread block required to be smaller than the maximum number of threads contained in the thread block allowed by the GPU; second, thread blocks are allocated for the thread cells, and for two-dimensional matrix operations, thread blocks (bg, bg) may be allocated for each thread cell, whereN is the number of samples in the rainfall event sample subset, and bg is smaller than the maximum number of thread blocks contained in the thread grid allowed by the GPU; after thread allocation is completed, DTW distance calculation of each element of the matrix is completed by using the GPU and returned to a CPU memory, and therefore distance matrix counting is completedAnd (4) calculating.
The invention has the beneficial effects that:
the rainfall events are automatically extracted by collecting station observation data of a drainage basin (region), and then the most representative rainfall process is analyzed by adopting a machine learning method to serve as a local rainfall representative rainfall pattern, so that the workload of artificial analysis can be greatly saved, and the difference caused by subjective judgment is avoided.
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Drawings
FIG. 1 is an overall flow chart of the method of the present invention;
FIG. 2 is a schematic illustration of rainfall data interpolation;
FIG. 3 is a dynamic warping path of the time series;
FIG. 4 is a rainfall event ensemble;
fig. 5 rain sample example 1;
fig. 6 rain sample example 2;
fig. 7 rain sample example 3;
fig. 8 rain sample example 4;
fig. 9 rain sample example 5;
fig. 10 rainfall sample example 6;
FIG. 11 a subset 1 clustering tree;
FIG. 12 a subset 2 cluster tree;
FIG. 13 a 3 rd subset cluster tree;
FIG. 14 is a 4 th subset cluster tree;
FIG. 15 a subset cluster tree of FIG. 5;
FIG. 16 a subset cluster tree;
FIG. 17 cluster number 1 of subset number 1;
FIG. 19 cluster No. 3 of subset No. 1;
FIG. 20 cluster No. 4 of subset No. 1;
FIG. 21 cluster 5 of subset 1;
Detailed Description
Example 1
A local rainfall type analysis method based on machine learning comprises the following steps:
1) data collection, processing and storage
And (3) collecting data: collecting rainfall observation data of hydrology and meteorological sites in a basin (region) to be analyzed, wherein the demand of cluster analysis on data volume is large, and the time span of the rainfall data needs to cover 10 years or more than 10 years.
And (3) processing data: processing rainfall data into an equal-period time sequence, if the original data is unequal-period data, performing interpolation processing on the data, preferably performing interpolation according to a rainfall accumulation curve, as shown in fig. 2, firstly obtaining the rainfall accumulation curve by using the original sequence, and further obtaining the equal-period rainfall time sequence { P'1,P′2,P′3,...,P′12}。
And storing the processed equivalent time rainfall time sequence data to a database.
2) Automatic extraction of rainfall events
And sequentially reading the rainfall time sequence of each station in the database, and dividing the rainfall time sequence into independent rainfall fields. In rainfall time series { P1,P2,P3,...,PtAnd its corresponding time stamp sequence { T }1,T2,T3,...,TtFor example, the division method is as follows: setting time threshold ThTWhen the interval time T of the rainfall processj-TiExceeds threshold ThTThe process is regarded as two precipitation processes, and the threshold value is not exceededThTThe rainfall process is regarded as a one-time rainfall process, so that automatic and continuous rainfall field division is realized; setting magnitude threshold ThAWhen the total rainfall in one rainfall process is lower than the threshold ThAThe rainfall is considered to be trace rainfall and is not considered. By means of the method, the rainfall time sequence of each station is calculated in a traversing mode to obtain n rainfall sequences { Pi1,Pi2,...,PikAnd its time mark sequence { T }i1,Ti2,...,TikAnd f, wherein i is 1, N is the number of rainfall fields, and k is the number of periods corresponding to the rainfall fields.
3) Generating local rainfall sample sets
Utilizing the N rainfall sequences { P) extracted in the step 2)i1,Pi2,...,PikN, N is the number of rainfall events, k is the number of periods corresponding to the rainfall event, and a sample set containing N elements is generated, wherein the elements of the set are independent rainfall events. Using formulasStandardizing rainfall events, wherein n is the length of the rainfall event, P'iFor standardized rainfall sequence points, PjIs the original rainfall sequence point. The sample set is divided into a plurality of subsets according to different divisions of each rainfall event duration.
4) Rainfall event cluster analysis based on GPU acceleration
Performing cluster analysis based on each subset of the sample set generated in step 3) to generate a cluster tree. The clustering analysis comprises the following specific steps:
4-1. generating an initial cluster with each element in the subset as an initial cluster, D ═ x for a subset of N data objects1,x2,...,xNSet an initial cluster set C ═ C1,C2,...,CNIn which C isj={xj};
4-2. calculating the first distance MatrixFThe matrix size is (N × N), and N is the number of rainfall events contained in the subsetAnd counting, wherein an element (i, j) of the matrix is the similarity between the cluster i and the cluster j, and for the first distance matrix, the element (i, j) is the similarity between the rainfall event i and the rainfall event j. Using the DTW distance as a similarity measure, the smaller the distance is, the stronger the similarity is, and the DTW distance calculation method is as follows:
for time series X ═ X1,x2,...,xi,...,xmY ═ Y1,y2,...,yi,...,ynFind a warped path W to represent the mapping between the time series X and Y, as shown in fig. 3, where W ═ W1,w2,...,wk,...,wKK is more than or equal to max (n, m) and less than or equal to K and n + m-1, and the kth element of W is recorded as WkThe term (i, j) denotes a correspondence relationship between the ith element of the time series X and the jth element of the time series Y. The selection of the twisted path has three constraints: the warp path starts at the start element of the matrix and ends at the diagonal element, i.e. w1=(1,1),wK(m, n); the twisted path is continuous at each step, i.e. for wk=(a,b),wk-1(a ', b') provided that a-a 'is ≦ 1 and b-b' is ≦ 1; the warped path is monotonic on the time axis, i.e., for wk=(a,b),wk-1(a ', b') where a-a 'is not less than 0 and b-b' is not less than 0.
There are many paths that can satisfy the constraint condition, and here, the path with the minimum distortion cost is found, that is:
wherein d (w)k) Is wkThe distance between two corresponding elements of the representation.
According to the dynamic planning idea, if the point (i, j) is on the optimal path, the sub-path from the point (1, 1) to the point (i, j) is also a local optimal solution, that is, the optimal path from the point (1, 1) to the point (m, n) can be obtained by the recursive search of the local optimal solution from the starting point (1, 1) to the end point (m, n), so that the optimal path can be conveniently foundMatrix, the matrix elements (i, j) being two time series points xiAnd point yjDistance d (x) therebetweeni,yj)=(xi-yj)2. The cumulative distance calculation formula for defining point (i, j):
γ(i,j)=d(xi,yj)+min{γ(i-1,j-1),γ(i-1,j),γ(i,j-1)}
an initial condition γ (1, 1) ═ d (x) is given1,y1) The cumulative distance matrix can be obtained by iterative computation.Namely the DTW distance between the time sequence X and the time sequence Y, the best matching path can be obtained by reversely searching the accumulated distance matrix from the point gamma (m, n).
The time complexity of the calculation of the first distance matrix isN is the number of elements in the sample set, the time complexity of DTW distance calculation is O (m.n), and m and N are the lengths of rainfall event time sequences, so that the calculation time of the first distance matrix is often very long, and the traditional method is difficult to meet the requirement of large-data-volume analysis.
The calculations between the elements are independent of each other and are suitable for parallel calculations, so that the calculation of the first matrix is accelerated by using GPU parallel calculations. The specific method comprises the following steps: assigning a thread to each matrix element to be responsible for calculating the DTW distance, firstly assigning thread numbers to the thread blocks, wherein the maximum thread number in the thread blocks is different according to the GPU performance, and assigning the thread numbers (tb, tb) to each thread block for two-dimensional matrix operation, wherein tb is2The number of threads contained in the thread block required to be smaller than the maximum number of threads contained in the thread block allowed by the GPU; second, thread blocks are allocated for the thread cells, and for two-dimensional matrix operations, thread blocks (bg, bg) may be allocated for each thread cell, whereAnd is less than the maximum number of thread blocks contained in the thread grid allowed by the GPU, N isNumber of samples in the subset. And after thread allocation is completed, completing DTW distance calculation of each element of the matrix by using the GPU and returning the DTW distance calculation to a CPU memory, thereby completing the calculation of the first distance matrix.
4-3, merging the clusters based on the first distance matrix, and finding out two clusters C with the closest distancei*And Cj*Merging Ci*And Cj*:Ci*=Ci*∪Cj*Renumbering the cluster, deleting jth row and jth column of distance matrix M (current distance matrix), calculating distance between new cluster and other clusters, and updating distance matrix.
4-4, repeating the previous step until all cluster clusters are combined into one cluster, thereby generating a cluster tree.
And 4-5, repeating the steps 4-2-4 to generate a corresponding clustering tree based on each subset in the sample set.
5) Clustering center extraction and local rainfall pattern analysis
Analyzing the clustering tree generated in the step 4), taking the root node as the 1 st layer of the clustering tree, wherein the n th layer of the clustering tree comprises n nodes, each node is 1 cluster and comprises 1 clustering center, traversing and searching each layer of node clusters from the root node downwards, and calculating the distance Matrix of rainfall events contained in each node clusterDMatrix size of (m)i×mi) N, n is the number of nodes in the layer, i is the node index, miThe number of rainfall events included in the node is, and the element (i, j) of the matrix is the DTW distance between the rainfall event i and the rainfall event j (the calculation method may refer to the calculation method of the DTW distance in step 4). First, calculate the distance MatrixDAnd then calculating the sum of each row, wherein the rainfall event corresponding to the row index with the minimum sum is a clustering center, namely the representative rainfall type of the local rainfall of the drainage basin (region).
In this embodiment: local rainfall type analysis is carried out on a certain sub-basin of the Yangtze river basin, the basin area is 572 square kilometers, 18 rainfall stations are arranged in the basin, and the station rainfall data time span is 15 years.
Setting a threshold ThA10mm, threshold ThT6h, threshold ThLAfter the rainfall events are automatically extracted, 3092 rainfall events are extracted, wherein the duration of the rainfall events is 3 to 120 hours, and the rainfall events are shown in fig. 4.
Calculating and standardizing the accumulated rainfall process according to the rainfall events, taking the standardized time sequence of the accumulated rainfall process as samples, dividing subsets according to rainfall duration intervals, and dividing 6 subsets according to 6 intervals of [3, 6), [6, 12), [12, 24), [24, 48), [48, 96), [96, 192), wherein the number of the samples in each subset is respectively: 521. 874, 1051, 542, 97, 7, some of the rainfall processes and samples are shown in FIGS. 5-10.
Performing cluster analysis on each subset as a basis to obtain a cluster tree as shown in fig. 11-16:
selecting a 5 th-layer extraction clustering center according to the generated clustering tree, taking a first subset, namely 3-5 hours of rainfall events as an example: for the cluster tree generated based on the first subset, 5 representative rain types of rainfall in the drainage basin for 3 to 5 hours can be obtained, and 5 cluster clusters and cluster centers (representative rain types) are respectively shown in fig. 17 to 26.
Claims (5)
1. A local rainfall type analysis method based on machine learning is characterized in that: the method comprises the following steps:
1) collecting, processing and storing data: collecting the rainfall data of hydrology and meteorological sites in a flow domain to be analyzed and carrying out equal-time-period processing;
2) automatic extraction of rainfall events: sequentially reading continuous rainfall time sequences of all stations in the database, dividing the continuous rainfall time sequences into independent rainfall sessions, and generating rainfall time sequences of a plurality of rainfall sessions;
3) generating a local rainfall sample set: generating a sample set by utilizing the rainfall time sequences of the plurality of occasions extracted in the step 2), wherein the elements of the sample set are independent rainfall events and are subjected to standardization treatment, and the number of the elements in the set is the same as that of the rainfall occasions; dividing the sample set into a plurality of subsets according to different duration of each rainfall event; the subset dividing method in the step 3) comprises the following steps: dividing the total duration into a plurality of time intervals according to the duration of each event in the rainfall event set, and extracting the events with the rainfall durations in the same interval to generate a subset;
4) carrying out cluster analysis on rainfall events based on GPU acceleration, wherein the cluster analysis is carried out on each subset of a sample set generated in the step 3) to generate a plurality of cluster trees, and the specific steps of the cluster analysis are 4-1. generating initial clusters, taking each element in the subsets as an initial cluster, 4-2. calculating a distance matrix, namely the matrix is N × N, N is the number of the rainfall events contained in the subsets, the elements (i, j) of the matrix are distance measurement of the i cluster and the j cluster and represent the similarity of the rainfall events i and the rainfall events j, using DTW distance as a similarity measurement standard, the similarity is stronger when the distance is smaller, adopting GPU parallel calculation to accelerate the calculation of the matrix, 4-3. merging the clusters based on the distance matrix in the step 4-2, finding out and merging two clusters with the closest distance, renumbering the clusters, calculating the distance between a new cluster and each other cluster, updating the distance matrix, 4-4. repeating the step 4-3 until all the clusters are merged into one cluster, thereby generating a cluster tree set, 4-5. repeating the clustering method generates a plurality of cluster trees based on the distance of the subsets, and the cluster trees, wherein the clustering method comprises the steps of the step 4-4, and the step X, and the step of calculating the step of the clustering trees, and the step of the step1,x2,...,xi,...,xmY ═ Y1,y2,...,yi,...,ynFinding a warped path W to represent the mapping relation W between the time sequences X and Y, wherein m and n represent the lengths of the two time sequences, respectively, { W ═ W }1,w2,...,wk,...,wKK is more than or equal to max (n, m) and less than or equal to K and n + m-1, and the kth element of W is recorded as WkThe cumulative distance calculation formula of the point (i, j) is represented by a correspondence relationship between the ith element of the time series X and the jth element of the time series Y: γ (i, j) ═ d (x)i,yj) + min { γ (i-1, j-1), γ (i-1, j), γ (i, j-1) }, given an initial condition γ (1, 1) ═ d (x)1,y1) And the accumulated distance matrix is obtained by iterative calculation,namely the DTW distance between the time sequence X and the time sequence Y;
5) analyzing the clustering tree generated in the step 4): taking the root node as the 1 st layer of the clustering tree, the nth layer of the clustering tree comprises n nodes, each node is 1 cluster and comprises 1 clustering center, traversing and searching the node clusters of the given layer, calculating the distance matrix of the rainfall events contained in each node cluster, and the matrix size is (m)i×mi) N, n is the number of the nodes in the layer, miThe number of rainfall events contained in the ith node is, and the element (i, j) of the matrix is the DTW distance between the rainfall event i and the rainfall event j; and calculating a distance matrix of rainfall events contained in each node cluster, and then calculating the sum of all rows of the node clusters, wherein the rainfall event corresponding to the row index with the minimum sum is the clustering center of the node cluster, namely the representative rainfall type of the local rainfall of the watershed.
2. The machine learning-based local rainfall pattern analysis method according to claim 1, characterized in that: the time span of rainfall data in step 1) covers 10 years or more than 10 years.
3. The machine learning-based local rainfall pattern analysis method according to claim 1, characterized in that: the method for dividing the rainfall in the step 2) comprises the following steps: setting a time threshold, and regarding the rainfall process as two rainfall processes when the intermission time of the rainfall process exceeds the threshold, and regarding the rainfall process as one rainfall process when the intermission time of the rainfall process is less than the threshold; and setting a magnitude threshold, and when the total rainfall in one rainfall process is lower than the magnitude threshold, determining that the rainfall is micro rainfall and not taking the rainfall into consideration.
4. The machine learning-based local rainfall pattern analysis method according to claim 1, characterized in that: the standardized processing method of the rainfall event in the step 3) comprises the following steps:where n is the length of the rainfall event, Pi' after standardizationRainfall sequence points, PjIs the original rainfall sequence point.
5. The machine learning-based local rainfall pattern analysis method of claim 1, characterized in that: step 4)4-2, the concrete method for accelerating the calculation of the matrix by adopting GPU parallel calculation comprises the following steps: assigning a thread to each matrix element to account for DTW distance, assigning a thread count to the thread block first, and for two-dimensional matrix operations, assigning a thread count (tb, tb) to each thread block, where tb is2The number of threads contained in the thread block required to be smaller than the maximum number of threads contained in the thread block allowed by the GPU; second, thread blocks are allocated for the thread cells, and for two-dimensional matrix operations, thread blocks (bg, bg) may be allocated for each thread cell, whereN is the number of samples in the rainfall event sample subset, and bg is smaller than the maximum number of thread blocks contained in the thread grid allowed by the GPU; and after the thread allocation is completed, completing DTW distance calculation of each element of the matrix by using the GPU and returning the DTW distance calculation to a CPU memory, thereby completing the calculation of the distance matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911243213.6A CN110930282B (en) | 2019-12-06 | 2019-12-06 | Local rainfall type analysis method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911243213.6A CN110930282B (en) | 2019-12-06 | 2019-12-06 | Local rainfall type analysis method based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110930282A CN110930282A (en) | 2020-03-27 |
CN110930282B true CN110930282B (en) | 2020-10-09 |
Family
ID=69858279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911243213.6A Active CN110930282B (en) | 2019-12-06 | 2019-12-06 | Local rainfall type analysis method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110930282B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112232574B (en) * | 2020-10-21 | 2022-06-14 | 成都理工大学 | Debris flow disaster rainfall threshold automatic partitioning method based on support vector machine |
CN112508237B (en) * | 2020-11-20 | 2022-07-08 | 北京师范大学 | Rain type region division method based on data analysis and real-time rain type prediction method |
CN113435661B (en) * | 2021-07-14 | 2022-03-22 | 珠江水利委员会珠江水利科学研究院 | Method, device, medium, and apparatus for estimating rainstorm peak position |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IL207116A (en) * | 2009-08-10 | 2014-12-31 | Stats Llc | System and method for location tracking |
CN102819677B (en) * | 2012-07-30 | 2014-12-10 | 河海大学 | Rainfall site similarity evaluation method on basis of single rainfall type |
CN104732092B (en) * | 2015-03-25 | 2018-07-24 | 河海大学 | A kind of consistent area's analysis method of hydrology rainfall based on cluster |
CN105954821B (en) * | 2016-04-20 | 2017-08-25 | 中国水利水电科学研究院 | A kind of typical catchment choosing method for numerical value atmospheric model |
CN106250667A (en) * | 2016-06-29 | 2016-12-21 | 中国地质大学(武汉) | The monitoring method of a kind of landslide transition between states of paddling and device |
CN106295576B (en) * | 2016-08-12 | 2017-12-12 | 中国水利水电科学研究院 | A kind of water source type analytic method based on nature geography characteristic |
CN106484971B (en) * | 2016-09-23 | 2019-07-02 | 北京清控人居环境研究院有限公司 | A kind of automatic identifying method of drainage pipeline networks monitoring point |
CN106781291B (en) * | 2016-12-29 | 2019-03-26 | 哈尔滨工业大学深圳研究生院 | A kind of rain-induced landslide method for early warning and device based on displacement |
CN107679644A (en) * | 2017-08-28 | 2018-02-09 | 河海大学 | A kind of website Rainfall data interpolating method based on rain types feature |
CN107908835B (en) * | 2017-10-27 | 2020-05-22 | 中国地质大学(武汉) | Method for analyzing landslide dynamic response condition under multiple influence factors |
CN108009596B (en) * | 2017-12-26 | 2020-04-14 | 中国水利水电科学研究院 | Method and device for determining rainfall characteristics |
CN108846573B (en) * | 2018-06-12 | 2021-04-09 | 河海大学 | Watershed hydrological similarity estimation method based on time series kernel distance |
CN109376940B (en) * | 2018-11-02 | 2021-08-17 | 中国水利水电科学研究院 | Method and device for acquiring rainfall spatial-temporal distribution rule in rainfall process |
-
2019
- 2019-12-06 CN CN201911243213.6A patent/CN110930282B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110930282A (en) | 2020-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110930282B (en) | Local rainfall type analysis method based on machine learning | |
KR102009373B1 (en) | Estimation method of flood discharge for varying rainfall duration | |
CN111027763B (en) | Basin flood response similarity analysis method based on machine learning | |
CN109033599B (en) | Soil erosion influence factor importance analysis method based on random forest | |
CN111027764B (en) | Flood forecasting method suitable for runoff data lack basin based on machine learning | |
CN110646867A (en) | Urban drainage monitoring and early warning method and system | |
CN112001610A (en) | Method and device for treating agricultural non-point source pollution | |
CN108388957B (en) | Medium and small river flood forecasting method and forecasting system based on multi-feature fusion technology | |
CN109918364B (en) | Data cleaning method based on two-dimensional probability density estimation and quartile method | |
CN111080107A (en) | Basin flood response similarity analysis method based on time series clustering | |
CN113435630B (en) | Basin hydrological forecasting method and system with self-adaptive runoff yield mode | |
CN115829812B (en) | Carbon sink measurement method and system based on ecological system simulation | |
CN107748940B (en) | Power-saving potential quantitative prediction method | |
CN112330065A (en) | Runoff forecasting method based on basic flow segmentation and artificial neural network model | |
CN111008259A (en) | River basin rainfall similarity searching method | |
CN112347652B (en) | Heavy rain high risk division method based on linear moment frequency analysis of hydrological region | |
Otache et al. | ARMA modelling of Benue River flow dynamics: comparative study of PAR model | |
CN115495991A (en) | Rainfall interval prediction method based on time convolution network | |
CN110110339A (en) | A kind of hydrologic forecast error calibration method and system a few days ago | |
CN109285219A (en) | A kind of grid type hydrological model grid calculation order encoding method based on DEM | |
CN117114194A (en) | Method and device for determining carbon sink quantity and optimizing carbon sink benefit and related equipment | |
CN110968929A (en) | Wind power plant wind speed prediction method and device and electronic equipment | |
CN114564487A (en) | Meteorological raster data updating method combining forecast prediction | |
CN115293391A (en) | Runoff prediction method and system based on mixed runoff production mode | |
CN117828312B (en) | Method for managing watershed hydrologic environment and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |