CN110930282B - Local rainfall type analysis method based on machine learning - Google Patents

Local rainfall type analysis method based on machine learning Download PDF

Info

Publication number
CN110930282B
CN110930282B CN201911243213.6A CN201911243213A CN110930282B CN 110930282 B CN110930282 B CN 110930282B CN 201911243213 A CN201911243213 A CN 201911243213A CN 110930282 B CN110930282 B CN 110930282B
Authority
CN
China
Prior art keywords
rainfall
cluster
matrix
distance
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911243213.6A
Other languages
Chinese (zh)
Other versions
CN110930282A (en
Inventor
王帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Institute of Water Resources and Hydropower Research
Original Assignee
China Institute of Water Resources and Hydropower Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Institute of Water Resources and Hydropower Research filed Critical China Institute of Water Resources and Hydropower Research
Priority to CN201911243213.6A priority Critical patent/CN110930282B/en
Publication of CN110930282A publication Critical patent/CN110930282A/en
Application granted granted Critical
Publication of CN110930282B publication Critical patent/CN110930282B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a local rainfall type analysis method based on machine learning, which comprises the following steps: 1) collecting, processing and storing data; 2) automatic extraction of rainfall events; 3) generating a local rainfall sample set; 4) performing cluster analysis on rainfall events based on GPU acceleration; 5) and analyzing the generated clustering tree to obtain a representative rain pattern. According to the rainfall event analysis method, the rainfall events are automatically extracted by collecting the station observation data of the drainage basin, and then the most representative rainfall process is analyzed by adopting a machine learning method to serve as the representative rainfall type of local rainfall, so that the workload of artificial analysis can be greatly saved, the difference caused by subjective judgment is avoided, meanwhile, the analysis result has regional pertinence, and powerful support can be provided for analysis of critical rainfall of mountain torrents and numerical simulation of urban inland inundation.

Description

Local rainfall type analysis method based on machine learning
Technical Field
The invention belongs to the technical field of water conservancy projects, particularly relates to the technical field of flood control forecast, and particularly relates to a local rainfall type analysis method based on machine learning.
Background
In recent years, extreme rainstorm events in China are frequent, local rainstorm has strong burstiness and short time efficiency, and are main inducing factors of mountain torrents and urban waterlogging. For local rainstorm, except for rainfall and raininess, the rainstorm type is used as a description of the rainstorm process, the distribution of the rainstorm intensity on a time scale is shown, and the distribution is one of main disaster-causing characteristics of the rainstorm event, and even if the rainstorm process has the same rainfall and raininess, the disaster-causing characteristics are different from one another.
Because the torrential rain and flood process in hilly areas steeply rises and falls and is difficult to forecast in real time, the method of dynamic critical rainfall is mainly adopted to carry out torrential flood early warning at present. Meanwhile, the urbanization of China is rapidly developed, flood control and drainage standards adopted by most cities are low, waterlogging disasters are frequent, and influence assessment is usually carried out on rainstorm waterlogging by adopting a numerical simulation mode at present. Research has shown that the rainstorm type has a direct influence on the determination of critical rainfall of mountain torrents disasters and the determination of the maximum range and the maximum depth of urban waterlogging.
At present, when determining the critical rainfall of the mountain torrents and simulating the urban waterlogging numerical value, the local rainstorm rainfall type is mainly calculated by adopting a same-frequency analysis method or a design rainfall type, and the commonly used design rainfall types comprise a Chicago rainfall type, a Huff rainfall type, a Pilgrime rainfall type, a Yen & Chow rainfall type and the like. The same-frequency method needs more human intervention, and results are subjective due to different samples and different comprehensions caused by different expert experiences. Various design rain types such as Chicago rain type, Huff rain type, Yen & Chow rain type and the like are obtained by generalized design of foreign scholars according to a rainstorm sample in a certain area, have a certain difference with the actual rainfall process, and currently, no accepted rain type exists as the basis for design.
Disclosure of Invention
The invention aims to provide a local rainfall type analysis method aiming at the problem.
A local rainfall type analysis method based on machine learning comprises the following steps:
1) collecting, processing and storing data: collecting the rainfall data of hydrology and meteorological sites in a flow domain to be analyzed and carrying out equal-time-period processing;
2) automatic extraction of rainfall events: sequentially reading continuous rainfall time sequences of all stations in the database, dividing the continuous rainfall time sequences into independent rainfall sessions, and generating rainfall time sequences of a plurality of rainfall sessions;
3) generating a local rainfall sample set: generating a sample set by utilizing the rainfall time sequences of the plurality of occasions extracted in the step 2), wherein the elements of the sample set are independent rainfall events and are subjected to standardization treatment, and the number of the elements in the set is the same as that of the rainfall occasions; dividing the sample set into a plurality of subsets according to different duration of each rainfall event;
4) rainfall event cluster analysis based on GPU acceleration: performing cluster analysis based on each subset of the sample set generated in the step 3) to generate a plurality of cluster trees, wherein the specific steps of the cluster analysis are as follows: 4-1, generating initial clusters: treating each element in the subset as an initial cluster; 4-2, calculating a distance matrix: the matrix size is (N multiplied by N), N is the number of rainfall events contained in the subset, the element (i, j) of the matrix is the distance measurement of the i cluster and the j cluster, the similarity of the rainfall event i and the rainfall event j is represented, the DTW distance is used as the similarity measurement standard, and the similarity is stronger when the distance is smaller; accelerating the calculation of the matrix by adopting GPU parallel calculation; 4-3, merging the clusters based on the distance matrix in the step 4-2, finding out two clusters with the shortest distance for merging, renumbering the clusters, calculating the distance between the new cluster and each other cluster, and updating the distance matrix; 4-4, repeating the step 4-3 until all the cluster clusters are combined into one cluster, thereby generating a cluster tree; 4-5, repeating the steps 4-2-4, so that a corresponding clustering tree is generated based on each subset in the sample set;
5) analyzing the clustering tree generated in the step 4): taking the root node as the 1 st layer of the clustering tree, the nth layer of the clustering tree comprises n nodes, each node is 1 cluster and comprises 1 clustering center, traversing and searching the node clusters of the given layer, calculating the distance matrix of the rainfall events contained in each node cluster, and the matrix size is (m)i×mi) N, n is the number of the nodes in the layer, miThe number of rainfall events contained in the ith node is, and the element (i, j) of the matrix is the DTW distance between the rainfall event i and the rainfall event j; and calculating a distance matrix of rainfall events contained in each node cluster, and then calculating the sum of all rows of the node clusters, wherein the rainfall event corresponding to the row index with the minimum sum is the clustering center of the node cluster, namely the representative rainfall type of the local rainfall of the watershed.
Further, the time span of the rainfall data in step 1) covers 10 years or more than 10 years.
Further, the method for dividing the rainfall in the step 2) comprises the following steps: setting a time threshold, and regarding the rainfall process as two rainfall processes when the intermission time of the rainfall process exceeds the threshold, and regarding the rainfall process as one rainfall process when the intermission time of the rainfall process is less than the threshold; and setting a magnitude threshold, and when the total rainfall in one rainfall process is lower than the magnitude threshold, determining that the rainfall is micro rainfall and not taking the rainfall into consideration.
Further, the standardized processing method of the rainfall event in the step 3) comprises the following steps:
Figure BDA0002306825470000031
where n is the length of the rainfall event, PiFor standardized rainfall sequence points, PjFor the original rainfall sequence points, i, j are the normalized sequence and the time index of the original sequence, respectively.
Further, the subset dividing method in step 3) is as follows: and dividing the total duration into a plurality of time intervals according to the duration of each event in the rainfall event set, and extracting and generating the events with the rainfall durations in the same interval into a subset.
Further, the DTW distance calculation method includes: for time series X ═ X1,x2,...,xi,...,xmY ═ Y1,y2,...,yi,...,ynFinding a warped path W to represent the mapping relation W between the time sequences X and Y, wherein m and n represent the lengths of the two time sequences, respectively, { W ═ W }1,w2,...,wk,...,wKK is more than or equal to max (n, m) and less than or equal to K and n + m-1, and the kth element of W is recorded as WkThe cumulative distance calculation formula of the point (i, j) is represented by a correspondence relationship between the ith element of the time series X and the jth element of the time series Y: γ (i, j) ═ d (x)i,yj) + min { γ (i-1, j-1), γ (i-1, j), γ (i, j-1) }, given an initial condition γ (1, 1) ═ d (x)1,y1) And the accumulated distance matrix is obtained by iterative calculation,
Figure BDA0002306825470000041
i.e. the DTW distance of time series X and Y.
Further, the specific method for accelerating the calculation of the matrix by adopting the parallel calculation of the GPU in the steps 4) and 4-2 is as follows: assigning a thread to each matrix element to account for DTW distance, assigning a thread count to the thread block first, and for two-dimensional matrix operations, assigning a thread count (tb, tb) to each thread block, where tb is2The number of threads contained in the thread block required to be smaller than the maximum number of threads contained in the thread block allowed by the GPU; second, thread blocks are allocated for the thread cells, and for two-dimensional matrix operations, thread blocks (bg, bg) may be allocated for each thread cell, where
Figure BDA0002306825470000042
N is the number of samples in the rainfall event sample subset, and bg is smaller than the maximum number of thread blocks contained in the thread grid allowed by the GPU; after thread allocation is completed, DTW distance calculation of each element of the matrix is completed by using the GPU and returned to a CPU memory, and therefore distance matrix counting is completedAnd (4) calculating.
The invention has the beneficial effects that:
the rainfall events are automatically extracted by collecting station observation data of a drainage basin (region), and then the most representative rainfall process is analyzed by adopting a machine learning method to serve as a local rainfall representative rainfall pattern, so that the workload of artificial analysis can be greatly saved, and the difference caused by subjective judgment is avoided.
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Drawings
FIG. 1 is an overall flow chart of the method of the present invention;
FIG. 2 is a schematic illustration of rainfall data interpolation;
FIG. 3 is a dynamic warping path of the time series;
FIG. 4 is a rainfall event ensemble;
fig. 5 rain sample example 1;
fig. 6 rain sample example 2;
fig. 7 rain sample example 3;
fig. 8 rain sample example 4;
fig. 9 rain sample example 5;
fig. 10 rainfall sample example 6;
FIG. 11 a subset 1 clustering tree;
FIG. 12 a subset 2 cluster tree;
FIG. 13 a 3 rd subset cluster tree;
FIG. 14 is a 4 th subset cluster tree;
FIG. 15 a subset cluster tree of FIG. 5;
FIG. 16 a subset cluster tree;
FIG. 17 cluster number 1 of subset number 1;
cluster 2 of the subset 1 of fig. 18;
FIG. 19 cluster No. 3 of subset No. 1;
FIG. 20 cluster No. 4 of subset No. 1;
FIG. 21 cluster 5 of subset 1;
representative rain pattern 1 of subset 1 of fig. 22;
representative rain pattern 2 of subset 1 of fig. 23;
representative rain pattern 3 of subset 1 of fig. 24;
representative rain pattern 4 of subset 1 of fig. 25;
representative rain pattern 5 of subset 1 of fig. 26.
Detailed Description
Example 1
A local rainfall type analysis method based on machine learning comprises the following steps:
1) data collection, processing and storage
And (3) collecting data: collecting rainfall observation data of hydrology and meteorological sites in a basin (region) to be analyzed, wherein the demand of cluster analysis on data volume is large, and the time span of the rainfall data needs to cover 10 years or more than 10 years.
And (3) processing data: processing rainfall data into an equal-period time sequence, if the original data is unequal-period data, performing interpolation processing on the data, preferably performing interpolation according to a rainfall accumulation curve, as shown in fig. 2, firstly obtaining the rainfall accumulation curve by using the original sequence, and further obtaining the equal-period rainfall time sequence { P'1,P′2,P′3,...,P′12}。
And storing the processed equivalent time rainfall time sequence data to a database.
2) Automatic extraction of rainfall events
And sequentially reading the rainfall time sequence of each station in the database, and dividing the rainfall time sequence into independent rainfall fields. In rainfall time series { P1,P2,P3,...,PtAnd its corresponding time stamp sequence { T }1,T2,T3,...,TtFor example, the division method is as follows: setting time threshold ThTWhen the interval time T of the rainfall processj-TiExceeds threshold ThTThe process is regarded as two precipitation processes, and the threshold value is not exceededThTThe rainfall process is regarded as a one-time rainfall process, so that automatic and continuous rainfall field division is realized; setting magnitude threshold ThAWhen the total rainfall in one rainfall process is lower than the threshold ThAThe rainfall is considered to be trace rainfall and is not considered. By means of the method, the rainfall time sequence of each station is calculated in a traversing mode to obtain n rainfall sequences { Pi1,Pi2,...,PikAnd its time mark sequence { T }i1,Ti2,...,TikAnd f, wherein i is 1, N is the number of rainfall fields, and k is the number of periods corresponding to the rainfall fields.
3) Generating local rainfall sample sets
Utilizing the N rainfall sequences { P) extracted in the step 2)i1,Pi2,...,PikN, N is the number of rainfall events, k is the number of periods corresponding to the rainfall event, and a sample set containing N elements is generated, wherein the elements of the set are independent rainfall events. Using formulas
Figure BDA0002306825470000081
Standardizing rainfall events, wherein n is the length of the rainfall event, P'iFor standardized rainfall sequence points, PjIs the original rainfall sequence point. The sample set is divided into a plurality of subsets according to different divisions of each rainfall event duration.
4) Rainfall event cluster analysis based on GPU acceleration
Performing cluster analysis based on each subset of the sample set generated in step 3) to generate a cluster tree. The clustering analysis comprises the following specific steps:
4-1. generating an initial cluster with each element in the subset as an initial cluster, D ═ x for a subset of N data objects1,x2,...,xNSet an initial cluster set C ═ C1,C2,...,CNIn which C isj={xj};
4-2. calculating the first distance MatrixFThe matrix size is (N × N), and N is the number of rainfall events contained in the subsetAnd counting, wherein an element (i, j) of the matrix is the similarity between the cluster i and the cluster j, and for the first distance matrix, the element (i, j) is the similarity between the rainfall event i and the rainfall event j. Using the DTW distance as a similarity measure, the smaller the distance is, the stronger the similarity is, and the DTW distance calculation method is as follows:
for time series X ═ X1,x2,...,xi,...,xmY ═ Y1,y2,...,yi,...,ynFind a warped path W to represent the mapping between the time series X and Y, as shown in fig. 3, where W ═ W1,w2,...,wk,...,wKK is more than or equal to max (n, m) and less than or equal to K and n + m-1, and the kth element of W is recorded as WkThe term (i, j) denotes a correspondence relationship between the ith element of the time series X and the jth element of the time series Y. The selection of the twisted path has three constraints: the warp path starts at the start element of the matrix and ends at the diagonal element, i.e. w1=(1,1),wK(m, n); the twisted path is continuous at each step, i.e. for wk=(a,b),wk-1(a ', b') provided that a-a 'is ≦ 1 and b-b' is ≦ 1; the warped path is monotonic on the time axis, i.e., for wk=(a,b),wk-1(a ', b') where a-a 'is not less than 0 and b-b' is not less than 0.
There are many paths that can satisfy the constraint condition, and here, the path with the minimum distortion cost is found, that is:
Figure BDA0002306825470000091
wherein d (w)k) Is wkThe distance between two corresponding elements of the representation.
According to the dynamic planning idea, if the point (i, j) is on the optimal path, the sub-path from the point (1, 1) to the point (i, j) is also a local optimal solution, that is, the optimal path from the point (1, 1) to the point (m, n) can be obtained by the recursive search of the local optimal solution from the starting point (1, 1) to the end point (m, n), so that the optimal path can be conveniently foundMatrix, the matrix elements (i, j) being two time series points xiAnd point yjDistance d (x) therebetweeni,yj)=(xi-yj)2. The cumulative distance calculation formula for defining point (i, j):
γ(i,j)=d(xi,yj)+min{γ(i-1,j-1),γ(i-1,j),γ(i,j-1)}
an initial condition γ (1, 1) ═ d (x) is given1,y1) The cumulative distance matrix can be obtained by iterative computation.
Figure BDA0002306825470000101
Namely the DTW distance between the time sequence X and the time sequence Y, the best matching path can be obtained by reversely searching the accumulated distance matrix from the point gamma (m, n).
The time complexity of the calculation of the first distance matrix is
Figure BDA0002306825470000102
N is the number of elements in the sample set, the time complexity of DTW distance calculation is O (m.n), and m and N are the lengths of rainfall event time sequences, so that the calculation time of the first distance matrix is often very long, and the traditional method is difficult to meet the requirement of large-data-volume analysis.
The calculations between the elements are independent of each other and are suitable for parallel calculations, so that the calculation of the first matrix is accelerated by using GPU parallel calculations. The specific method comprises the following steps: assigning a thread to each matrix element to be responsible for calculating the DTW distance, firstly assigning thread numbers to the thread blocks, wherein the maximum thread number in the thread blocks is different according to the GPU performance, and assigning the thread numbers (tb, tb) to each thread block for two-dimensional matrix operation, wherein tb is2The number of threads contained in the thread block required to be smaller than the maximum number of threads contained in the thread block allowed by the GPU; second, thread blocks are allocated for the thread cells, and for two-dimensional matrix operations, thread blocks (bg, bg) may be allocated for each thread cell, where
Figure BDA0002306825470000103
And is less than the maximum number of thread blocks contained in the thread grid allowed by the GPU, N isNumber of samples in the subset. And after thread allocation is completed, completing DTW distance calculation of each element of the matrix by using the GPU and returning the DTW distance calculation to a CPU memory, thereby completing the calculation of the first distance matrix.
4-3, merging the clusters based on the first distance matrix, and finding out two clusters C with the closest distancei*And Cj*Merging Ci*And Cj*:Ci*=Ci*∪Cj*Renumbering the cluster, deleting jth row and jth column of distance matrix M (current distance matrix), calculating distance between new cluster and other clusters, and updating distance matrix.
4-4, repeating the previous step until all cluster clusters are combined into one cluster, thereby generating a cluster tree.
And 4-5, repeating the steps 4-2-4 to generate a corresponding clustering tree based on each subset in the sample set.
5) Clustering center extraction and local rainfall pattern analysis
Analyzing the clustering tree generated in the step 4), taking the root node as the 1 st layer of the clustering tree, wherein the n th layer of the clustering tree comprises n nodes, each node is 1 cluster and comprises 1 clustering center, traversing and searching each layer of node clusters from the root node downwards, and calculating the distance Matrix of rainfall events contained in each node clusterDMatrix size of (m)i×mi) N, n is the number of nodes in the layer, i is the node index, miThe number of rainfall events included in the node is, and the element (i, j) of the matrix is the DTW distance between the rainfall event i and the rainfall event j (the calculation method may refer to the calculation method of the DTW distance in step 4). First, calculate the distance MatrixDAnd then calculating the sum of each row, wherein the rainfall event corresponding to the row index with the minimum sum is a clustering center, namely the representative rainfall type of the local rainfall of the drainage basin (region).
In this embodiment: local rainfall type analysis is carried out on a certain sub-basin of the Yangtze river basin, the basin area is 572 square kilometers, 18 rainfall stations are arranged in the basin, and the station rainfall data time span is 15 years.
Setting a threshold ThA10mm, threshold ThT6h, threshold ThLAfter the rainfall events are automatically extracted, 3092 rainfall events are extracted, wherein the duration of the rainfall events is 3 to 120 hours, and the rainfall events are shown in fig. 4.
Calculating and standardizing the accumulated rainfall process according to the rainfall events, taking the standardized time sequence of the accumulated rainfall process as samples, dividing subsets according to rainfall duration intervals, and dividing 6 subsets according to 6 intervals of [3, 6), [6, 12), [12, 24), [24, 48), [48, 96), [96, 192), wherein the number of the samples in each subset is respectively: 521. 874, 1051, 542, 97, 7, some of the rainfall processes and samples are shown in FIGS. 5-10.
Performing cluster analysis on each subset as a basis to obtain a cluster tree as shown in fig. 11-16:
selecting a 5 th-layer extraction clustering center according to the generated clustering tree, taking a first subset, namely 3-5 hours of rainfall events as an example: for the cluster tree generated based on the first subset, 5 representative rain types of rainfall in the drainage basin for 3 to 5 hours can be obtained, and 5 cluster clusters and cluster centers (representative rain types) are respectively shown in fig. 17 to 26.

Claims (5)

1. A local rainfall type analysis method based on machine learning is characterized in that: the method comprises the following steps:
1) collecting, processing and storing data: collecting the rainfall data of hydrology and meteorological sites in a flow domain to be analyzed and carrying out equal-time-period processing;
2) automatic extraction of rainfall events: sequentially reading continuous rainfall time sequences of all stations in the database, dividing the continuous rainfall time sequences into independent rainfall sessions, and generating rainfall time sequences of a plurality of rainfall sessions;
3) generating a local rainfall sample set: generating a sample set by utilizing the rainfall time sequences of the plurality of occasions extracted in the step 2), wherein the elements of the sample set are independent rainfall events and are subjected to standardization treatment, and the number of the elements in the set is the same as that of the rainfall occasions; dividing the sample set into a plurality of subsets according to different duration of each rainfall event; the subset dividing method in the step 3) comprises the following steps: dividing the total duration into a plurality of time intervals according to the duration of each event in the rainfall event set, and extracting the events with the rainfall durations in the same interval to generate a subset;
4) carrying out cluster analysis on rainfall events based on GPU acceleration, wherein the cluster analysis is carried out on each subset of a sample set generated in the step 3) to generate a plurality of cluster trees, and the specific steps of the cluster analysis are 4-1. generating initial clusters, taking each element in the subsets as an initial cluster, 4-2. calculating a distance matrix, namely the matrix is N × N, N is the number of the rainfall events contained in the subsets, the elements (i, j) of the matrix are distance measurement of the i cluster and the j cluster and represent the similarity of the rainfall events i and the rainfall events j, using DTW distance as a similarity measurement standard, the similarity is stronger when the distance is smaller, adopting GPU parallel calculation to accelerate the calculation of the matrix, 4-3. merging the clusters based on the distance matrix in the step 4-2, finding out and merging two clusters with the closest distance, renumbering the clusters, calculating the distance between a new cluster and each other cluster, updating the distance matrix, 4-4. repeating the step 4-3 until all the clusters are merged into one cluster, thereby generating a cluster tree set, 4-5. repeating the clustering method generates a plurality of cluster trees based on the distance of the subsets, and the cluster trees, wherein the clustering method comprises the steps of the step 4-4, and the step X, and the step of calculating the step of the clustering trees, and the step of the step1,x2,...,xi,...,xmY ═ Y1,y2,...,yi,...,ynFinding a warped path W to represent the mapping relation W between the time sequences X and Y, wherein m and n represent the lengths of the two time sequences, respectively, { W ═ W }1,w2,...,wk,...,wKK is more than or equal to max (n, m) and less than or equal to K and n + m-1, and the kth element of W is recorded as WkThe cumulative distance calculation formula of the point (i, j) is represented by a correspondence relationship between the ith element of the time series X and the jth element of the time series Y: γ (i, j) ═ d (x)i,yj) + min { γ (i-1, j-1), γ (i-1, j), γ (i, j-1) }, given an initial condition γ (1, 1) ═ d (x)1,y1) And the accumulated distance matrix is obtained by iterative calculation,
Figure FDA0002618783110000021
namely the DTW distance between the time sequence X and the time sequence Y;
5) analyzing the clustering tree generated in the step 4): taking the root node as the 1 st layer of the clustering tree, the nth layer of the clustering tree comprises n nodes, each node is 1 cluster and comprises 1 clustering center, traversing and searching the node clusters of the given layer, calculating the distance matrix of the rainfall events contained in each node cluster, and the matrix size is (m)i×mi) N, n is the number of the nodes in the layer, miThe number of rainfall events contained in the ith node is, and the element (i, j) of the matrix is the DTW distance between the rainfall event i and the rainfall event j; and calculating a distance matrix of rainfall events contained in each node cluster, and then calculating the sum of all rows of the node clusters, wherein the rainfall event corresponding to the row index with the minimum sum is the clustering center of the node cluster, namely the representative rainfall type of the local rainfall of the watershed.
2. The machine learning-based local rainfall pattern analysis method according to claim 1, characterized in that: the time span of rainfall data in step 1) covers 10 years or more than 10 years.
3. The machine learning-based local rainfall pattern analysis method according to claim 1, characterized in that: the method for dividing the rainfall in the step 2) comprises the following steps: setting a time threshold, and regarding the rainfall process as two rainfall processes when the intermission time of the rainfall process exceeds the threshold, and regarding the rainfall process as one rainfall process when the intermission time of the rainfall process is less than the threshold; and setting a magnitude threshold, and when the total rainfall in one rainfall process is lower than the magnitude threshold, determining that the rainfall is micro rainfall and not taking the rainfall into consideration.
4. The machine learning-based local rainfall pattern analysis method according to claim 1, characterized in that: the standardized processing method of the rainfall event in the step 3) comprises the following steps:
Figure FDA0002618783110000022
where n is the length of the rainfall event, Pi' after standardizationRainfall sequence points, PjIs the original rainfall sequence point.
5. The machine learning-based local rainfall pattern analysis method of claim 1, characterized in that: step 4)4-2, the concrete method for accelerating the calculation of the matrix by adopting GPU parallel calculation comprises the following steps: assigning a thread to each matrix element to account for DTW distance, assigning a thread count to the thread block first, and for two-dimensional matrix operations, assigning a thread count (tb, tb) to each thread block, where tb is2The number of threads contained in the thread block required to be smaller than the maximum number of threads contained in the thread block allowed by the GPU; second, thread blocks are allocated for the thread cells, and for two-dimensional matrix operations, thread blocks (bg, bg) may be allocated for each thread cell, where
Figure FDA0002618783110000031
N is the number of samples in the rainfall event sample subset, and bg is smaller than the maximum number of thread blocks contained in the thread grid allowed by the GPU; and after the thread allocation is completed, completing DTW distance calculation of each element of the matrix by using the GPU and returning the DTW distance calculation to a CPU memory, thereby completing the calculation of the distance matrix.
CN201911243213.6A 2019-12-06 2019-12-06 Local rainfall type analysis method based on machine learning Active CN110930282B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911243213.6A CN110930282B (en) 2019-12-06 2019-12-06 Local rainfall type analysis method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911243213.6A CN110930282B (en) 2019-12-06 2019-12-06 Local rainfall type analysis method based on machine learning

Publications (2)

Publication Number Publication Date
CN110930282A CN110930282A (en) 2020-03-27
CN110930282B true CN110930282B (en) 2020-10-09

Family

ID=69858279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911243213.6A Active CN110930282B (en) 2019-12-06 2019-12-06 Local rainfall type analysis method based on machine learning

Country Status (1)

Country Link
CN (1) CN110930282B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232574B (en) * 2020-10-21 2022-06-14 成都理工大学 Debris flow disaster rainfall threshold automatic partitioning method based on support vector machine
CN112508237B (en) * 2020-11-20 2022-07-08 北京师范大学 Rain type region division method based on data analysis and real-time rain type prediction method
CN113435661B (en) * 2021-07-14 2022-03-22 珠江水利委员会珠江水利科学研究院 Method, device, medium, and apparatus for estimating rainstorm peak position

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL207116A (en) * 2009-08-10 2014-12-31 Stats Llc System and method for location tracking
CN102819677B (en) * 2012-07-30 2014-12-10 河海大学 Rainfall site similarity evaluation method on basis of single rainfall type
CN104732092B (en) * 2015-03-25 2018-07-24 河海大学 A kind of consistent area's analysis method of hydrology rainfall based on cluster
CN105954821B (en) * 2016-04-20 2017-08-25 中国水利水电科学研究院 A kind of typical catchment choosing method for numerical value atmospheric model
CN106250667A (en) * 2016-06-29 2016-12-21 中国地质大学(武汉) The monitoring method of a kind of landslide transition between states of paddling and device
CN106295576B (en) * 2016-08-12 2017-12-12 中国水利水电科学研究院 A kind of water source type analytic method based on nature geography characteristic
CN106484971B (en) * 2016-09-23 2019-07-02 北京清控人居环境研究院有限公司 A kind of automatic identifying method of drainage pipeline networks monitoring point
CN106781291B (en) * 2016-12-29 2019-03-26 哈尔滨工业大学深圳研究生院 A kind of rain-induced landslide method for early warning and device based on displacement
CN107679644A (en) * 2017-08-28 2018-02-09 河海大学 A kind of website Rainfall data interpolating method based on rain types feature
CN107908835B (en) * 2017-10-27 2020-05-22 中国地质大学(武汉) Method for analyzing landslide dynamic response condition under multiple influence factors
CN108009596B (en) * 2017-12-26 2020-04-14 中国水利水电科学研究院 Method and device for determining rainfall characteristics
CN108846573B (en) * 2018-06-12 2021-04-09 河海大学 Watershed hydrological similarity estimation method based on time series kernel distance
CN109376940B (en) * 2018-11-02 2021-08-17 中国水利水电科学研究院 Method and device for acquiring rainfall spatial-temporal distribution rule in rainfall process

Also Published As

Publication number Publication date
CN110930282A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN110930282B (en) Local rainfall type analysis method based on machine learning
KR102009373B1 (en) Estimation method of flood discharge for varying rainfall duration
CN111027763B (en) Basin flood response similarity analysis method based on machine learning
CN109033599B (en) Soil erosion influence factor importance analysis method based on random forest
CN111027764B (en) Flood forecasting method suitable for runoff data lack basin based on machine learning
CN110646867A (en) Urban drainage monitoring and early warning method and system
CN112001610A (en) Method and device for treating agricultural non-point source pollution
CN108388957B (en) Medium and small river flood forecasting method and forecasting system based on multi-feature fusion technology
CN109918364B (en) Data cleaning method based on two-dimensional probability density estimation and quartile method
CN111080107A (en) Basin flood response similarity analysis method based on time series clustering
CN113435630B (en) Basin hydrological forecasting method and system with self-adaptive runoff yield mode
CN115829812B (en) Carbon sink measurement method and system based on ecological system simulation
CN107748940B (en) Power-saving potential quantitative prediction method
CN112330065A (en) Runoff forecasting method based on basic flow segmentation and artificial neural network model
CN111008259A (en) River basin rainfall similarity searching method
CN112347652B (en) Heavy rain high risk division method based on linear moment frequency analysis of hydrological region
Otache et al. ARMA modelling of Benue River flow dynamics: comparative study of PAR model
CN115495991A (en) Rainfall interval prediction method based on time convolution network
CN110110339A (en) A kind of hydrologic forecast error calibration method and system a few days ago
CN109285219A (en) A kind of grid type hydrological model grid calculation order encoding method based on DEM
CN117114194A (en) Method and device for determining carbon sink quantity and optimizing carbon sink benefit and related equipment
CN110968929A (en) Wind power plant wind speed prediction method and device and electronic equipment
CN114564487A (en) Meteorological raster data updating method combining forecast prediction
CN115293391A (en) Runoff prediction method and system based on mixed runoff production mode
CN117828312B (en) Method for managing watershed hydrologic environment and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant