CN114911846A - FAD and DTW-based hydrological time sequence similarity searching method - Google Patents
FAD and DTW-based hydrological time sequence similarity searching method Download PDFInfo
- Publication number
- CN114911846A CN114911846A CN202210531963.9A CN202210531963A CN114911846A CN 114911846 A CN114911846 A CN 114911846A CN 202210531963 A CN202210531963 A CN 202210531963A CN 114911846 A CN114911846 A CN 114911846A
- Authority
- CN
- China
- Prior art keywords
- sequence
- dtw
- fad
- subsequences
- distance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A10/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
- Y02A10/40—Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Fuzzy Systems (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a hydrological time sequence similarity searching method based on FAD and DTW, which comprises the following steps: firstly, smoothing a pre-acquired time sequence by utilizing wavelet transformation; secondly, selecting a starting point, an ending point and a local extreme point in the time sequence as feature points, giving semantics to a data segment between adjacent feature points, and performing semantic symbolization expression on the sequence; then calculating derivative estimation values of subsequences in the preliminary candidate set and each point in the sequence to be queried to obtain a derivative estimation sequence, converting the derivative estimation sequence into a symbolic representation sequence, and finally obtaining a characteristic sequence corresponding to the subsequences in the preliminary candidate set and the sequence to be queried; after the data representation stage is completed, firstly, the FAD is used for finding out the sub-sequence with approximate trend, then, the DTW is used for carrying out accurate matching, and finally, the similar sub-sequence is obtained. The method disclosed by the invention is used for carrying out similarity search on the historical time sequence by combining the characteristics of FAD and DTW, so that the search efficiency is improved to a great extent.
Description
Technical Field
The invention belongs to the technical field of hydrologic data mining, and particularly relates to a method for searching similarity of hydrologic time sequences based on FAD and DTW.
Background
The hydrologic time series similarity search aims to find out similar subsequences from historical time series given a certain time series. The similarity of the data in the time sequence database is found, so that the data change rule and trend can be mastered, and a basis is provided for effective prediction. Therefore, the research on the similarity search of the hydrological time series has important practical significance in flood forecasting and flood control scheduling.
The problems involved in the hydrologic time series similarity search mainly include time series feature representation, similarity measurement, subsequence matching, and the like. Many researchers have achieved certain results around the research of time series similarity by adopting different methods, and have certain application in the hydrology field. The similarity measurement method of the hydrological time series mainly comprises Euclidean distance, dynamic time warping distance and related improved algorithms (such as DTW-SS and FastDTW). The euclidean distance is simple and easy to understand, but is only suitable for similarity comparison between equal-length time sequences. The DTW can obtain a high-precision measurement effect by bending a time axis, but the calculation method is point-by-point matching, and the time complexity is high. Therefore, a similarity search method capable of greatly reducing the time complexity while ensuring the query accuracy is needed.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the defects in the prior art, provides a hydrological time sequence similarity searching method based on FAD and DTW, and provides a hydrological time sequence reality searching method capable of improving the query efficiency while ensuring the query efficiency by combining the related technology of data mining.
The technical scheme is as follows: the invention discloses a hydrological time series similarity searching method based on FAD and DTW, which comprises the following steps:
step S1, in order to eliminate the noise in the original time sequence, the data smoothing processing is carried out on the historical time sequence and the sequence to be inquired by utilizing the wavelet transform;
step S2, selecting a starting point, an end point and a local extreme point meeting a certain condition in the smoothed time sequence as feature points, giving semantic rise (U), maintenance (B) and decline (D) to a data segment between adjacent feature points, and performing semantic symbolization representation on the historical time sequence and the sequence to be queried;
s3, screening out subsequences with the same semantics as the sequence to be queried from the historical sequence as a preliminary candidate set;
step S4, calculating derivative estimation values of subsequences in the preliminary candidate set and each point in the sequence to be queried to obtain a derivative estimation sequence, converting the derivative estimation sequence into a symbolic representation sequence, and finally obtaining a characteristic sequence corresponding to the subsequences in the preliminary candidate set and the sequence to be queried;
s5, sequentially carrying out approximate matching on the characteristic sequence of the sequence to be queried and the characteristic subsequences in the preliminary candidate set by using a FAD similarity measurement method, and screening out the previous M subsequences with approximate change trends according to the FAD distance;
step S6, carrying out DTW accurate matching on the query sequence and the M approximate subsequences to obtain the first N subsequences with the minimum DTW distance, namely the best similar subsequence;
the step S2 is to semantically symbolize the time sequence, and the step S2 is further to:
step S2.1, a time sequence T ═ x is provided 1 ,x 2 ...x n ) If one of the following conditions is satisfied, the data point T is called (x) 1 ,x 2 ...x n ) Is an extreme point:
(1) m is 1 or m is n;
(2)x m ≥x m-1 and x m ≥x m+1 Wherein m is more than 1 and less than n;
(3)x m ≤x m-1 and x m ≤x m+1 Wherein m is more than 1 and less than n;
s2.2, giving semantic ascending (U), keeping (B) and descending (D) to the data segment between the adjacent extreme points, and performing semantic symbolization expression on the historical time sequence and the sequence to be inquired;
the step S4 is converting the time series into the feature series, and the step S4 is further:
step S4.1, a certain time sequence T ═ x is set 1 ,x 2 ...x n ) Converting the original time sequence into a derivative estimation sequence, wherein the derivative estimation value is calculated according to the following formula:
wherein X h For the time series T ═ x 1 ,x 2 ...x n ) One data point of;
step S4.2, after obtaining the derivative estimation sequenceThe derivative values are divided into different sign values according to their distribution, which reflect the trend information of the time series. The conversion formula for the symbolic representation sequence is as follows:
wherein R is h Is thatThe symbolization of (2) indicates that the parameter epsilon is a threshold value (epsilon is more than or equal to 0) of the variation trend and is used for judging the variation amplitude of the data. The parameter λ is the number of symbols used to represent the original sequence;
s4.3, transforming the obtained symbolic representation sequence to obtain a characteristic sequenceWherein S j =(R j ,k j ),R j Is a sequence of featuresIs a symbol, k j Is the number of adjacent points of the same sign.
And S4.4, acquiring the subsequences in the preliminary candidate set and the characteristic sequences corresponding to the sequences to be inquired according to the steps.
Step S5 is to sequentially perform FAD similarity measurement on the subsequences in the preliminary candidate set and the sequence to be queried, where step S2 further includes:
step S5.1, settingIs a signature sequence of a sub-sequence in the preliminary candidate set,is the characteristic sequence of the sequence to be inquired.
If sequenceAndthe corresponding segments in (1) are represented by different symbols, that is, the variation trend of two segments is different, and the distance formula between the two segments is as follows:
D(S1 i ,S2 j )=1,(R1 i ≠R2 j )
wherein S1 i And S2 j Are respectively a sequenceAnda fragment subsequence of (2), R1 i And R2 j Is S1 i And S2 j A corresponding symbolic representation;
step S5.2, if the sequenceAndcorresponding segments are denoted by the same symbols, i.e. both segments have a similar trend. The distance between these two segments depends mainly on their difference in length, which is calculated as follows:
wherein k1 i And k2 j Are respectivelyAndthe number of intermediate points, γ, is an adjustable parameter for varying the ratio of the distance of the same symbol to a different symbol. In theory, the distance of the same symbol segments must be less than the distance of the different symbol segments. Therefore, 0. ltoreq. D (S1) i ,S2 j ) < 1 and gamma. epsilon [0, 1]。
Step S5.3, since the length of the time sequence may not be equal and the time warping of FAD, there will be some segments in a certain sequence that can not be mapped. These fragments can be considered dissimilar to any fragment belonging to another sequence, and the formula is calculated as follows:
D(-,S i )=1
step S5.4, combining steps S5.1 to S5.3, summarizes the distance calculation formula of the two segments as follows.
and S5.5, screening the first 50 subsequences with the minimum distance according to the FAD distance value to form a subsequent candidate set to be matched.
Step S6 is to sequentially perform DTW similarity measurement on the subsequences in the candidate set to be matched and the sequence to be queried, and step S6 is further:
and S6.1, calculating DTW distance values of the sequence to be queried and the subsequences in the candidate set to be matched, and acquiring the first 4 subsequences with the minimum DTW distance as the optimal similar subsequences. The DTW distance is calculated as follows:
wherein Q is a sequence to be queried, Y is a subsequence in a candidate set to be matched,
D base (q 1 ,y 1 ) The base distance between the ith time point vector representing Q and the jth time point vector of Y is expressed by euclidean distance.
And S6.2, outputting the final similar sequence result set.
Has the advantages that: compared with the prior art, the invention has the advantages that:
based on the existing similarity measurement method, the morphological characteristics and numerical characteristics of the hydrological time sequence are comprehensively considered, and the similarity search of the hydrological time sequence is carried out by combining FAD approximate matching and DTW accurate matching based on trend characteristics, so that the similar sequence in the flow domain can be effectively excavated.
Compared with the traditional DTW, the FAD _ DTW solves the problem of high calculation complexity of DTW due to point-to-point matching, can greatly reduce a candidate set of follow-up similarity matching by screening out subsequences with approximate morphological trends, effectively improves query efficiency, and has important practical significance in flood forecasting and flood control scheduling.
Drawings
FIG. 1 is an overall step diagram in an embodiment of the present invention;
FIG. 2 is a diagram illustrating a conversion of a symbolic representation sequence into a feature sequence in an embodiment;
FIGS. 3 and 4 are similar subsequences obtained by FAD _ DTW method in two experiments as an example;
FIGS. 5 and 6 show similar subsequences obtained by DTW-SS method in two experiments as an example;
FIG. 7 is a comparison of query times of FAD _ DTW and DTW-SS with the increase of the years of the historical sequence in the example;
FIG. 8 is a comparison of query times of FAD _ DTW and DTW-SS with increasing length of the sequence to be queried in the examples;
Detailed Description
The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.
As shown in fig. 1, a method for calculating a grid rainfall based on a survey station of the present embodiment includes the following steps:
step S1, selecting hydrological data of the tunny river basin tunny river station as a data set, obtaining a sequence Q to be queried and a historical time sequence S therefrom, and smoothing the obtained time sequence by using wavelet transformation to obtain a smoothed sequence Q 'to be queried and a smoothed historical time sequence S'.
Step S2, selecting a starting point, an end point and a local extreme point meeting certain conditions in the sequence Q' to be inquired as feature points, giving semantic rise (U), maintenance (B) and decline (D) to data segments between adjacent feature points, and performing semantic symbolization expression on the historical time sequence and the sequence to be inquired;
step S2.1, in which Q' is { x for the sequence to be queried 1 ,x 2 ...x n The data point x is called if one of the following conditions is satisfied m (m.ltoreq.n) is an extreme point:
(1) m is 1 or m is n;
(2)x m ≥x m-1 and x m ≥x m+1 Wherein m is more than 1 and less than n;
(3)x m ≤x m-1 and x m ≤x m+1 Wherein m is more than 1 and less than n;
and S2.2, extracting extreme points of the sequence Q ' according to the conditions in the step S2.1 to obtain an extreme point sequence Q ', and symbolizing Q '. For the extreme point sequence Q ", the pattern between every two data points is used to form a new time sequence Q" { Q ″ 1 ,q 2 ,...q n }. Wherein q is i Belongs to { U, B, D }, and represents rising, holding, and falling, respectivelyThe trend of (a), Q '"is represented as a semantic schema for Q';
step S3, extracting extreme points from the historical time sequence S in the same way according to the way in the step 2, and obtaining a semantic mode representation S' of S;
s4, screening out subsequences with the same semantics as the sequence Q 'to be queried from the historical sequence S' as a primary candidate set Z;
step S5, calculating the derivative estimation value of each point in the sequence Q' to be inquired to obtain a derivative estimation sequenceThen converting the sequence into a symbolic representation sequence to finally obtain a characteristic sequence corresponding to Q
Step S5.1, obtaining derivative estimation sequenceThe derivative estimate calculation is as follows:
step S5.2, the derivative values are divided into different sign values according to their distribution, which reflect the trend information of the time series. The conversion formula for the symbolic representation sequence is as follows:
wherein R is h Is thatThe symbolization of (2) indicates that the parameter epsilon is a threshold value (epsilon is more than or equal to 0) of the variation trend and is used for judging the variation amplitude of the data. The parameter λ (λ ≧ 1 and λ an integer) is the number of symbols used to represent the original sequence. For example, we canConverting the original sequence into a sequence consisting of-3, -2, -1, 0, 1, 2, 3 and the like;
s5.3, transforming the obtained symbolic representation sequence to obtain a characteristic sequence of the sequence to be inquired Wherein S j =(R j ,k j ),R j Is a certain expression symbol, k, in the signature sequence j Is the number of adjacent points of the same symbol, fig. 2 shows the whole transformation process;
step S6, calculating the derivative estimated value of all the subsequences in the candidate set Z in the same way according to the method for acquiring the characteristic sequence in step S5, and obtaining the corresponding characteristic subsequence set
Step S7, calculating the characteristic sequence of the sequence to be inquired in sequenceAndand screening the first 50 subsequences with the similar trend to the sequence to be inquired according to the FAD distance values of all the subsequences to form a data set S' to be matched.
Step S7.1, settingIs composed ofA signature subsequence of (1). If sequenceAndthe corresponding segments in (1) are represented by different symbols, that is, the variation trend of two segments is different, and the distance between the two segments is expressed as follows:
D(S1 i ,S2 j )=1,(R1 i ≠R2 j )
step S7.2, if the sequenceAndcorresponding segments are denoted by the same symbols, i.e. both segments have a similar trend of change. The distance between these two segments depends mainly on their difference in length, which is calculated as follows:
wherein k1 i And k2 j Are respectivelyAndthe number of intermediate points, γ, is an adjustable parameter for varying the ratio of the distance of the same symbol to a different symbol. In theory, the distance of the same symbol segments must be smaller than the distance of the different symbol segments. Therefore, 0. ltoreq. D (S1) i ,S2 j ) < 1 and gamma. epsilon. [0, 1 ]]。
Step S7.3, since the length of the time sequence may not be equal and the time warping of FAD, there will often be some segments in a certain sequence that can not be mapped. These fragments can be considered dissimilar to any fragment belonging to another sequence, and the formula is calculated as follows:
D(-,S i )=1
step S7.4, combining step S5.1 to step S5.3, summarizes the distance calculation formula of the two segments as follows.
step S8, calculating the DTW distance value between the sequence to be queried and each subsequence in the candidate set S', and obtaining the first 4 subsequences with the smallest DTW distance, namely the best similar subsequence. The DTW distance is calculated as follows:
wherein Q is a sequence to be queried, Y is a subsequence in a candidate set to be matched,
D base (q 1 ,y 1 ) The base distance between the ith time point vector representing Q and the jth time point vector of Y is expressed by euclidean distance.
To verify the effect of the invention, two sets of experimental data were taken, taking the tunxi station in the tunxi basin as an example, and compared and analyzed with the DTW-SS method in order to verify the rapidity and accuracy of the invention. The similar subsequences queried by the two methods are shown in table 1 and table 2. Fig. 3 and fig. 4 respectively correspond to the first 4 matching results obtained by the FAD _ DTW method for the two query sequences, and fig. 5 and fig. 6 respectively correspond to the first 4 matching results obtained by the DTW-SS method for the two query sequences. The query times for both methods are shown in fig. 7 and 8. Through the chart, the FAD _ DTW algorithm in the embodiment can ensure the query accuracy and has the query efficiency obviously superior to that of the DTW-SS method.
TABLE 1 FAD _ DTW similarity match results
TABLE 2 DTW-SS similarity match results
Claims (5)
1. A hydrological time similarity searching method based on FAD and DTW is characterized by comprising the following steps:
the data preparation stage specifically comprises:
step S1, in order to eliminate the noise in the original time sequence, the data smoothing processing is carried out on the historical time sequence and the sequence to be inquired by utilizing the wavelet transform;
step S2, selecting a starting point, an end point and a local extreme point meeting a certain condition in the smoothed time sequence as feature points, giving semantic rise (U), maintenance (B) and decline (D) to a data segment between adjacent feature points, and performing semantic symbolization representation on the historical time sequence and the sequence to be queried;
s3, screening out subsequences with the same semantics as the sequence to be queried from the historical sequence as a preliminary candidate set;
step S4, calculating derivative estimation values of subsequences in the preliminary candidate set and each point in the sequence to be queried to obtain a derivative estimation sequence, converting the derivative estimation sequence into a symbolic representation sequence, and finally obtaining a characteristic sequence corresponding to the subsequences in the preliminary candidate set and the sequence to be queried;
the similarity searching stage specifically comprises the following steps:
s5, sequentially carrying out approximate matching on the characteristic sequence of the sequence to be queried and the characteristic subsequences in the preliminary candidate set by using a FAD similarity measurement method, and screening out the previous M subsequences with approximate change trends according to the FAD distance;
and step S6, carrying out DTW accurate matching on the query sequence and the M approximate subsequences, and obtaining the first N subsequences with the minimum DTW distance, namely the best similar subsequence.
2. The FAD and DTW-based hydrological time series similarity search method according to claim 1, wherein the step S2 is implemented as follows:
step S2.1, a time sequence T ═ x is provided 1 ,x 2 ...x n ) If one of the following conditions is satisfied, the data point T is called (x) 1 ,x 2 ...x n ) Is an extreme point:
(1) m is 1 or m is n;
(2)x m ≥x m-1 and x m ≥x m+1 Wherein m is more than 1 and less than n;
(3)x m ≤x m-1 and x m ≤x m+1 Wherein m is more than 1 and less than n;
and S2.2, giving semantic rise (U), maintenance (B) and decline (D) to the data segments between the adjacent extreme points, and performing semantic symbolization representation on the historical time sequence and the sequence to be queried.
3. The FAD and DTW-based hydrological time series similarity search method according to claim 1, wherein the step S4 is implemented as follows:
step S4.1, a certain time sequence T ═ x is set 1 ,x 2 ...x n ) Converting the original time series into a derivative estimation series by equation (1)
Wherein, X h As a sequence of timeColumn T ═ x 1 ,x 2 ...x n ) One data point of;
step S4.2 after obtaining the derivative estimation sequenceThe derivative values are divided into different sign values according to their distribution, which reflect the trend information of the time series. The conversion formula for the symbolic representation sequence is as follows:
wherein R is h Is thatThe symbolization of (2) indicates that the parameter epsilon is a threshold value (epsilon is more than or equal to 0) of the variation trend and is used for judging the variation amplitude of the data. The parameter λ is the number of symbols used to represent the original sequence;
s4.3, transforming the obtained symbolic representation sequence to obtain a characteristic sequenceWherein S j =(R j ,k j ),R j Is a sequence of featuresIs a symbol, k j The number of adjacent points of the same symbol;
and S4.4, acquiring the subsequences in the preliminary candidate set and the characteristic sequences corresponding to the sequences to be inquired according to the steps.
4. The FAD and DTW based hydrological time series similarity search method of claim 1, wherein the FAD similarity measure of step S5 is implemented by the following process:
step S5.1, two characteristic sequences are assumed to existAndif the sequence isAndthe corresponding segments in (1) are represented by different symbols, which shows that the variation trends of the two segments are different, and the distance formula between the two segments is as follows:
D(S1 i ,S2 j )=1,(R1 i ≠R2 j )
wherein S1 i And S2 j Are respectively a sequenceAnda fragment subsequence of (2), R1 i And R2 j Is S1 i And S2 j A corresponding symbolic representation;
step S5.2, if sequenceAndcorresponding segments are represented by the same symbols, which indicates that two segments have similar variation trends, and the distance between the two segments depends mainly on the length difference, and the calculation formula is as follows:
wherein k1 i And k2 j Are respectivelyAndthe number of intermediate points, γ, is an adjustable parameter for varying the ratio of the distance of the same symbol to a different symbol. In theory, the distance of the same symbol segments must be less than the distance of the different symbol segments. Therefore, 0. ltoreq. D (S1) i ,S2 j ) < 1 and gamma. epsilon. [0, 1 ]]。
Step S5.3, since the length of the time sequence may not be equal and the time warping of FAD, there will be some segments in a certain sequence that no segments can map. These fragments can be considered dissimilar to any fragment belonging to another sequence, and the formula is calculated as follows:
D(-,S i )=1
step S5.4, combining steps S5.1 to S5.3, summarizes the distance calculation formula of the two segments as follows.
5. the FAD and DTW-based hydrological time series similarity search method according to claim 1, wherein the DTW similarity measure of step S6 is implemented by the following process:
and S6.1, calculating DTW distance values of the sequence to be queried and the subsequences in the candidate set to be matched, and acquiring the first 4 subsequences with the minimum DTW distance as the optimal similar subsequences. The DTW distance is calculated as follows:
wherein X and Y represent two time series for DTW similarity measurement, D base (x i ,y j ) The base distance between the ith time point vector representing X and the jth time point vector of Y is expressed by euclidean distance.
And S6.2, outputting the final similar sequence result set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210531963.9A CN114911846A (en) | 2022-05-17 | 2022-05-17 | FAD and DTW-based hydrological time sequence similarity searching method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210531963.9A CN114911846A (en) | 2022-05-17 | 2022-05-17 | FAD and DTW-based hydrological time sequence similarity searching method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114911846A true CN114911846A (en) | 2022-08-16 |
Family
ID=82766136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210531963.9A Pending CN114911846A (en) | 2022-05-17 | 2022-05-17 | FAD and DTW-based hydrological time sequence similarity searching method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114911846A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115729981A (en) * | 2022-11-29 | 2023-03-03 | 中国长江电力股份有限公司 | Similar water regime data mining method based on editing distance and application thereof |
CN115994137A (en) * | 2023-03-23 | 2023-04-21 | 无锡弘鼎软件科技有限公司 | Data management method based on application service system of Internet of things |
-
2022
- 2022-05-17 CN CN202210531963.9A patent/CN114911846A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115729981A (en) * | 2022-11-29 | 2023-03-03 | 中国长江电力股份有限公司 | Similar water regime data mining method based on editing distance and application thereof |
CN115729981B (en) * | 2022-11-29 | 2024-02-13 | 中国长江电力股份有限公司 | Editing distance-based similar water condition data mining method and application thereof |
CN115994137A (en) * | 2023-03-23 | 2023-04-21 | 无锡弘鼎软件科技有限公司 | Data management method based on application service system of Internet of things |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114911846A (en) | FAD and DTW-based hydrological time sequence similarity searching method | |
CN111242377B (en) | Short-term wind speed prediction method integrating deep learning and data denoising | |
Hong et al. | SSDTW: Shape segment dynamic time warping | |
WO2023226292A1 (en) | Method for extracting relation from text, relation extraction model, and medium | |
CN111949707B (en) | Shadow field-based hidden Markov model non-invasive load decomposition method | |
CN110837736B (en) | Named entity recognition method of Chinese medical record based on word structure | |
CN103559232A (en) | Music humming searching method conducting matching based on binary approach dynamic time warping | |
Muthumanickam et al. | Shape grammar extraction for efficient query-by-sketch pattern matching in long time series | |
CN111125380B (en) | Entity linking method based on RoBERTa and heuristic algorithm | |
CN113836341B (en) | Remote sensing image retrieval method based on unsupervised converter balanced hash | |
CN117766021A (en) | Deep learning algorithm for predicting protein-polypeptide binding site | |
CN111916064A (en) | End-to-end neural network speech recognition model training method | |
JP2003132088A (en) | Time series data retrieval system | |
CN104484425A (en) | Color image searching method based on multiple features | |
CN112767922B (en) | Speech recognition method for contrast predictive coding self-supervision structure joint training | |
CN113515540A (en) | Query rewriting method for database | |
CN117033657A (en) | Information retrieval method and device | |
CN118378135B (en) | Gas well effusion classification and prediction method based on frequency channel conversion and self-supervision | |
CN115565177A (en) | Character recognition model training method, character recognition device, character recognition equipment and medium | |
CN113486668A (en) | Electric power knowledge entity identification method, device, equipment and medium | |
CN111507103B (en) | Self-training neural network word segmentation model using partial label set | |
Bammer et al. | Invariance and stability of Gabor scattering for music signals | |
CN114648152B (en) | Building energy consumption prediction method and system based on state constraint and time-frequency characteristics | |
Chatzigeorgakidis et al. | MultiCast: Zero-Shot Multivariate Time Series Forecasting Using LLMs | |
CN116562294A (en) | Bridge text small sample named entity recognition method based on prompt learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |