CN108647737A - A kind of auto-adaptive time sequence variation detection method and device based on cluster - Google Patents

A kind of auto-adaptive time sequence variation detection method and device based on cluster Download PDF

Info

Publication number
CN108647737A
CN108647737A CN201810471537.4A CN201810471537A CN108647737A CN 108647737 A CN108647737 A CN 108647737A CN 201810471537 A CN201810471537 A CN 201810471537A CN 108647737 A CN108647737 A CN 108647737A
Authority
CN
China
Prior art keywords
prefix trees
test
time series
path
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810471537.4A
Other languages
Chinese (zh)
Inventor
王宏志
杜冠宏
万晓珑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201810471537.4A priority Critical patent/CN108647737A/en
Publication of CN108647737A publication Critical patent/CN108647737A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The present invention relates to technical field of data processing, provide a kind of auto-adaptive time sequence variation detection method and device based on cluster, and this method includes:Dimensionality reduction and symbolism are carried out to the time series in training set by SAX methods, obtain the time series of symbolism;Prefix trees are built according to the time series of symbolism;When extracting Test segment from test set by sliding window, and judging a route matching in the Test segment and prefix trees that are extracted in test set, by the Test segment of the extraction labeled as normal.The present invention reduces possible Time & Space Complexity by dimensionality reduction, and new pattern self can be changed during test data and be adapted to model therein, has certain rule or periodic data set suitable for having.

Description

A kind of auto-adaptive time sequence variation detection method and device based on cluster
Technical field
The present invention relates to technical field of data processing more particularly to a kind of auto-adaptive time sequence variation inspections based on cluster Survey method and device.
Background technology
Time series is ubiquitous, is widely used in medical analysis, weather forecast, the fields such as Prediction of Stock Index.For many years, it grinds Personnel are studied carefully always in the detection of search time sequence variation and data scrubbing.Abnormality detection is to data analysis and status checkout to closing weight It wants.For example, intruding detection system is highly dependent on abnormality detection, because system mode is counted as the time sequence of state parameter Row.Exceptional value is defined as a kind of observation as a result, it obviously or statistically deviates from other observation results by Huo Jinsi.Obviously, We can design detection scheme using this for the definition of time series exceptional value.
Currently, there is the method for detecting abnormality of following several prevalences:
1, unsupervised learning method, such as KNN.Unsupervised approaches are generally too simply, although they are omnipotent, precisely Degree is not often high, and also often mistakenly identification is abnormal.
2, algorithm financial, that the industries such as medical treatment combine, such as HTM.Often they are only used for corresponding field to these algorithms Data, once deviateing this field, the discrimination of algorithm will appear prodigious limitation.
3, the method based on data statistics rule, such as IMR.These methods first have to find the statistical law in data, It is typically the variation limitation found out in percentage speed variation or vertical orientations for One-dimension Time Series;For multidimensional when Between sequence, the statistical law between sequence, such as normal distribution can be found, clustering algorithm is then designed by statistical law. The shortcomings that this kind of method, is, needs the statistical property that data are known in advance, may also need to the help of domain expert.
4, machine learning algorithm, such as SVM, HMM.Machine learning algorithm is the emerging side for probing into time series exception Method, this field is at the early-stage, there is certain development, and the abnormal conditions of testing data are detected by training pattern.
However, on the one hand, the space occupied is big during above-mentioned time Series Processing.On the other hand, if data are weeks Phase time series, above-mentioned major part method has its limitation, for example can not indicate its statistical law, and time series is not simple It is abnormal to being detected with common unsupervised approaches.Secondly, time series certain change may occur with the propulsion of time Change, such as due to the influence of temperature, faint variation has occurred in certain properties of oil, but time series still belongs to normal Range.Possible in the case that these, method before may be less applicable in.
Invention content
The technical problem to be solved in the present invention is that the said one for the prior art or multiple defects provide A kind of auto-adaptive time sequence variation detection method and device based on cluster.
In order to solve the above technical problem, the present invention provides the auto-adaptive time sequence variation detection sides based on cluster Method, including:
1) dimensionality reduction and symbolism are carried out to the time series in training set by SAX methods, obtains the time sequence of symbolism Row;
2) prefix trees are built according to the time series of symbolism;
3) Test segment is extracted from test set by sliding window, and judges the Test segment extracted in test set When with a route matching in prefix trees, by the Test segment of the extraction labeled as normal.
Optionally, prefix trees are built according to the time series of symbolism in the step 2), including:
The time series that the symbolism is scanned by sliding window obtains the subsequence with equal length character, and Prefix trees are built according to the subsequence extracted from training data.
Optionally, the either path in the Test segment and prefix trees that are extracted in judging test set in the step 3) When mismatching, by the Test segment of the extraction labeled as abnormal.
Optionally, the prefix trees include red path and black path;According to the time of symbolism in the step 2) Red path in sequence construct prefix trees;The Test segment and prefix trees extracted in judging test set in the step 3) In either path when mismatching, further calculate the Test segment of the extraction in the prefix trees from tree root to leaf segment Minimum range between all red paths of point, when less than predetermined threshold value, by the Test segment of the extraction labeled as just Often, and the Test segment of the extraction is inserted into prefix trees and is used as black path, by the extraction when not less than predetermined threshold value Test segment labeled as abnormal;3) the next Test segment of extraction is gone to step to be detected.
Optionally, red path is converted to when the counting in detection black path reaches preset value in the step 3).
The auto-adaptive time sequence variation detection device based on cluster that the present invention also provides a kind of, includes at least:It is discrete Processing unit, prefix trees construction unit and abnormality detecting unit;
The discrete processes unit, for carrying out dimensionality reduction and symbolism to the time series in training set by SAX methods, Obtain the time series of symbolism;
The prefix trees construction unit, for building prefix trees according to the time series of symbolism;
The abnormality detecting unit extracts Test segment for passing through sliding window from test set, and judges to test When concentrating a route matching in the Test segment and prefix trees of extraction, by the Test segment of the extraction labeled as just Often.
Optionally, the prefix trees construction unit scans the time series of the symbolism by sliding window, is had There is the subsequence of equal length character, and prefix trees are built according to the subsequence extracted from training data.
Optionally, any in the abnormality detecting unit extracts in judging test set Test segment and prefix trees When path mismatches, by the Test segment of the extraction labeled as abnormal.
Optionally, the prefix trees include red path and black path;According to symbol in the prefix trees construction unit Red path in the time series structure prefix trees of change;The test extracted in judging test set in the abnormality detecting unit When subsequence mismatches with the either path in prefix trees, the Test segment of the extraction and the prefix trees are further calculated In minimum range between all red paths from tree root to leaf node, when less than predetermined threshold value, by the test of the extraction Subsequence is labeled as normally, and the Test segment of the extraction is inserted into prefix trees and is used as black path, when not less than default By the Test segment of the extraction labeled as abnormal when threshold value;The abnormality detecting unit continues thereafter with next test of extraction Sequence is detected.
Optionally, red path is converted to when the counting in abnormality detecting unit detection black path reaches preset value.
Implement the auto-adaptive time sequence variation detection method and device provided in an embodiment of the present invention based on cluster, at least It has the advantages that:
1, the present invention reduces possible Time & Space Complexity by dimensionality reduction, time series is stored in a small amount of, solid In fixed data structure, the memory space of time series needs can be greatly reduced, while ensureing time series fluctuation as possible With the integrality of property.
2, the present invention test data on the way, model can be changed with self and adapt to new pattern, therefore the present invention can Enough accurately identify time series exception, additionally it is possible to which, with the variation of time series, certain normal trends of adaptation time sequence become Change.
Description of the drawings
Fig. 1 is the flow that the embodiment of the present invention one provides the auto-adaptive time sequence variation detection method based on cluster Figure;
Fig. 2 is the schematic diagram according to the subsequence structure prefix trees extracted from training data;
Fig. 3 is that the embodiment of the present invention three is provided and examined extremely in the auto-adaptive time sequence variation detection method based on cluster Survey the flow chart of step;
Fig. 4 is the signal that the embodiment of the present invention four provides the auto-adaptive time sequence variation detection device based on cluster Figure.
In figure:401:Discrete processes unit;402:Prefix trees construction unit;403:Abnormality detecting unit.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The every other embodiment that member is obtained without making creative work, shall fall within the protection scope of the present invention.
Embodiment one
It, can be with as shown in Figure 1, the auto-adaptive time sequence variation detection method provided in an embodiment of the present invention based on cluster Include the following steps:
Step S101:By SAX (symbolic aggregate approximation) methods in training set when Between sequence carry out dimensionality reduction and symbolism, obtain the time series of symbolism;SAX is a kind of method of time series range measurement, " A Symbolic Representation of the Time Series, with delivered referring specifically to J Lin etc. Implications for Streaming Algorithms”.This method assumes that initial data has normal distribution, Gao Sifen (σ is that preset character list is big according to the part for being averagely divided into σ homalographic by the straight line perpendicular to x coordinate for cloth curve It is small, generally take 3-10), cut-point is made of several points of interruption, so that it may be generated under Gaussian curve with simply determination several etc. The region of size.Therefore, the section of the time series in each regional extent will be replaced by a character.It thus can be by time sequence The stamp of multiple continuous times (such as 50 timestamps) of row is converted into a character according to the method described above, thus by the time of higher-dimension Sequence is reduced to low-dimensional, while realizing symbolism.According to the table being provided previously, the SAX distances of each two character can pass through table Lattice calculate.
Step S102:Prefix trees are built according to the time series of symbolism.Specifically, the present invention passes through a fixed size Sliding window scan time series and it is cut into discrete symbol, time series data is converted into equal length The subsequence of character builds prefix trees, as shown in Figure 2 further according to these training subsequences extracted from training data.Fig. 2 In the training subsequence extracted from training data include:Abcda, abcac, abcab, bacad and bacab, according to prefix trees Structure rule, first is root node, and root node does not include character, and each child node in addition to root node includes one Character.From root node to some node, the Connection operator passed through on path, for the corresponding character string of the node, i.e., one A subsequence.The character continuously repeated is begun with from the first character and only takes up a node, as a in abcda, abcac in Fig. 2 is equal The same a character nodes are shared, b shares the same b character nodes, and c shares the same c character nodes.
Step S103:Test segment is extracted from test set by sliding window, and judges the survey extracted in test set When a route matching in swab sequence and prefix trees, by the Test segment of the extraction labeled as normal.Institute in the present invention State the complete phase of subsequence that " matching " refers to the Test segment extracted in test set and any one path representation in prefix trees Together, it as test subsequence is classified as abcac, is then matched with training subsequence abcac in Fig. 2.The time series of test set also needs Dimensionality reduction and symbolism are carried out using SAX methods, then after being converted into subsequence (the i.e. described Test segment) by sliding window, then It is matched with the path in prefix trees.In the Test segment and prefix trees that are extracted in judging test set in step S103 Either path when mismatching, by the Test segment of the extraction labeled as abnormal.Prefix trees are when being built by training data, It is accurate model that is normal and representing time series, so if the sequence extracted from test set i.e. test subsequence It arranges and matches with a path of prefix trees, Test segment is marked as normally, otherwise it is assumed that Test segment is abnormal , because it is too big with the difference of raw mode.
The present invention proposes a kind of new method to solve the problems, such as abnormality detection, symbolism and reduces time series input Dimension, construction prefix trees build model and test data.Present invention application SAX methods, introduce the discrete of discrete data Change, and calculates the space length of discrete data to carry out rejecting outliers.Due to the higher-dimension property and big data quantity of time series, The present invention reduces possible Time & Space Complexity by dimensionality reduction.Time series is stored on a small quantity by the present invention, fixed In data structure, the memory space of time series needs can be greatly reduced, while ensureing time series fluctuation and property as possible The integrality of matter.
Embodiment two
Embodiment two is on the basis of the auto-adaptive time sequence variation detection method based on cluster that embodiment one provides It is optimized, wherein the prefix trees built include red path and black path.Specific in flow:
The red path in prefix trees is built according to the time series of symbolism in step s 102.
Test segment is extracted from test set by sliding window in step s 103, and judge to extract in test set When a route matching in Test segment and prefix trees, by the Test segment of the extraction labeled as normal.Judging to survey When examination concentrates the either path in the Test segment and prefix trees of extraction to mismatch, test of the extraction is further calculated The minimum range between all red paths in sequence and the prefix trees from tree root to leaf node, when less than predetermined threshold value When, the Test segment of the extraction is labeled as normally, and the Test segment of the extraction is inserted into prefix trees and is used as black Path, when minimum range is not less than predetermined threshold value by the Test segment of the extraction labeled as abnormal.When the meter in black path Number is converted into red path when reaching preset value.Next Test segment is then extracted after step S103 again to carry out Detection, until all Test segment abnormality detections in test set finish.It should be appreciated that in the present invention " red path " " black path " is only intended to distinguish the type in two kinds of paths, and this field basic technology personnel can also use other not of the same name Claim to refer to both paths.
Red path is to be built by training data, therefore represent time series just in prefix trees in the embodiment two Norm formula.But certain patterns may not occur in training data, but still belong to normal mode, how that avoids the problem Exception definition is the excessive value of deviation observed result and observe with other results by Huo Jinsi.That is, being similar to other The observation result of observation can be marked as normal mode.Similarly, if certain patterns are often sent out in time series data It is raw, then it may be considered normal.Therefore, the present invention is during data test, if any red in prefix trees Path is not exactly matched with test pattern, and the minimum range calculated is less than threshold value, then can be by a new black road Diameter is inserted into prefix trees.Black path is new model itself, and terminal node is that test data concentrates the counting for the pattern occur. When counting arrival certain proportion, black path is considered normal, and is converted to red path.
The present invention needs the time complexity with O (n) come scan time series data.When noting abnormalities, the sequence of test It will be compared with each paths in prefix trees, therefore the time complexity of comparison procedure is O (k) (k:In prefix trees Number of path).The case where worst is that do not have any subsequence to share the same character, and no test in prefix trees Any paths in sequences match prefix trees, in this case, time complexity are O (n^2).
Embodiment three
The present invention provides the auto-adaptive time sequence variation based on cluster of embodiment three on the basis of embodiment two Detection method, which provide the detailed process of step S103 is as follows:
Step S301:Start;
Step S302:Enable i=1;
Step S303:The Test segment Y [i, i+L-1] in test set is extracted using the time window that length is L;
Step S304:Whether Test segment Y [i, i+L-1] matches with the red path in prefix trees, is to go to step Otherwise S305 goes to step S306;
Step S305:Test segment Y [i, i+L-1] is labeled as normally, and goes to step S316;
Step S306:Whether Test segment Y [i, i+L-1] matches with the black path in prefix trees, is to go to step Otherwise S307 goes to step S308;
Step S307:By Test segment Y [i, i+L-1] labeled as normal;
Step S308:Count is incremented in matched black path;
Step S309:Whether the counting in matched black path is more than preset valueIt is to go to step S310, otherwise goes to step S316;
Step S310:Matched black path is switched into red path;
Step S311:Calculate Test segment Y [i, i+L-1] and all red roads in prefix trees from tree root to leaf node Minimum range between diameter;Distance in the present invention refer to the subsequence of two symbolism calculated according to SAX methods SAX away from From.The SAX distances are to be calculated by the character distance of each position of two subsequences and added up to sum and obtain.For example, for son For sequence abcde and acdee, wherein each sub-sequence length is 5, include the character of No. 0 position to No. 4 positions, then the two sons The SAX distances of sequence are equal to the SAX sum of the distance of each position.Calculate separately the SAX distances between No. 0 position a and a, No. 1 position The SAX distances of b and c, the SAX distances between No. 2 positions c and d, the SAX distances between No. 3 positions d and e, between No. 4 positions e and e SAX distances, then this 5 SAX distances are summed.And the SAX distances of each two character, it can be by the SAX that is provided previously Table lookup, which calculates, to be obtained, such as table 1.
Table 1
Annotation:σ is character table size, that is, the character quantity divided, β is corresponding value;If character table size is σ=3, i.e., Tri- characters of a, b, c, reference numeral 3,2 and 1.
SAX distance calculating methods:
Wherein Cellr,cFor the SAX distances between character r and c, | r-c | refer to the number absolute value of the difference of character r and c, Max (r, c) refers to the number maximum value of character r and c, and min (r, c) refers to the number minimum value of character r and c.
If for example, σ=4, that is, include 4 characters a, b, c and d, wherein reference numeral d=1, c=2, b=3, a=4, then The distance of character a and character d is:Cella,d31=0.67- (- 0.67)=1.34.
Therefore the present invention calculates the SAX distances of Test segment Y [i, i+L-1] and all red paths in prefix trees, and It chooses minimum range therein and is used as follow-up judgement.Step S312:Judge whether minimum range is less than predetermined threshold value, is to turn to walk Rapid S313, otherwise goes to step S315;
Step S313:By Test segment Y [i, i+L-1] labeled as normal;
Step S314:Test segment Y [i, i+L-1] is inserted into black path new as one in prefix trees;
Step S315:By Test segment Y [i, i+L-1] labeled as exception, and go to step S316;
Step S316:Enable i=i+1;
Step S317:Judge i>Whether m-L+1 is true, is, goes to step S318, otherwise goes to step S303;Wherein m is to survey Try the length of time series of collection.
Step S318:The time series of output token, newer prefix trees;
Step S319:Terminate.
The pseudocode of the algorithm is as follows:
Algorithm:Data adaptive clusters (DAC)
Input:Window size:L, length of time series:M, prefix trees:T is inputted
Output:The time series of label, newer T outputs
Above-mentioned algorithm accurately matched in prefix trees a red path sequence can be marked as it is normal (3-4 rows, Situation 1).If Test segment and black route matching, check the counting in corresponding black path whether higher than predefined Value, it means that it can be converted into red path (5-9 rows, situation 2).Otherwise, situation 3 is happened at sequence and generates newly Black path (14-15 rows) or when may be marked as abnormal (the 17th row).In the 12nd row, function GetMinDis is returned The minimum range between all red paths in Test segment Y [i, i+L-1] and prefix trees from tree root to leaf node.
Example IV
It, can be with as shown in figure 4, the auto-adaptive time sequence variation detection device provided in an embodiment of the present invention based on cluster Including:Discrete processes unit 401, prefix trees construction unit 402 and abnormality detecting unit 403;
Discrete processes unit 401, for carrying out dimensionality reduction and symbolism to the time series in training set by SAX methods, Obtain the time series of symbolism.The operation that the discrete processes unit 401 executes is identical as abovementioned steps S101;
Prefix trees construction unit 402, for building prefix trees according to the time series of symbolism.The prefix trees construction unit The operation of 402 execution is identical as abovementioned steps S102;
Abnormality detecting unit 403 for extracting Test segment from test set, and judges the test extracted in test set When a route matching in subsequence and prefix trees, by the Test segment of the extraction labeled as normal.The abnormality detection list The operation that member 403 executes is identical as abovementioned steps S103.
Preferably, prefix trees construction unit 402 by the window scan time series of a fixed size and cuts it At discrete symbol, time series is set to be converted into the subsequence of the character with equal length, and according to from training data The subsequence of extraction builds prefix trees.
Preferably, any in abnormality detecting unit 403 extracts in judging test set Test segment and prefix trees When path mismatches, by the Test segment of the extraction labeled as abnormal.
Embodiment five
Embodiment five is on the basis of the auto-adaptive time sequence variation detection device based on cluster that example IV provides It is optimized, wherein the prefix trees built include red path and black path.According to symbol in prefix trees construction unit 402 Red path in the time series structure prefix trees of change.The test extracted in judging test set in abnormality detecting unit 403 When subsequence mismatches with the either path in prefix trees, the Test segment of the extraction and the prefix trees are further calculated In minimum range between all red paths from tree root to leaf node, when less than predetermined threshold value, by the test of the extraction Subsequence is labeled as normally, and the Test segment of the extraction is inserted into prefix trees and is used as black path, when not less than default By the Test segment of the extraction labeled as abnormal when threshold value;Abnormality detecting unit 403 continues thereafter with next test of extraction Sequence is detected.The counting that abnormality detecting unit 403 detects black path is converted to red path when reaching preset value.
The present invention is since a large amount of time series is stored in one tree, and many of which prefix sequence all having the same Row, the space consuming that can greatly reduce with the inventive method, and record the shape of initial data.Meanwhile side of the present invention The advantage of cluster is utilized in method, and considers the potential time sequence model for not having occur in training data.Although the worst Situation is seldom happened in the time series data collection of reality, but the present invention wishes to have applied it to certain rule or period The data set of property, or be not at least irregular pattern.
By experiment test, method of the invention can detect irregular fluctuation, but most of systems in the prior art Meter method, including combined estimation method all can not achieve the detection.This method particularly with cycle time sequence abnormal patterns, Protrusion, continuous mistake or feature missing are very sensitive.In addition, being shown in the test of the cumulative sequence of white noise by this method This method can gradually adapt to the time series of test data set, be gradually decreased in cumulative being no different in the test process of regular data Abnormal error detection.
It is further to note that the auto-adaptive time sequence variation detection dress provided in an embodiment of the present invention based on cluster It sets, can also be realized by way of hardware or software and hardware combining by software realization.For implemented in software, such as Shown in Fig. 4, as the device on a logical meaning, being will be corresponding in nonvolatile memory by the CPU of equipment where it Computer program instructions read in memory what operation was formed.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, it will be understood by those of ordinary skill in the art that:It still may be used With technical scheme described in the above embodiments is modified or equivalent replacement of some of the technical features; And these modifications or replacements, various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of auto-adaptive time sequence variation detection method based on cluster, which is characterized in that including:
1) dimensionality reduction and symbolism are carried out to the time series in training set by SAX methods, obtains the time series of symbolism;
2) prefix trees are built according to the time series of symbolism;
3) Test segment is extracted from test set by sliding window, and judges the Test segment extracted in test set with before When sewing a route matching in tree, by the Test segment of the extraction labeled as normal.
2. according to the method described in claim 1, it is characterized in that, being built according to the time series of symbolism in the step 2) Prefix trees, including:
The time series that the symbolism is scanned by sliding window, obtains the subsequence with equal length character, and according to The subsequence structure prefix trees extracted from training data.
3. according to the method described in claim 1, it is characterized in that, the test extracted in judging test set in the step 3) When subsequence mismatches with the either path in prefix trees, by the Test segment of the extraction labeled as abnormal.
4. according to the method described in claim 1, it is characterized in that, the prefix trees include red path and black path;
The red path in prefix trees is built according to the time series of symbolism in the step 2);
When either path in the step 3) in the Test segment and prefix trees extracted in judging test set mismatches, Further calculate the extraction Test segment and the prefix trees between all red paths from tree root to leaf node Minimum range, when less than predetermined threshold value, by the Test segment of the extraction labeled as normal, and by the test subsequence of the extraction Row, which are inserted into prefix trees, is used as black path, by the Test segment of the extraction labeled as abnormal when not less than predetermined threshold value; 3) the next Test segment of extraction is gone to step to be detected.
5. according to the method described in claim 4, it is characterized in that, the counting for detecting black path in the step 3) reaches pre- If being converted to red path when value.
6. a kind of auto-adaptive time sequence variation detection device based on cluster, which is characterized in that include at least:Discrete processes list Member, prefix trees construction unit and abnormality detecting unit;
The discrete processes unit is obtained for carrying out dimensionality reduction and symbolism to the time series in training set by SAX methods The time series of symbolism;
The prefix trees construction unit, for building prefix trees according to the time series of symbolism;
The abnormality detecting unit extracts Test segment for passing through sliding window from test set, and judges in test set When a route matching in the Test segment and prefix trees of extraction, by the Test segment of the extraction labeled as normal.
7. device according to claim 6, which is characterized in that the prefix trees construction unit scans institute by sliding window The time series for stating symbolism, obtains the subsequence with equal length character, and according to described in being extracted from training data Subsequence builds prefix trees.
8. device according to claim 6, which is characterized in that the abnormality detecting unit extracted in judging test set When Test segment is mismatched with the either path in prefix trees, by the Test segment of the extraction labeled as abnormal.
9. device according to claim 6, it is characterised in that:The prefix trees include red path and black path;
The red path in prefix trees is built according to the time series of symbolism in the prefix trees construction unit;
Either path in the Test segment and prefix trees that are extracted in judging test set in the abnormality detecting unit is not The Test segment for further calculating the extraction when matching and all red paths in the prefix trees from tree root to leaf node Between minimum range, when less than predetermined threshold value, by the Test segment of the extraction labeled as normal, and by the survey of the extraction Swab sequence, which is inserted into prefix trees, is used as black path, is labeled as the Test segment of the extraction when not less than predetermined threshold value It is abnormal;The abnormality detecting unit continues thereafter with the next Test segment of extraction and is detected.
10. device according to claim 9, which is characterized in that the counting in abnormality detecting unit detection black path Red path is converted to when reaching preset value.
CN201810471537.4A 2018-05-17 2018-05-17 A kind of auto-adaptive time sequence variation detection method and device based on cluster Pending CN108647737A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810471537.4A CN108647737A (en) 2018-05-17 2018-05-17 A kind of auto-adaptive time sequence variation detection method and device based on cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810471537.4A CN108647737A (en) 2018-05-17 2018-05-17 A kind of auto-adaptive time sequence variation detection method and device based on cluster

Publications (1)

Publication Number Publication Date
CN108647737A true CN108647737A (en) 2018-10-12

Family

ID=63756345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810471537.4A Pending CN108647737A (en) 2018-05-17 2018-05-17 A kind of auto-adaptive time sequence variation detection method and device based on cluster

Country Status (1)

Country Link
CN (1) CN108647737A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110071913A (en) * 2019-03-26 2019-07-30 同济大学 A kind of time series method for detecting abnormality based on unsupervised learning
CN110532297A (en) * 2019-08-01 2019-12-03 河海大学 A kind of symbolism Hydrological Time Series abnormal patterns detection method based on hierarchical clustering
CN110929800A (en) * 2019-11-29 2020-03-27 四川万益能源科技有限公司 Business body abnormal electricity utilization detection method based on sax algorithm
CN111612082A (en) * 2020-05-26 2020-09-01 河北小企鹅医疗科技有限公司 Method and device for detecting abnormal subsequence in time sequence
CN117435676A (en) * 2023-07-13 2024-01-23 南京电力设计研究院有限公司 Building energy management method based on subsequence mining and directed weighted graph clustering

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110071913A (en) * 2019-03-26 2019-07-30 同济大学 A kind of time series method for detecting abnormality based on unsupervised learning
CN110532297A (en) * 2019-08-01 2019-12-03 河海大学 A kind of symbolism Hydrological Time Series abnormal patterns detection method based on hierarchical clustering
CN110929800A (en) * 2019-11-29 2020-03-27 四川万益能源科技有限公司 Business body abnormal electricity utilization detection method based on sax algorithm
CN110929800B (en) * 2019-11-29 2022-10-21 四川万益能源科技有限公司 Business body abnormal electricity utilization detection method based on sax algorithm
CN111612082A (en) * 2020-05-26 2020-09-01 河北小企鹅医疗科技有限公司 Method and device for detecting abnormal subsequence in time sequence
CN117435676A (en) * 2023-07-13 2024-01-23 南京电力设计研究院有限公司 Building energy management method based on subsequence mining and directed weighted graph clustering

Similar Documents

Publication Publication Date Title
CN108647737A (en) A kind of auto-adaptive time sequence variation detection method and device based on cluster
Ponti et al. A decision cognizant Kullback–Leibler divergence
Gala et al. Active learning of neuron morphology for accurate automated tracing of neurites
Sharpnack et al. Changepoint detection over graphs with the spectral scan statistic
Parker An analysis of performance measures for binary classifiers
CN104216349B (en) Utilize the yield analysis system and method for the sensing data of manufacturing equipment
CN108182433A (en) A kind of meter reading recognition methods and system
US20210406727A1 (en) Managing defects in a model training pipeline using synthetic data sets associated with defect types
CN107545273A (en) A kind of local outlier detection method based on density
Li et al. Outlier detection using structural scores in a high-dimensional space
CN110389971A (en) A kind of multi-Sensor Information Fusion Approach based on cloud computing
Wang et al. Plant leaf tooth feature extraction
CN110717602B (en) Noise data-based machine learning model robustness assessment method
Al‐Tahhan et al. Accurate automatic detection of acute lymphatic leukemia using a refined simple classification
CN111811567A (en) Equipment detection method based on curve inflection point comparison and related device
Zhang et al. Spectral radius-based interval principal component analysis (SR-IPCA) for fault detection in industrial processes with imprecise data
CN109145764B (en) Method and device for identifying unaligned sections of multiple groups of detection waveforms of comprehensive detection vehicle
CN111651340B (en) Alarm data rule mining method and device and electronic equipment
CN112949735A (en) Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining
CN116628611A (en) Visual analysis method and system for association of abnormal modes of machine tool operation data
Vignon Inference in morphological taxonomy using collinear data and small sample sizes: Monogenean sclerites (Platyhelminthes) as a case study
Strobl et al. Mitigating pathogenesis for target discovery and disease subtyping
Li et al. Control chart pattern recognition under small shifts based on multi-scale weighted ordinal pattern and ensemble classifier
CN112101468A (en) Method for judging abnormal sequence in sequence combination
CN112765011B (en) Quality control state judging method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181012

RJ01 Rejection of invention patent application after publication