CN108647737A - A kind of auto-adaptive time sequence variation detection method and device based on cluster - Google Patents
A kind of auto-adaptive time sequence variation detection method and device based on cluster Download PDFInfo
- Publication number
- CN108647737A CN108647737A CN201810471537.4A CN201810471537A CN108647737A CN 108647737 A CN108647737 A CN 108647737A CN 201810471537 A CN201810471537 A CN 201810471537A CN 108647737 A CN108647737 A CN 108647737A
- Authority
- CN
- China
- Prior art keywords
- prefix trees
- test
- time series
- path
- extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Abstract
The present invention relates to technical field of data processing, provide a kind of auto-adaptive time sequence variation detection method and device based on cluster, and this method includes:Dimensionality reduction and symbolism are carried out to the time series in training set by SAX methods, obtain the time series of symbolism;Prefix trees are built according to the time series of symbolism;When extracting Test segment from test set by sliding window, and judging a route matching in the Test segment and prefix trees that are extracted in test set, by the Test segment of the extraction labeled as normal.The present invention reduces possible Time & Space Complexity by dimensionality reduction, and new pattern self can be changed during test data and be adapted to model therein, has certain rule or periodic data set suitable for having.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of auto-adaptive time sequence variation inspections based on cluster
Survey method and device.
Background technology
Time series is ubiquitous, is widely used in medical analysis, weather forecast, the fields such as Prediction of Stock Index.For many years, it grinds
Personnel are studied carefully always in the detection of search time sequence variation and data scrubbing.Abnormality detection is to data analysis and status checkout to closing weight
It wants.For example, intruding detection system is highly dependent on abnormality detection, because system mode is counted as the time sequence of state parameter
Row.Exceptional value is defined as a kind of observation as a result, it obviously or statistically deviates from other observation results by Huo Jinsi.Obviously,
We can design detection scheme using this for the definition of time series exceptional value.
Currently, there is the method for detecting abnormality of following several prevalences:
1, unsupervised learning method, such as KNN.Unsupervised approaches are generally too simply, although they are omnipotent, precisely
Degree is not often high, and also often mistakenly identification is abnormal.
2, algorithm financial, that the industries such as medical treatment combine, such as HTM.Often they are only used for corresponding field to these algorithms
Data, once deviateing this field, the discrimination of algorithm will appear prodigious limitation.
3, the method based on data statistics rule, such as IMR.These methods first have to find the statistical law in data,
It is typically the variation limitation found out in percentage speed variation or vertical orientations for One-dimension Time Series;For multidimensional when
Between sequence, the statistical law between sequence, such as normal distribution can be found, clustering algorithm is then designed by statistical law.
The shortcomings that this kind of method, is, needs the statistical property that data are known in advance, may also need to the help of domain expert.
4, machine learning algorithm, such as SVM, HMM.Machine learning algorithm is the emerging side for probing into time series exception
Method, this field is at the early-stage, there is certain development, and the abnormal conditions of testing data are detected by training pattern.
However, on the one hand, the space occupied is big during above-mentioned time Series Processing.On the other hand, if data are weeks
Phase time series, above-mentioned major part method has its limitation, for example can not indicate its statistical law, and time series is not simple
It is abnormal to being detected with common unsupervised approaches.Secondly, time series certain change may occur with the propulsion of time
Change, such as due to the influence of temperature, faint variation has occurred in certain properties of oil, but time series still belongs to normal
Range.Possible in the case that these, method before may be less applicable in.
Invention content
The technical problem to be solved in the present invention is that the said one for the prior art or multiple defects provide
A kind of auto-adaptive time sequence variation detection method and device based on cluster.
In order to solve the above technical problem, the present invention provides the auto-adaptive time sequence variation detection sides based on cluster
Method, including:
1) dimensionality reduction and symbolism are carried out to the time series in training set by SAX methods, obtains the time sequence of symbolism
Row;
2) prefix trees are built according to the time series of symbolism;
3) Test segment is extracted from test set by sliding window, and judges the Test segment extracted in test set
When with a route matching in prefix trees, by the Test segment of the extraction labeled as normal.
Optionally, prefix trees are built according to the time series of symbolism in the step 2), including:
The time series that the symbolism is scanned by sliding window obtains the subsequence with equal length character, and
Prefix trees are built according to the subsequence extracted from training data.
Optionally, the either path in the Test segment and prefix trees that are extracted in judging test set in the step 3)
When mismatching, by the Test segment of the extraction labeled as abnormal.
Optionally, the prefix trees include red path and black path;According to the time of symbolism in the step 2)
Red path in sequence construct prefix trees;The Test segment and prefix trees extracted in judging test set in the step 3)
In either path when mismatching, further calculate the Test segment of the extraction in the prefix trees from tree root to leaf segment
Minimum range between all red paths of point, when less than predetermined threshold value, by the Test segment of the extraction labeled as just
Often, and the Test segment of the extraction is inserted into prefix trees and is used as black path, by the extraction when not less than predetermined threshold value
Test segment labeled as abnormal;3) the next Test segment of extraction is gone to step to be detected.
Optionally, red path is converted to when the counting in detection black path reaches preset value in the step 3).
The auto-adaptive time sequence variation detection device based on cluster that the present invention also provides a kind of, includes at least:It is discrete
Processing unit, prefix trees construction unit and abnormality detecting unit;
The discrete processes unit, for carrying out dimensionality reduction and symbolism to the time series in training set by SAX methods,
Obtain the time series of symbolism;
The prefix trees construction unit, for building prefix trees according to the time series of symbolism;
The abnormality detecting unit extracts Test segment for passing through sliding window from test set, and judges to test
When concentrating a route matching in the Test segment and prefix trees of extraction, by the Test segment of the extraction labeled as just
Often.
Optionally, the prefix trees construction unit scans the time series of the symbolism by sliding window, is had
There is the subsequence of equal length character, and prefix trees are built according to the subsequence extracted from training data.
Optionally, any in the abnormality detecting unit extracts in judging test set Test segment and prefix trees
When path mismatches, by the Test segment of the extraction labeled as abnormal.
Optionally, the prefix trees include red path and black path;According to symbol in the prefix trees construction unit
Red path in the time series structure prefix trees of change;The test extracted in judging test set in the abnormality detecting unit
When subsequence mismatches with the either path in prefix trees, the Test segment of the extraction and the prefix trees are further calculated
In minimum range between all red paths from tree root to leaf node, when less than predetermined threshold value, by the test of the extraction
Subsequence is labeled as normally, and the Test segment of the extraction is inserted into prefix trees and is used as black path, when not less than default
By the Test segment of the extraction labeled as abnormal when threshold value;The abnormality detecting unit continues thereafter with next test of extraction
Sequence is detected.
Optionally, red path is converted to when the counting in abnormality detecting unit detection black path reaches preset value.
Implement the auto-adaptive time sequence variation detection method and device provided in an embodiment of the present invention based on cluster, at least
It has the advantages that:
1, the present invention reduces possible Time & Space Complexity by dimensionality reduction, time series is stored in a small amount of, solid
In fixed data structure, the memory space of time series needs can be greatly reduced, while ensureing time series fluctuation as possible
With the integrality of property.
2, the present invention test data on the way, model can be changed with self and adapt to new pattern, therefore the present invention can
Enough accurately identify time series exception, additionally it is possible to which, with the variation of time series, certain normal trends of adaptation time sequence become
Change.
Description of the drawings
Fig. 1 is the flow that the embodiment of the present invention one provides the auto-adaptive time sequence variation detection method based on cluster
Figure;
Fig. 2 is the schematic diagram according to the subsequence structure prefix trees extracted from training data;
Fig. 3 is that the embodiment of the present invention three is provided and examined extremely in the auto-adaptive time sequence variation detection method based on cluster
Survey the flow chart of step;
Fig. 4 is the signal that the embodiment of the present invention four provides the auto-adaptive time sequence variation detection device based on cluster
Figure.
In figure:401:Discrete processes unit;402:Prefix trees construction unit;403:Abnormality detecting unit.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The every other embodiment that member is obtained without making creative work, shall fall within the protection scope of the present invention.
Embodiment one
It, can be with as shown in Figure 1, the auto-adaptive time sequence variation detection method provided in an embodiment of the present invention based on cluster
Include the following steps:
Step S101:By SAX (symbolic aggregate approximation) methods in training set when
Between sequence carry out dimensionality reduction and symbolism, obtain the time series of symbolism;SAX is a kind of method of time series range measurement,
" A Symbolic Representation of the Time Series, with delivered referring specifically to J Lin etc.
Implications for Streaming Algorithms”.This method assumes that initial data has normal distribution, Gao Sifen
(σ is that preset character list is big according to the part for being averagely divided into σ homalographic by the straight line perpendicular to x coordinate for cloth curve
It is small, generally take 3-10), cut-point is made of several points of interruption, so that it may be generated under Gaussian curve with simply determination several etc.
The region of size.Therefore, the section of the time series in each regional extent will be replaced by a character.It thus can be by time sequence
The stamp of multiple continuous times (such as 50 timestamps) of row is converted into a character according to the method described above, thus by the time of higher-dimension
Sequence is reduced to low-dimensional, while realizing symbolism.According to the table being provided previously, the SAX distances of each two character can pass through table
Lattice calculate.
Step S102:Prefix trees are built according to the time series of symbolism.Specifically, the present invention passes through a fixed size
Sliding window scan time series and it is cut into discrete symbol, time series data is converted into equal length
The subsequence of character builds prefix trees, as shown in Figure 2 further according to these training subsequences extracted from training data.Fig. 2
In the training subsequence extracted from training data include:Abcda, abcac, abcab, bacad and bacab, according to prefix trees
Structure rule, first is root node, and root node does not include character, and each child node in addition to root node includes one
Character.From root node to some node, the Connection operator passed through on path, for the corresponding character string of the node, i.e., one
A subsequence.The character continuously repeated is begun with from the first character and only takes up a node, as a in abcda, abcac in Fig. 2 is equal
The same a character nodes are shared, b shares the same b character nodes, and c shares the same c character nodes.
Step S103:Test segment is extracted from test set by sliding window, and judges the survey extracted in test set
When a route matching in swab sequence and prefix trees, by the Test segment of the extraction labeled as normal.Institute in the present invention
State the complete phase of subsequence that " matching " refers to the Test segment extracted in test set and any one path representation in prefix trees
Together, it as test subsequence is classified as abcac, is then matched with training subsequence abcac in Fig. 2.The time series of test set also needs
Dimensionality reduction and symbolism are carried out using SAX methods, then after being converted into subsequence (the i.e. described Test segment) by sliding window, then
It is matched with the path in prefix trees.In the Test segment and prefix trees that are extracted in judging test set in step S103
Either path when mismatching, by the Test segment of the extraction labeled as abnormal.Prefix trees are when being built by training data,
It is accurate model that is normal and representing time series, so if the sequence extracted from test set i.e. test subsequence
It arranges and matches with a path of prefix trees, Test segment is marked as normally, otherwise it is assumed that Test segment is abnormal
, because it is too big with the difference of raw mode.
The present invention proposes a kind of new method to solve the problems, such as abnormality detection, symbolism and reduces time series input
Dimension, construction prefix trees build model and test data.Present invention application SAX methods, introduce the discrete of discrete data
Change, and calculates the space length of discrete data to carry out rejecting outliers.Due to the higher-dimension property and big data quantity of time series,
The present invention reduces possible Time & Space Complexity by dimensionality reduction.Time series is stored on a small quantity by the present invention, fixed
In data structure, the memory space of time series needs can be greatly reduced, while ensureing time series fluctuation and property as possible
The integrality of matter.
Embodiment two
Embodiment two is on the basis of the auto-adaptive time sequence variation detection method based on cluster that embodiment one provides
It is optimized, wherein the prefix trees built include red path and black path.Specific in flow:
The red path in prefix trees is built according to the time series of symbolism in step s 102.
Test segment is extracted from test set by sliding window in step s 103, and judge to extract in test set
When a route matching in Test segment and prefix trees, by the Test segment of the extraction labeled as normal.Judging to survey
When examination concentrates the either path in the Test segment and prefix trees of extraction to mismatch, test of the extraction is further calculated
The minimum range between all red paths in sequence and the prefix trees from tree root to leaf node, when less than predetermined threshold value
When, the Test segment of the extraction is labeled as normally, and the Test segment of the extraction is inserted into prefix trees and is used as black
Path, when minimum range is not less than predetermined threshold value by the Test segment of the extraction labeled as abnormal.When the meter in black path
Number is converted into red path when reaching preset value.Next Test segment is then extracted after step S103 again to carry out
Detection, until all Test segment abnormality detections in test set finish.It should be appreciated that in the present invention " red path "
" black path " is only intended to distinguish the type in two kinds of paths, and this field basic technology personnel can also use other not of the same name
Claim to refer to both paths.
Red path is to be built by training data, therefore represent time series just in prefix trees in the embodiment two
Norm formula.But certain patterns may not occur in training data, but still belong to normal mode, how that avoids the problem
Exception definition is the excessive value of deviation observed result and observe with other results by Huo Jinsi.That is, being similar to other
The observation result of observation can be marked as normal mode.Similarly, if certain patterns are often sent out in time series data
It is raw, then it may be considered normal.Therefore, the present invention is during data test, if any red in prefix trees
Path is not exactly matched with test pattern, and the minimum range calculated is less than threshold value, then can be by a new black road
Diameter is inserted into prefix trees.Black path is new model itself, and terminal node is that test data concentrates the counting for the pattern occur.
When counting arrival certain proportion, black path is considered normal, and is converted to red path.
The present invention needs the time complexity with O (n) come scan time series data.When noting abnormalities, the sequence of test
It will be compared with each paths in prefix trees, therefore the time complexity of comparison procedure is O (k) (k:In prefix trees
Number of path).The case where worst is that do not have any subsequence to share the same character, and no test in prefix trees
Any paths in sequences match prefix trees, in this case, time complexity are O (n^2).
Embodiment three
The present invention provides the auto-adaptive time sequence variation based on cluster of embodiment three on the basis of embodiment two
Detection method, which provide the detailed process of step S103 is as follows:
Step S301:Start;
Step S302:Enable i=1;
Step S303:The Test segment Y [i, i+L-1] in test set is extracted using the time window that length is L;
Step S304:Whether Test segment Y [i, i+L-1] matches with the red path in prefix trees, is to go to step
Otherwise S305 goes to step S306;
Step S305:Test segment Y [i, i+L-1] is labeled as normally, and goes to step S316;
Step S306:Whether Test segment Y [i, i+L-1] matches with the black path in prefix trees, is to go to step
Otherwise S307 goes to step S308;
Step S307:By Test segment Y [i, i+L-1] labeled as normal;
Step S308:Count is incremented in matched black path;
Step S309:Whether the counting in matched black path is more than preset valueIt is to go to step S310, otherwise goes to step
S316;
Step S310:Matched black path is switched into red path;
Step S311:Calculate Test segment Y [i, i+L-1] and all red roads in prefix trees from tree root to leaf node
Minimum range between diameter;Distance in the present invention refer to the subsequence of two symbolism calculated according to SAX methods SAX away from
From.The SAX distances are to be calculated by the character distance of each position of two subsequences and added up to sum and obtain.For example, for son
For sequence abcde and acdee, wherein each sub-sequence length is 5, include the character of No. 0 position to No. 4 positions, then the two sons
The SAX distances of sequence are equal to the SAX sum of the distance of each position.Calculate separately the SAX distances between No. 0 position a and a, No. 1 position
The SAX distances of b and c, the SAX distances between No. 2 positions c and d, the SAX distances between No. 3 positions d and e, between No. 4 positions e and e
SAX distances, then this 5 SAX distances are summed.And the SAX distances of each two character, it can be by the SAX that is provided previously
Table lookup, which calculates, to be obtained, such as table 1.
Table 1
Annotation:σ is character table size, that is, the character quantity divided, β is corresponding value;If character table size is σ=3, i.e.,
Tri- characters of a, b, c, reference numeral 3,2 and 1.
SAX distance calculating methods:
Wherein Cellr,cFor the SAX distances between character r and c, | r-c | refer to the number absolute value of the difference of character r and c,
Max (r, c) refers to the number maximum value of character r and c, and min (r, c) refers to the number minimum value of character r and c.
If for example, σ=4, that is, include 4 characters a, b, c and d, wherein reference numeral d=1, c=2, b=3, a=4, then
The distance of character a and character d is:Cella,d=β3-β1=0.67- (- 0.67)=1.34.
Therefore the present invention calculates the SAX distances of Test segment Y [i, i+L-1] and all red paths in prefix trees, and
It chooses minimum range therein and is used as follow-up judgement.Step S312:Judge whether minimum range is less than predetermined threshold value, is to turn to walk
Rapid S313, otherwise goes to step S315;
Step S313:By Test segment Y [i, i+L-1] labeled as normal;
Step S314:Test segment Y [i, i+L-1] is inserted into black path new as one in prefix trees;
Step S315:By Test segment Y [i, i+L-1] labeled as exception, and go to step S316;
Step S316:Enable i=i+1;
Step S317:Judge i>Whether m-L+1 is true, is, goes to step S318, otherwise goes to step S303;Wherein m is to survey
Try the length of time series of collection.
Step S318:The time series of output token, newer prefix trees;
Step S319:Terminate.
The pseudocode of the algorithm is as follows:
Algorithm:Data adaptive clusters (DAC)
Input:Window size:L, length of time series:M, prefix trees:T is inputted
Output:The time series of label, newer T outputs
Above-mentioned algorithm accurately matched in prefix trees a red path sequence can be marked as it is normal (3-4 rows,
Situation 1).If Test segment and black route matching, check the counting in corresponding black path whether higher than predefined
Value, it means that it can be converted into red path (5-9 rows, situation 2).Otherwise, situation 3 is happened at sequence and generates newly
Black path (14-15 rows) or when may be marked as abnormal (the 17th row).In the 12nd row, function GetMinDis is returned
The minimum range between all red paths in Test segment Y [i, i+L-1] and prefix trees from tree root to leaf node.
Example IV
It, can be with as shown in figure 4, the auto-adaptive time sequence variation detection device provided in an embodiment of the present invention based on cluster
Including:Discrete processes unit 401, prefix trees construction unit 402 and abnormality detecting unit 403;
Discrete processes unit 401, for carrying out dimensionality reduction and symbolism to the time series in training set by SAX methods,
Obtain the time series of symbolism.The operation that the discrete processes unit 401 executes is identical as abovementioned steps S101;
Prefix trees construction unit 402, for building prefix trees according to the time series of symbolism.The prefix trees construction unit
The operation of 402 execution is identical as abovementioned steps S102;
Abnormality detecting unit 403 for extracting Test segment from test set, and judges the test extracted in test set
When a route matching in subsequence and prefix trees, by the Test segment of the extraction labeled as normal.The abnormality detection list
The operation that member 403 executes is identical as abovementioned steps S103.
Preferably, prefix trees construction unit 402 by the window scan time series of a fixed size and cuts it
At discrete symbol, time series is set to be converted into the subsequence of the character with equal length, and according to from training data
The subsequence of extraction builds prefix trees.
Preferably, any in abnormality detecting unit 403 extracts in judging test set Test segment and prefix trees
When path mismatches, by the Test segment of the extraction labeled as abnormal.
Embodiment five
Embodiment five is on the basis of the auto-adaptive time sequence variation detection device based on cluster that example IV provides
It is optimized, wherein the prefix trees built include red path and black path.According to symbol in prefix trees construction unit 402
Red path in the time series structure prefix trees of change.The test extracted in judging test set in abnormality detecting unit 403
When subsequence mismatches with the either path in prefix trees, the Test segment of the extraction and the prefix trees are further calculated
In minimum range between all red paths from tree root to leaf node, when less than predetermined threshold value, by the test of the extraction
Subsequence is labeled as normally, and the Test segment of the extraction is inserted into prefix trees and is used as black path, when not less than default
By the Test segment of the extraction labeled as abnormal when threshold value;Abnormality detecting unit 403 continues thereafter with next test of extraction
Sequence is detected.The counting that abnormality detecting unit 403 detects black path is converted to red path when reaching preset value.
The present invention is since a large amount of time series is stored in one tree, and many of which prefix sequence all having the same
Row, the space consuming that can greatly reduce with the inventive method, and record the shape of initial data.Meanwhile side of the present invention
The advantage of cluster is utilized in method, and considers the potential time sequence model for not having occur in training data.Although the worst
Situation is seldom happened in the time series data collection of reality, but the present invention wishes to have applied it to certain rule or period
The data set of property, or be not at least irregular pattern.
By experiment test, method of the invention can detect irregular fluctuation, but most of systems in the prior art
Meter method, including combined estimation method all can not achieve the detection.This method particularly with cycle time sequence abnormal patterns,
Protrusion, continuous mistake or feature missing are very sensitive.In addition, being shown in the test of the cumulative sequence of white noise by this method
This method can gradually adapt to the time series of test data set, be gradually decreased in cumulative being no different in the test process of regular data
Abnormal error detection.
It is further to note that the auto-adaptive time sequence variation detection dress provided in an embodiment of the present invention based on cluster
It sets, can also be realized by way of hardware or software and hardware combining by software realization.For implemented in software, such as
Shown in Fig. 4, as the device on a logical meaning, being will be corresponding in nonvolatile memory by the CPU of equipment where it
Computer program instructions read in memory what operation was formed.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, it will be understood by those of ordinary skill in the art that:It still may be used
With technical scheme described in the above embodiments is modified or equivalent replacement of some of the technical features;
And these modifications or replacements, various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. a kind of auto-adaptive time sequence variation detection method based on cluster, which is characterized in that including:
1) dimensionality reduction and symbolism are carried out to the time series in training set by SAX methods, obtains the time series of symbolism;
2) prefix trees are built according to the time series of symbolism;
3) Test segment is extracted from test set by sliding window, and judges the Test segment extracted in test set with before
When sewing a route matching in tree, by the Test segment of the extraction labeled as normal.
2. according to the method described in claim 1, it is characterized in that, being built according to the time series of symbolism in the step 2)
Prefix trees, including:
The time series that the symbolism is scanned by sliding window, obtains the subsequence with equal length character, and according to
The subsequence structure prefix trees extracted from training data.
3. according to the method described in claim 1, it is characterized in that, the test extracted in judging test set in the step 3)
When subsequence mismatches with the either path in prefix trees, by the Test segment of the extraction labeled as abnormal.
4. according to the method described in claim 1, it is characterized in that, the prefix trees include red path and black path;
The red path in prefix trees is built according to the time series of symbolism in the step 2);
When either path in the step 3) in the Test segment and prefix trees extracted in judging test set mismatches,
Further calculate the extraction Test segment and the prefix trees between all red paths from tree root to leaf node
Minimum range, when less than predetermined threshold value, by the Test segment of the extraction labeled as normal, and by the test subsequence of the extraction
Row, which are inserted into prefix trees, is used as black path, by the Test segment of the extraction labeled as abnormal when not less than predetermined threshold value;
3) the next Test segment of extraction is gone to step to be detected.
5. according to the method described in claim 4, it is characterized in that, the counting for detecting black path in the step 3) reaches pre-
If being converted to red path when value.
6. a kind of auto-adaptive time sequence variation detection device based on cluster, which is characterized in that include at least:Discrete processes list
Member, prefix trees construction unit and abnormality detecting unit;
The discrete processes unit is obtained for carrying out dimensionality reduction and symbolism to the time series in training set by SAX methods
The time series of symbolism;
The prefix trees construction unit, for building prefix trees according to the time series of symbolism;
The abnormality detecting unit extracts Test segment for passing through sliding window from test set, and judges in test set
When a route matching in the Test segment and prefix trees of extraction, by the Test segment of the extraction labeled as normal.
7. device according to claim 6, which is characterized in that the prefix trees construction unit scans institute by sliding window
The time series for stating symbolism, obtains the subsequence with equal length character, and according to described in being extracted from training data
Subsequence builds prefix trees.
8. device according to claim 6, which is characterized in that the abnormality detecting unit extracted in judging test set
When Test segment is mismatched with the either path in prefix trees, by the Test segment of the extraction labeled as abnormal.
9. device according to claim 6, it is characterised in that:The prefix trees include red path and black path;
The red path in prefix trees is built according to the time series of symbolism in the prefix trees construction unit;
Either path in the Test segment and prefix trees that are extracted in judging test set in the abnormality detecting unit is not
The Test segment for further calculating the extraction when matching and all red paths in the prefix trees from tree root to leaf node
Between minimum range, when less than predetermined threshold value, by the Test segment of the extraction labeled as normal, and by the survey of the extraction
Swab sequence, which is inserted into prefix trees, is used as black path, is labeled as the Test segment of the extraction when not less than predetermined threshold value
It is abnormal;The abnormality detecting unit continues thereafter with the next Test segment of extraction and is detected.
10. device according to claim 9, which is characterized in that the counting in abnormality detecting unit detection black path
Red path is converted to when reaching preset value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810471537.4A CN108647737A (en) | 2018-05-17 | 2018-05-17 | A kind of auto-adaptive time sequence variation detection method and device based on cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810471537.4A CN108647737A (en) | 2018-05-17 | 2018-05-17 | A kind of auto-adaptive time sequence variation detection method and device based on cluster |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108647737A true CN108647737A (en) | 2018-10-12 |
Family
ID=63756345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810471537.4A Pending CN108647737A (en) | 2018-05-17 | 2018-05-17 | A kind of auto-adaptive time sequence variation detection method and device based on cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108647737A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110071913A (en) * | 2019-03-26 | 2019-07-30 | 同济大学 | A kind of time series method for detecting abnormality based on unsupervised learning |
CN110532297A (en) * | 2019-08-01 | 2019-12-03 | 河海大学 | A kind of symbolism Hydrological Time Series abnormal patterns detection method based on hierarchical clustering |
CN110929800A (en) * | 2019-11-29 | 2020-03-27 | 四川万益能源科技有限公司 | Business body abnormal electricity utilization detection method based on sax algorithm |
CN111612082A (en) * | 2020-05-26 | 2020-09-01 | 河北小企鹅医疗科技有限公司 | Method and device for detecting abnormal subsequence in time sequence |
CN117435676A (en) * | 2023-07-13 | 2024-01-23 | 南京电力设计研究院有限公司 | Building energy management method based on subsequence mining and directed weighted graph clustering |
-
2018
- 2018-05-17 CN CN201810471537.4A patent/CN108647737A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110071913A (en) * | 2019-03-26 | 2019-07-30 | 同济大学 | A kind of time series method for detecting abnormality based on unsupervised learning |
CN110532297A (en) * | 2019-08-01 | 2019-12-03 | 河海大学 | A kind of symbolism Hydrological Time Series abnormal patterns detection method based on hierarchical clustering |
CN110929800A (en) * | 2019-11-29 | 2020-03-27 | 四川万益能源科技有限公司 | Business body abnormal electricity utilization detection method based on sax algorithm |
CN110929800B (en) * | 2019-11-29 | 2022-10-21 | 四川万益能源科技有限公司 | Business body abnormal electricity utilization detection method based on sax algorithm |
CN111612082A (en) * | 2020-05-26 | 2020-09-01 | 河北小企鹅医疗科技有限公司 | Method and device for detecting abnormal subsequence in time sequence |
CN117435676A (en) * | 2023-07-13 | 2024-01-23 | 南京电力设计研究院有限公司 | Building energy management method based on subsequence mining and directed weighted graph clustering |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647737A (en) | A kind of auto-adaptive time sequence variation detection method and device based on cluster | |
Ponti et al. | A decision cognizant Kullback–Leibler divergence | |
Gala et al. | Active learning of neuron morphology for accurate automated tracing of neurites | |
Sharpnack et al. | Changepoint detection over graphs with the spectral scan statistic | |
Parker | An analysis of performance measures for binary classifiers | |
CN104216349B (en) | Utilize the yield analysis system and method for the sensing data of manufacturing equipment | |
CN108182433A (en) | A kind of meter reading recognition methods and system | |
US20210406727A1 (en) | Managing defects in a model training pipeline using synthetic data sets associated with defect types | |
CN107545273A (en) | A kind of local outlier detection method based on density | |
Li et al. | Outlier detection using structural scores in a high-dimensional space | |
CN110389971A (en) | A kind of multi-Sensor Information Fusion Approach based on cloud computing | |
Wang et al. | Plant leaf tooth feature extraction | |
CN110717602B (en) | Noise data-based machine learning model robustness assessment method | |
Al‐Tahhan et al. | Accurate automatic detection of acute lymphatic leukemia using a refined simple classification | |
CN111811567A (en) | Equipment detection method based on curve inflection point comparison and related device | |
Zhang et al. | Spectral radius-based interval principal component analysis (SR-IPCA) for fault detection in industrial processes with imprecise data | |
CN109145764B (en) | Method and device for identifying unaligned sections of multiple groups of detection waveforms of comprehensive detection vehicle | |
CN111651340B (en) | Alarm data rule mining method and device and electronic equipment | |
CN112949735A (en) | Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining | |
CN116628611A (en) | Visual analysis method and system for association of abnormal modes of machine tool operation data | |
Vignon | Inference in morphological taxonomy using collinear data and small sample sizes: Monogenean sclerites (Platyhelminthes) as a case study | |
Strobl et al. | Mitigating pathogenesis for target discovery and disease subtyping | |
Li et al. | Control chart pattern recognition under small shifts based on multi-scale weighted ordinal pattern and ensemble classifier | |
CN112101468A (en) | Method for judging abnormal sequence in sequence combination | |
CN112765011B (en) | Quality control state judging method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181012 |
|
RJ01 | Rejection of invention patent application after publication |