CN113282876B - Method, device and equipment for generating one-dimensional time sequence data in anomaly detection - Google Patents

Method, device and equipment for generating one-dimensional time sequence data in anomaly detection Download PDF

Info

Publication number
CN113282876B
CN113282876B CN202110817079.7A CN202110817079A CN113282876B CN 113282876 B CN113282876 B CN 113282876B CN 202110817079 A CN202110817079 A CN 202110817079A CN 113282876 B CN113282876 B CN 113282876B
Authority
CN
China
Prior art keywords
differentiation
curve
depth
preset
curves
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110817079.7A
Other languages
Chinese (zh)
Other versions
CN113282876A (en
Inventor
蔡志平
王承禹
周桐庆
余广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110817079.7A priority Critical patent/CN113282876B/en
Publication of CN113282876A publication Critical patent/CN113282876A/en
Application granted granted Critical
Publication of CN113282876B publication Critical patent/CN113282876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • G06F11/3423Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time where the assessed time is active or idle time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/203Drawing of straight lines or curves

Abstract

The application relates to a method and a device for generating one-dimensional time series data in anomaly detection. The method comprises the following steps: acquiring time series data demand information for anomaly detection from a service system, wherein the information comprises: recursion is carried out on the two preset end points by adopting a random midpoint displacement method, and a comparison curve is obtained by recursion to the first differentiation depth; acquiring a preset second differentiation depth, and differentiating and recursing the control curves to the second differentiation depth by adopting a random midpoint displacement differentiation method to generate a plurality of differentiation curves; according to a plurality of preset similarity intervals, comparing the differentiation curve with the control curve to obtain similar curves of a plurality of categories; and sampling the similar curve to obtain one-dimensional time sequence data of a plurality of categories. By adopting the method, the shape of the time sequence can be changed, and the problem of insufficient quantity of the time sequence data is solved.

Description

Method, device and equipment for generating one-dimensional time sequence data in anomaly detection
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a device for generating one-dimensional time series data in anomaly detection.
Background
Currently, large companies providing internet-based services require close monitoring of the real-time performance of the internet system, as short service outages or quality degradation can result in significant traffic loss. These real-time performance data (e.g., search response time, CPU usage) are typically collected and stored in a time series. To ensure the smooth operation of the service, a time-series abnormality detection system is usually used to monitor the time-series data and timely remove the fault.
However, the amount of time series data in a large company is extremely large, and the overhead of training a model and monitoring its behavior for each KPI is significant. However, many KPIs have similar shape attributes, and according to this feature, current practice typically clusters time series data based on shape and trains a unified model for the same type of data for anomaly detection. Sample data is needed in model training, and a time sequence is needed to be used as sample data in a model monitored by a time sequence abnormity detection system.
Disclosure of Invention
In view of the above, it is necessary to provide a method and an apparatus for generating one-dimensional time-series data in anomaly detection, which can solve various problems of the conventional time-series data.
A method for generating one-dimensional time sequence data in anomaly detection comprises the following steps:
acquiring time sequence data demand information for anomaly detection from a service system; the time series data requirement information comprises: two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals;
recursion is carried out on the two preset end points by adopting a random midpoint displacement method, and when the recursion reaches a first differentiation depth, a comparison curve is obtained; obtaining a shared control point when recursion reaches a first differentiation depth; the shared control point includes: generating a differentiation point and two preset endpoints when the first differentiation depth is reached in a recursion mode; the shared control points are sequentially connected from left to right to obtain a contrast curve;
acquiring a preset second differentiation depth, and generating a plurality of differentiation curves when the control curves are differentiated and recurred to the second differentiation depth by adopting a random midpoint displacement differentiation method;
comparing the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain similar curves of a plurality of categories;
and sampling the similar curve according to a preset sampling rate to obtain one-dimensional time sequence data of a plurality of categories.
In one embodiment, a random midpoint displacement method is adopted to carry out recursion on two preset endpoints, and when the recursion reaches a first differentiation depth, a comparison curve is obtained; wherein, obtaining the shared control point when recursing to the first differentiation depth comprises:
Figure 576528DEST_PATH_IMAGE001
wherein
Figure 605664DEST_PATH_IMAGE002
And
Figure 111731DEST_PATH_IMAGE003
the differentiation points generated before the first differentiation depth,
Figure 213768DEST_PATH_IMAGE004
and
Figure 893011DEST_PATH_IMAGE003
are the adjacent points of the image, and are the adjacent points,
Figure 979915DEST_PATH_IMAGE005
the differentiation points generated when the recursion reaches the first differentiation depth,
Figure 594567DEST_PATH_IMAGE006
is that
Figure 563660DEST_PATH_IMAGE007
And
Figure 718698DEST_PATH_IMAGE008
the center point of (a) is,
Figure 112639DEST_PATH_IMAGE009
is a vector of the number of bits in the vector,
Figure 960510DEST_PATH_IMAGE010
is an and vector
Figure 682478DEST_PATH_IMAGE011
Vertical random displacement vector of
Figure 313311DEST_PATH_IMAGE012
Is a line segment
Figure 109228DEST_PATH_IMAGE013
The length of (a) of (b),
Figure 190317DEST_PATH_IMAGE014
is a line segment
Figure 337264DEST_PATH_IMAGE015
The length of (a) of (b),
Figure 522520DEST_PATH_IMAGE016
is a line segment
Figure 500841DEST_PATH_IMAGE017
And line segment
Figure 97038DEST_PATH_IMAGE018
The included angle of (a).
In one embodiment, generating a plurality of differentiation curves when differentiating and recurrently differentiating the control curve to a second differentiation depth by using a random midpoint displacement differentiation method comprises: performing multiple recursions on the control curve differentiation by adopting a random midpoint displacement differentiation method to a second differentiation depth to obtain multiple control point sets; and connecting the points in the control point set from left to right in sequence to obtain a plurality of differentiation curves.
In one embodiment, comparing the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain a plurality of categories of similarity curves, includes: and calculating the similarity of the differentiation curve and the control curve according to a plurality of preset similarity intervals, and dividing the differentiation curve into corresponding similarity intervals according to the similarity of the differentiation curve and the control curve to obtain similar curves of a plurality of categories.
In one embodiment, calculating the similarity of the differentiation curve and the control curve comprises:
and obtaining the similarity of the differentiation curve and the control curve according to the dynamic time reduction cost and the root mean square error of the differentiation curve and the control curve.
In one embodiment, the sampling is equidistant.
An apparatus for generating one-dimensional time-series data in abnormality detection, the apparatus comprising:
the data acquisition module is used for acquiring time sequence data demand information for anomaly detection from a service system; the time series data requirement information comprises: two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals;
the comparison curve acquisition module is used for recursing the two preset endpoints by adopting a random midpoint displacement method and obtaining a comparison curve when recursion reaches a first differentiation depth; obtaining a shared control point when recursion reaches a first differentiation depth; the shared control point includes: generating a differentiation point and two preset endpoints when the first differentiation depth is reached in a recursion mode; the shared control points are sequentially connected from left to right to obtain the contrast curve;
the differentiation curve acquisition module is used for acquiring a preset second differentiation depth, and generating a plurality of differentiation curves when the control curve is differentiated and recurred to the second differentiation depth by adopting a random midpoint displacement differentiation method;
the curve comparison module is used for comparing the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain similar curves of a plurality of categories;
and the sampling module is used for sampling the similar curve according to a preset sampling rate to obtain one-dimensional time sequence data of a plurality of categories.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring time sequence data demand information for anomaly detection from a service system; the time series data requirement information comprises: two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals;
recursion is carried out on the two preset end points by adopting a random midpoint displacement method, and when the recursion reaches a first differentiation depth, a comparison curve is obtained; obtaining a shared control point when recursion reaches a first differentiation depth; the shared control point includes: generating a differentiation point and two preset endpoints when the first differentiation depth is reached in a recursion mode; the shared control points are sequentially connected from left to right to obtain a contrast curve;
acquiring a preset second differentiation depth, and generating a plurality of differentiation curves when the control curves are differentiated and recurred to the second differentiation depth by adopting a random midpoint displacement differentiation method;
comparing the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain similar curves of a plurality of categories;
and sampling the similar curve according to a preset sampling rate to obtain one-dimensional time sequence data of a plurality of categories. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring time sequence data demand information for anomaly detection from a service system; the time series data requirement information comprises: two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals;
recursion is carried out on the two preset end points by adopting a random midpoint displacement method, and when the recursion reaches a first differentiation depth, a comparison curve is obtained; obtaining a shared control point when recursion reaches a first differentiation depth; the shared control point includes: generating a differentiation point and two preset endpoints when the first differentiation depth is reached in a recursion mode; the shared control points are sequentially connected from left to right to obtain a contrast curve;
acquiring a preset second differentiation depth, and generating a plurality of differentiation curves when the control curves are differentiated and recurred to the second differentiation depth by adopting a random midpoint displacement differentiation method;
comparing the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain similar curves of a plurality of categories;
and sampling the similar curve according to a preset sampling rate to obtain one-dimensional time sequence data of a plurality of categories. The method and the device for generating the one-dimensional time sequence data in the anomaly detection acquire time sequence data requirement information for the anomaly detection from a business system; the time series data requirement information comprises: two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals; firstly, recursion is carried out on two preset end points by adopting a random midpoint displacement method, and when the recursion reaches a first differentiation depth, a comparison curve is obtained; obtaining a shared control point when recursion reaches a first differentiation depth; the shared control point includes: generating a differentiation point and two preset endpoints when the first differentiation depth is reached in a recursion mode; the shared control points are sequentially connected from left to right to obtain a contrast curve; the method comprises the steps of obtaining a preset second differentiation depth, differentiating a control curve to the second differentiation depth by a random point displacement differentiation method, obtaining a plurality of control point sets, connecting points in the control point sets from left to right to obtain a plurality of differentiation curves, setting similarity intervals of the control curve and the differentiation curves according to different numbers of the control points, calculating the similarity of the differentiation curves and the control curves, dividing the differentiation curves into corresponding similarity intervals according to the similarity of the differentiation curves and the control curves to obtain the similarity curves of a plurality of categories, ensuring that the differentiation curves in one interval are similar curves of one category by the method, and sampling the similarity curves according to a preset sampling rate to obtain one-dimensional time sequence data of the plurality of categories. According to the invention, a plurality of differentiation curves are generated through point displacement differentiation, the similarity intervals of the control curve and the differentiation curves are set, and the differentiation curves are divided into corresponding similarity intervals, so that the shape similarity of the curves in the same interval is high, the similarity of the curves in different intervals is low, the one-dimensional time sequence data with controllable shape similarity is obtained by sampling the similar curves in the same similarity interval, the shape of the time sequence can be easily changed, dynamic sample data is generated, a time sequence abnormity detection system can be favorably used for model training, and the problem of insufficient time sequence data is solved.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for generating one-dimensional time-series data in anomaly detection according to an embodiment;
FIG. 2 is a schematic diagram of a process of a random midpoint displacement method used in one embodiment;
FIG. 3 is a graph of the effect of a random midpoint shift differentiation method used in one example;
FIG. 4 is a graph of dynamic time alignment cost versus differentiation depth for a differentiation curve generated by a random midpoint displacement differentiation method used in one embodiment;
FIG. 5 is a plot of root mean square error versus differentiation depth for differentiation curves generated by the random midpoint shift differentiation method used in one example versus control curves;
FIG. 6 is a block diagram showing a configuration of a one-dimensional time-series data generating apparatus in abnormality detection according to an embodiment;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, there is provided a method for generating one-dimensional time-series data in anomaly detection, including the steps of:
102, acquiring time series data demand information for anomaly detection from a business system; the time series data requirement information comprises: the system comprises two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals.
The service system may be an internet system, and the time series data obtained from the service system refers to network performance index data. The business system may also be a plant status detection system and the timing data may be status data collected by the sensors.
And 104, recursion is carried out on the two preset end points by adopting a random midpoint displacement method, and when the recursion reaches the first differentiation depth, a comparison curve is obtained.
Obtaining a shared control point when the recursion reaches a first differentiation depth; the shared control point includes: generating a differentiation point and two preset endpoints when the first differentiation depth is reached in a recursion mode; the shared control points are connected from left to right in sequence to obtain a comparison curve.
The random midpoint displacement method is a method for generating a graph quickly and conveniently and increasing details for an existing shape, recursion refers to a method for directly or indirectly calling a process or a function in the definition or the description of the process or the function, a random midpoint displacement method is adopted to recurse two preset endpoints to a first differentiation depth to generate differentiation points, the first differentiation depth refers to a certain depth in the process of random midpoint displacement, the number of the differentiation points generated by different differentiation depths is different, and the first and second quantities are used as prefixes to distinguish the differentiation depths, wherein the two preset endpoints and the differentiation points are shared control points, and the shared control points are sequentially connected from left to right to obtain a curve as a comparison curve.
And 106, acquiring a preset second differentiation depth, and generating a plurality of differentiation curves when the control curves are differentiated and recurred to the second differentiation depth by adopting a random midpoint displacement differentiation method.
When the second differentiation depth is reached, a plurality of new differentiation points are generated when the control curve is differentiated and recurred by adopting a random midpoint displacement differentiation method, the new differentiation points form a plurality of control point sets, the differentiation points in the control point sets are sequentially connected from left to right to generate a plurality of differentiation curves with different shapes, and the higher the differentiation depth is, the more the number of the generated new differentiation points is, and the larger the shape difference between the differentiation curve generated by the new differentiation points and the control curve is.
And 108, comparing the differentiation curve with the comparison curve according to a plurality of preset similarity intervals to obtain similar curves of a plurality of categories.
Because the random midpoint displacement differentiation method has certain randomness, the shape similarity of the differentiation curve and the control curve also has randomness, a plurality of similarity intervals are preset, the shape similarity between each differentiation curve and the control curve is calculated, the differentiation curves with the shape similarities in the same similarity interval are classified into one category, and the similar curves of a plurality of categories are obtained. When the time sequence is used for clustering algorithm training, the category corresponding to the similarity interval can be used as a label of the time sequence, so that the label of the time sequence does not need to be set again, and the time cost of manual labeling is saved.
And 110, sampling the similar curve according to a preset sampling rate to obtain one-dimensional time sequence data of a plurality of categories.
Two similar curves in the same category can form a segmentation function, the segmentation function is sampled equidistantly to obtain a section of randomly-shaped one-dimensional time sequence data of the category, and similarly, a plurality of one-dimensional time sequence data in different shapes can be obtained by sampling similar curves in different categories equidistantly.
The method and the device for generating the one-dimensional time sequence data in the anomaly detection acquire time sequence data requirement information for the anomaly detection from a business system; the time series data requirement information comprises: two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals; firstly, recursion is carried out on two preset end points by adopting a random midpoint displacement method, and when the recursion reaches a first differentiation depth, a comparison curve is obtained; obtaining a shared control point when recursion reaches a first differentiation depth; the shared control point includes: generating a differentiation point and two preset endpoints when the first differentiation depth is reached in a recursion mode; the shared control points are sequentially connected from left to right to obtain a contrast curve; the method comprises the steps of obtaining a preset second differentiation depth, differentiating a control curve to the second differentiation depth by a random point displacement differentiation method, obtaining a plurality of control point sets, connecting points in the control point sets from left to right to obtain a plurality of differentiation curves, setting similarity intervals of the control curve and the differentiation curves according to different numbers of the control points, calculating the similarity of the differentiation curves and the control curves, dividing the differentiation curves into corresponding similarity intervals according to the similarity of the differentiation curves and the control curves to obtain the similarity curves of a plurality of categories, ensuring that the differentiation curves in one interval are similar curves of one category by the method, and sampling the similarity curves according to a preset sampling rate to obtain one-dimensional time sequence data of the plurality of categories. According to the invention, a plurality of differentiation curves are generated through point displacement differentiation, the similarity intervals of the control curve and the differentiation curves are set, and the differentiation curves are divided into corresponding similarity intervals, so that the shape similarity of the curves in the same interval is high, the similarity of the curves in different intervals is low, the one-dimensional time sequence data with controllable shape similarity is obtained by sampling the similar curves in the same similarity interval, the shape of the time sequence can be easily changed, dynamic sample data is generated, a time sequence abnormity detection system can be favorably used for model training, and the problem of insufficient time sequence data is solved.
In one embodiment, as shown in fig. 2, a random midpoint displacement method is used to recurse two preset endpoints, and when the two preset endpoints recur to the first differentiation depth, a comparison curve is obtained; wherein, obtaining the shared control point when recursing to the first differentiation depth comprises:
Figure 528019DEST_PATH_IMAGE019
wherein
Figure 749922DEST_PATH_IMAGE020
And
Figure 786011DEST_PATH_IMAGE021
the differentiation points generated before the first differentiation depth,
Figure 943323DEST_PATH_IMAGE022
and
Figure 64863DEST_PATH_IMAGE023
are the adjacent points of the image, and are the adjacent points,
Figure 841189DEST_PATH_IMAGE024
the differentiation points generated when the recursion reaches the first differentiation depth,
Figure 528522DEST_PATH_IMAGE025
is that
Figure 59998DEST_PATH_IMAGE026
And
Figure 826091DEST_PATH_IMAGE027
the center point of (a) is,
Figure 796321DEST_PATH_IMAGE028
is a vector of the number of bits in the vector,
Figure 541423DEST_PATH_IMAGE029
is an and vector
Figure 181483DEST_PATH_IMAGE030
Vertical random displacement vector of
Figure 808773DEST_PATH_IMAGE031
Is the length of the line segment and,
Figure 254798DEST_PATH_IMAGE032
is a line segment
Figure 41357DEST_PATH_IMAGE033
The length of (a) of (b),
Figure 914636DEST_PATH_IMAGE034
is a line segment
Figure 29222DEST_PATH_IMAGE035
And line segment
Figure 951042DEST_PATH_IMAGE036
The included angle of (a). Will be provided with
Figure 467474DEST_PATH_IMAGE037
,
Figure 511653DEST_PATH_IMAGE038
,
Figure 5214DEST_PATH_IMAGE039
The control curves are connected from left to right in sequence.
In one embodiment, generating a plurality of differentiation curves when differentiating and recurrently differentiating the control curve to a second differentiation depth by using a random midpoint displacement differentiation method comprises: performing multiple recursions on the control curve differentiation by adopting a random midpoint displacement differentiation method to a second differentiation depth to obtain multiple control point sets; and connecting the points in the control point set from left to right in sequence to obtain a plurality of differentiation curves.
The second differentiation depth is a differentiation depth larger than the first differentiation depth in the process of random midpoint shift, and the specific process of the random midpoint shift differentiation method will be explained below with reference to FIG. 2, using
Figure 793041DEST_PATH_IMAGE040
Representing maximum recursion depth of a random midpoint displacement method, using
Figure 429559DEST_PATH_IMAGE041
Indicating the degree of differentiation, use
Figure 316743DEST_PATH_IMAGE042
Representing a shared depth, wherein
Figure 343605DEST_PATH_IMAGE043
. Order to
Figure 997440DEST_PATH_IMAGE044
Representing a set of control points generated at a recursion depth i during a random midpoint displacement, wherein
Figure 347519DEST_PATH_IMAGE045
Then obviously have
Figure 733501DEST_PATH_IMAGE046
And
Figure 575555DEST_PATH_IMAGE047
and | represents the number of elements in the set. Order to
Figure 705185DEST_PATH_IMAGE048
Figure 660503DEST_PATH_IMAGE049
Respectively representing a random midpoint displacement process at a differentiation depth of
Figure 545282DEST_PATH_IMAGE050
Two sets of control points are finally generated, and the common set of control points can be expressed as
Figure 546736DEST_PATH_IMAGE051
Thus, the number of control points (expressed in m) and the differentiation depth differ between differentiation curves of the same random midpoint displacement process
Figure 897034DEST_PATH_IMAGE052
Has the following relationship:
Figure 97071DEST_PATH_IMAGE053
. As the above formula indicates, as the differentiation depth increases, the number of control points differing between the differentiation curves increases, and the shape difference between the differentiation curves and the control curves becomes larger, as shown in fig. 3. According to this property, it is possible to control the differentiation depth
Figure 90435DEST_PATH_IMAGE054
Multiple differentiation curves with controlled similarity to the control curve shape were generated.
In one embodiment, comparing the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain a plurality of categories of similarity curves, includes: and calculating the similarity of the differentiation curve and the control curve according to a plurality of preset similarity intervals, and dividing the differentiation curve into corresponding similarity intervals according to the similarity of the differentiation curve and the control curve to obtain similar curves of a plurality of categories.
And classifying the differentiation curves with the shape similarity in the same similarity interval into a category according to the similarity of the differentiation curves and the control curves, so that the similarity curves with the high and low similarity can be obtained by once classification. When the time sequence is used for clustering algorithm training, the category corresponding to the similarity interval can be used as a label of the time sequence, so that the label of the time sequence does not need to be set again, and the time cost of manual labeling is saved.
In one embodiment, calculating the similarity of the differentiation curve and the control curve comprises:
and obtaining the similarity of the differentiation curve and the control curve according to the dynamic time reduction cost and the root mean square error of the differentiation curve and the control curve.
The dynamic time alignment overhead, also called DTW cost, is a method for adjusting the dynamic time alignment cost by calculating a distance matrix between each point of two sequences,
and (3) finding a path from the upper left corner to the lower right corner of the matrix to minimize the sum of elements on the path, thereby calculating the similarity between the two sequences. The root mean square error is also called RMSE (root mean square error), which is an expected value of the square of the difference between the estimated value of the parameter and the actual value of the parameter, and the smaller the MSE (mean square error) value is, the better accuracy of the prediction model description experiment data can be shown by means of the change degree of the data. The similarity is obtained by calculating the dynamic time alignment overhead and the root mean square error between the differentiation curve and the control curve.
As shown in fig. 4 and 5, the influence of the differentiation depth on the dynamic time alignment cost (DTW cost, a similarity) and the root mean square error (RMSE, a similarity) between the differentiation curve and the control curve increases with the increase of the differentiation depth, and on the basis of this, the similarity interval between the differentiation curve and the control curve can be specified, and the differentiation curves with the similarities in the same interval can be classified into one category.
In one embodiment, the sampling is equidistant.
The piecewise functions formed by the similar curves of multiple categories are sampled equidistantly, so that multiple groups of one-dimensional time sequence data with different shapes can be obtained, and the generated time sequences are dynamic sample data with variable shapes, so that a time sequence anomaly detection system can perform model training.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 6, there is provided a one-dimensional time-series data generating apparatus in abnormality detection, including: a data acquisition module 601, a contrast curve acquisition module 602, a differentiation curve acquisition module 603, a curve comparison module 604, and a sampling module 605, wherein:
a data acquisition module 601, configured to acquire time-series data demand information for anomaly detection from a service system; the time series data requirement information comprises: two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals;
a comparison curve obtaining module 602, configured to perform recursion on two preset endpoints by using a random midpoint displacement method, and obtain a comparison curve when the two preset endpoints recur to a first differentiation depth; obtaining a shared control point when recursion reaches a first differentiation depth; the shared control point includes: generating a differentiation point and two preset endpoints when the first differentiation depth is reached in a recursion mode; the shared control points are sequentially connected from left to right to obtain a contrast curve;
a differentiation curve obtaining module 603, configured to obtain a preset second differentiation depth, and generate a plurality of differentiation curves when differentiation recursion is performed on the control curve to the second differentiation depth by using a random midpoint displacement differentiation method;
a curve comparison module 604, configured to compare the differentiation curve with the comparison curve according to a plurality of preset similarity intervals, so as to obtain similar curves of multiple categories;
the sampling module 605 is configured to sample the similar curve according to a preset sampling rate to obtain one-dimensional time series data of multiple categories.
In one embodiment, the comparison curve obtaining module 602 is further configured to perform recursion on two preset endpoints by using a random midpoint displacement method, and obtain a comparison curve when the recursion reaches a first differentiation depth; wherein, obtaining the shared control point when recursing to the first differentiation depth comprises:
Figure 516868DEST_PATH_IMAGE055
wherein
Figure 988301DEST_PATH_IMAGE056
And
Figure 42845DEST_PATH_IMAGE057
the differentiation points generated before the first differentiation depth,
Figure 394060DEST_PATH_IMAGE058
and
Figure 104528DEST_PATH_IMAGE059
are the adjacent points of the image, and are the adjacent points,
Figure 441968DEST_PATH_IMAGE060
the differentiation points generated when the recursion reaches the first differentiation depth,
Figure 491964DEST_PATH_IMAGE061
is that
Figure 561551DEST_PATH_IMAGE062
And
Figure 821631DEST_PATH_IMAGE063
the center point of (a) is,
Figure 323282DEST_PATH_IMAGE064
is a vector of the number of bits in the vector,
Figure 555680DEST_PATH_IMAGE065
is an and vector
Figure 858485DEST_PATH_IMAGE066
Vertical random displacement vector of
Figure 481227DEST_PATH_IMAGE067
Is a line segment
Figure 98154DEST_PATH_IMAGE068
The length of (a) of (b),
Figure 247375DEST_PATH_IMAGE069
is a line segment
Figure 845716DEST_PATH_IMAGE070
The length of (a) of (b),
Figure 18071DEST_PATH_IMAGE071
is a line segment
Figure 235426DEST_PATH_IMAGE072
And line segment
Figure 442416DEST_PATH_IMAGE073
The included angle of (a).
In one embodiment, the differentiation curve obtaining module 603 is further configured to generate a plurality of differentiation curves when differentiating and recurrently differentiating the control curve to a second differentiation depth by using a random midpoint displacement differentiation method, where the method includes: performing multiple recursions on the control curve differentiation by adopting a random midpoint displacement differentiation method to a second differentiation depth to obtain multiple control point sets; and connecting the points in the control point set from left to right in sequence to obtain a plurality of differentiation curves.
In one embodiment, the curve comparing module 604 is further configured to compare the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain similar curves of a plurality of categories, including: and calculating the similarity of the differentiation curve and the control curve according to a plurality of preset similarity intervals, and dividing the differentiation curve into corresponding similarity intervals according to the similarity of the differentiation curve and the control curve to obtain similar curves of a plurality of categories.
In one embodiment, the curve comparison module 604 is further configured to obtain the similarity between the differentiation curve and the control curve according to the dynamic time reduction cost and the root mean square error of the differentiation curve and the control curve.
In one embodiment, the sampling module 605 is further configured to sample in an equidistant manner.
For specific limitations of the one-dimensional time series data generation device in abnormality detection, reference may be made to the above limitations on one-dimensional time series data generation method in abnormality detection, and details thereof are not repeated here. The modules in the one-dimensional time series data generating device for abnormality detection may be implemented in whole or in part by software, hardware, or a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a one-dimensional time-series data generation method in abnormality detection. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method in the above embodiments when the processor executes the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method in the above-mentioned embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A method for generating one-dimensional time-series data in anomaly detection, the method comprising:
acquiring time sequence data demand information for anomaly detection from a service system; the time series data requirement information comprises: two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals;
recursion is carried out on the two preset end points by adopting a random midpoint displacement method, and when the recursion reaches a first differentiation depth, a comparison curve is obtained; obtaining a shared control point when recursion reaches a first differentiation depth; the shared control point includes: a differentiation point generated when recursion is carried out to a first differentiation depth and two preset end points; the shared control points are sequentially connected from left to right to obtain the contrast curve;
acquiring a preset second differentiation depth, and generating a plurality of differentiation curves when the control curves are differentiated and recurred to the second differentiation depth by adopting a random midpoint displacement differentiation method;
comparing the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain similar curves of a plurality of categories;
sampling the similar curve according to a preset sampling rate to obtain one-dimensional time sequence data of a plurality of categories;
the step of comparing the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain a plurality of categories of similar curves comprises:
and calculating the similarity of the differentiation curve and the control curve according to a plurality of preset similarity intervals, and dividing the differentiation curve into corresponding similarity intervals according to the similarity of the differentiation curve and the control curve to obtain similar curves of a plurality of categories.
2. The method of claim 1, wherein a random midpoint displacement method is used to recurse two preset endpoints, and when the two preset endpoints recur to a first differentiation depth, a comparison curve is obtained; wherein, obtaining the shared control point when recursing to the first differentiation depth comprises:
Figure 810466DEST_PATH_IMAGE001
wherein
Figure 350032DEST_PATH_IMAGE002
And
Figure 137859DEST_PATH_IMAGE003
the differentiation points generated before the first differentiation depth,
Figure 446481DEST_PATH_IMAGE002
and
Figure 910829DEST_PATH_IMAGE003
are the adjacent points of the image, and are the adjacent points,
Figure 937691DEST_PATH_IMAGE004
the differentiation points generated when the recursion reaches the first differentiation depth,
Figure 529209DEST_PATH_IMAGE005
is that
Figure 692337DEST_PATH_IMAGE002
And
Figure 78319DEST_PATH_IMAGE003
the center point of (a) is,
Figure 858056DEST_PATH_IMAGE006
is a vector of the number of bits in the vector,
Figure 987686DEST_PATH_IMAGE007
is an and vector
Figure 254589DEST_PATH_IMAGE008
Vertical random displacement vector of
Figure 77051DEST_PATH_IMAGE009
Is a line segment
Figure 78505DEST_PATH_IMAGE010
The length of (a) of (b),
Figure 746247DEST_PATH_IMAGE011
is a line segment
Figure 883967DEST_PATH_IMAGE012
The length of (a) of (b),
Figure 877331DEST_PATH_IMAGE013
is a line segment
Figure 349769DEST_PATH_IMAGE014
And line segment
Figure 821202DEST_PATH_IMAGE015
The included angle of (a).
3. The method of claim 1, wherein generating a plurality of differentiation curves when differentiating and recursing the control curve to a second differentiation depth using a random midpoint-shift differentiation method comprises:
performing multiple recursions on the differentiation of the control curve to a second differentiation depth by adopting a random midpoint displacement differentiation method to obtain multiple control point sets;
and sequentially connecting the points in the control point set from left to right to obtain a plurality of differentiation curves.
4. The method of claim 1, wherein calculating the similarity of the differentiation curve and the control curve comprises:
and obtaining the similarity of the differentiation curve and the control curve according to the dynamic time reduction cost and the root mean square error of the differentiation curve and the control curve.
5. The method of claim 1, wherein the sampling pattern is equidistant sampling.
6. An apparatus for generating one-dimensional time-series data in abnormality detection, the apparatus comprising:
the data acquisition module is used for acquiring time sequence data demand information for anomaly detection from a service system; the time series data requirement information comprises: two preset endpoints, a first differentiation depth, a second differentiation depth and a plurality of similarity intervals;
the comparison curve acquisition module is used for recursing the two preset endpoints by adopting a random midpoint displacement method and obtaining a comparison curve when recursion reaches a first differentiation depth; obtaining a shared control point when recursion reaches a first differentiation depth; the shared control point includes: generating a differentiation point and two preset endpoints when the first differentiation depth is reached in a recursion mode; the shared control points are sequentially connected from left to right to obtain the contrast curve;
the differentiation curve acquisition module is used for acquiring a preset second differentiation depth, and generating a plurality of differentiation curves when the control curve is differentiated and recurred to the second differentiation depth by adopting a random midpoint displacement differentiation method;
the curve comparison module is used for comparing the differentiation curve with the control curve according to a plurality of preset similarity intervals to obtain similar curves of a plurality of categories;
the sampling module is used for sampling the similar curve according to a preset sampling rate to obtain one-dimensional time sequence data of a plurality of categories;
and the curve comparison module is further used for calculating the similarity between the differentiation curve and the control curve according to a plurality of preset similarity intervals, and dividing the differentiation curve into corresponding similarity intervals according to the similarity between the differentiation curve and the control curve to obtain similar curves of a plurality of categories.
7. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN202110817079.7A 2021-07-20 2021-07-20 Method, device and equipment for generating one-dimensional time sequence data in anomaly detection Active CN113282876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110817079.7A CN113282876B (en) 2021-07-20 2021-07-20 Method, device and equipment for generating one-dimensional time sequence data in anomaly detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110817079.7A CN113282876B (en) 2021-07-20 2021-07-20 Method, device and equipment for generating one-dimensional time sequence data in anomaly detection

Publications (2)

Publication Number Publication Date
CN113282876A CN113282876A (en) 2021-08-20
CN113282876B true CN113282876B (en) 2021-10-01

Family

ID=77286895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110817079.7A Active CN113282876B (en) 2021-07-20 2021-07-20 Method, device and equipment for generating one-dimensional time sequence data in anomaly detection

Country Status (1)

Country Link
CN (1) CN113282876B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114726749B (en) * 2022-03-02 2023-10-31 阿里巴巴(中国)有限公司 Data anomaly detection model acquisition method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3474237A (en) * 1966-10-03 1969-10-21 Automation Ind Inc Strain gage rosette calculator
JP2007173907A (en) * 2005-12-19 2007-07-05 Nippon Telegr & Teleph Corp <Ntt> Abnormal traffic detection method and device
CN110473084A (en) * 2019-07-17 2019-11-19 中国银行股份有限公司 A kind of method for detecting abnormality and device
CN110909046A (en) * 2019-12-02 2020-03-24 上海舵敏智能科技有限公司 Time series abnormality detection method and device, electronic device, and storage medium
CN112037182A (en) * 2020-08-14 2020-12-04 中南大学 Locomotive running gear fault detection method and device based on time sequence image and storage medium
CN112819386A (en) * 2021-03-05 2021-05-18 中国人民解放军国防科技大学 Method, system and storage medium for generating time series data with abnormity

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7613576B2 (en) * 2007-04-12 2009-11-03 Sun Microsystems, Inc. Using EMI signals to facilitate proactive fault monitoring in computer systems
CN107133343B (en) * 2017-05-19 2018-04-13 哈工大大数据产业有限公司 Big data abnormal state detection method and device based on time series approximate match
CN108982106B (en) * 2018-07-26 2020-09-22 安徽大学 Effective method for rapidly detecting kinetic mutation of complex system
CN111506637B (en) * 2020-06-17 2020-11-27 北京必示科技有限公司 Multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) and storage medium
CN111967508A (en) * 2020-07-31 2020-11-20 复旦大学 Time series abnormal point detection method based on saliency map
CN112685476A (en) * 2021-01-06 2021-04-20 银江股份有限公司 Periodic multivariate time series anomaly detection method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3474237A (en) * 1966-10-03 1969-10-21 Automation Ind Inc Strain gage rosette calculator
JP2007173907A (en) * 2005-12-19 2007-07-05 Nippon Telegr & Teleph Corp <Ntt> Abnormal traffic detection method and device
CN110473084A (en) * 2019-07-17 2019-11-19 中国银行股份有限公司 A kind of method for detecting abnormality and device
CN110909046A (en) * 2019-12-02 2020-03-24 上海舵敏智能科技有限公司 Time series abnormality detection method and device, electronic device, and storage medium
CN112037182A (en) * 2020-08-14 2020-12-04 中南大学 Locomotive running gear fault detection method and device based on time sequence image and storage medium
CN112819386A (en) * 2021-03-05 2021-05-18 中国人民解放军国防科技大学 Method, system and storage medium for generating time series data with abnormity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《时间序列异常检测算法的研究与应用》;吕玉红;《中国优秀硕士学位论文全文数据库信息科技辑》;20180323;全文 *

Also Published As

Publication number Publication date
CN113282876A (en) 2021-08-20

Similar Documents

Publication Publication Date Title
CN111210024B (en) Model training method, device, computer equipment and storage medium
Zhao et al. Step-wise sequential phase partition (SSPP) algorithm based statistical modeling and online process monitoring
EP3847586A1 (en) Computer-implemented method, computer program product and system for anomaly detection and/or predictive maintenance
CN108762228A (en) A kind of multi-state fault monitoring method based on distributed PCA
WO2017076154A1 (en) Method and apparatus for predicting network event and establishing network event prediction model
US9396061B1 (en) Automated repair of storage system components via data analytics
CN108664603B (en) Method and device for repairing abnormal aggregation value of time sequence data
CN105518654B (en) To tool processing data offer multi-variables analysis based on K nearest neighbor method and system
CN111709447A (en) Power grid abnormality detection method and device, computer equipment and storage medium
CN111897695B (en) Method and device for acquiring KPI abnormal data sample and computer equipment
CN111325159B (en) Fault diagnosis method, device, computer equipment and storage medium
WO2019200738A1 (en) Data feature extraction method, apparatus, computer device, and storage medium
CN113282876B (en) Method, device and equipment for generating one-dimensional time sequence data in anomaly detection
CN114240243A (en) Rectifying tower product quality prediction method and device based on dynamic system identification
CN113110961B (en) Equipment abnormality detection method and device, computer equipment and readable storage medium
CN114547145B (en) Time sequence data anomaly detection method, system, storage medium and equipment
CN114553681B (en) Device state abnormality detection method and device and computer device
CN111679953B (en) Fault node identification method, device, equipment and medium based on artificial intelligence
CN114003422A (en) Host anomaly detection method, computer device, and storage medium
CN112819386A (en) Method, system and storage medium for generating time series data with abnormity
CN113052302A (en) Machine health monitoring method and device based on recurrent neural network and terminal equipment
CN109829745A (en) Business revenue data predication method, device, computer equipment and storage medium
CN113283501A (en) Deep learning-based equipment state detection method, device, equipment and medium
CN114676868A (en) Logistics cargo quantity prediction method and device, computer equipment and storage medium
CN114781278B (en) Electromechanical equipment service life prediction method and system based on data driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant