CN113361965A - Time series exception handling method, device and equipment and readable storage medium - Google Patents

Time series exception handling method, device and equipment and readable storage medium Download PDF

Info

Publication number
CN113361965A
CN113361965A CN202110747260.5A CN202110747260A CN113361965A CN 113361965 A CN113361965 A CN 113361965A CN 202110747260 A CN202110747260 A CN 202110747260A CN 113361965 A CN113361965 A CN 113361965A
Authority
CN
China
Prior art keywords
time
time sequence
point
sequence
series
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110747260.5A
Other languages
Chinese (zh)
Inventor
张紫婷
朱红燕
莫林林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202110747260.5A priority Critical patent/CN113361965A/en
Publication of CN113361965A publication Critical patent/CN113361965A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Abstract

The invention relates to the technical field of financial technology (Fintech), and discloses a time series exception handling method, a device, equipment and a readable storage medium, wherein the time series exception handling method comprises the following steps: acquiring a time sequence, and taking time points in the time sequence as shareable time points; traversing the segmentable time points to segment the time sequence into two subsequences according to the segmentable time points in sequence, respectively fitting linear functions to the two segmented subsequences according to a regression tree algorithm, and determining the error sum corresponding to the fitting of the two subsequences; and taking the sharable time point with the minimum error sum as the optimal sharable point of the time sequence, and taking the optimal sharable point as the abnormal occurrence point of the time sequence. The method and the device can accurately find the time point of the trend change in the time sequence, and improve the accuracy of time sequence exception handling.

Description

Time series exception handling method, device and equipment and readable storage medium
Technical Field
The invention relates to the technical field of financial technology (Fintech), in particular to a time series exception handling method, a time series exception handling device, time series exception handling equipment and a readable storage medium.
Background
With the development of computer technology, more and more technologies (big data, distributed, artificial intelligence, etc.) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but because of the requirements of security and universality of the financial industry, higher requirements are also put forward on the time series exception handling technology.
In the field of intelligent operation and maintenance, various processes need to be performed on a time sequence so as to analyze the time sequence, wherein one scenario is to detect that the trend of the time sequence is changed due to an abnormal event. For example, the response delay of a certain interface is 10 milliseconds on average, and a developer changes the interface and modifies the related codes on a certain day, so that the interface is changed, the response delay of the interface becomes abnormal, and the trend of the response delay is changed.
The response delay of the interface is a time sequence, and in order to detect at which time point an abnormality occurs, the existing technical solution is: and setting a correlation rule, considering that the time series is abnormal when the values of the time series meet a threshold value specified by the rule, and regarding the time point when the abnormality is detected as an abnormality occurrence point. For example, according to the 3sigma rule, the threshold set exceeding the 3sigma rule is set as an abnormal value. However, after an abnormality occurs, the response delay does not rise to the threshold immediately, but rises step by step or changes step by step to the threshold, so that it is difficult to accurately find the time point of the abnormality occurrence in the time series, and the accuracy of the time point of the abnormality occurrence found by processing the time series by the conventional time series abnormality processing technology is low.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a time series exception handling method, a time series exception handling device, time series exception handling equipment and a readable storage medium, and aims to solve the technical problem that the accuracy of a time point of an exception occurring when the existing time series exception handling technology processes a time series is low.
In order to achieve the above object, the present invention provides a time series exception handling method, which includes the following steps:
acquiring a time sequence, and taking time points in the time sequence as shareable time points;
traversing the segmentable time points to segment the time sequence into two subsequences according to the segmentable time points in sequence, respectively fitting linear functions to the two segmented subsequences according to a regression tree algorithm, and determining the error sum corresponding to the fitting of the two subsequences;
and taking the sharable time point with the minimum error sum as the optimal sharable point of the time sequence, and taking the optimal sharable point as the abnormal occurrence point of the time sequence.
Optionally, before the step of using the optimal segmentation point as the abnormality occurrence point of the time series, the method further includes:
fitting a linear function to the time sequence according to a regression tree algorithm to obtain an error value corresponding to the time sequence;
and if the difference value between the error value corresponding to the fitting time sequence and the error sum corresponding to the fitting of the two sub-sequences is larger than a set value, and the time lengths of the two sub-sequences after segmentation are both larger than a preset threshold value, taking the optimal segmentation point as an abnormal occurrence point of the time sequence.
Optionally, after the step of using the sharable time point with the smallest error sum as the optimal sharable point of the time series and using the optimal sharable time point as the abnormality occurrence point of the time series, the method further includes:
according to the optimal segmentation point, segmenting the time sequence into two subsequences to obtain a first subsequence and a second subsequence;
and respectively taking the first subsequence and the second subsequence as the time sequences, and executing the step of taking the time points in the time sequences as sharable time points until the difference between the error value corresponding to fitting the time sequences and the error sum corresponding to the time sequences after being sharded is less than or equal to the set value, or the time length of the subsequence after being sharded is less than or equal to the preset threshold value.
Optionally, the step of fitting a linear function to the time series according to a regression tree algorithm to obtain an error value corresponding to the time series includes:
fitting a linear function to the time sequence according to a loss function corresponding to the regression tree algorithm;
and minimizing the function value of the loss function to obtain the minimum function value corresponding to the loss function, and taking the minimum function value as an error value corresponding to the fitting time sequence.
Optionally, the time series exception handling method further includes:
acquiring a constraint parameter of the split number of the time sequence;
determining different regression tree models according to different constraint parameters, and determining abnormal occurrence points corresponding to the time sequence according to target functions corresponding to the regression tree models;
and minimizing the objective function to obtain a minimum objective function value, and taking an abnormal occurrence point corresponding to the minimum objective function value as a final abnormal segmentation point of the time sequence.
Optionally, the step of minimizing the objective function to obtain a minimum objective function value includes:
determining a target subsequence obtained by segmenting the time sequence according to the abnormal occurrence point corresponding to the time sequence;
determining an objective function value corresponding to the objective function according to the error sum corresponding to the fitting of the target subsequence and the sequence number of the target subsequence;
and determining the minimum objective function value in the objective function values according to the objective function values corresponding to the objective functions.
Optionally, before the step of acquiring the time series, the method further includes:
acquiring an initial time sequence;
and according to a preset time window, performing data smoothing processing on the initial time sequence to obtain the time sequence.
In order to achieve the above object, the present invention also provides a time-series exception handling apparatus including:
the acquisition module is used for acquiring a time sequence and taking a time point in the time sequence as a sharable time point;
the traversal module is used for traversing the segmentable time points, segmenting the time sequence into two subsequences according to the segmentable time points in sequence, respectively fitting linear functions to the two segmented subsequences according to a regression tree algorithm, and determining the error sum corresponding to the two fitted subsequences;
and the abnormal occurrence point determining module is used for taking the segmentable time point with the minimum error sum as the optimal segmentation point of the time sequence and taking the optimal segmentation point as the abnormal occurrence point of the time sequence.
Further, to achieve the above object, the present invention also provides a time-series exception handling apparatus including: the system comprises a memory, a processor and a time-series exception handling program which is stored on the memory and can run on the processor, wherein when the time-series exception handling program is executed by the processor, the steps of the time-series exception handling method are realized.
In addition, to achieve the above object, the present invention further provides a readable storage medium, on which a time-series exception handling program is stored, and when the time-series exception handling program is executed by a processor, the time-series exception handling program implements the steps of the time-series exception handling method as described above.
According to the method, a time sequence is obtained, and time points in the time sequence are used as sharable time points; traversing the segmentable time points to segment the time sequence into two subsequences according to the segmentable time points in sequence, respectively fitting linear functions to the two segmented subsequences according to a regression tree algorithm, and determining the error sum corresponding to the fitting of the two subsequences; and taking the sharable time point with the minimum error sum as the optimal sharable point of the time sequence, and taking the optimal sharable point as the abnormal occurrence point of the time sequence. According to the method, the time sequence is segmented according to each time point in the time sequence to obtain two subsequences, then the subsequences are fitted with linear functions respectively, the error sum corresponding to the two subsequences is obtained, the optimal segmentation point of the time sequence is found according to the error sum of the two subsequences, the found optimal segmentation point of the time sequence is the time point when the trend of the time sequence changes, namely the abnormal segmentation point of the time sequence, the time point when the trend changes in the time sequence can be accurately found, and the technical problem that the accuracy of the time point when the abnormality occurs, which is found by processing the time sequence through the existing time sequence abnormality processing technology, is low is solved.
Drawings
FIG. 1 is a schematic diagram of a time-series exception handling apparatus in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a time series exception handling method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a time series exception handling method according to a second embodiment of the present invention;
FIG. 4 is a schematic diagram of a system configuration of an embodiment of a time-series exception handling apparatus according to the present invention;
5a-5c are exemplary flow diagrams of the present invention exception handling for a time series.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a time-series exception handling device in a hardware operating environment according to an embodiment of the present invention.
The time sequence abnormality processing device in the embodiment of the present invention may be a PC, or may be a mobile terminal device having a display function, such as a smart phone, a tablet computer, an electronic book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, a portable computer, or the like.
As shown in fig. 1, the time-series exception handling apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the time-series exception handling apparatus may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors.
Those skilled in the art will appreciate that the time series exception handling apparatus configuration shown in fig. 1 does not constitute a limitation of the time series exception handling apparatus and may include more or less components than those shown, or some of the components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a time-series exception handler.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to invoke a time series exception handler stored in the memory 1005.
In this embodiment, the time-series exception handling apparatus includes: the system comprises a memory 1005, a processor 1001 and a time-series exception handler stored on the memory 1005 and operable on the processor 1001, wherein when the processor 1001 calls the time-series exception handler stored in the memory 1005, the following operations are performed:
acquiring a time sequence, and taking time points in the time sequence as shareable time points;
traversing the segmentable time points to segment the time sequence into two subsequences according to the segmentable time points in sequence, respectively fitting linear functions to the two segmented subsequences according to a regression tree algorithm, and determining the error sum corresponding to the fitting of the two subsequences;
and taking the sharable time point with the minimum error sum as the optimal sharable point of the time sequence, and taking the optimal sharable point as the abnormal occurrence point of the time sequence.
Further, the processor 1001 may call a time series exception handler stored in the memory 1005, and also perform the following operations:
fitting a linear function to the time sequence according to a regression tree algorithm to obtain an error value corresponding to the time sequence;
and if the difference value between the error value corresponding to the fitting time sequence and the error sum corresponding to the fitting of the two sub-sequences is larger than a set value, and the time lengths of the two sub-sequences after segmentation are both larger than a preset threshold value, taking the optimal segmentation point as an abnormal occurrence point of the time sequence.
Further, the processor 1001 may call a time series exception handler stored in the memory 1005, and also perform the following operations:
according to the optimal segmentation point, segmenting the time sequence into two subsequences to obtain a first subsequence and a second subsequence;
and respectively taking the first subsequence and the second subsequence as the time sequences, and executing the step of taking the time points in the time sequences as sharable time points until the difference between the error value corresponding to fitting the time sequences and the error sum corresponding to the time sequences after being sharded is less than or equal to the set value, or the time length of the subsequence after being sharded is less than or equal to the preset threshold value.
Further, the processor 1001 may call a time series exception handler stored in the memory 1005, and also perform the following operations:
fitting a linear function to the time sequence according to a loss function corresponding to the regression tree algorithm;
and minimizing the function value of the loss function to obtain the minimum function value corresponding to the loss function, and taking the minimum function value as an error value corresponding to the fitting time sequence.
Further, the processor 1001 may call a time series exception handler stored in the memory 1005, and also perform the following operations:
acquiring a constraint parameter of the split number of the time sequence;
determining different regression tree models according to different constraint parameters, and determining abnormal occurrence points corresponding to the time sequence according to target functions corresponding to the regression tree models;
and minimizing the objective function to obtain a minimum objective function value, and taking an abnormal occurrence point corresponding to the minimum objective function value as a final abnormal segmentation point of the time sequence.
Further, the processor 1001 may call a time series exception handler stored in the memory 1005, and also perform the following operations:
determining a target subsequence obtained by segmenting the time sequence according to the abnormal occurrence point corresponding to the time sequence;
determining an objective function value corresponding to the objective function according to the error sum corresponding to the fitting of the target subsequence and the sequence number of the target subsequence;
and determining the minimum objective function value in the objective function values according to the objective function values corresponding to the objective functions.
Further, the processor 1001 may call a time series exception handler stored in the memory 1005, and also perform the following operations:
acquiring an initial time sequence;
and according to a preset time window, performing data smoothing processing on the initial time sequence to obtain the time sequence.
Referring to fig. 2, fig. 2 is a schematic flowchart of a time series exception handling method according to a first embodiment of the time series exception handling method of the present invention.
In this embodiment, the time series exception handling method includes the following steps:
step S10, acquiring a time sequence, and taking time points in the time sequence as shareable time points;
in this embodiment, the time series (or called dynamic number series) refers to a number series formed by arranging the numerical values of the same statistical index according to the time sequence of occurrence. The time series may be response delay of the interface, and the time series may also be other parameters that can be recorded over time, which is explained in this embodiment by response delay of the interface. When the interface is abnormal or changed, the response delay of the interface changes, that is, the time sequence corresponding to the response delay of the interface changes, and the data in the time sequence corresponding to the response delay changes at a certain point, or the trend of the data change in the time sequence changes at a certain point. Therefore, the time series needs to be processed to find an abnormal time point at which the time series is abnormal.
The method comprises the steps of conducting exception processing on time to find out the time point of the time sequence, firstly obtaining the time sequence, taking each time point in the time sequence as a sharable time point, conducting subsequent time sequence segmentation on each time point in the time sequence, and analyzing the time sequence after segmentation. Wherein the separable time points comprise each time point in the time series.
Further, before the step of obtaining the time series, the method further includes:
step S11, acquiring an initial time sequence;
and step S12, according to a preset time window, performing data smoothing processing on the initial time sequence to obtain the time sequence.
In this embodiment, before performing exception processing on the time sequence, the data is smoothed to remove outliers in the initial time sequence, so as to avoid an over-fitting effect generated when fitting the time sequence when performing exception processing according to the time sequence including the outliers. Further, a suitable preset time window w can be selected according to data form or experience, data in the initial time sequence are collected according to the preset time window, the data corresponding to the preset time window are subjected to sliding averaging, and the sequence after the sliding averaging is obtained is the time sequence. For example, the influence of holidays on data is reduced by using a window sliding average of 7 days a week, and an average value is taken for each window to obtain a sequence after the sliding average.
Step S20, traversing the segmentable time points to segment the time sequence into two subsequences according to the segmentable time points in sequence, and fitting linear functions to the two segmented subsequences according to a regression tree algorithm to determine the error sum corresponding to the two subsequences;
in this embodiment, the method includes traversing the segmentable time points to segment the time series into two subsequences for each time point in the time series in sequence according to each time point in the time series, and performing a linear function fitting on the two segmented subsequences according to a regression tree algorithm to determine an error sum corresponding to the two subsequences. Specifically, after the time sequence is divided into two subsequences, fitting linear functions to the two subsequences obtained by division through loss functions corresponding to a regression tree algorithm, respectively, and calculating error values obtained by fitting the subsequences, so as to obtain a minimum value corresponding to the loss function by minimizing the loss function corresponding to the regression tree algorithm, wherein the minimum value corresponding to the loss function is the error value obtained by fitting each subsequence. And summing the error values obtained by fitting the subsequences to obtain the error sum corresponding to the two fitted subsequences. The regression tree algorithm fits a linear function to the subsequence, namely fits the subsequence to a linear equation, and fits the subsequence to a straight line.
Wherein, the time lengths of the subsequences obtained by segmenting the time sequence all need to be greater than a preset threshold, that is, the time lengths of the subsequences obtained by segmenting the time sequence cannot be less than or equal to the preset threshold, that is, the conditions to be followed by segmenting the time sequence are as follows: the time length of the two sub-sequences after segmentation needs to be greater than a preset threshold.
And step S30, taking the sharable time point with the minimum error sum as the optimal segmentation point of the time sequence, and taking the optimal segmentation point as the abnormal occurrence point of the time sequence.
In this embodiment, after summing the error values obtained by fitting the subsequences and calculating to obtain the error sum corresponding to the two fitted subsequences, the separable time point corresponding to the error sum minimum is taken as the optimal cut point of the time sequence, and it should be noted that, after the error sum minimum description is cut into two subsequences relative to the time sequence, the two subsequences corresponding to the error sum minimum have the best cut effect for the time sequence, so that the separable time point with the minimum error sum is the optimal cut point of the time sequence, and thus the optimal cut point is also the abnormal occurrence point of the time sequence.
For ease of understanding, the time series exception handling method is illustrated below with respect to a time series data set. Referring to a time sequence shown in fig. 5a, regarding a data set D corresponding to the time sequence, a time point included in the data set D is taken as a sharable time point, so as to determine an optimal segmentation point for the time point included in the data set D. After the first traversal of the shareable time point, the choice is split into two at x ═ 20Subsequences, respectively, subsequence D1(x∈[0,20]) And subsequence D2(x∈(20,40]) And obtaining the sum of the minimized errors corresponding to the two sub-sequences, wherein the error sum calculation formula is as follows: l is0=LD1+LD2Wherein L isD1Is a subsequence D1Corresponding loss function, LD2Is a subsequence D2The corresponding loss function. The diagram obtained after the first segmentation can be seen from fig. 5b, and it can be seen that the segmentation can be continued to fit multiple subsequences. Next, the next cut point is recursively found, for subsequence D1The process of searching the cut point is consistent with the time sequence D, and the traversal characteristic x belongs to [0,20 ]]Finding the best point of tangency, assuming that at time x 10 is the best point of tangency, for D1For example, subsequence D is substituted from x-101Cut into two subsequences. For subsequence D2In other words, continuing the slicing does not reduce the error by more than the set value, and therefore does not decrease D any more2And (6) cutting. The final original sequence is cut into 3 subsequences as shown in fig. 5 c.
In the time sequence exception handling method provided by this embodiment, a time point in a time sequence is taken as a sharable time point by obtaining the time sequence; traversing the segmentable time points to segment the time sequence into two subsequences according to the segmentable time points in sequence, respectively fitting linear functions to the two segmented subsequences according to a regression tree algorithm, and determining the error sum corresponding to the fitting of the two subsequences; and taking the sharable time point with the minimum error sum as the optimal sharable point of the time sequence, and taking the optimal sharable point as the abnormal occurrence point of the time sequence. According to the method, the time sequence is segmented according to each time point in the time sequence to obtain two subsequences, then the subsequences are fitted with linear functions respectively, the error sum corresponding to the two subsequences is obtained, the optimal segmentation point of the time sequence is found according to the error sum of the two subsequences, the found optimal segmentation point of the time sequence is the time point when the trend of the time sequence changes, namely the abnormal segmentation point of the time sequence, the time point when the trend changes in the time sequence can be accurately found, and the technical problem that the accuracy of the time point when the abnormality occurs, which is found by processing the time sequence through the existing time sequence abnormality processing technology, is low is solved.
Based on the first embodiment, a second embodiment of the time-series exception handling method according to the present invention is provided, and referring to fig. 3, in this embodiment, before step S30, the method further includes:
step S31, fitting a linear function to the time sequence according to a regression tree algorithm to obtain an error value corresponding to the time sequence;
step S32, if the difference between the error value corresponding to the fitting time sequence and the error sum corresponding to the fitting two sub-sequences is larger than a set value, and the time lengths of the two sub-sequences after segmentation are both larger than a preset threshold value, the optimal segmentation point is used as an abnormal occurrence point of the time sequence.
In this embodiment, a linear function is fitted to the time sequence according to a regression tree algorithm, so as to obtain an error corresponding to the fitted time sequence. The regression tree algorithm fits a linear function to the time sequence, i.e., fits a linear equation to the time sequence, so that the time sequence is first fitted into a straight line. After error values obtained by fitting the subsequences are summed and error sums corresponding to the two fitting subsequences are obtained through calculation, the error values corresponding to the fitting time sequences are compared with the error sums corresponding to the two fitting subsequences, so that the segmentation effect after the time sequences are segmented is compared with the original time sequences before segmentation; and calculating the difference between the error value corresponding to the fitting time sequence and the error sum corresponding to the fitting two sub-sequences, and if the difference between the error value corresponding to the fitting time sequence and the error sum corresponding to the fitting two sub-sequences is greater than a set value and the time lengths of the two sub-sequences after segmentation are greater than a preset threshold value, taking the optimal segmentation point as an abnormal occurrence point of the time sequence.
It should be noted that the conditions that the optimal segmentation point belongs to the abnormality occurrence point and needs to be satisfied at the same time are as follows: (1) the difference between the error value corresponding to the fitting time sequence and the error sum corresponding to the fitting of the two sub-sequences is larger than a set value, (2) the time lengths of the two sub-sequences after segmentation are both larger than a preset threshold value, namely the time length of the sub-sequence obtained by segmenting the time sequence cannot be smaller than or equal to the preset threshold value. That is, after the optimal segmentation point is determined, if the fitting effect of the subsequence obtained by segmentation according to the optimal segmentation point is compared with the original time sequence, and the fitting effect is smaller, the optimal segmentation point does not belong to the abnormal segmentation point; or if the length of any subsequence of the subsequences obtained by segmenting according to the optimal segmentation point is too short, the optimal segmentation point does not belong to the abnormal segmentation point.
The principle of fitting a linear function to a time series is explained as follows: given a time series of data sets D { (x)1,y1),(x2,y2),...,(xn,yn) And x is a characteristic, and y is a real regression value. Fitting the time series to a linear function, fitting the elements in the data set according to a linear function f (x), and calculating the parameters corresponding to the linear function, wherein f (x) is theta1x+θn=θTx,x∈(x1,x2,...,xn) And theta is a parameter of the linear function. The regression tree algorithm may divide the features, and using the time points of the time series as the features, the time series may be recursively divided into M subsequences R1,R2,...,RMEach subsequence can be independently fitted with a linear function, each section of the linear function can be solved by minimizing the corresponding loss function to obtain a parameter theta of the linear function, and Mean Square Error (MSE) is used as the loss function. For the regression model, the MSE function of an ensemble is the sum of the MSEs of M subsequences, and the loss function to be minimized is as follows:
Figure BDA0003142241610000111
when the loss function is minimized, the parameter θ of f (X) is obtained by solvingTX)-1XTAnd Y. Where X and Y are matrix representations of X and Y in the data set D, respectively, and X is a matrix of m rows and n columns (m is the data set D)N is the number of features of x), Y is a matrix of m rows and 1 column.
In the embodiment, the segmentation effect after the time sequence is segmented is compared with the original time sequence before the segmentation, the segmentation effect after the time sequence is at least greater than the worst effect limited by the set value compared with the original time sequence before the segmentation, the time length of the subsequence after the segmentation can be controlled to be greater than the preset threshold value, the poor segmentation effect is avoided, and the effectiveness and the accuracy of the optimal segmentation point are further improved.
Further, after the step S30, the method further includes:
step S40, according to the optimal segmentation point, segmenting the time sequence into two subsequences to obtain a first subsequence and a second subsequence;
step S50, taking the first subsequence and the second subsequence as the time series, and executing the step of taking the time point in the time series as a sharable time point until a difference between an error value corresponding to fitting the time series and an error sum corresponding to sharding each subsequence is less than or equal to the set value, or a time length of the subsequence after sharding is less than or equal to the preset threshold.
In this embodiment, after the optimal segmentation point is obtained and the time sequence is segmented according to the optimal segmentation point, exception handling is further performed on the subsequence after the segmentation time sequence is found, so as to find exception occurrence points included in two subsequences corresponding to the segmentation time sequence, thereby further mining the exception occurrence points included in the time sequence.
Specifically, after the optimal segmentation point corresponding to the time sequence is obtained, the time sequence is segmented into two subsequences according to the optimal segmentation point, so that a first subsequence and a second subsequence are obtained. Then, respectively taking the first subsequence and the second subsequence as the time sequences, and circularly executing the step of taking the time points in the time sequences as sharable time points, so as to respectively search the abnormal occurrence points in the first subsequence and search the abnormal occurrence points in the second subsequence until any one of cut-stopping conditions is met, wherein the cut-stopping conditions comprise: (1) the difference between the error value corresponding to the fitting time sequence and the error sum corresponding to the split sub-sequences is smaller than or equal to a set value, and (2) the time length of the sub-sequences after the split sub-sequences is smaller than or equal to a preset threshold value, namely, as long as any one of stopping conditions is met, the sub-sequences are stopped from being split continuously, namely, the circulation is stopped.
In the embodiment, after the time sequence is segmented according to the optimal segmentation point, the optimal segmentation point and the abnormal time point in the subsequence after the time sequence is segmented are further searched to perform exception handling on each subsequence, so that other exception occurrence points in the time sequence can be further mined, other exception occurrence points in the time sequence can be accurately mined, and further exception handling on the time sequence is realized.
Further, before the step S32, the method further includes:
step S321, fitting a linear function to the time sequence according to a loss function corresponding to the regression tree algorithm;
step S322 is to minimize the function value of the loss function to obtain a minimum function value corresponding to the loss function, and use the minimum function value as an error value corresponding to the fitting of the time series.
In this embodiment, the specific steps of fitting a linear function to the time series according to the regression tree algorithm are as follows: obtaining a loss function corresponding to the regression tree algorithm, fitting a linear function to the time sequence according to the loss function, namely fitting the time sequence into a linear equation, firstly fitting the time sequence into a straight line, and minimizing the loss function according to the loss function so as to maximize the fitting effect of the time sequence. And after the function value of the loss function is minimized, obtaining a minimum function value corresponding to the loss function, and taking the minimum function value corresponding to the loss function as an error value corresponding to the fitting time sequence.
Further, the time series exception handling method further includes:
step S100, acquiring a constraint parameter of the time series split number;
step S200, determining different regression tree models according to different constraint parameters, and determining abnormal occurrence points corresponding to the time sequence according to target functions corresponding to the regression tree models;
and step S300, minimizing the objective function to obtain a minimum objective function value, and taking an abnormal occurrence point corresponding to the minimum objective function value as a final abnormal segmentation point of the time sequence.
In this embodiment, the above exception handling process for the time series completes the establishment of the regression tree model, but the number M of the obtained subsequences is different for different results of the model. We expect that each split node is a true point in time of a trend change. In order to find the time point of the trend change in the time series and prevent overfitting, a penalty term is added to the loss function to obtain an objective function, so as to determine the abnormal occurrence point of the time series according to a new objective function, and the specific step of determining the abnormal occurrence point of the time series can refer to the first embodiment. And the penalty term is the product of a constraint parameter for constraining the split number of the time series and the number of the sequences of the regression tree model after the time series is split. For split M subsequences, the greater the number of split subsequences, the more likely it is to generate an overfitting, so the new objective function is defined as follows:
Figure BDA0003142241610000131
wherein, λ M2As a penalty term, lambda is a constraint parameter of the time sequence splitting number, M is the time sequence splitting number,
Figure BDA0003142241610000132
is a loss function.
Further, in order to minimize the objective function, the error function and the number of sequences obtained by splitting the subsequence need to be considered at the same time. The larger the value of the parameter λ, the stronger the constraint force of the model on the number of node splits. During parameter adjustment, the value of the parameter λ may be fixed, different set values and preset thresholds are tried to be input, different model results are obtained, the model with the minimum objective function is selected as a final result, and the corresponding abnormality occurrence point with the minimum objective function is selected as a final abnormality segmentation point of the time series.
In this embodiment, the time series is subjected to exception handling through regression tree models corresponding to different constraint parameters, so that an exception occurrence point with the minimum objective function value is used as a final exception segmentation point of the time series, and the number of splits of the time series or the number of sequences after being segmented is constrained through the constraint parameters of the number of splits of the time series, so that an overfitting effect of the time series can be prevented, and the technical problem that an existing regression tree algorithm is extremely easy to overfitt in a scene of finding a trend change is solved.
Further, the step S300 includes:
step S3001, determining a target subsequence obtained by segmenting the time sequence according to the abnormal occurrence point corresponding to the time sequence;
step S3002, determining an objective function value corresponding to the objective function according to the error sum corresponding to the fitting of the objective subsequence and the sequence number of the objective subsequence;
step S3003, determining a minimum objective function value among the objective function values according to the objective function value corresponding to the objective function.
In this embodiment, the specific steps of minimizing the objective function to obtain the minimum objective function value and taking the abnormality occurrence point corresponding to the minimum objective function value as the final abnormality segmentation point of the time series include: for different regression tree models, different abnormal occurrence points can be determined for time sequences, and the abnormal segmentation points are optimal segmentation points; then, fitting a linear function to the target subsequence through a loss function corresponding to the regression tree algorithm, and respectively calculating error values obtained by fitting the target subsequence, so as to obtain a minimum value corresponding to the loss function through a loss function corresponding to the minimum regression tree algorithm, and obtain an error value corresponding to the fitted target subsequence; calculating and fitting the error sum corresponding to each target subsequence according to the error value corresponding to each target subsequence, and calculating the target function value corresponding to the target function according to the error sum and the sequence number of the target subsequences; and then taking the abnormal occurrence point corresponding to the minimum objective function value as the final abnormal occurrence point of the time sequence.
In this embodiment, by minimizing the objective function, the minimum objective function value is determined from the objective function values corresponding to different objective sub-sequences determined by different regression tree models, so as to find the objective function value with the minimum objective function value, and thus, different objective sub-sequences determined by the regression tree model with the best effect and corresponding abnormal occurrence points can be found, the fitting effect of the regression tree model on the time sequence is improved, and overfitting of the time sequence is prevented.
In addition, an embodiment of the present invention further provides a time-series exception handling apparatus, and referring to fig. 4, the time-series exception handling apparatus includes:
an obtaining module 100, configured to obtain a time sequence, and take a time point in the time sequence as a sharable time point;
the traversal module 200 is configured to traverse the segmentable time points, so as to segment the time sequence into two sub-sequences according to the segmentable time points in sequence, and respectively perform fitting linear functions on the two segmented sub-sequences according to a regression tree algorithm to determine error sums corresponding to the fitting of the two sub-sequences;
and the abnormal occurrence point determining module 300 is configured to use the sharable time point with the smallest error sum as the optimal segmentation point of the time series, and use the optimal segmentation point as the abnormal occurrence point of the time series.
Further, the time-series exception handling apparatus includes:
the time sequence fitting module is used for fitting a linear function to the time sequence according to a regression tree algorithm to obtain an error value corresponding to the time sequence;
and the judging module is used for taking the optimal segmentation point as an abnormal occurrence point of the time sequence if the difference value between the error value corresponding to the fitting of the time sequence and the error sum corresponding to the fitting of the two sub-sequences is larger than a set value and the time lengths of the two sub-sequences after segmentation are both larger than a preset threshold value.
Further, the time-series exception handling apparatus further includes:
the segmentation module is used for segmenting the time sequence into two subsequences according to the optimal segmentation point to obtain a first subsequence and a second subsequence;
and the circulating module is used for respectively taking the first subsequence and the second subsequence as the time sequences, and executing the step of taking the time points in the time sequences as the segmentable time points until the difference between the error value corresponding to the fitting of the time sequences and the error sum corresponding to the segmentable subsequences is less than or equal to the set value, or the time length of the segmentable subsequences is less than or equal to the preset threshold value.
Further, the abnormality occurrence point determining module is further configured to:
fitting a linear function to the time sequence according to a loss function corresponding to the regression tree algorithm;
and minimizing the function value of the loss function to obtain the minimum function value corresponding to the loss function, and taking the minimum function value as an error value corresponding to the fitting time sequence.
Further, the time-series exception handling apparatus further includes:
the constraint parameter acquisition module is used for acquiring the constraint parameters of the split number of the time series;
the regression tree model determining module is used for determining different regression tree models according to different constraint parameters so as to determine the abnormal occurrence points corresponding to the time sequence according to the target functions corresponding to the regression tree models;
and the final abnormal dividing point determining module is used for minimizing the objective function to obtain a minimum objective function value, and taking an abnormal occurrence point corresponding to the minimum objective function value as the final abnormal dividing point of the time sequence.
Further, the final abnormal segmentation point determination module is further configured to:
determining a target subsequence obtained by segmenting the time sequence according to the abnormal occurrence point corresponding to the time sequence;
determining an objective function value corresponding to the objective function according to the error sum corresponding to the fitting of the target subsequence and the sequence number of the target subsequence;
and determining the minimum objective function value in the objective function values according to the objective function values corresponding to the objective functions.
Further, the time-series exception handling apparatus further includes:
the initial time sequence acquisition module is used for acquiring an initial time sequence;
and the data smoothing processing module is used for performing data smoothing processing on the initial time sequence according to a preset time window to obtain the time sequence.
Furthermore, an embodiment of the present invention further provides a readable storage medium, where a time-series exception handling program is stored, and when executed by a processor, the time-series exception handling program implements the steps of the time-series exception handling method according to any one of the above.
The specific embodiment of the readable storage medium of the present invention is substantially the same as the embodiments of the time series exception handling method, and will not be described in detail herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A time series exception handling method is characterized by comprising the following steps:
acquiring a time sequence, and taking time points in the time sequence as shareable time points;
traversing the segmentable time points to segment the time sequence into two subsequences according to the segmentable time points in sequence, respectively fitting linear functions to the two segmented subsequences according to a regression tree algorithm, and determining the error sum corresponding to the fitting of the two subsequences;
and taking the sharable time point with the minimum error sum as the optimal sharable point of the time sequence, and taking the optimal sharable point as the abnormal occurrence point of the time sequence.
2. The time-series abnormality processing method according to claim 1, wherein before the step of taking the optimal division point as the abnormality occurrence point of the time-series, the method further comprises:
fitting a linear function to the time sequence according to a regression tree algorithm to obtain an error value corresponding to the time sequence;
and if the difference value between the error value corresponding to the fitting time sequence and the error sum corresponding to the fitting of the two sub-sequences is larger than a set value, and the time lengths of the two sub-sequences after segmentation are both larger than a preset threshold value, taking the optimal segmentation point as an abnormal occurrence point of the time sequence.
3. The method for processing time-series anomalies according to claim 2, characterized in that, after the step of using the sharable time point with the smallest error sum as the optimal segmentation point of the time series and using the optimal segmentation point as the anomaly occurrence point of the time series, the method further comprises:
according to the optimal segmentation point, segmenting the time sequence into two subsequences to obtain a first subsequence and a second subsequence;
and respectively taking the first subsequence and the second subsequence as the time sequences, and executing the step of taking the time points in the time sequences as sharable time points until the difference between the error value corresponding to fitting the time sequences and the error sum corresponding to the time sequences after being sharded is less than or equal to the set value, or the time length of the subsequence after being sharded is less than or equal to the preset threshold value.
4. The method for processing the time series exception according to claim 2, wherein the step of fitting the time series with a linear function according to a regression tree algorithm to obtain the error value corresponding to the fitted time series comprises:
fitting a linear function to the time sequence according to a loss function corresponding to the regression tree algorithm;
and minimizing the function value of the loss function to obtain the minimum function value corresponding to the loss function, and taking the minimum function value as an error value corresponding to the fitting time sequence.
5. The time-series exception handling method according to claim 1, further comprising:
acquiring a constraint parameter of the split number of the time sequence;
determining different regression tree models according to different constraint parameters, and determining abnormal occurrence points corresponding to the time sequence according to target functions corresponding to the regression tree models;
and minimizing the objective function to obtain a minimum objective function value, and taking an abnormal occurrence point corresponding to the minimum objective function value as a final abnormal segmentation point of the time sequence.
6. The method of time series exception handling according to claim 5, wherein said step of minimizing said objective function to obtain a minimum objective function value comprises:
determining a target subsequence obtained by segmenting the time sequence according to the abnormal occurrence point corresponding to the time sequence;
determining an objective function value corresponding to the objective function according to the error sum corresponding to the fitting of the target subsequence and the sequence number of the target subsequence;
and determining the minimum objective function value in the objective function values according to the objective function values corresponding to the objective functions.
7. The time-series exception handling method according to any one of claims 1 to 6, wherein said step of obtaining a time series is preceded by:
acquiring an initial time sequence;
and according to a preset time window, performing data smoothing processing on the initial time sequence to obtain the time sequence.
8. A time-series exception handling apparatus, characterized in that the time-series exception handling apparatus comprises:
the acquisition module is used for acquiring a time sequence and taking a time point in the time sequence as a sharable time point;
the traversal module is used for traversing the segmentable time points, segmenting the time sequence into two subsequences according to the segmentable time points in sequence, respectively fitting linear functions to the two segmented subsequences according to a regression tree algorithm, and determining the error sum corresponding to the two fitted subsequences;
and the abnormal occurrence point determining module is used for taking the segmentable time point with the minimum error sum as the optimal segmentation point of the time sequence and taking the optimal segmentation point as the abnormal occurrence point of the time sequence.
9. A time-series exception handling apparatus, characterized by comprising: memory, a processor and a time series exception handler stored on the memory and executable on the processor, the time series exception handler when executed by the processor implementing the steps of the time series exception handling method of any one of claims 1 to 7.
10. A readable storage medium having stored thereon a time-series exception handler that, when executed by a processor, implements the steps of the time-series exception handling method of any one of claims 1 to 7.
CN202110747260.5A 2021-06-30 2021-06-30 Time series exception handling method, device and equipment and readable storage medium Pending CN113361965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110747260.5A CN113361965A (en) 2021-06-30 2021-06-30 Time series exception handling method, device and equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110747260.5A CN113361965A (en) 2021-06-30 2021-06-30 Time series exception handling method, device and equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN113361965A true CN113361965A (en) 2021-09-07

Family

ID=77537961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110747260.5A Pending CN113361965A (en) 2021-06-30 2021-06-30 Time series exception handling method, device and equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113361965A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149896A (en) * 2023-03-27 2023-05-23 阿里巴巴(中国)有限公司 Time sequence data abnormality detection method, storage medium and electronic device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149896A (en) * 2023-03-27 2023-05-23 阿里巴巴(中国)有限公司 Time sequence data abnormality detection method, storage medium and electronic device
CN116149896B (en) * 2023-03-27 2023-07-21 阿里巴巴(中国)有限公司 Time sequence data abnormality detection method, storage medium and electronic device

Similar Documents

Publication Publication Date Title
US20190325197A1 (en) Methods and apparatuses for searching for target person, devices, and media
US11625433B2 (en) Method and apparatus for searching video segment, device, and medium
CN111694718A (en) Method and device for identifying abnormal behavior of intranet user, computer equipment and readable storage medium
CN107590143B (en) Time series retrieval method, device and system
CN113361965A (en) Time series exception handling method, device and equipment and readable storage medium
CN112306605A (en) RPA-based application program operation method, device and storage medium
US9769357B1 (en) Border detection in videos
CN108961316A (en) Image processing method, device and server
CN114708287A (en) Shot boundary detection method, device and storage medium
CN113435328A (en) Video clip processing method and device, electronic equipment and readable storage medium
CN110096605B (en) Image processing method and device, electronic device and storage medium
US20190050672A1 (en) INCREMENTAL AUTOMATIC UPDATE OF RANKED NEIGHBOR LISTS BASED ON k-th NEAREST NEIGHBORS
CN106844727B (en) Mass image characteristic data distributed acquisition processing and grading application system and method
CN116664335B (en) Intelligent monitoring-based operation analysis method and system for semiconductor production system
CN116635911A (en) Action recognition method and related device, storage medium
CN116935280A (en) Behavior prediction method and system based on video analysis
CN110968835A (en) Approximate quantile calculation method and device
CN113590447B (en) Buried point processing method and device
CN105830437A (en) Method and system for identifying background in monitoring system
KR102153674B1 (en) A method for classifying sql query, a method for detecting abnormal occurrence, and a computing device
CN113570070A (en) Streaming data sampling and model updating method, device, system and storage medium
CN114357449A (en) Abnormal process detection method and device, electronic equipment and storage medium
CN111343502B (en) Video processing method, electronic device and computer readable storage medium
CN113946717A (en) Sub-map index feature obtaining method, device, equipment and storage medium
CN116089788B (en) Online missing data processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination