CN112783744A - Data detection method, device, equipment and storage medium - Google Patents
Data detection method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112783744A CN112783744A CN202110116094.9A CN202110116094A CN112783744A CN 112783744 A CN112783744 A CN 112783744A CN 202110116094 A CN202110116094 A CN 202110116094A CN 112783744 A CN112783744 A CN 112783744A
- Authority
- CN
- China
- Prior art keywords
- data
- time
- algorithm model
- time sequence
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 222
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 208
- 230000000737 periodic effect Effects 0.000 claims abstract description 116
- 238000000034 method Methods 0.000 claims abstract description 53
- 238000001228 spectrum Methods 0.000 claims description 34
- 230000008859 change Effects 0.000 claims description 25
- 238000006243 chemical reaction Methods 0.000 claims description 17
- 238000012896 Statistical algorithm Methods 0.000 claims description 11
- 230000015654 memory Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 230000006978 adaptation Effects 0.000 abstract description 3
- 230000002159 abnormal effect Effects 0.000 description 28
- 238000004458 analytical method Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000007635 classification algorithm Methods 0.000 description 6
- 230000005856 abnormality Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000013450 outlier detection Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000010183 spectrum analysis Methods 0.000 description 3
- 230000001934 delay Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003534 oscillatory effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3447—Performance evaluation by modeling
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Biology (AREA)
- Quality & Reliability (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The embodiment of the application discloses a data detection method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring time sequence data of a preset time period; judging whether the time sequence data is in periodic variation; selecting a target algorithm model corresponding to the time sequence data according to the judgment result; the data detection result of the time sequence data is determined by utilizing the target algorithm model, and automatic data identification and algorithm adaptation are carried out by aiming at the characteristics of the time sequence data in different scenes in the complex environment, so that the accuracy of data detection is effectively improved.
Description
Technical Field
The present application relates to computer technology, and relates to, but is not limited to, a data detection method, apparatus, device, and storage medium.
Background
Cloud computing is a product of development and fusion of traditional computers and network technologies such as grid computing, virtualization, load balancing and the like. Cloud computing in the narrow sense is a mode of delivery and use of IT (Information Technology) infrastructure to obtain required resources through a network in an on-demand, easily scalable manner. Cloud computing in a broad sense is a mode of delivery and use of services, and a desired service is obtained through a network in an on-demand, easily extensible manner. In a cloud computing scenario, each host device or virtual machine may generate time series data reflecting performance indicators of the host device or virtual machine.
In the related art, for different time series data, a single data detection algorithm is often used for data detection, for example, an unsupervised algorithm based on data distribution is used for data detection. However, the unsupervised algorithm is generally adapted to situations such as data changes that may occur in high-dimensional time series data or single-dimensional time series data, and false alarm of a detection result is easily caused.
Disclosure of Invention
In view of this, embodiments of the present application provide a data detection method, apparatus, device, and storage medium.
In a first aspect, an embodiment of the present application provides a data detection method, where the method includes: acquiring time sequence data of a preset time period; judging whether the time sequence data is in periodic variation or not; selecting a target algorithm model corresponding to the time sequence data according to a judgment result; and determining a data detection result of the time-series data by using the target algorithm model.
In one embodiment, the determining whether the time-series data changes periodically includes: performing frequency domain conversion on the time sequence data to obtain frequency spectrum data; and judging whether the time sequence data is periodically changed or not according to the frequency spectrum data.
Because the periodicity is difficult to detect in the time domain, but is easy to mine in the frequency domain, the time sequence data in the time domain can be subjected to frequency domain conversion, and whether the time sequence data is periodically judged or not can be judged according to the obtained frequency spectrum data, so that a bridge from time domain analysis to frequency domain analysis is built, and the accuracy and the efficiency of periodicity determination are improved.
In one embodiment, the determining whether the time-series data changes periodically according to the spectrum data includes: analyzing the frequency spectrum data to obtain the periodicity confidence coefficient of the time series data; when the periodic confidence is larger than a specific confidence threshold, determining that the time series data are periodically changed according to the judgment result; and determining that the time-series data are in non-periodic variation according to the judgment result when the periodic confidence is smaller than or equal to the confidence threshold.
The period of the time sequence data is obtained by analyzing the frequency spectrum data, and the reliability of the predicted period is analyzed through the periodicity confidence, so that the accuracy of periodicity determination is further improved.
In one embodiment, the selecting a target algorithm model corresponding to the time-series data according to the judgment result includes: selecting the target algorithm model as a periodic algorithm model under the condition that the time sequence data are in periodic variation according to the judgment result; and selecting the target algorithm model as at least one non-periodic algorithm model when the judgment result shows that the time sequence data are in non-periodic variation.
According to the periodic characteristics of the time series data, namely whether the time series data are periodically changed or not, the target algorithm model is selected to be a periodic algorithm model or at least one non-periodic algorithm model, so that the accuracy and pertinence of data detection can be improved.
In one embodiment, the at least one aperiodic algorithm model includes at least one of an unsupervised algorithm model, a statistical algorithm model, and a novelty detection algorithm model.
And under the condition that the judgment result is that the time sequence data is in aperiodic change, at least one aperiodic algorithm in an unsupervised algorithm model, a statistical algorithm model and a novel detection algorithm model can be adopted to determine the data detection result of the time sequence data, so that the flexibility and diversity of data detection are improved.
In one embodiment, the time series data of the preset time period is time series data acquired in a first preset time period; the determining a data detection result of the time-series data by using the target algorithm model comprises: acquiring historical time sequence data acquired in a second preset time period under the condition that the time sequence data are periodically changed according to the judgment result; the second preset time period is before the first preset time period; the time sequence data and the historical time sequence data are the same time sequence data; inputting historical time series data into the periodic algorithm model; determining expected time series data output by the periodic algorithm model; and comparing the expected time sequence data with the time sequence data to obtain a data detection result of the time sequence data.
Under the condition that the time sequence data are in periodic variation according to the judgment result, the trained periodic algorithm model is adopted for data detection, so that the periodic variation data can be adapted to the periodic algorithm model, and the accuracy and the efficiency of data detection are improved.
In one embodiment, the determining data detection results of the time-series data using the target algorithm model includes: when the time sequence data do not change periodically, inputting the time sequence data into a plurality of non-periodic algorithm models respectively; determining data detection sub-results output by each aperiodic algorithm model; the plurality of aperiodic algorithmic models are models that analyze the time series data based on different dimensions; and determining the data detection result of the time-series data according to a plurality of data detection sub-results.
Under the condition that the judgment result is that the time sequence data is in aperiodic change, the trained plurality of aperiodic algorithm models are adopted for data detection, and the data detection result is analyzed according to each data detection sub-result, so that the data in aperiodic change can be adapted to the aperiodic algorithm models, and the accuracy and the efficiency of data detection are improved.
In one embodiment, the determining a data detection result of the time-series data from a plurality of the data detection sub-results comprises: determining a weight corresponding to each aperiodic algorithm model; and determining the data detection result of the time-series data according to the weight and a plurality of data detection sub-results.
The final data detection result is determined by determining the weight of the aperiodic algorithm model and according to the weight and the data detection sub-result corresponding to each aperiodic algorithm model, so that the accuracy and the efficiency of data detection are further improved.
In a second aspect, an embodiment of the present application provides a data detection apparatus, including: the acquisition module is used for acquiring time sequence data of a preset time period; the judging module is used for judging whether the time sequence data is in periodic change or not; the first determining module is used for selecting a target algorithm model corresponding to the time sequence data according to the judgment result; and the second determining module is used for determining the data detection result of the time sequence data by utilizing the target algorithm model.
In a third aspect, an embodiment of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor executes the computer program to implement steps in the data detection method according to any one of the embodiments of the present application.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the data detection method according to any one of the embodiments of the present application.
In the embodiment of the application, the periodicity of the time series data is judged before the data detection, the target algorithm model corresponding to the time series data is determined according to whether the data has the periodicity, and the data detection is performed by using the target algorithm model, so that the automatic data identification and algorithm adaptation are performed according to the characteristics of the time series data in different scenes in a complex environment, and the accuracy of the data detection is effectively improved.
Drawings
Fig. 1 is a schematic flowchart of a data detection method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating another data detection method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a data detection apparatus according to an embodiment of the present application;
fig. 4 is a hardware entity diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solution of the present application is further elaborated below with reference to the drawings and the embodiments.
Fig. 1 is a schematic flow chart of an implementation of a data detection method provided in an embodiment of the present application, and as shown in fig. 1, the method includes:
step 102: acquiring time sequence data of a preset time period;
the time sequence data of the preset time period can be data collected at different moments in a first preset time period in any field, and the time sequence data can reflect the change state or degree of things, phenomena, signals and the like along with time; the first preset time period may be a time period from a certain historical time to a current time, and may be considered as a current detection cycle; the time series data may be data generated by each host device or virtual machine in a cloud computing scenario and used for reflecting a performance index of the host device or virtual machine, where the performance index may be a resource usage of an operating system corresponding to each host device or virtual machine, or may be throughput, response time, and the like of an application installed on each host device or virtual machine. Taking time series data generated when a computer system in the internet field operates as an example, the time series data may reflect throughput, response time, queue depth, and the like of an application program, and the time series data may also reflect resource usage of an operating system, such as a CPU (Central Processing Unit) load, a memory load, a disk load, a process number, and the like.
Step 104: judging whether the time sequence data is in periodic variation or not;
wherein, the periodic variation, also called cyclic fluctuation, is a wave-shaped or oscillatory variation around the long-term trend presented in the time series data; i.e. the values in the time series data are repeated periodically.
Step 106: selecting a target algorithm model corresponding to the time sequence data according to a judgment result;
wherein the judgment result may include that the time-series data changes periodically and that the time-series data does not change periodically, and the time-series data does not change periodically and may be referred to as non-periodic; different target algorithm models can be determined according to different judgment results; the target algorithm model may be a pre-constructed algorithm model that has been trained, and the target algorithm model may be one of machine learning algorithm models.
Step 108: and determining a data detection result of the time-series data by using the target algorithm model.
The time series data can be input into a target algorithm model, the target algorithm model can output corresponding data detection results, the data detection results can be abnormal data in the time series data, and the abnormal data points can also be abnormal data points in the time series data, and the abnormal data points form an abnormal interval of the time series data; the outlier data point can be a data point in the time series data that is inconsistent with other data points, such as a data point that deviates more than two standard deviations from the mean of the other data, such as a sudden rise or fall, a data that exceeds a historical maximum or minimum.
In the embodiment of the application, the periodicity of the time series data is judged before the data detection, the target algorithm model corresponding to the time series data is determined according to whether the data has the periodicity, and the data detection is performed by using the target algorithm model, so that the automatic data identification and algorithm adaptation are performed by aiming at the characteristics of the time series data in different scenes in the complex environment, and the accuracy and pertinence of the data detection are effectively improved.
The embodiment of the present application further provides a data detection method, where the method includes:
step S202: acquiring time sequence data of a preset time period;
step S204: performing frequency domain conversion on the time sequence data to obtain frequency spectrum data;
wherein the time domain may be a mathematical function or a physical signal versus time, for example, a time domain waveform of a signal may express a change of the signal over time, and the time series data is data in the time domain; the frequency domain is a coordinate system used to describe the characteristics of the signal in terms of frequency; the frequency domain conversion may be a structure form in which a time signal or a spatial signal is converted into a frequency component, and the frequency domain conversion may be performed on the time series data to obtain frequency spectrum data; the frequency domain conversion method may include fourier analysis, which may include a fourier series that may be used to convert periodic signals and fourier transform that may be used to convert non-periodic signals, laplace transform, Z transform, and the like.
Step S206: judging whether the time sequence data are periodically changed or not according to the frequency spectrum data;
step S208: selecting a target algorithm model corresponding to the time sequence data according to a judgment result;
step S210: and determining a data detection result of the time-series data by using the target algorithm model.
In the embodiment of the application, because some periods are difficult to detect in the time domain, but are easy to mine in the frequency domain, the time sequence data in the time domain can be subjected to frequency domain conversion, and whether the time sequence data is periodically judged or not can be judged according to the obtained frequency spectrum data, so that a bridge from time domain analysis to frequency domain analysis is built, and the accuracy and the efficiency of periodicity determination are improved.
The embodiment of the present application further provides a data detection method, where the method includes:
step S302: acquiring time sequence data of a preset time period;
step S304: performing frequency domain conversion on the time sequence data to obtain frequency spectrum data;
step S306: analyzing the frequency spectrum data to obtain the periodicity confidence coefficient of the time series data;
the spectrum analysis can be to find out the information of amplitude, power, phase or intensity of a signal under different frequencies; the frequency spectrum analysis can be carried out on the frequency spectrum data, the phase and the amplitude of the discrete signal at a certain characteristic frequency can be determined by observing the power spectrum of the discrete signal, so that the frequency domain characteristic of the signal is revealed, whether the time series data are periodically changed or not is further judged, and the period of the time series data is predicted under the condition that the time series data are periodically changed.
The periodicity confidence may refer to the confidence or reliability of the predicted period, and the periodicity confidence analysis is very important in the overall signal analysis; a confidence analysis framework that can determine the confidence of periodicity can be constructed using monte carlo statistical methods and first order autoregressive processes to analyze the confidence level of the predicted periodicity.
Step S308: when the periodic confidence is larger than a specific confidence threshold, determining that the time series data are periodically changed according to the judgment result;
wherein the confidence threshold may be 85%, 90%, etc.
Step S310: and determining that the time-series data are in non-periodic variation according to the judgment result when the periodic confidence is smaller than or equal to the confidence threshold.
Step S312: selecting the target algorithm model as a periodic algorithm model under the condition that the time sequence data are in periodic variation according to the judgment result;
step S314: selecting the target algorithm model as at least one non-periodic algorithm model under the condition that the judgment result is that the time sequence data are in non-periodic variation;
wherein the at least one aperiodic algorithmic model comprises at least one of an unsupervised algorithmic model, a statistical algorithmic model, and a novelty detection algorithmic model.
Step S316: and determining a data detection result of the time-series data by using the target algorithm model.
In the embodiment of the application, the period of the time series data is obtained by analyzing the frequency spectrum data, and the reliability of the predicted period is analyzed through the periodic confidence, so that the accuracy of periodic determination is further improved; according to the periodic characteristics of the time series data, namely whether the time series data are periodically changed or not, the target algorithm model is selected to be a periodic algorithm model or at least one non-periodic algorithm model, so that the accuracy and pertinence of data detection can be improved.
The embodiment of the present application further provides a data detection method, where the method includes:
step S402: acquiring time sequence data of a preset time period;
step S404: performing frequency domain conversion on the time sequence data to obtain frequency spectrum data;
step S406: analyzing the frequency spectrum data to obtain the periodicity confidence coefficient of the time series data;
step S408: when the periodic confidence is larger than a specific confidence threshold, determining that the time series data are periodically changed according to the judgment result;
step S410: and determining that the time-series data are in non-periodic variation according to the judgment result when the periodic confidence is smaller than or equal to the confidence threshold.
Step S412: under the condition that the time sequence data are periodically changed, determining that the target algorithm model is a periodic algorithm model;
wherein the periodic algorithm model is a trained algorithm model, and the periodic algorithm model can be a model for analyzing time series data containing linear trend and periodic fluctuation; the periodic algorithm model can be an abnormal point monitoring model based on time series, and can judge abnormal data points according to the relevance of the data points at the previous moment and the next moment; the periodic algorithm model can be a Long Short-Term Memory artificial neural network (LSTM) model, can also be an Autoregressive model with differential Integrated Moving Average (ARIMA), and can also be a Holt-Winter model and the like; the number of the periodic algorithm models can be one or more.
Step S414: acquiring historical time sequence data acquired in a second preset time period; the second preset time period is before the first preset time period; the time sequence data and the historical time sequence data are the same time sequence data;
step S416: inputting historical time series data into the periodic algorithm model;
the historical time-series data may be data collected at different times within a second preset time period, the second preset time period may be a previous cycle, and the time-series data and the historical time-series data are the same kind of data collected in different time periods.
Step S418: determining expected time series data output by the periodic algorithm model;
wherein, in case of one periodic algorithm model, an expected time series data can be obtained; in the case that the number of the periodic algorithm models is multiple, the expected time series data corresponding to the output of each periodic algorithm model can be obtained.
Step S420: and comparing the expected time sequence data with the time sequence data to obtain a data detection result of the time sequence data.
The expected time-series data and the time-series data can be compared, if inconsistent data points exist in the time-series data and the expected time-series data, the inconsistent data points are determined as abnormal data points, the positions of the abnormal data points are determined as abnormal intervals in the time-series data, the data detection result is obtained as the abnormal data points exist in the time-series data, and the abnormal intervals can be located.
In the embodiment of the application, the trained periodic algorithm model is adopted to perform data detection under the condition that the time series data are in periodic change according to the judgment result, so that the periodically changed data can be adapted to the periodic algorithm model, and the accuracy and the efficiency of data detection are improved.
The embodiment of the present application further provides a data detection method, where the method includes:
step S502: acquiring time sequence data of a preset time period;
step S504: performing frequency domain conversion on the time sequence data to obtain frequency spectrum data;
step S506: analyzing the frequency spectrum data to obtain the periodicity confidence coefficient of the time series data;
step S508: when the periodic confidence is larger than a specific confidence threshold, determining that the time series data are periodically changed according to the judgment result;
step S510: and determining that the time-series data are in non-periodic variation according to the judgment result when the periodic confidence is smaller than or equal to the confidence threshold.
Step S512: under the condition that the time sequence data do not periodically change, determining the target algorithm model as a plurality of non-periodic algorithm models;
step S514: inputting the time-series data into a plurality of non-periodic algorithm models, respectively; the plurality of aperiodic algorithmic models are models that analyze the time series data based on different dimensions;
wherein the aperiodic algorithmic model is an algorithmic model that has been trained, and the aperiodic algorithmic model may be a model for analyzing time series data that does not exhibit periodic fluctuations; the aperiodic algorithmic model may include distance-based anomaly detection models, density-based outlier detection models, and the like.
Further, the distance-based anomaly detection model can also be called as a statistic-based anomaly detection model, and can perform anomaly point judgment according to the distribution of the time series data, and focus on judging data points with overlarge or undersize data in the time series data through data variation indexes; the common variation indexes comprise range difference, interquartile distance, mean deviation, standard deviation, variation coefficient and the like; outliers can be found, for example, by extracting the data points that are farthest from the center point of each class or do not belong to any class by a k-means clustering algorithm.
The density-based outlier detection model is generally established on the basis of distance, and mainly combines two parameters, namely the distance between data points in time series data and the data number in a certain range, so as to obtain the density; the density-based Outlier detection model may be a Local Outlier Factor (LOF) algorithm; the density-based outlier detection model may include an unsupervised algorithm model that may perform judgment of abnormal data points according to distribution and distribution density of the time-series data itself, and a singularity detection algorithm model that may perform judgment of abnormal data points according to distribution of the time-series data, emphasizing on judgment of form abnormality inside the numerical values.
Step S516: determining data detection sub-results output by each aperiodic algorithm model;
step S518: and determining the data detection result of the time-series data according to a plurality of data detection sub-results.
Wherein, a plurality of the data detection sub-results can be the same or different; as is apparent from the above description, the aperiodic algorithm model based on different dimensions such as distance and density has different emphasis points when performing anomaly analysis, and thus, it is possible to determine the data detection result of the time-series data by comprehensively analyzing a plurality of the data detection sub-results.
In the embodiment of the application, under the condition that the judgment result is that the time sequence data is in aperiodic change, the trained plurality of aperiodic algorithm models are adopted for data detection, and the data detection result is analyzed according to each data detection sub-result, so that the aperiodic algorithm models can be adapted to the data in aperiodic change, and the accuracy and the efficiency of data detection are improved.
The embodiment of the present application further provides a data detection method, where the method includes:
step S602: acquiring time sequence data of a preset time period;
step S604: performing frequency domain conversion on the time sequence data to obtain frequency spectrum data;
step S606: analyzing the frequency spectrum data to obtain the periodicity confidence coefficient of the time series data;
step S608: when the periodic confidence is larger than a specific confidence threshold, determining that the time series data are periodically changed according to the judgment result;
step S610: and determining that the time-series data are in non-periodic variation according to the judgment result when the periodic confidence is smaller than or equal to the confidence threshold.
Step S612: under the condition that the time sequence data do not periodically change, determining the target algorithm model as a plurality of non-periodic algorithm models;
step S614: inputting the time-series data into a plurality of non-periodic algorithm models, respectively;
step S616: determining data detection sub-results output by each aperiodic algorithm model; the plurality of aperiodic algorithmic models are models that analyze the time series data based on different dimensions;
step S618: voting is carried out on the data detection sub-results to obtain a voting result;
step S620: and determining the data detection result of the time sequence data according to the voting result.
Wherein, the aperiodic algorithm model can be a classification algorithm or a regression algorithm; the classification algorithm can input the learning time of the student to predict whether the examination of the student passes, and the regression algorithm can input the learning time of the student to predict the examination score of the student.
Inputting the time series data into a classification algorithm to obtain a decision surface for classifying data in the time series data, for example, whether an abnormal data point exists in the time series data can be predicted by inputting the time series data into the classification algorithm; inputting the time series data into the regression algorithm may result in a best fit line that may best approximate each data point in the time series data, such as by inputting the time series data into the regression algorithm to predict an outlier data point in the time series data.
In one embodiment, the aperiodic algorithm models can be five classification algorithms for predicting whether there is an anomaly in the time-series data, and the five data detection sub-results include: exception, absence of exception, exception; the five data detection sub-results are voted, the corresponding voting results are 2 votes without abnormality and 3 votes without abnormality, and the voting results with a large number can be determined as the data detection results of the time series data, that is, the data detection results of the time series data can be considered as abnormal.
In the embodiment of the application, under the condition that the judgment result is that the time sequence data is in aperiodic change, a plurality of aperiodic algorithm models are adopted for data detection, and each data detection sub-result is voted to obtain the data detection result, so that the accuracy and the efficiency of data detection are improved.
The embodiment of the present application further provides a data detection method, where the method includes:
step S702: acquiring time sequence data of a preset time period;
step S704: performing frequency domain conversion on the time sequence data to obtain frequency spectrum data;
step S706: analyzing the frequency spectrum data to obtain the periodicity confidence coefficient of the time series data;
step S708: when the periodic confidence is larger than a specific confidence threshold, determining that the time series data are periodically changed according to the judgment result;
step S710: and determining that the time-series data are in non-periodic variation according to the judgment result when the periodic confidence is smaller than or equal to the confidence threshold.
Step S712: under the condition that the time sequence data do not periodically change, determining the target algorithm model as a plurality of non-periodic algorithm models;
step S714: inputting the time-series data into a plurality of non-periodic algorithm models, respectively;
step S716: determining data detection sub-results output by each aperiodic algorithm model; the plurality of aperiodic algorithmic models are models that analyze the time series data based on different dimensions;
step S718: determining an average of a plurality of said data detection sub-results;
in one embodiment, the aperiodic algorithm models can be five regression algorithms for predicting abnormal intervals of the time-series data, and the five data detection sub-results respectively reflect a set of values of the CPU utilization, such as (a) which is the five data detection sub-results respectively11,A12,A13,A14,A15)、(A21,A22,A23,A24,A25)、(A31,A32,A33,A34,A35)、(A41,A42,A43,A44,A45) And (A)51,A52,A53,A54,A55) (ii) a Suppose B1=(A11+A21+A31+A41+A51)/5,B2=(A12+A22+A32+A42+A52)/5,B3=(A13+A23+A33+A43+A53)/5,B4=(A14+A24+A34+A44+A54)/5,B5=(A15+A25+A35+A45+A55) And/5, the average value of 5 data detection sub-results can be B ═ B (B1,B2,B3,B4,B5)。
Step S720: and determining the data detection result of the time series data according to the average value.
Wherein the data detection result may be B ═ (B)1,B2,B3,B4,B5)。
In the embodiment of the application, under the condition that the judgment result is that the time sequence data is in aperiodic change, a plurality of aperiodic algorithm models are adopted for data detection, and the average value of each data detection sub-result is calculated to obtain the data detection result, so that the accuracy and the efficiency of data detection are improved.
The embodiment of the present application further provides a data detection method, where the method includes:
step S802: acquiring time sequence data of a preset time period;
step S804: performing frequency domain conversion on the time sequence data to obtain frequency spectrum data;
step S806: analyzing the frequency spectrum data to obtain the periodicity confidence coefficient of the time series data;
step S808: when the periodic confidence is larger than a specific confidence threshold, determining that the time series data are periodically changed according to the judgment result;
step S810: and determining that the time-series data are in non-periodic variation according to the judgment result when the periodic confidence is smaller than or equal to the confidence threshold.
Step S812: under the condition that the time sequence data do not periodically change, determining the target algorithm model as a plurality of non-periodic algorithm models;
step S814: inputting the time-series data into a plurality of non-periodic algorithm models, respectively;
step S816: determining data detection sub-results output by each aperiodic algorithm model; the plurality of aperiodic algorithmic models are models that analyze the time series data based on different dimensions;
step S818: determining a weight corresponding to each aperiodic algorithm model;
wherein, each aperiodic algorithm model can be given a corresponding weight in advance according to the prediction accuracy of each aperiodic algorithm model to the abnormal data points of the time series data, the higher the prediction accuracy of the abnormal data points can be, the higher the weight can be given to the aperiodic algorithm model, and the prediction accuracy of the abnormal data points can be determined by inputting the historical time series data into the aperiodic algorithm model for multiple times and analyzing the data detection sub-results output by the aperiodic algorithm model.
Step S820: and determining the data detection result of the time-series data according to the weight and a plurality of data detection sub-results.
In one embodiment, in the case that the aperiodic algorithm model is a classification algorithm, a data detection sub-result of the aperiodic algorithm model with the highest weight among the plurality of aperiodic algorithm models may be determined as a data detection result of the time-series data, and an average value of probabilities that the aperiodic algorithm model predicts the time-series data as a certain class may be used as a criterion, and a corresponding average value with the highest probability may be used as a final data detection result.
In one embodiment, assuming that there are 5 aperiodic algorithm models in total, the X type represents the existence of abnormal data points in the time-series data, and the Y type represents the absence of abnormal data points in the time-series data, the probability of the X type for the data detection sub-result of model 1 is 98%, the probability of the Y type is 2%, the probability of the X type for the data detection sub-result of model 2 is 49%, the probability of the Y type is 51%, the probability of the X type for the data detection sub-result of model 3 is 40%, the probability of the Y type is 60%, the probability of the X type for the data detection sub-result of model 4 is 91%, the probability of the Y type is 9%, the probability of the X type for the data detection sub-result of model 5 is 32%, and the probability of the Y type is 68%, in the case of using only voting results as the reference basis for the data detection results, since model 2, Both models 3 and 5 vote for the Y type, so the final data detection result is that no abnormal data point exists in the time series data.
It is obvious that the classification effect of the models 1 and 4 is better, and in the case that the average value of the data detection sub-results of the models 1 to 5 is used as the reference basis of the data detection result, that is, the probability of the X type is (0.98+0.49+0.40+0.91+ 0.32)/5-0.62, and the probability of the Y type is (0.02+0.51+0.60+0.09+ 0.68)/5-0.38, so that the final data detection result is the abnormal data point in the time series data.
In another embodiment, in the case where the aperiodic algorithm model is a regression algorithm, a weighted average of the respective data detection sub-results may be calculated with reference to a method of calculating an average of the respective data detection sub-results, and the weighted average may be determined as the data detection result of the time-series data.
In the embodiment of the application, the final data detection result is determined by determining the weight of the aperiodic algorithm model and according to the weight corresponding to each aperiodic algorithm model and the data detection sub-result, so that the accuracy and the efficiency of data detection are further improved.
The embodiment of the present application further provides a data detection method, where the method includes:
step S902: acquiring time sequence data of a preset time period;
step S904: sequentially determining autocorrelation coefficients among the sub-time sequence data according to the sequence of time delay among the sub-time sequence data of the time sequence data from small to large; the autocorrelation coefficients are used for measuring the correlation between the sub-time sequence data;
in one embodiment, assuming that the time series data is C ═ 1,2,3,5,7,8,10,13,14,16,20, with a time delay of 1, the sub-time series data compared may include [1,2,3,5,7,8,10,13,14,16] and [2,3,5,7,8,10,13,14,16,20 ]; when the time delay is 2, the compared sub-time series data may include [1,2,3,5,7,8,10,13,14] and [3,5,7,8,10,13,14,16,20 ]; autocorrelation coefficients between sub-time series data of different time delays can be calculated using pearson correlation coefficients.
Step S906: and judging whether the time sequence data is in periodic change or not according to the autocorrelation coefficient.
In addition to converting the time series data into the frequency domain by fourier transform for processing, the data points in the time series data may be compared with other data points that are aggregated with the data points for a fixed time interval (or delay) from the statistical characteristics of the observation signal itself, the difference between the data points may be examined, the period of the time series data may be found by data segmentation, binning, and maximum correlation between data, and the period extraction method may be a state Dispersion Minimization (PDM) algorithm or a Structure Function (SF) algorithm.
Step S908: selecting a target algorithm model corresponding to the time sequence data according to a judgment result;
step S910: and determining a data detection result of the time-series data by using the target algorithm model.
In the embodiment of the application, the periodicity of the time series data can be judged by starting from the statistical characteristics of the time series data, a test function is not needed to simulate the data, the non-uniformity adaptability of the time series data is strong, the calculation speed is high, and the implementation is simple.
The embodiment of the present application further provides a data detection method, where the method includes:
step S1002: acquiring time sequence data of a preset time period;
step S1004: sequentially determining autocorrelation coefficients among the sub-time sequence data according to the sequence of time delay among the sub-time sequence data of the time sequence data from small to large; the autocorrelation coefficients are used for measuring the correlation between the sub-time sequence data;
step S1006: determining that the time-series data are periodically changed according to the judgment result when the autocorrelation coefficient is larger than a specific coefficient threshold value;
in one embodiment, a sufficiently large autocorrelation coefficient may be found across a sufficient number of time delays, the autocorrelation coefficient being greater than a particular coefficient threshold, and the time series data may be determined to be periodically varying, and the time delay corresponding to the autocorrelation coefficient may be a period.
Step S1008: determining that the time-series data are non-periodically changed according to the judgment result when the autocorrelation coefficient is less than or equal to the coefficient threshold;
step S1010: selecting a target algorithm model corresponding to the time sequence data according to a judgment result;
step S1012: and determining a data detection result of the time-series data by using the target algorithm model.
In the embodiment of the application, the self-correlation coefficient is compared with the coefficient threshold value to judge the periodicity of the time sequence data, so that the accuracy, efficiency and convenience of periodic judgment are further improved.
In the related art, the anomaly detection for the time series data can be generally performed by the following method:
one method is an unsupervised algorithm based on data distribution, and the algorithm generally cannot be adapted to conditions such as data change, integral migration and the like which may occur in high-dimensional time sequence data or single-dimensional time sequence data, so that false alarm is easily caused; the high-dimensional time sequence data and the single-dimensional time sequence data refer to the number of utilized indexes, for example, indexes (such as indexes of throughput, response time, queue depth and the like) of multiple dimensions are required to be jointly analyzed in a complex system, and part of scenes are analyzed aiming at the single index (such as CPU load).
Another approach is a detection algorithm based on a periodic assumption, which generally assumes that in an actual environment, data exhibits data changes in units of days/weeks; however, the algorithm cannot accurately process non-periodic data.
Based on the data characteristics, the data characteristics are often complex in the actual environment, and the data environment is invisible, so that corresponding algorithm matching cannot be performed after manual screening, and fig. 2 is a data detection method provided by the embodiment of the application. The data detection method may include the steps of:
step 202: acquiring time sequence data of a preset time period;
the data flow in the environment can be collected in the current detection period, and time series data of each target index, which is also called as time series data, is obtained, and the target index can be indexes such as throughput, response time and queue depth of an application program.
Step 204: judging whether the time sequence data has periodicity or not;
the time series data can be periodically verified and extracted, frequency domain conversion can be firstly carried out on the time series data to obtain frequency spectrum data corresponding to the time series data, the periodicity confidence coefficient of the time series data can be obtained by carrying out frequency spectrum analysis on the frequency spectrum data, and data distribution is carried out according to the periodicity confidence coefficient; if the time series data exhibits a high periodicity, go to step 206; otherwise, executing step 208, step 210 and step 212;
step 206: performing data detection on the time sequence data by adopting a periodic algorithm model;
if the time sequence data shows higher periodicity, the historical time sequence data in the previous period can be reused through a periodic algorithm to judge the expected time sequence data in the current detection period; after the actual data (i.e., time-series data) is generated, the time-series data is compared with the expected time-series data, so that a first data detection sub-result of the time-series data, i.e., an abnormal section of the time-series data, can be obtained.
Step 208: performing data detection on the time sequence data by adopting an unsupervised algorithm to obtain a second data detection sub-result;
step 210: performing data detection on the time sequence data by adopting a statistical algorithm to obtain a third data detection sub-result;
step 212: performing data detection on the time sequence data by adopting a novelty detection algorithm to obtain a fourth data detection sub-result;
if the time-series data do not exhibit periodicity, a non-periodic algorithm model can be used for performing data anomaly detection on the time-series data, and the non-periodic algorithm model can be an unsupervised algorithm, a statistical algorithm and a novelty detection algorithm.
The unsupervised algorithm corresponds to the supervised algorithm, and refers to a machine learning algorithm for clustering or classifying tasks based on data, and the like, and data labeling without manual participation is realized; the unsupervised algorithm can judge abnormal data points in the time series data according to the distribution and the distribution density of the time series data, and a second data detection sub-result of the time series data is obtained.
The statistical algorithm can judge abnormal data points according to data distribution, and can emphatically judge data points with overlarge or undersize values in the time series data to obtain a third data detection sub-result of the time series data.
The novelty detection algorithm can judge abnormal data points according to the data distribution of the time series data, can emphatically judge the form abnormality inside numerical values in the time series data, and obtains a fourth data detection sub-result of the time series data.
Step 214: voting or integrating the data detection sub-results;
the first data detection sub-result to the fourth data detection sub-result obtained by the periodic algorithm model and the aperiodic algorithm model can be voted or integrated; in the case of performing data detection on the time-series data by using a periodic algorithm model, the weight of the non-periodic algorithm model may be adjusted to 0; in this case, since the weight of the aperiodic algorithm model is 0, the final voting result of the data detection sub-result is the first data detection sub-result; under the condition of carrying out data detection on the time sequence data by adopting a non-periodic algorithm model, different weights can be respectively given to an unsupervised algorithm, a statistical algorithm and a novelty detection algorithm in the non-periodic algorithm model, and the weight of the periodic algorithm model is adjusted to be 0; the weights of the unsupervised algorithm, the statistical algorithm and the novelty detection algorithm are respectively the weights of the second detection sub-result, the third detection sub-result and the fourth detection sub-result; in this case, since the weight of the periodic algorithm model is 0, the voting result of the data detection sub-result is the voting result of the second to fourth detection sub-results, and the final voting result of the data detection sub-result can be obtained according to the second to fourth detection sub-results and the corresponding weights.
Step 216: and determining the data detection result of the time sequence data according to the voting result.
Through the data judgment/distribution, the time series data are subjected to anomaly detection through a periodic algorithm model or a non-periodic algorithm model, and voting integration is performed according to weights preset for an unsupervised algorithm, a statistical algorithm and a novel detection algorithm, so that a final anomaly detection result can be generated.
It should be understood that, in another embodiment, different weights may be given to only the unsupervised algorithm, the statistical algorithm and the novelty detection algorithm in the aperiodic algorithm model, and in the case of performing data detection on the time-series data by using the periodic algorithm model, the final voting result of the data detection sub-result is directly the first data detection sub-result of the periodic algorithm model; and under the condition that the aperiodic algorithm model is adopted to carry out data detection on the time sequence data, the voting result of the data detection sub-result is the voting result of the second data detection sub-result to the fourth data detection sub-result.
The embodiment of the application can automatically identify data and adapt to algorithm aiming at various data characteristics and online/offline environments in a complex environment, can adapt to various environments in a large quantity, and effectively improves the detection effect of a single anomaly detection algorithm.
The embodiment of the application can realize the automatic combination of data judgment and an anomaly detection algorithm, including periodic judgment and model integrated detection.
Based on the foregoing embodiments, the present application provides a data detection apparatus, where the apparatus includes units and modules included in the units, and may be implemented by a processor in a computer device; of course, the implementation can also be realized through a specific logic circuit; in implementation, the processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.
Fig. 3 is a schematic structural diagram of a data detection apparatus according to an embodiment of the present application, and as shown in fig. 3, the apparatus 300 includes an obtaining module 301, a determining module 302, a first determining module 303, and a second determining module 304, where:
an obtaining module 301, configured to obtain time series data of a preset time period;
a judging module 302, configured to judge whether the time series data changes periodically;
a first determining module 303, configured to select, according to a determination result, a target algorithm model corresponding to the time-series data;
a second determining module 304, configured to determine a data detection result of the time-series data by using the target algorithm model.
In some embodiments, the determining module 302 includes: the conversion submodule is used for carrying out frequency domain conversion on the time sequence data to obtain frequency spectrum data; and the judging submodule is used for judging whether the time sequence data is in periodic change or not according to the frequency spectrum data.
In some embodiments, the determining sub-module includes: the analysis unit is used for analyzing the frequency spectrum data to obtain the periodicity confidence of the time series data; a first determining unit, configured to determine that the time-series data changes periodically as a result of the determination when the periodicity confidence is greater than a specific confidence threshold; a second determining unit configured to determine that the time-series data changes aperiodically as a result of the determination if the periodicity confidence is less than or equal to the confidence threshold.
In some embodiments, the first determining module 303 includes: the first determining submodule is used for selecting the target algorithm model as a periodic algorithm model under the condition that the judgment result is that the time sequence data is in periodic change; and the second determining submodule is used for selecting the target algorithm model as at least one non-periodic algorithm model under the condition that the judgment result is that the time sequence data do not periodically change.
In some embodiments, the at least one aperiodic algorithm model comprises at least one of an unsupervised algorithm model, a statistical algorithm model, and a novelty detection algorithm model.
In some embodiments, the time-series data of the preset time period is time-series data acquired within a first preset time period; a second determination module 304, comprising: the acquisition submodule is used for acquiring historical time sequence data acquired in a second preset time period under the condition that the time sequence data are periodically changed according to the judgment result; the second preset time period is before the first preset time period; the time sequence data and the historical time sequence data are the same time sequence data; a first input submodule for inputting historical time series data into the periodic algorithm model; a first output sub-module for determining expected time series data output by the periodic algorithm model; and the comparison submodule is used for comparing the expected time sequence data with the time sequence data to obtain a data detection result of the time sequence data.
In some embodiments, the second determining module 304 includes: the second input submodule is used for respectively inputting the time sequence data into a plurality of non-periodic algorithm models under the condition that the time sequence data are in non-periodic variation; the second output submodule is used for determining a data detection sub-result output by each aperiodic algorithm model; the plurality of aperiodic algorithmic models are models that analyze the time series data based on different dimensions; and the third determining submodule is used for determining the data detection result of the time-series data according to a plurality of data detection sub-results.
In one embodiment, the third determining sub-module includes: the voting unit is used for voting the data detection sub-results to obtain voting results; and a third determining unit configured to determine a data detection result of the time-series data according to the voting result.
In one embodiment, the third determining sub-module includes: an average value determining unit for determining an average value of a plurality of the data detection sub-results; and the fourth determining unit is used for determining the data detection result of the time series data according to the average value.
In one embodiment, the third determining sub-module includes: the weight determining unit is used for determining the weight corresponding to each aperiodic algorithm model; a fifth determining unit configured to determine a data detection result of the time-series data according to the weight and the plurality of data detection sub-results.
The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
It should be noted that, in the embodiment of the present application, if the data detection method is implemented in the form of a software functional module and sold or used as a standalone product, the data detection method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or a part contributing to the related art may be embodied in the form of a software product stored in a storage medium, and including a plurality of instructions for enabling a computer device (which may be a mobile phone, a tablet computer, a desktop computer, a personal digital assistant, a navigator, a digital phone, a video phone, a television, a sensing device, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
Correspondingly, an embodiment of the present application provides a computer device, fig. 4 is a schematic diagram of a hardware entity of the computer device in the embodiment of the present application, and as shown in fig. 4, the hardware entity of the computer device 400 includes: the data detection method comprises a memory 401 and a processor 402, wherein the memory 401 stores a computer program which can run on the processor 402, and the processor 402 executes the computer program to realize the steps of the data detection method of the embodiment.
The Memory 401 is configured to store instructions and applications executable by the processor 402, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 402 and modules in the computer device 400, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).
Correspondingly, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the data detection method provided in the above embodiments.
Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or a part contributing to the related art may be embodied in the form of a software product stored in a storage medium, and including a plurality of instructions for enabling a computer device (which may be a mobile phone, a tablet computer, a desktop computer, a personal digital assistant, a navigator, a digital phone, a video phone, a television, a sensing device, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments. Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict. The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.
The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (11)
1. A method of data detection, the method comprising:
acquiring time sequence data of a preset time period;
judging whether the time sequence data is in periodic variation or not;
selecting a target algorithm model corresponding to the time sequence data according to a judgment result;
and determining a data detection result of the time-series data by using the target algorithm model.
2. The method of claim 1, wherein the determining whether the time-series data is periodically changed comprises:
performing frequency domain conversion on the time sequence data to obtain frequency spectrum data;
and judging whether the time sequence data is periodically changed or not according to the frequency spectrum data.
3. The method of claim 2, wherein the determining whether the time-series data periodically changes according to the spectrum data comprises:
analyzing the frequency spectrum data to obtain the periodicity confidence coefficient of the time series data;
when the periodic confidence is larger than a specific confidence threshold, determining that the time series data are periodically changed according to the judgment result;
and determining that the time-series data are in non-periodic variation according to the judgment result when the periodic confidence is smaller than or equal to the confidence threshold.
4. The method according to any one of claims 1 to 3, wherein the selecting a target algorithm model corresponding to the time-series data according to the judgment result comprises:
selecting the target algorithm model as a periodic algorithm model under the condition that the time sequence data are in periodic variation according to the judgment result;
and selecting the target algorithm model as at least one non-periodic algorithm model when the judgment result shows that the time sequence data are in non-periodic variation.
5. The method of claim 4, wherein the at least one aperiodic algorithm model comprises at least one of an unsupervised algorithm model, a statistical algorithm model, and a novelty detection algorithm model.
6. The method according to claim 4, wherein the time series data of the preset time period is the time series data collected in a first preset time period;
the determining a data detection result of the time-series data by using the target algorithm model comprises:
acquiring historical time sequence data acquired in a second preset time period under the condition that the time sequence data are periodically changed according to the judgment result; the second preset time period is before the first preset time period; the time sequence data and the historical time sequence data are the same time sequence data;
inputting historical time series data into the periodic algorithm model;
determining expected time series data output by the periodic algorithm model;
and comparing the expected time sequence data with the time sequence data to obtain a data detection result of the time sequence data.
7. The method of claim 4, wherein determining data detection results for the time series data using the target algorithm model comprises:
when the time sequence data do not change periodically, inputting the time sequence data into a plurality of non-periodic algorithm models respectively;
determining data detection sub-results output by each aperiodic algorithm model; the plurality of aperiodic algorithmic models are models that analyze the time series data based on different dimensions;
and determining the data detection result of the time-series data according to a plurality of data detection sub-results.
8. The method of claim 7, wherein determining the data detection result for the time series data from the plurality of data detection sub-results comprises:
determining a weight corresponding to each aperiodic algorithm model;
and determining the data detection result of the time-series data according to the weight and a plurality of data detection sub-results.
9. A data detection apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring time sequence data of a preset time period;
the judging module is used for judging whether the time sequence data is in periodic change or not;
the first determining module is used for selecting a target algorithm model corresponding to the time sequence data according to the judgment result;
and the second determining module is used for determining the data detection result of the time sequence data by utilizing the target algorithm model.
10. A computer device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the data detection method of any one of claims 1 to 8 when executing the program.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data detection method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110116094.9A CN112783744A (en) | 2021-01-28 | 2021-01-28 | Data detection method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110116094.9A CN112783744A (en) | 2021-01-28 | 2021-01-28 | Data detection method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112783744A true CN112783744A (en) | 2021-05-11 |
Family
ID=75759231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110116094.9A Pending CN112783744A (en) | 2021-01-28 | 2021-01-28 | Data detection method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112783744A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113536066A (en) * | 2021-07-16 | 2021-10-22 | 全球能源互联网研究院有限公司 | Data anomaly detection algorithm determination method and device and computer equipment |
CN115495274A (en) * | 2022-11-15 | 2022-12-20 | 阿里云计算有限公司 | Exception handling method based on time sequence data, network equipment and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009199533A (en) * | 2008-02-25 | 2009-09-03 | Nec Corp | Operation management device, operation management system, information processing method, and operation management program |
CN110750429A (en) * | 2019-09-06 | 2020-02-04 | 平安科技(深圳)有限公司 | Abnormity detection method, device, equipment and storage medium of operation and maintenance management system |
CN110851338A (en) * | 2019-09-23 | 2020-02-28 | 平安科技(深圳)有限公司 | Abnormality detection method, electronic device, and storage medium |
-
2021
- 2021-01-28 CN CN202110116094.9A patent/CN112783744A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009199533A (en) * | 2008-02-25 | 2009-09-03 | Nec Corp | Operation management device, operation management system, information processing method, and operation management program |
CN110750429A (en) * | 2019-09-06 | 2020-02-04 | 平安科技(深圳)有限公司 | Abnormity detection method, device, equipment and storage medium of operation and maintenance management system |
CN110851338A (en) * | 2019-09-23 | 2020-02-28 | 平安科技(深圳)有限公司 | Abnormality detection method, electronic device, and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113536066A (en) * | 2021-07-16 | 2021-10-22 | 全球能源互联网研究院有限公司 | Data anomaly detection algorithm determination method and device and computer equipment |
CN115495274A (en) * | 2022-11-15 | 2022-12-20 | 阿里云计算有限公司 | Exception handling method based on time sequence data, network equipment and readable storage medium |
CN115495274B (en) * | 2022-11-15 | 2023-03-07 | 阿里云计算有限公司 | Exception handling method based on time sequence data, network equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113792453B (en) | Digital twinning-based partial discharge monitoring system, method and device | |
D'Isanto et al. | An analysis of feature relevance in the classification of astronomical transients with machine learning methods | |
JP6783443B2 (en) | Information processing equipment, information processing systems, information processing methods, programs, and recording media | |
CN111046286A (en) | Object recommendation method and device and computer storage medium | |
CN114297036B (en) | Data processing method, device, electronic equipment and readable storage medium | |
CN112561082A (en) | Method, device, equipment and storage medium for generating model | |
JP2001502831A (en) | A method for classifying the statistical dependence of measurable time series | |
CN112783744A (en) | Data detection method, device, equipment and storage medium | |
CN111105786B (en) | Multi-sampling-rate voice recognition method, device, system and storage medium | |
CN113961765B (en) | Searching method, searching device, searching equipment and searching medium based on neural network model | |
CN118094118B (en) | Data set quality evaluation method, system, electronic equipment and storage medium | |
CN109002810A (en) | Model evaluation method, Radar Signal Recognition method and corresponding intrument | |
CN113986674A (en) | Method and device for detecting abnormity of time sequence data and electronic equipment | |
CN116451139B (en) | Live broadcast data rapid analysis method based on artificial intelligence | |
JP2019105871A (en) | Abnormality candidate extraction program, abnormality candidate extraction method and abnormality candidate extraction apparatus | |
CN112785067A (en) | Data prediction method and device, equipment and storage medium | |
CN117041017A (en) | Intelligent operation and maintenance management method and system for data center | |
JP2021043477A (en) | Demand forecasting device, demand forecasting method, and program | |
AU2021251463B2 (en) | Generating performance predictions with uncertainty intervals | |
CN114266601A (en) | Marketing strategy determination method and device, terminal equipment and storage medium | |
CN113836240A (en) | Time sequence data classification method and device, terminal equipment and storage medium | |
CN117056709A (en) | Training method and device of time sequence prediction model, storage medium and electronic equipment | |
CN113934585A (en) | Disk failure prediction method and device, equipment and storage medium | |
KR101181326B1 (en) | System and Method for distinguishing chaff echoes | |
Ji et al. | Active region-based flare forecasting with sliding window multivariate time series forest classifiers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210511 |