CN111726341A - Data detection method and device, electronic equipment and storage medium - Google Patents

Data detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111726341A
CN111726341A CN202010491155.5A CN202010491155A CN111726341A CN 111726341 A CN111726341 A CN 111726341A CN 202010491155 A CN202010491155 A CN 202010491155A CN 111726341 A CN111726341 A CN 111726341A
Authority
CN
China
Prior art keywords
data
time period
unit time
abnormal
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010491155.5A
Other languages
Chinese (zh)
Other versions
CN111726341B (en
Inventor
庄伟�
史忠伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuba Co Ltd
Original Assignee
Wuba Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuba Co Ltd filed Critical Wuba Co Ltd
Priority to CN202010491155.5A priority Critical patent/CN111726341B/en
Publication of CN111726341A publication Critical patent/CN111726341A/en
Application granted granted Critical
Publication of CN111726341B publication Critical patent/CN111726341B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a data detection method, a data detection device, electronic equipment and a storage medium, and relates to the technical field of computers. The method comprises the following steps: aiming at any unit time period, obtaining predicted data in the unit time period according to historical time data corresponding to the unit time period, wherein the historical time data is real data in a preset time period before the unit time period; acquiring candidate abnormal data based on the real data and the predicted data in each unit time period; and acquiring final abnormal data from the candidate abnormal data according to a preset filtering rule. The method has the beneficial effect of obtaining the accuracy of the data detection result.

Description

Data detection method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data detection method and apparatus, an electronic device, and a storage medium.
Background
In all industries, all fields and all channels, a series of complete risk control, namely wind control, is needed to ensure that things develop towards a good direction so as to reduce loss. At this time, a complete set of wind control system is generated to solve various problems in business. During the operation period of the wind control system, the monitoring system can realize a monitoring action by sending alarm information to related personnel according to the alarm rule, so that technicians can participate in the monitoring action in time to prevent accidents. Therefore, the early warning is a due function in the wind control system.
In the related art, a wind control system generally performs monitoring and early warning based on a rule engine. However, the strategies in the rule engine hardly hit all types of abnormal data, the accuracy of data detection results is insufficient, and the strategies of the rule engine adopt hard indexes to alarm data, so that the generalization is insufficient.
Disclosure of Invention
The embodiment of the invention provides a data detection method, a data detection device, electronic equipment and a storage medium, and aims to solve the problems of insufficient accuracy and insufficient generalization of data detection results in related technologies such as the existing wind control system and the like.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a data detection method, including:
aiming at any unit time period, obtaining predicted data in the unit time period according to historical time data corresponding to the unit time period, wherein the historical time data is real data in a preset time period before the unit time period;
acquiring candidate abnormal data based on the real data and the predicted data in each unit time period;
and acquiring final abnormal data from the candidate abnormal data according to a preset filtering rule.
Optionally, the step of obtaining candidate abnormal data based on the real data and the predicted data in each unit time period includes:
performing anomaly detection on the real data in each unit time period based on the real data and the predicted data in each unit time period to obtain initial anomaly data;
and acquiring initial abnormal data meeting a preset threshold condition as the candidate abnormal data.
Optionally, the step of performing anomaly detection on the real data in each unit time period based on the real data and the predicted data in each unit time period to obtain initial anomaly data includes:
based on the real data and the predicted data in each unit time period, carrying out anomaly detection on the real data in each unit time period through at least one of the following comparison strategies to obtain initial anomaly data;
wherein the alignment strategy comprises:
the development trend of the data difference value in the unit time period is compared with the development trend of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the peak value of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the lowest value of the data difference value in a preset time period before the unit time period;
the average variance of the data difference values in the unit time period and the nearest N unit time periods before the unit time period is compared with the average variance of the data difference values in a preset time period before the unit time period, wherein N is a positive integer;
the divergence degrees of the data difference values in the unit time period and the unit time periods which are close to M before the unit time period are compared with the divergence degrees of the data difference values in the preset time period before the unit time period, wherein M is a positive integer;
the data difference is the difference between real data and predicted data in the same unit time.
Optionally, the step of acquiring initial abnormal data meeting a preset threshold condition as the candidate abnormal data includes:
for each initial abnormal data, sequencing historical time data corresponding to the initial abnormal data and the initial abnormal data according to a time sequence to obtain a data sequence;
responding to that the data sequence meets normal distribution, and acquiring the candidate abnormal data according to a threshold condition set for the initial abnormal data based on a Lauda criterion;
and responding to the data sequence not meeting normal distribution, and acquiring the candidate abnormal data according to a threshold condition set based on a local abnormal factor algorithm.
Optionally, the step of obtaining the prediction data in each unit time period according to the historical time data includes:
acquiring time series characteristics of historical time data corresponding to the unit time periods aiming at each unit time period;
generating a time series model corresponding to the unit time period according to the time series characteristics;
acquiring the prediction data in the unit time period based on the time series model;
wherein the time series feature comprises at least one of a period feature, a trend feature, a seasonal feature, an autocorrelation feature, a skewness feature, a kurtosis feature, and a non-linear feature characterizing a divergence degree.
In a second aspect, an embodiment of the present invention additionally provides a data detection apparatus, including:
the data prediction module is used for acquiring prediction data in a unit time period according to historical time data corresponding to the unit time period aiming at any unit time period, wherein the historical time data is real data in a preset time period before the unit time period;
the candidate data acquisition module is used for acquiring candidate abnormal data based on the real data and the predicted data in each unit time period;
and the abnormal data acquisition module is used for acquiring final abnormal data from the candidate abnormal data according to a preset filtering rule.
Optionally, the candidate data obtaining module includes:
the initial data acquisition submodule is used for carrying out anomaly detection on the real data in each unit time period based on the real data and the predicted data in each unit time period to acquire initial anomaly data;
and the candidate data acquisition submodule is used for acquiring initial abnormal data meeting a preset threshold condition and taking the initial abnormal data as the candidate abnormal data.
Optionally, the initial data obtaining sub-module is further configured to perform, based on the real data and the predicted data in each unit time period, anomaly detection on the real data in each unit time period through at least one of the following comparison strategies, so as to obtain initial anomaly data;
wherein the alignment strategy comprises:
the development trend of the data difference value in the unit time period is compared with the development trend of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the peak value of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the lowest value of the data difference value in a preset time period before the unit time period;
the average variance of the data difference values in the unit time period and the nearest N unit time periods before the unit time period is compared with the average variance of the data difference values in a preset time period before the unit time period, wherein N is a positive integer;
the divergence degrees of the data difference values in the unit time period and the unit time periods which are close to M before the unit time period are compared with the divergence degrees of the data difference values in the preset time period before the unit time period, wherein M is a positive integer;
the data difference is the difference between real data and predicted data in the same unit time.
Optionally, the candidate data obtaining sub-module is specifically configured to:
for each initial abnormal data, sequencing historical time data corresponding to the initial abnormal data and the initial abnormal data according to a time sequence to obtain a data sequence;
responding to that the data sequence meets normal distribution, and acquiring the candidate abnormal data according to a threshold condition set for the initial abnormal data based on a Lauda criterion;
and responding to the data sequence not meeting normal distribution, and acquiring the candidate abnormal data according to a threshold condition set based on a local abnormal factor algorithm.
Optionally, the data prediction module includes:
the data characteristic acquisition submodule is used for acquiring the time series characteristics of the historical time data corresponding to the unit time period aiming at each unit time period;
the time series model construction submodule is used for generating a time series model corresponding to the unit time period according to the time series characteristics;
the data prediction submodule is used for acquiring prediction data in the unit time period based on the time series model;
wherein the time series feature comprises at least one of a period feature, a trend feature, a seasonal feature, an autocorrelation feature, a skewness feature, a kurtosis feature, and a non-linear feature characterizing a divergence degree.
In a third aspect, an embodiment of the present invention additionally provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the data detection method as described above.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the data detection method as described above.
In the embodiment of the invention, the data in the unit time period is predicted through the historical time data corresponding to each unit time period, the candidate abnormal data is selected based on the real data and the predicted data, then the candidate abnormal data is subjected to personalized intelligent detection, and finally the accuracy of the data detection result is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive labor.
FIG. 1 is a flow chart of the steps of a method of data detection in an embodiment of the invention;
FIG. 2 is a flow chart of steps of another method of data detection in an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data detection apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of another data detection apparatus in an embodiment of the present invention;
fig. 5 is a schematic diagram of a hardware structure of an electronic device in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The data detection method provided by the embodiment of the invention is described in detail.
Referring to fig. 1, a flowchart illustrating steps of a data detection method according to an embodiment of the present invention is shown.
Step 110, for any unit time period, obtaining predicted data in the unit time period according to historical time data corresponding to the unit time period, wherein the historical time data is real data in a preset time period before the unit time period.
And step 120, acquiring candidate abnormal data based on the real data and the predicted data in each unit time period.
And step 130, acquiring final abnormal data from the candidate abnormal data according to a preset filtering rule.
In a related risk policy monitoring system, various risk control indexes can exist generally, but due to the fact that risk policies have different periods and trends in different scenes and time, an alarm policy cannot be configured manually according to actual conditions. For example, for a certain network platform, risk policy monitoring includes risk control indexes such as interception volumes of service lines (e.g., yellow pages, real estate, recruitment, used cars, etc.) of a master station under different policies, illegal post issuance interception volumes, different illegal statistical interception volumes, and chat disturbance information interception volumes, but it is difficult for a human to accurately and quickly judge whether each index is abnormal, and the human subjectivity and stability are strong and poor, which easily affects the accuracy of a result. Therefore, in the embodiment of the invention, a model for predicting in real time based on historical data and intelligently identifying abnormal data by combining real data is provided, so that data abnormality can be detected at an early stage, and the model plays a very critical role in maintaining the consistency of monitored data and protecting enterprises from malicious attacks.
Specifically, the prediction data per unit time period may be acquired from the historical time data. The specific time length of the unit time period may be set by user according to requirements, and the embodiment of the present invention is not limited. For example, the unit time period may be set to one day, and then each unit time period may be understood as one day. The historical time data may be the historical time data corresponding to the data generated before the historical time data for each unit time period. However, in general, the closer the data to the unit time period is, the greater the correlation degree with the data in the corresponding unit time period, and therefore in the embodiment of the present invention, for each unit time period, the real data in the preset time period before it may be taken as the historical time data thereof to predict the predicted data in the corresponding unit time period. The preset time period may be preset according to a requirement, and the embodiment of the present invention is not limited. For example, the preset time period may be set to 6 months, one week, or the like.
In addition, in practical application, data can be divided into multiple types, and the type of data to be predicted can be different according to different detection requirements in different application scenarios. However, the data dimension included in the historical time data should be the same as the data dimension of the corresponding predicted data to be predicted, or the data dimension included in the historical time data should include the data dimension of the corresponding predicted data to be predicted.
For example, it is assumed that, when detecting the access amount for each day, for any unit time period (for example, any day), the prediction data may be a prediction value of the access amount in the corresponding unit time period, and the corresponding historical time data for the unit time period may be a real access amount for each day in a preset time period before the prediction value.
For any unit time period, after the time of the unit time period is reached, the real data in the corresponding unit time period can be acquired, and after the prediction data and the real data are acquired, the candidate abnormal data can be further acquired based on the real data and the prediction data in the corresponding unit time period. The conditions that the abnormal data needs to meet can be set by user according to requirements, and the embodiment of the invention is not limited.
For example, the real data and the predicted data in the same unit time period may be compared, and if the difference between the real data and the predicted data exceeds a preset threshold, the real data in the corresponding unit time period may be determined as the candidate abnormal data.
Or, the historical time data may be referred to, by comparing the development trend of the predicted data of the unit time period with respect to the historical time data with the normal development trend of the real data with respect to the historical time data, comparing the peak value of the predicted data of the unit time period with the historical time data, comparing the lowest value of the predicted data of the unit time period with the historical time data, comparing the fluctuation variance of the predicted data of the unit time period and the latest N unit time periods before the unit time period with the average variance of the historical time data corresponding to the fluctuation variance, and comparing the divergence degree of the predicted data of the unit time period and the latest N unit time periods before the unit time period with the average divergence degree of the historical time data corresponding to the divergence; and the like to obtain candidate anomaly data.
Or, historical time data can be referred to, and whether each real data is abnormal or not is judged according to a Lauda criterion or a local abnormal factor algorithm on the basis of the real data and the predicted data in each unit time period, so that candidate abnormal data is obtained; and so on.
Of course, in the embodiment of the present invention, the condition that the candidate abnormal data needs to satisfy may be set according to a requirement, and the embodiment of the present invention is not limited thereto.
However, the above-mentioned manner of obtaining candidate abnormal data is mainly based on comparing the real data with the predicted data, or based on comparing the relevant attribute parameters obtained from the real data and the predicted data, and in practical application, some normal fluctuation may occur in the data in some specific time periods, but it is difficult to predict the fluctuation generated by the data when prediction is performed based on the historical time data. For example, for the website access amount, if the website advertises in a certain time period or performs a preferential activity, the website access amount of the corresponding website in the corresponding time period is obviously increased, but at this time, if the website access amount in the corresponding time period is predicted based on historical time data, it is difficult to accurately predict the increase condition, so that the predicted data in the corresponding time period is obviously lower than the real data, and thus the real data in the corresponding time period is misjudged as candidate abnormal data, which affects the accuracy of the data detection result.
Therefore, in the embodiment of the present invention, after obtaining the candidate abnormal data, to avoid data misjudgment and improve the accuracy of the data detection result, the final abnormal data may be obtained from the candidate abnormal data according to a preset filtering rule. The filtering rule may be set by a user according to a specific application scenario, and the embodiment of the present invention is not limited thereto. For example, the filtering rule may be set according to a specific time such as holidays, operation activities set under different businesses, and the like.
Also, for each candidate abnormal data, if it satisfies a preset filtering rule, it may be considered as not abnormal data and filtered out, and if it does not satisfy the preset filtering rule, it may be considered as abnormal data.
For example, it is assumed that the filtering rule is set according to a specific holiday, if the unit time period corresponding to the candidate abnormal data is a preset holiday, the candidate abnormal data is determined to be normal data, otherwise, the corresponding candidate abnormal data can be determined to be abnormal data.
Referring to fig. 2, in the embodiment of the present invention, the step 120 may further include:
step 121, performing anomaly detection on the real data in each unit time period based on the real data and the predicted data in each unit time period to obtain initial anomaly data;
and step 122, acquiring initial abnormal data meeting a preset threshold condition as the candidate abnormal data.
In the embodiment of the present invention, in order to improve the efficiency of the data detection process, abnormality detection may be performed on the real data in each unit time period in advance based on the real data and the predicted data in each unit time period, so as to obtain initial abnormal data, and further detect the initial abnormal data, so as to obtain initial abnormal data meeting a preset threshold condition, as the candidate abnormal data.
In the process of detecting the anomaly, the conditions required to be met by the anomaly data can be set by self according to requirements, and whether the corresponding real data are the initial anomaly data or not can be confirmed according to the predicted data and the real data in each unit time period.
For example, in the abnormality detection process, the real data and the predicted data in the same unit time period may be compared as described above, and if the difference between the real data and the predicted data exceeds a preset threshold, the real data in the corresponding unit time period may be determined as the initial abnormal data.
Or, the historical time data may be referred to, by comparing the development trend of the predicted data of the unit time period with respect to the historical time data with the normal development trend of the real data with respect to the historical time data, comparing the peak value of the predicted data of the unit time period with the historical time data, comparing the lowest value of the predicted data of the unit time period with the historical time data, comparing the fluctuation variance of the predicted data of the unit time period and the latest N unit time periods before the unit time period with the average variance of the historical time data corresponding to the fluctuation variance, and comparing the divergence degree of the predicted data of the unit time period and the latest N unit time periods before the unit time period with the average divergence degree of the historical time data corresponding to the divergence; and the like to obtain initial anomaly data.
For example, if the trend of the predicted data relative to the historical time data of a certain unit time period is opposite to the normal trend of the real data relative to the historical time data, the real data of the corresponding unit time period can be considered as the initial abnormal data; or, if the predicted data of a certain unit time period is higher than the peak value in the historical time data, or the predicted data of the corresponding unit time period is lower than the lowest value in the historical time data, the real data of the corresponding unit time period can be determined as the initial abnormal data; or, if the difference between the fluctuation variance of the predicted data in a certain unit time period and the latest N unit time periods before the unit time period and the average variance of the historical time data corresponding to the unit time period exceeds a preset variance threshold, and/or the difference between the divergence degree of the predicted data in the corresponding unit time period and the latest N unit time periods before the unit time period and the average divergence degree of the historical time data corresponding to the unit time period exceeds a preset divergence degree threshold, the real data of the corresponding unit time period can be determined as the initial abnormal data; and so on.
In addition, in the embodiment of the present invention, in order to conveniently acquire and record the initial abnormal data, an abnormal database may be pre-constructed to record the initial abnormal data obtained through the abnormal detection.
After the initial abnormal data is obtained, the initial abnormal data meeting a preset threshold condition may be further obtained from each of the initial abnormal data as the candidate abnormal data. At this time, the initial abnormal data is real data, not predicted data, and the predicted data is also predicted based on the historical time data, that is, the data generated really is further judged to obtain the final abnormal data.
The threshold condition may be set by a user according to a requirement and a specific application scenario, and the embodiment of the present invention is not limited.
For example, a threshold condition may be set based on the degree of deviation of each of the initial abnormal data with respect to its corresponding real data, such that the L initial abnormal data with the greatest degree of deviation are obtained as final abnormal data, and so on. The specific value of L may be set by user according to the requirement, and the embodiment of the present invention is not limited.
Optionally, in an embodiment of the present invention, the step 121 may further include: based on the real data and the predicted data in each unit time period, carrying out anomaly detection on the real data in each unit time period through at least one of the following comparison strategies to obtain initial anomaly data;
wherein the alignment strategy comprises:
comparing the development trend of the data difference value in the unit time period with the development trend of the data difference value in a preset time period before the unit time period, wherein the data difference value is the difference value between real data and predicted data in the same unit time;
the data difference value in the unit time period is compared with the peak value of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the lowest value of the data difference value in a preset time period before the unit time period;
the average variance of the data difference values in the unit time period and the nearest N unit time periods before the unit time period is compared with the average variance of the data difference values in a preset time period before the unit time period, wherein N is a positive integer;
and the divergence degrees of the data difference values in the unit time period and the unit time periods which are close to M before the unit time period are compared with the divergence degree of the data difference values in the preset time period before the unit time period, wherein M is a positive integer.
The comparison policy for the development trend of the data difference value in the unit time period relative to the development trend of the data difference value in the preset time period before the unit time period may specifically be that if the development trend of the data difference value in the unit time period is the same as the development trend of the data difference value in the preset time period before the unit time period, it may be considered that the real data in the corresponding unit time period passes through the comparison policy, otherwise, it is considered that the real data in the corresponding unit time period does not pass through the corresponding comparison policy; the comparison policy of the data difference value in the unit time period with respect to the peak value of the data difference value in the preset time period before the unit time period may specifically be that if the data difference value in the unit time period is not higher than the peak value of the data difference value in the preset time period before the unit time period, the real data in the corresponding unit time period is considered to pass through the comparison policy, otherwise, the real data in the corresponding unit time period is considered not to pass through the comparison policy; the comparison policy of the data difference value in the unit time period with respect to the lowest value of the data difference value in the preset time period before the unit time period may specifically be that if the data difference value in the unit time period is not lower than the lowest value of the data difference value in the preset time period before the unit time period, the real data in the corresponding unit time period is considered to pass through the comparison policy, otherwise, the real data in the corresponding unit time period is considered not to pass through the comparison policy; a comparison policy for the average variance of the data difference values in the unit time period and the N nearest unit time periods before the unit time period relative to the average variance of the data difference values in a preset time period before the unit time period, specifically, if the difference between the average variance of the data difference values in a certain unit time period and the N nearest unit time periods before the unit time period and the average variance of the data difference values in the preset time period before the unit time period is within a preset variance threshold, it may be determined that the real data in the corresponding unit time period passes through the comparison policy, otherwise, it is determined that the real data in the corresponding unit time period does not pass through the comparison policy; the divergence degrees of the data difference values in the unit time period and the unit time periods before the unit time period are compared with the divergence degree of the data difference values in the preset time period before the unit time period, specifically, if the difference value between the divergence degree of the data difference values in a certain unit time period and the unit time periods before the unit time period and the divergence degree of the data difference values in the preset time period before the unit time period is within a preset divergence degree threshold, the real data in the corresponding unit time period may pass through the comparison policy, otherwise, the real data in the corresponding unit time period is considered not to pass through the comparison policy.
Moreover, if the real data in each unit time period is subjected to the anomaly detection through the plurality of comparison strategies, the real data in the corresponding unit time period is determined not to be the initial anomalous data under the condition that the real data in a certain unit time period simultaneously meets all the comparison strategies for performing the anomaly detection, and if the real data in a certain unit time period does not pass any at least one of the plurality of corresponding comparison strategies, the real data in the corresponding unit time period can be determined to be the initial anomalous data.
The specific values of N and M may be set by user according to requirements, and the embodiment of the present invention is not limited. Generally speaking, the value of M needs to ensure that the sum of the time lengths of any unit time period and the M unit time periods immediately before the unit time period is less than the preset time period, and correspondingly, the value of N needs to ensure that the sum of the time lengths of any unit time period and the N unit time periods immediately before the unit time period is less than the preset time period.
The divergence degree may also be referred to as a divergence measure (measure of dispersion), and in the embodiment of the present invention, the divergence degree may be characterized in any available manner, which is not limited to the embodiment of the present invention. For example, the degree of divergence may be obtained by an interquartile range (IQR). A quartering distance, also known as midspread, midle 50%, H-spread, which is equal to the difference between the 75th percentile and the 25th percentile, i.e., IQR-Q3-Q1, where Q1 is the n smallest median, i.e., Q1 is at the 25% position of the ordered series, ordered from small to large, for a series of lengths of 2n or 2n + 1. Q3 is the median of the n largest numbers, i.e., Q3 is at 75% of the positions in the ordered sequence, ordered from small to large. The IQR reflects the degree of concentration of data, and the smaller the IQR, the more concentrated the data near the center line, and the larger the IQR, the more divergent the data at both ends. Alternatively, the degree of divergence can also be characterized by a quartile coefficient of divergence, which is defined as (Q3-Q1)/(Q3+ Q1). Alternatively, the divergence degree can also be characterized by a range (range), in statistics, for a set, the difference between the maximum value and the minimum value is a range, and the larger the value is, the larger the difference between the maximum value and the minimum value of the set is, the more divergent the data is; the smaller the value is, the smaller the difference between the maximum value and the minimum value of the representation set is, and the more concentrated the data is; and so on.
Optionally, in an embodiment of the present invention, the step 122 further includes:
step 1221, for each initial abnormal data, sorting historical time data corresponding to the initial abnormal data and the initial abnormal data according to a time sequence to obtain a data sequence;
step 1222, in response to the data sequence satisfying a normal distribution, obtaining the candidate abnormal data according to a threshold condition set for the initial abnormal data based on a raleigh criterion;
and 1223, in response to that the data sequence does not meet normal distribution, acquiring the candidate abnormal data according to a threshold condition set based on a local abnormal factor algorithm.
In addition, the initial abnormal data is screened again to obtain the final abnormal data. In addition, in the embodiment of the present invention, in order to improve the accuracy of the abnormal data obtained by the final screening as much as possible, different screening methods may be assigned to each initial abnormal data. Meanwhile, in order to improve the detection efficiency of abnormal data, the distribution form of each initial abnormal data can be obtained according to the initial abnormal data and the corresponding historical time data, and different methods are adopted to set threshold conditions based on the distribution form satisfied by the data.
Specifically, for each initial abnormal data, the historical time data corresponding to the initial abnormal data and the initial abnormal data may be sorted according to a time sequence to obtain a data sequence, and then whether the data sequence of each initial abnormal data satisfies a normal distribution is detected, and for the initial abnormal data corresponding to the data sequence satisfying the normal distribution, a threshold condition may be set based on a rale criterion to determine whether the initial abnormal data is candidate abnormal data; for the initial abnormal data corresponding to the data sequence which does not meet the normal distribution, a threshold condition can be set based on a local abnormal factor algorithm to judge whether the initial abnormal data is the candidate abnormal data.
The Lavian criterion may also be referred to as a 3 σ criterion. The Layouda criterion is that a group of detection data is supposed to only contain random errors, the detection data is calculated to obtain standard deviation, an interval is determined according to a certain probability, the errors exceeding the interval are considered not to belong to the random errors but to be coarse errors, and the data containing the errors are rejected. Where σ is understood as the standard deviation of the data sequence.
Assuming that the arithmetic mean value of each data in a certain data sequence is mu, if the value of the initial abnormal data corresponding to the data sequence is within three standard deviations of the mean value, namely within the range of (mu-3 sigma, mu +3 sigma), the initial abnormal data is considered to be a normal value, otherwise, the initial abnormal data can be considered to be a bad value containing a coarse error value, namely a candidate abnormal value. The threshold condition in this case may be (μ -3 σ, μ +3 σ) as described above.
In addition, in the embodiment of the present invention, the above 3 σ may also be customized to be K σ, and accordingly, the threshold condition may be (μ -K σ, μ + K σ), where a value of K may be set by the user, for example, 1, 2, 3, and so on. At this point, it may be measured whether a given initial anomaly data is within a corresponding threshold condition of K2 or 1, depending on the sensitivity required. It should be noted that, at this time, for each initial abnormal data, a corresponding data sequence may be obtained, and if the corresponding data sequence satisfies a normal distribution, a threshold condition of the corresponding initial abnormal data may be set based on the foregoing manner, and since σ and μ of different data sequences are not completely the same, threshold conditions corresponding to different initial abnormal data may also be correspondingly different, which is not limited in the embodiment of the present invention.
For data sequences that do not satisfy the normal distribution, the threshold condition of the corresponding initial abnormal data cannot be set in the manner described above. Then a threshold condition may be set at this time based on a Local Outlier Factor (LOF) algorithm. Specifically, for all the initial abnormal data corresponding to the data sequence which does not satisfy normal distribution, the deviation measurement distribution may be obtained based on the LOF algorithm, then P low-density regions with the lowest density are obtained, and identification is performed on each low-density region in sequence, and the obtained initial abnormal data is the candidate abnormal data. The value of P can be set by user according to the requirement, and the embodiment of the present invention is not limited. The threshold condition in this case may be the P low-density regions with the lowest density.
For example, an outlier LOF of initial abnormal data corresponding to each data sequence that does not satisfy the normal distribution may be obtained by the LOF algorithm, and whether the LOF is an outlier may be determined by determining whether the LOF is close to 1. If the LOF is far more than 1, the data is regarded as an outlier, namely candidate abnormal data, and if the LOF is close to 1, the corresponding initial abnormal data is not regarded as the candidate abnormal data; and so on.
Referring to fig. 2, in an embodiment of the present invention, the step 110 may further include:
step 111, acquiring time series characteristics of historical time data corresponding to the unit time periods for each unit time period; wherein the time series feature comprises at least one of a period feature, a trend feature, a seasonal feature, an autocorrelation feature, a skewness feature, a kurtosis feature, and a non-linear feature.
And 112, generating a time series model corresponding to the unit time period according to the time series characteristics.
And 113, acquiring the prediction data in the unit time period based on the time series model.
In the embodiment of the invention, in order to improve the accuracy of the predicted data obtained by each prediction, for each unit time period, the time series characteristics of the historical time data corresponding to the corresponding unit time period can be obtained; wherein the time series feature comprises at least one of a period feature, a trend feature, a seasonal feature, an autocorrelation feature, a skewness feature, a kurtosis feature, and a non-linear feature. And then, according to the time series characteristics, generating a time series model corresponding to the corresponding unit time period. Thereby, the prediction data in the corresponding unit time period can be obtained based on the time series model.
In the embodiment of the present invention, the time series model may be generated by any available method, which is not limited to this embodiment of the present invention. For example, a time series model may be created for predicting the next unit time period data by a regression algorithm using a tree model (e.g., a regression tree model), and so on. Moreover, the time series model obtained by simulation may be different for different unit time periods, where the corresponding historical time data is different. In addition, in the embodiment of the present invention, the time series characteristic may be obtained in any available manner, and the embodiment of the present invention is not limited thereto.
For example, feature decomposition may be performed on the historical time data at an offline stage, then modeling and fitting may be performed on the time series features obtained through the decomposition to obtain a time series model generated through simulation, and in addition, the time series model obtained through simulation may be stored in a preset prediction model database to facilitate subsequent use.
The period characteristic can be any characteristic data which is related to the characteristic data period, such as whether the historical time data is periodically changed, the change period of the historical time data and the like; the trend characteristic can be any characteristic data related to the characteristic trend of the data, such as the type of the trend, the change period of the trend, and the like; the seasonal characteristic can be any characteristic data related to characteristic seasonal changes of data, such as data change conditions of different seasons and the like; the autocorrelation feature may be a correlation feature between data characterizing different time periods, for example, the autocorrelation feature may be obtained by an autocorrelation function (ACF), and the like; the skewness characteristic can also be understood as a skewness coefficient which is a characteristic number for describing the degree of distribution deviation symmetry, when different measurement units are used, the calculation formulas of the skewness coefficient can be different, and the skewness coefficient can be set in a self-defined manner specifically according to requirements, so that the embodiment of the invention is not limited; the Kurtosis characteristic can also be understood as a Kurtosis coefficient (Kurtosis), which is an index used for reflecting the degree of the sharp or flat top of a frequency distribution curve, and is used for measuring the aggregation degree of data in the center; the non-linear characteristic may be any kind of characteristic data represented in a non-linear manner, such as characteristic data characterizing the divergence degree of the data, and so on.
The invention extracts the multi-dimensional characteristics of the time sequence such as period, trend, seasonality, autocorrelation and the like from historical time data, adopts a machine learning method to fit a time sequence model, predicts data of the next unit time period according to the model, adopts a uniform abnormal detection algorithm to throw out candidate abnormal data, then carries out personalized intelligent detection alarm on the candidate abnormal data, and finally ensures the accuracy of the alarm.
Moreover, compared with the monitoring early warning based on a rule engine, the wind control intelligent early warning based on the prediction model and the anomaly detection can intelligently alarm the monitoring data of all service lines, can predict the future data value of each scene, can dynamically adjust the system (such as risk strategy) according to the predicted value, can alarm the abnormal data according to the service requirement, and the like, and has more landing scenes.
Referring to fig. 3, a schematic structural diagram of a data detection apparatus in an embodiment of the present invention is shown.
The data detection device of the embodiment of the invention comprises: a data prediction module 210, a candidate data acquisition module 220, and an anomalous data acquisition module 230.
The functions of the modules and the interaction relationship between the modules are described in detail below.
The data prediction module 210 is configured to, for any unit time period, obtain prediction data in the unit time period according to historical time data corresponding to the unit time period, where the historical time data is real data in a preset time period before the unit time period;
a candidate data obtaining module 220, configured to obtain candidate abnormal data based on the real data and the predicted data in each unit time period;
an abnormal data obtaining module 230, configured to obtain final abnormal data from the candidate abnormal data according to a preset filtering rule.
Referring to fig. 4, in the embodiment of the present invention, the candidate data obtaining module 220 may further include:
an initial data obtaining sub-module 221, configured to perform anomaly detection on the real data in each unit time period based on the real data and the predicted data in each unit time period, and obtain initial anomaly data;
the candidate data obtaining sub-module 222 is configured to obtain initial abnormal data that meets a preset threshold condition, as the candidate abnormal data.
Optionally, in an embodiment of the present invention, the initial data obtaining sub-module is further configured to perform, based on the real data and the predicted data in each unit time period, abnormality detection on the real data in each unit time period through at least one of the following comparison strategies, so as to obtain initial abnormal data;
wherein the alignment strategy comprises:
the development trend of the data difference value in the unit time period is compared with the development trend of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the peak value of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the lowest value of the data difference value in a preset time period before the unit time period;
the average variance of the data difference values in the unit time period and the nearest N unit time periods before the unit time period is compared with the average variance of the data difference values in a preset time period before the unit time period, wherein N is a positive integer;
the divergence degrees of the data difference values in the unit time period and the unit time periods which are close to M before the unit time period are compared with the divergence degrees of the data difference values in the preset time period before the unit time period, wherein M is a positive integer;
the data difference is the difference between real data and predicted data in the same unit time.
Optionally, in an embodiment of the present invention, the candidate data obtaining sub-module is specifically configured to:
for each initial abnormal data, sequencing historical time data corresponding to the initial abnormal data and the initial abnormal data according to a time sequence to obtain a data sequence;
responding to that the data sequence meets normal distribution, and acquiring the candidate abnormal data according to a threshold condition set for the initial abnormal data based on a Lauda criterion;
and responding to the data sequence not meeting normal distribution, and acquiring the candidate abnormal data according to a threshold condition set based on a local abnormal factor algorithm.
Referring to fig. 4, in an embodiment of the present invention, the data prediction module 210 further includes:
the data feature obtaining sub-module 211 is configured to obtain, for each unit time period, a time series feature of historical time data corresponding to the unit time period; wherein the time series feature comprises at least one of a period feature, a trend feature, a seasonal feature, an autocorrelation feature, a skewness feature, a kurtosis feature, and a non-linear feature characterizing a divergence degree.
And the time series model construction submodule 212 is configured to generate a time series model corresponding to the unit time period according to the time series characteristic.
And the data prediction sub-module 213 is configured to obtain the prediction data in the unit time period based on the time series model.
The data detection device provided by the embodiment of the present invention can implement each process implemented in the method embodiments of fig. 1 to fig. 2, and is not described herein again to avoid repetition.
Fig. 5 is a schematic diagram of a hardware structure of an electronic device implementing various embodiments of the present invention.
The electronic device 300 includes, but is not limited to: radio frequency unit 301, network module 302, audio output unit 303, input unit 304, sensor 305, display unit 306, user input unit 307, interface unit 308, memory 309, processor 310, and power supply 311. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 5 does not constitute a limitation of the electronic device, and that the electronic device may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.
It should be understood that, in the embodiment of the present invention, the radio frequency unit 301 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 310; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 301 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 301 can also communicate with a network and other devices through a wireless communication system.
The electronic device provides wireless broadband internet access to the user via the network module 302, such as assisting the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.
The audio output unit 303 may convert audio data received by the radio frequency unit 301 or the network module 302 or stored in the memory 309 into an audio signal and output as sound. Also, the audio output unit 303 may also provide audio output related to a specific function performed by the electronic apparatus 300 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 303 includes a speaker, a buzzer, a receiver, and the like.
The input unit 304 is used to receive audio or video signals. The input Unit 304 may include a Graphics Processing Unit (GPU) 3041 and a microphone 3042, and the Graphics processor 3041 processes image data of a still picture or video obtained by an image capturing apparatus (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 306. The image frames processed by the graphic processor 3041 may be stored in the memory 309 (or other storage medium) or transmitted via the radio frequency unit 301 or the network module 302. The microphone 3042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 301 in case of the phone call mode.
The electronic device 300 also includes at least one sensor 305, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that adjusts the brightness of the display panel 3061 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 3061 and/or the backlight when the electronic device 300 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of an electronic device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 305 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.
The display unit 306 is used to display information input by the user or information provided to the user. The Display unit 306 may include a Display panel 3061, and the Display panel 3061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The user input unit 307 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 307 includes a touch panel 3071 and other input devices 3072. The touch panel 3071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 3071 (e.g., operations by a user on or near the touch panel 3071 using a finger, a stylus, or any suitable object or attachment). The touch panel 3071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 310, and receives and executes commands sent by the processor 310. In addition, the touch panel 3071 may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 307 may include other input devices 3072 in addition to the touch panel 3071. Specifically, the other input devices 3072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described herein.
Further, the touch panel 3071 may be overlaid on the display panel 3061, and when the touch panel 3071 detects a touch operation on or near the touch panel, the touch operation is transmitted to the processor 310 to determine the type of the touch event, and then the processor 310 provides a corresponding visual output on the display panel 3061 according to the type of the touch event. Although in fig. 5, the touch panel 3071 and the display panel 3061 are implemented as two separate components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 3071 and the display panel 3061 may be integrated to implement the input and output functions of the electronic device, which is not limited herein.
The interface unit 308 is an interface for connecting an external device to the electronic apparatus 300. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 308 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 300 or may be used to transmit data between the electronic apparatus 300 and the external device.
The memory 309 may be used to store software programs as well as various data. The memory 309 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 309 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The processor 310 is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 309 and calling data stored in the memory 309, thereby performing overall monitoring of the electronic device. Processor 310 may include one or more processing units; preferably, the processor 310 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 310.
The electronic device 300 may further include a power supply 311 (such as a battery) for supplying power to various components, and preferably, the power supply 311 may be logically connected to the processor 310 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.
In addition, the electronic device 300 includes some functional modules that are not shown, and are not described in detail herein.
Preferably, an embodiment of the present invention further provides an electronic device, including: the processor 310, the memory 309, and a computer program stored in the memory 309 and capable of running on the processor 310, where the computer program, when executed by the processor 310, implements the processes of the data detection method embodiments, and can achieve the same technical effects, and are not described herein again to avoid repetition.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements each process of the data detection method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. A method for data detection, comprising:
aiming at any unit time period, obtaining predicted data in the unit time period according to historical time data corresponding to the unit time period, wherein the historical time data is real data in a preset time period before the unit time period;
acquiring candidate abnormal data based on the real data and the predicted data in each unit time period;
and acquiring final abnormal data from the candidate abnormal data according to a preset filtering rule.
2. The method according to claim 1, wherein the step of obtaining candidate abnormal data based on the real data and the predicted data in each of the unit time periods includes:
performing anomaly detection on the real data in each unit time period based on the real data and the predicted data in each unit time period to obtain initial anomaly data;
and acquiring initial abnormal data meeting a preset threshold condition as the candidate abnormal data.
3. The method according to claim 2, wherein the step of performing anomaly detection on the real data in each unit time period based on the real data and the predicted data in each unit time period to obtain initial anomaly data comprises:
based on the real data and the predicted data in each unit time period, carrying out anomaly detection on the real data in each unit time period through at least one of the following comparison strategies to obtain initial anomaly data;
wherein the alignment strategy comprises:
the development trend of the data difference value in the unit time period is compared with the development trend of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the peak value of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the lowest value of the data difference value in a preset time period before the unit time period;
the average variance of the data difference values in the unit time period and the nearest N unit time periods before the unit time period is compared with the average variance of the data difference values in a preset time period before the unit time period, wherein N is a positive integer;
the divergence degrees of the data difference values in the unit time period and the unit time periods which are close to M before the unit time period are compared with the divergence degrees of the data difference values in the preset time period before the unit time period, wherein M is a positive integer;
the data difference is the difference between real data and predicted data in the same unit time.
4. The method according to claim 2 or 3, wherein the step of acquiring initial abnormal data satisfying a preset threshold condition as the candidate abnormal data comprises:
for each initial abnormal data, sequencing historical time data corresponding to the initial abnormal data and the initial abnormal data according to a time sequence to obtain a data sequence;
responding to that the data sequence meets normal distribution, and acquiring the candidate abnormal data according to a threshold condition set for the initial abnormal data based on a Lauda criterion;
and responding to the data sequence not meeting normal distribution, and acquiring the candidate abnormal data according to a threshold condition set based on a local abnormal factor algorithm.
5. The method of claim 1, wherein the step of obtaining the prediction data for each unit time period from the historical time data comprises:
acquiring time series characteristics of historical time data corresponding to the unit time periods aiming at each unit time period;
generating a time series model corresponding to the unit time period according to the time series characteristics;
acquiring the prediction data in the unit time period based on the time series model;
wherein the time series feature comprises at least one of a period feature, a trend feature, a seasonal feature, an autocorrelation feature, a skewness feature, a kurtosis feature, and a non-linear feature characterizing a divergence degree.
6. A data detection apparatus, comprising:
the data prediction module is used for acquiring prediction data in a unit time period according to historical time data corresponding to the unit time period aiming at any unit time period, wherein the historical time data is real data in a preset time period before the unit time period;
the candidate data acquisition module is used for acquiring candidate abnormal data based on the real data and the predicted data in each unit time period;
and the abnormal data acquisition module is used for acquiring final abnormal data from the candidate abnormal data according to a preset filtering rule.
7. The apparatus of claim 6, wherein the candidate data acquisition module comprises:
the initial data acquisition submodule is used for carrying out anomaly detection on the real data in each unit time period based on the real data and the predicted data in each unit time period to acquire initial anomaly data;
and the candidate data acquisition submodule is used for acquiring initial abnormal data meeting a preset threshold condition and taking the initial abnormal data as the candidate abnormal data.
8. The apparatus according to claim 7, wherein the initial data obtaining sub-module is further configured to perform anomaly detection on the real data in each unit time period through at least one of the following comparison strategies based on the real data and the predicted data in each unit time period to obtain initial anomaly data;
wherein the alignment strategy comprises:
the development trend of the data difference value in the unit time period is compared with the development trend of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the peak value of the data difference value in a preset time period before the unit time period;
the data difference value in the unit time period is compared with the lowest value of the data difference value in a preset time period before the unit time period;
the average variance of the data difference values in the unit time period and the nearest N unit time periods before the unit time period is compared with the average variance of the data difference values in a preset time period before the unit time period, wherein N is a positive integer;
the divergence degrees of the data difference values in the unit time period and the unit time periods which are close to M before the unit time period are compared with the divergence degrees of the data difference values in the preset time period before the unit time period, wherein M is a positive integer;
the data difference is the difference between real data and predicted data in the same unit time.
9. The apparatus according to claim 7 or 8, wherein the candidate data acquisition sub-module is specifically configured to:
for each initial abnormal data, sequencing historical time data corresponding to the initial abnormal data and the initial abnormal data according to a time sequence to obtain a data sequence;
responding to that the data sequence meets normal distribution, and acquiring the candidate abnormal data according to a threshold condition set for the initial abnormal data based on a Lauda criterion;
and responding to the data sequence not meeting normal distribution, and acquiring the candidate abnormal data according to a threshold condition set based on a local abnormal factor algorithm.
10. The apparatus of claim 6, wherein the data prediction module comprises:
the data characteristic acquisition submodule is used for acquiring the time series characteristics of the historical time data corresponding to the unit time period aiming at each unit time period;
the time series model construction submodule is used for generating a time series model corresponding to the unit time period according to the time series characteristics;
the data prediction submodule is used for acquiring prediction data in the unit time period based on the time series model;
wherein the time series feature comprises at least one of a period feature, a trend feature, a seasonal feature, an autocorrelation feature, a skewness feature, a kurtosis feature, and a non-linear feature characterizing a divergence degree.
11. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the data detection method according to any one of claims 1 to 5.
12. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the data detection method according to one of the claims 1 to 5.
CN202010491155.5A 2020-06-02 2020-06-02 Data detection method and device, electronic equipment and storage medium Active CN111726341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010491155.5A CN111726341B (en) 2020-06-02 2020-06-02 Data detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010491155.5A CN111726341B (en) 2020-06-02 2020-06-02 Data detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111726341A true CN111726341A (en) 2020-09-29
CN111726341B CN111726341B (en) 2022-10-14

Family

ID=72565550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010491155.5A Active CN111726341B (en) 2020-06-02 2020-06-02 Data detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111726341B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328789A (en) * 2020-11-06 2021-02-05 广州笑脸教育科技有限公司 Data processing method and system based on block chain
CN112732693A (en) * 2021-01-18 2021-04-30 深圳市宇航智造技术有限公司 Intelligent internet of things data acquisition method, device, equipment and storage medium
CN112925950A (en) * 2021-01-27 2021-06-08 中国人民大学 Data quality control method and system for continuous star catalogue data
CN113094408A (en) * 2021-03-19 2021-07-09 深圳力维智联技术有限公司 Air quality monitoring method and device based on pigeon flock and computer storage medium
CN113342502A (en) * 2021-06-30 2021-09-03 招商局金融科技有限公司 Method and device for diagnosing performance of data lake, computer equipment and storage medium
CN113965805A (en) * 2021-10-22 2022-01-21 北京达佳互联信息技术有限公司 Prediction model training method and device and target video editing method and device
CN116582702A (en) * 2023-07-11 2023-08-11 成都工业职业技术学院 Network video play amount prediction method, system and medium based on big data

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070055477A1 (en) * 2005-09-02 2007-03-08 Microsoft Corporation Web data outlier detection and mitigation
US20100030544A1 (en) * 2008-07-31 2010-02-04 Mazu Networks, Inc. Detecting Outliers in Network Traffic Time Series
CN104486353A (en) * 2014-12-26 2015-04-01 北京神州绿盟信息安全科技股份有限公司 Security incident detecting method and device based on flow
CN106612202A (en) * 2015-10-27 2017-05-03 网易(杭州)网络有限公司 Method and system for pre-estimate and judgment of amount brushing of online game channel
CN107222780A (en) * 2017-06-23 2017-09-29 中国地质大学(武汉) A kind of live platform comprehensive state is perceived and content real-time monitoring method and system
CN107315647A (en) * 2017-06-26 2017-11-03 广州视源电子科技股份有限公司 Outlier detection method and system
CN108667856A (en) * 2018-08-10 2018-10-16 广东电网有限责任公司 A kind of network anomaly detection method, device, equipment and storage medium
US20180337836A1 (en) * 2011-11-07 2018-11-22 Netflow Logic Corporation Method and system for confident anomaly detection in computer network traffic
CN108920336A (en) * 2018-05-25 2018-11-30 麒麟合盛网络技术股份有限公司 A kind of service abnormity prompt method and device based on time series
CN109032829A (en) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 Data exception detection method, device, computer equipment and storage medium
CN109560984A (en) * 2018-11-13 2019-04-02 苏宁易购集团股份有限公司 A kind of network service response time method for detecting abnormality and device
CN109587008A (en) * 2018-12-28 2019-04-05 华为技术服务有限公司 Detect the method, apparatus and storage medium of abnormal flow data
CN109800483A (en) * 2018-12-29 2019-05-24 北京城市网邻信息技术有限公司 A kind of prediction technique, device, electronic equipment and computer readable storage medium
CN109902265A (en) * 2019-02-28 2019-06-18 西南石油大学 A kind of underground method for early warning based on hidden Markov model
CN110032670A (en) * 2019-04-17 2019-07-19 腾讯科技(深圳)有限公司 Method for detecting abnormality, device, equipment and the storage medium of time series data
CN110086649A (en) * 2019-03-19 2019-08-02 深圳壹账通智能科技有限公司 Detection method, device, computer equipment and the storage medium of abnormal flow
CN110210508A (en) * 2018-12-06 2019-09-06 北京奇艺世纪科技有限公司 Model generating method, anomalous traffic detection method, device, electronic equipment, computer readable storage medium
CN110286656A (en) * 2019-05-07 2019-09-27 清华大学 A kind of the false-alarm filter method and device of wrong data tolerance
CN110377447A (en) * 2019-07-17 2019-10-25 腾讯科技(深圳)有限公司 A kind of abnormal deviation data examination method, device and server
US20190369570A1 (en) * 2018-05-30 2019-12-05 Mitsubishi Electric Us, Inc. System and method for automatically detecting anomalies in a power-usage data set
CN110808962A (en) * 2019-10-17 2020-02-18 奇安信科技集团股份有限公司 Malformed data packet detection method and device
US20200110689A1 (en) * 2018-10-08 2020-04-09 Acer Cyber Security Incorporated Method and device for detecting abnormal operation of operating system
CN111130940A (en) * 2019-12-26 2020-05-08 众安信息技术服务有限公司 Abnormal data detection method and device and server
CN111143169A (en) * 2019-12-30 2020-05-12 杭州迪普科技股份有限公司 Abnormal parameter detection method and device, electronic equipment and storage medium

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070055477A1 (en) * 2005-09-02 2007-03-08 Microsoft Corporation Web data outlier detection and mitigation
US20100030544A1 (en) * 2008-07-31 2010-02-04 Mazu Networks, Inc. Detecting Outliers in Network Traffic Time Series
US20180337836A1 (en) * 2011-11-07 2018-11-22 Netflow Logic Corporation Method and system for confident anomaly detection in computer network traffic
CN104486353A (en) * 2014-12-26 2015-04-01 北京神州绿盟信息安全科技股份有限公司 Security incident detecting method and device based on flow
CN106612202A (en) * 2015-10-27 2017-05-03 网易(杭州)网络有限公司 Method and system for pre-estimate and judgment of amount brushing of online game channel
CN107222780A (en) * 2017-06-23 2017-09-29 中国地质大学(武汉) A kind of live platform comprehensive state is perceived and content real-time monitoring method and system
CN107315647A (en) * 2017-06-26 2017-11-03 广州视源电子科技股份有限公司 Outlier detection method and system
CN108920336A (en) * 2018-05-25 2018-11-30 麒麟合盛网络技术股份有限公司 A kind of service abnormity prompt method and device based on time series
US20190369570A1 (en) * 2018-05-30 2019-12-05 Mitsubishi Electric Us, Inc. System and method for automatically detecting anomalies in a power-usage data set
CN109032829A (en) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 Data exception detection method, device, computer equipment and storage medium
CN108667856A (en) * 2018-08-10 2018-10-16 广东电网有限责任公司 A kind of network anomaly detection method, device, equipment and storage medium
US20200110689A1 (en) * 2018-10-08 2020-04-09 Acer Cyber Security Incorporated Method and device for detecting abnormal operation of operating system
CN109560984A (en) * 2018-11-13 2019-04-02 苏宁易购集团股份有限公司 A kind of network service response time method for detecting abnormality and device
CN110210508A (en) * 2018-12-06 2019-09-06 北京奇艺世纪科技有限公司 Model generating method, anomalous traffic detection method, device, electronic equipment, computer readable storage medium
CN109587008A (en) * 2018-12-28 2019-04-05 华为技术服务有限公司 Detect the method, apparatus and storage medium of abnormal flow data
CN109800483A (en) * 2018-12-29 2019-05-24 北京城市网邻信息技术有限公司 A kind of prediction technique, device, electronic equipment and computer readable storage medium
CN109902265A (en) * 2019-02-28 2019-06-18 西南石油大学 A kind of underground method for early warning based on hidden Markov model
CN110086649A (en) * 2019-03-19 2019-08-02 深圳壹账通智能科技有限公司 Detection method, device, computer equipment and the storage medium of abnormal flow
CN110032670A (en) * 2019-04-17 2019-07-19 腾讯科技(深圳)有限公司 Method for detecting abnormality, device, equipment and the storage medium of time series data
CN110286656A (en) * 2019-05-07 2019-09-27 清华大学 A kind of the false-alarm filter method and device of wrong data tolerance
CN110377447A (en) * 2019-07-17 2019-10-25 腾讯科技(深圳)有限公司 A kind of abnormal deviation data examination method, device and server
CN110808962A (en) * 2019-10-17 2020-02-18 奇安信科技集团股份有限公司 Malformed data packet detection method and device
CN111130940A (en) * 2019-12-26 2020-05-08 众安信息技术服务有限公司 Abnormal data detection method and device and server
CN111143169A (en) * 2019-12-30 2020-05-12 杭州迪普科技股份有限公司 Abnormal parameter detection method and device, electronic equipment and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328789A (en) * 2020-11-06 2021-02-05 广州笑脸教育科技有限公司 Data processing method and system based on block chain
CN112732693A (en) * 2021-01-18 2021-04-30 深圳市宇航智造技术有限公司 Intelligent internet of things data acquisition method, device, equipment and storage medium
CN112732693B (en) * 2021-01-18 2021-08-17 深圳市宇航智造技术有限公司 Intelligent internet of things data acquisition method, device, equipment and storage medium
CN112925950A (en) * 2021-01-27 2021-06-08 中国人民大学 Data quality control method and system for continuous star catalogue data
CN113094408A (en) * 2021-03-19 2021-07-09 深圳力维智联技术有限公司 Air quality monitoring method and device based on pigeon flock and computer storage medium
CN113342502A (en) * 2021-06-30 2021-09-03 招商局金融科技有限公司 Method and device for diagnosing performance of data lake, computer equipment and storage medium
CN113965805A (en) * 2021-10-22 2022-01-21 北京达佳互联信息技术有限公司 Prediction model training method and device and target video editing method and device
CN116582702A (en) * 2023-07-11 2023-08-11 成都工业职业技术学院 Network video play amount prediction method, system and medium based on big data
CN116582702B (en) * 2023-07-11 2023-09-15 成都工业职业技术学院 Network video play amount prediction method, system and medium based on big data

Also Published As

Publication number Publication date
CN111726341B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN111726341B (en) Data detection method and device, electronic equipment and storage medium
US11405268B2 (en) Fine grained network management to edge device features
EP3644219A1 (en) Human face feature point tracking method, device, storage medium and apparatus
KR20200085490A (en) Service providing system and method for detecting sensor abnormality using neural network model, and non-transitory computer readable medium having computer program recorded thereon
CN111614634B (en) Flow detection method, device, equipment and storage medium
CN109154965A (en) The system and method confirmed for the threat event in using the discrete time reference of 3D abstract modeling
CN111753520B (en) Risk prediction method and device, electronic equipment and storage medium
CN112350974A (en) Safety monitoring method and device of Internet of things and electronic equipment
CN110659179B (en) Method and device for evaluating system running condition and electronic equipment
CN113128693A (en) Information processing method, device, equipment and storage medium
CN115145788A (en) Detection data generation method and device for intelligent operation and maintenance system
CN112256732B (en) Abnormality detection method and device, electronic equipment and storage medium
CN112256748A (en) Abnormity detection method and device, electronic equipment and storage medium
CN113052198A (en) Data processing method, device, equipment and storage medium
CN116227917A (en) Method and device for processing flood prevention risk of building, electronic equipment and storage medium
CN113836241B (en) Time sequence data classification prediction method, device, terminal equipment and storage medium
CN114581230A (en) Money laundering behavior detection method, device and medium in flow chart
CN113360908A (en) Data processing method, violation recognition model training method and related equipment
CN111818548A (en) Data processing method, device and equipment
CN116128689A (en) Monitoring model building method and device, electronic equipment and storage medium
US20230421639A1 (en) Fine grained network management to edge device features
CN118013416A (en) Method and device for detecting risk operation, electronic equipment and storage medium
CN117076295A (en) Database detection method, system, electronic equipment and storage medium
CN116227325A (en) Electrical appliance fault prediction method and device based on neuron model
CN207020790U (en) Defence area supervising device and intelligent security guard cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant