CN111447193A - Method and device for anomaly detection of real-time data stream - Google Patents

Method and device for anomaly detection of real-time data stream Download PDF

Info

Publication number
CN111447193A
CN111447193A CN202010206736.XA CN202010206736A CN111447193A CN 111447193 A CN111447193 A CN 111447193A CN 202010206736 A CN202010206736 A CN 202010206736A CN 111447193 A CN111447193 A CN 111447193A
Authority
CN
China
Prior art keywords
detected
time
real
item
data information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010206736.XA
Other languages
Chinese (zh)
Other versions
CN111447193B (en
Inventor
庄静扬
李万强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wangsu Science and Technology Co Ltd
Original Assignee
Wangsu Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wangsu Science and Technology Co Ltd filed Critical Wangsu Science and Technology Co Ltd
Priority to CN202010206736.XA priority Critical patent/CN111447193B/en
Publication of CN111447193A publication Critical patent/CN111447193A/en
Application granted granted Critical
Publication of CN111447193B publication Critical patent/CN111447193B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and a device for carrying out anomaly detection on a real-time data stream, which are used for carrying out index calculation on real-time data information of an item to be detected to obtain a real-time result of each index in the item to be detected; acquiring first time period historical data information of an item to be detected from historical data; inputting the historical data information of the first time period into a preset model to obtain a prediction result of each index in the item to be detected; and determining whether the real-time data information is abnormal or not according to the real-time result and the prediction result of each index. According to the scheme, whether the index to be detected is abnormal or not is favorably and rapidly determined through decoupling data calculation and algorithm prediction, and the efficiency of abnormal detection is improved.

Description

Method and device for anomaly detection of real-time data stream
Technical Field
The embodiment of the invention relates to the field of anomaly detection, in particular to a method and a device for anomaly detection aiming at real-time data streams.
Background
At present, whether an index to be detected is abnormal or not is determined through an abnormality detection technical means, so that a response can be made in time, and risks brought by the abnormality of the index to be detected are avoided. Therefore, technical means for abnormality detection are receiving great attention in various industries.
However, the existing technical means for abnormality detection have the following problems:
in the process of performing anomaly detection on the index to be detected, a data calculation link and an algorithm prediction link are usually coupled together, so that the index to be detected is difficult to access to the algorithm prediction link, and the problem of long time consumption is brought.
In summary, the prior art fails to provide a method for rapidly detecting an abnormal index.
Disclosure of Invention
The invention provides a method and a device for carrying out anomaly detection on a real-time data stream, which are used for solving the problem that the prior art cannot rapidly detect an anomaly index.
In a first aspect, an embodiment of the present invention provides a method for performing anomaly detection on a real-time data stream, where the method includes: index calculation is carried out on the real-time data information of the item to be detected, and a real-time result of each index in the item to be detected is obtained; acquiring first time period historical data information of the item to be detected from historical data; inputting the historical data information of the first time period into a preset model to obtain a prediction result of each index in the item to be detected; and determining whether the real-time data information is abnormal or not according to the real-time result and the prediction result.
Based on the scheme, the real-time result of each index of the item to be detected can be obtained by performing index calculation on the acquired real-time data information; the prediction result of each index can be obtained by inputting the acquired first-period historical data information into a preset model; and integrating the real-time result and the prediction result of each index, so that whether the real-time data information is abnormal can be determined. By the aid of the two processes of decoupling index calculation and index prediction, whether the index to be detected is abnormal or not can be determined rapidly, and the efficiency of abnormal detection is improved.
In a possible implementation method, the acquiring of the real-time data information of the item to be detected includes: receiving a search statement input by a user, wherein the search statement is used for setting the item to be detected; and acquiring the real-time data information of the item to be detected from a real-time data stream according to the search statement.
Based on the scheme, by receiving the search statement input by the user, the search statement is used for setting the items to be detected, that is, the real-time data information can be quickly acquired from the real-time data stream according to the search statement, so that a real-time calculation program needs to be written for each item to be detected, the programming difficulty is greatly reduced, and the data processing speed is improved.
In a possible implementation method, performing index calculation on real-time data information of an item to be detected to obtain a real-time result of each index in the item to be detected includes: determining a calculation factor of each index of the item to be detected; and calling the calculation factors of the calculation platform, and performing index calculation on the real-time data information to obtain a real-time result of each index.
Based on the scheme, for the item to be detected, after the calculation factors of all indexes of the item to be detected are determined, the calculation factors stored in the calculation platform are called, index calculation is carried out on the real-time data information, then the real-time result of each index can be obtained, and the calculation efficiency of the real-time result of each index is greatly improved.
In a possible implementation method, before inputting the first period historical data information into a preset model, determining the preset model of the item to be detected; wherein the preset model of the item to be detected is determined by: acquiring second time period historical data information of the item to be detected from the historical data; determining a data characteristic of the second period historical data information; and determining a preset model of the item to be detected from a plurality of preset models according to the data characteristics of the second time period historical data information.
Based on the scheme, for the item to be detected, the preset model for the item to be detected can be determined from the plurality of preset models through determining the data characteristics of the second time period historical data information, acquired from the historical data, of the item to be detected, so that the prediction result obtained through processing the first time period historical data information through the preset model is more real and reasonable.
In one possible implementation, determining the data characteristic of the second period historical data information includes: clustering the second period historical data information; and determining the data characteristics of the second time period historical data information according to the clustering result.
Based on the scheme, the second time period historical data information is clustered, so that the data characteristics of the second time period historical data information can be determined according to the clustering result. Through the clustering method, the data characteristics of the historical data information of the second time interval can be determined to a great extent.
In a possible implementation method, determining a preset model of the item to be detected from a plurality of preset models according to the data characteristics of the second period historical data information includes: if the data characteristics of the historical data information in the second time period accord with a smooth curve type, determining that the preset model of the item to be detected is a numerical model; and if the data characteristics of the historical data information in the second time interval accord with a stable linear type, determining that the preset model of the item to be detected is a proportional model.
Based on the scheme, if the data characteristics of the historical data information in the second time period are different, the items to be detected correspond to different preset models, and the specific expression is as follows: if the data characteristics of the historical data information in the second time period accord with a smooth curve type, determining that the preset model of the item to be detected is a numerical model; and if the data characteristics of the historical data information in the second time interval accord with a stable linear type, determining that the preset model of the item to be detected is a proportional model. The preset model is set for the items to be detected which accord with different data characteristics, so that the current real-time data stream can be favorably judged whether to be abnormal or not by combining historical data.
In one possible implementation method, obtaining first period history data information of the item to be detected from history data includes: acquiring first time period historical data information of the item to be detected from the historical data of a first time period; the first time interval is the time interval closest to the current moment; acquiring second time period historical data information of the item to be detected from the historical data, wherein the second time period historical data information comprises: and acquiring second time period historical data information of the item to be detected from second time period historical data, wherein the second time period is shorter than the first time period.
Based on the scheme, the preset model is trained by using the historical data information of the first time period, so that whether the real-time data information in the real-time data stream is abnormal or not can be accurately predicted when the preset model is actually used; and determining a preset model by using the second time period historical data information of the second time period, wherein the second time period historical data information is extremely close to the real-time data information, so that the determined preset model can be well used for predicting the real-time data information of the item to be detected.
In one possible implementation, determining whether the real-time data information is abnormal according to the real-time result and the prediction result includes: if the real-time result of each index is not greater than the prediction result of each index, determining that the real-time data information is normal; and if at least one real-time result in the real-time results of the indexes is larger than the corresponding prediction result, determining that the real-time data information is abnormal.
Based on the scheme, when determining whether the real-time data information is abnormal, the following rules can be used for carrying out the following steps: if the real-time result of each index is not greater than the prediction result of each index, determining that the real-time data information is normal; and if at least one real-time result in the real-time results of the indexes is larger than the corresponding prediction result, determining that the real-time data information is abnormal.
In a possible implementation method, before determining the preset model of the item to be detected, the method further includes: sending a prediction service calling request to a routing gateway through an HTTP interface; the prediction service calling request comprises the items to be detected; determining a preset model of the item to be detected, including: and the routing gateway determines a service interface corresponding to the preset model of the item to be detected, and sends the prediction service calling request to the preset model of the item to be detected through the service interface.
Based on the scheme, a prediction service calling request is sent through an HTTP interface item routing gateway so as to be used for determining a preset model of the item to be detected; after the preset model of the item to be detected is determined, the routing gateway further determines a service interface corresponding to the preset model of the box to be detected, and sends the prediction service calling request to the preset model of the item to be detected through the service interface.
In a second aspect, an embodiment of the present invention provides an apparatus for performing anomaly detection on a real-time data stream, where the apparatus includes: the real-time result determining unit is used for carrying out index calculation on the real-time data information of the item to be detected to obtain the real-time result of each index in the item to be detected; the first time period historical data information acquisition unit is used for acquiring the first time period historical data information of the item to be detected from historical data; the prediction result determining unit is used for inputting the first period historical data information into a preset model to obtain the prediction result of each index in the item to be detected; and the abnormity determining unit is used for determining whether the real-time data information is abnormal according to the real-time result and the prediction result.
Based on the scheme, the real-time result of each index of the item to be detected can be obtained by performing index calculation on the acquired real-time data information; the obtained first-period historical data information is input into a preset model of the item to be detected, so that the prediction result of each index can be obtained; and integrating the real-time result and the prediction result of each index, so that whether the real-time data information is abnormal can be determined. By the aid of the two processes of decoupling index calculation and index prediction, whether the index to be detected is abnormal or not can be determined rapidly, and the efficiency of abnormal detection is improved.
In a possible implementation method, the apparatus further includes a real-time data information obtaining unit, configured to: receiving a search statement input by a user, wherein the search statement is used for setting the item to be detected; and acquiring the real-time data information of the item to be detected from a real-time data stream according to the search statement.
Based on the scheme, by receiving the search statement input by the user, the search statement is used for setting the items to be detected, that is, the real-time data information can be quickly acquired from the real-time data stream according to the search statement, so that a real-time calculation program needs to be written for each item to be detected, the programming difficulty is greatly reduced, and the data processing speed is improved.
In a possible implementation method, the real-time result determining unit is specifically configured to: determining a calculation factor of each index of the item to be detected; and calling the calculation factors of the calculation platform, and performing index calculation on the real-time data information to obtain a real-time result of each index.
Based on the scheme, for the item to be detected, after the calculation factors of all indexes of the item to be detected are determined, the calculation factors stored in the calculation platform are called, index calculation is carried out on the real-time data information, then the real-time result of each index can be obtained, and the calculation efficiency of the real-time result of each index is greatly improved.
In a possible implementation method, the apparatus further includes a preset model determining unit, configured to: acquiring second time period historical data information of the item to be detected from the historical data; determining a data characteristic of the second period historical data information; and determining a preset model of the item to be detected from a plurality of preset models according to the data characteristics of the second time period historical data information.
Based on the scheme, for the item to be detected, the preset model for the item to be detected can be determined from the plurality of preset models through determining the data characteristics of the second time period historical data information, acquired from the historical data, of the item to be detected, so that the prediction result obtained through processing the first time period historical data information through the preset model is more real and reasonable.
In a possible implementation method, the preset model determining unit is specifically configured to: clustering the second period historical data information; and determining the data characteristics of the second time period historical data information according to the clustering result.
Based on the scheme, the second time period historical data information is clustered, so that the data characteristics of the second time period historical data information can be determined according to the clustering result. Through the clustering method, the data characteristics of the historical data information of the second time interval can be determined to a great extent.
In a possible implementation method, the preset model determining unit is specifically configured to: if the data characteristics of the historical data information in the second time period accord with a smooth curve type, determining that the preset model of the item to be detected is a numerical model; and if the data characteristics of the historical data information in the second time interval accord with a stable linear type, determining that the preset model of the item to be detected is a proportional model.
Based on the scheme, if the data characteristics of the historical data information in the second time period are different, the items to be detected correspond to different preset models, and the specific expression is as follows: if the data characteristics of the historical data information in the second time period accord with a smooth curve type, determining that the preset model of the item to be detected is a numerical model; and if the data characteristics of the historical data information in the second time interval accord with a stable linear type, determining that the preset model of the item to be detected is a proportional model. The preset model is set for the items to be detected which accord with different data characteristics, so that the current real-time data stream can be favorably judged whether to be abnormal or not by combining historical data.
In a possible implementation method, the first period historical data information obtaining unit is specifically configured to: acquiring first time period historical data information of the item to be detected from the historical data of a first time period; the first time interval is the time interval closest to the current moment; the device further comprises a second period historical data information acquisition unit, which is specifically configured to: and acquiring second time period historical data information of the item to be detected from second time period historical data, wherein the second time period is shorter than the first time period.
Based on the scheme, the preset model is trained by using the historical data information of the first time period, so that whether the real-time data information in the real-time data stream is abnormal or not can be accurately predicted when the preset model is actually used; and determining a preset model by using the second time period historical data information of the second time period, wherein the second time period historical data information is extremely close to the real-time data information, so that the determined preset model can be well used for predicting the real-time data information of the item to be detected.
In a possible implementation method, the abnormality determining unit is specifically configured to: if the real-time result of each index is not greater than the prediction result of each index, determining that the real-time data information is normal; and if at least one real-time result in the real-time results of the indexes is larger than the corresponding prediction result, determining that the real-time data information is abnormal.
Based on the scheme, when determining whether the real-time data information is abnormal, the following rules can be used for carrying out the following steps: if the real-time result of each index is not greater than the prediction result of each index, determining that the real-time data information is normal; and if at least one real-time result in the real-time results of the indexes is larger than the corresponding prediction result, determining that the real-time data information is abnormal.
In a possible implementation method, the preset model determining unit is further configured to: sending a prediction service calling request to a routing gateway through an HTTP interface; the prediction service calling request comprises the items to be detected; the preset model determining unit is specifically configured to: and determining a service interface corresponding to the preset model of the item to be detected, and sending the prediction service calling request to the preset model of the item to be detected through the service interface.
Based on the scheme, a prediction service calling request is sent through an HTTP interface item routing gateway so as to be used for determining a preset model of the item to be detected; after the preset model of the item to be detected is determined, the routing gateway further determines a service interface corresponding to the preset model of the box to be detected, and sends the prediction service calling request to the preset model of the item to be detected through the service interface.
In a third aspect, an embodiment of the present invention provides a computing device, including:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to perform a method according to any of the first aspects in accordance with the obtained program.
In a fourth aspect, the present invention provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method according to any one of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a system architecture for performing anomaly detection on a real-time data stream according to an embodiment of the present invention;
fig. 2 is a method for performing anomaly detection on a real-time data stream according to an embodiment of the present invention;
fig. 3 is a device for performing anomaly detection on a real-time data stream according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
With the rapid development of real-time services, it becomes important to detect anomalies in real-time data streams, especially how to detect them quickly and accurately. Taking the live broadcast service as an example, the client has higher requirements on the bandwidth, the card pause rate, the error rate and the like of the live broadcast service, and if the abnormality of the current live broadcast service can be quickly detected in the live broadcast process, an adjustment scheme can be provided for the client in a targeted manner so as to ensure that the requirements of the client on the live broadcast service are met. In the embodiment of the invention, the real-time result of the item to be detected is obtained by calculating the real-time data information in real time, and meanwhile, the prediction result of the item to be detected at the current moment is estimated through the historical data information of the first time period; on one hand, real-time calculation is separated from prediction calculation, so that the detection efficiency can be improved; on the other hand, the current real-time data information can be accurately predicted through the historical data of the first time period, so that the abnormal detection result is more accurate.
Fig. 1 shows a system architecture for performing anomaly detection on a real-time data stream according to an embodiment of the present invention.
Referring to fig. 1, the system architecture includes: computing platform 101, routing gateway 102, prediction service 103.
The computing platform 101 is configured to compute a real-time result of each index of the item to be detected according to real-time data information about the item to be detected, which is acquired from the real-time data stream; each index here can be set by the computing platform according to the detection requirement, for example, the item to be detected is the bandwidth, and each index of the item to be detected can be the average value of the bandwidth, the sum value of the bandwidth, and the like; the computing platform 101 stores the calculation factor of each index, so that real-time calculation can be realized.
The routing gateway 102 is configured to determine a preset model; while also being used for information transfer between the computing platform 101 and the prediction service 103: inputting real-time results of each index of the items to be detected, which are obtained by calculation of the calculation platform 101, into the prediction service 103; and after the prediction service 103 concludes whether the real-time data information of the item to be detected is abnormal, the routing gateway 102 sends the conclusion to the computing platform 101.
In the embodiment of the application, on one hand, the real-time result of each index of the item to be detected can be obtained through the computing platform 101, and on the other hand, the prediction result of each index of the item to be detected can be obtained through a model prediction method after the historical data information of the item to be detected is combined. And determining whether the real-time data information of the item to be detected is abnormal or not by comparing the real-time result of each index of the item to be detected with the corresponding prediction result.
The items to be detected in the embodiment of the application specifically include more types, and the preset models used for different items to be detected are different. Therefore, in the embodiment of the application, a plurality of preset models are provided, so that when it is determined whether the real-time data information of any item to be detected is abnormal, the historical data information of the item to be detected needs to be input into the corresponding preset model.
The routing gateway 102 may implement how to input the historical data information of the item to be detected into the adaptive preset model.
The prediction service 103 is configured to calculate historical data information of the item to be detected, so as to obtain a prediction result of each index of the item to be detected; meanwhile, the method is also used for comparing the real-time results of all the indexes of the item to be detected with the corresponding prediction results to determine whether the real-time data information of the item to be detected is abnormal or not.
One possible implementation is through a Cluster of application container engines (Docker Cluster). Various prediction models can be loaded in the Docker Cluster, and generally, the prediction models have certain universality. In a specific implementation, the data may be obtained by training a large amount of historical data stored on the big data storage platform 104. The trained models are stored in the model storage server 105, so that the problem that the preset models are lost or cannot be found in time due to the irregular storage of the preset models is solved. Docker Cluster may load the predictive model from model storage server 105.
In one possible implementation, the system further comprises an alert platform 106 and a visualization platform 107. The alarm platform 106 is used for displaying alarm information to prompt service personnel to check abnormality and react in time; the visualization platform 107 is used for displaying a real-time result and a prediction result corresponding to a certain abnormal index of the item to be detected, so that a user can adjust and optimize the sensitivity of the preset model on the visualization platform conveniently.
As shown in fig. 2, a method for performing anomaly detection on a real-time data stream according to an embodiment of the present invention includes the following steps:
step 201, index calculation is performed on the real-time data information of the item to be detected, so as to obtain a real-time result of each index in the item to be detected.
Step 202, obtaining the first period historical data information of the item to be detected from the historical data.
Step 203, inputting the first period historical data information into a preset model to obtain a prediction result of each index in the item to be detected.
And 204, determining whether the real-time data information is abnormal or not according to the real-time result and the prediction result.
Based on the scheme, the real-time result of each index of the item to be detected can be obtained by performing index calculation on the acquired real-time data information; the obtained first-period historical data information is input into a preset model of the item to be detected, so that the prediction result of each index can be obtained; and integrating the real-time result and the prediction result of each index, so that whether the real-time data information is abnormal can be determined. By the aid of the two processes of decoupling index calculation and index prediction, whether the index to be detected is abnormal or not can be determined rapidly, and the efficiency of abnormal detection is improved.
The real-time data stream contains all data information at the current time point, such as data information including bandwidth data, status code data, stuck rate data, error rate data, retransmission ratio data, and the like; the item to be detected may be any one of the above-mentioned various data information. The determination of the items to be detected can be set by service personnel according to actual working requirements.
Step 201 described above may be performed by computing platform 101 of fig. 1. The real-time data information of the item to be detected is the state data information of the item to be detected at the current time point. As a simple example, the embodiment of the present invention is described by taking the case where the item to be detected is a bandwidth. For example, the item to be detected is the bandwidth of the domain name "163. com", and it is assumed that it can be detected that the domain name is used in a plurality of different areas at the current time point, such as beijing, shanghai and fujian 3, so that the real-time data information about the bandwidth of the domain name "163. com" is the bandwidth values of the domain name in the above 3 different areas, for example, the bandwidth values are 70M, 30M and 20M, respectively.
Optionally, in step 201, the obtaining of the real-time data information of the item to be detected includes: receiving a search statement input by a user, wherein the search statement is used for setting the item to be detected; and acquiring the real-time data information of the item to be detected from a real-time data stream according to the search statement.
Taking the item to be detected as the bandwidth as an example, the user writes a Structured Query language (Structured Query L anguage, SQ L), and the written SQ L statement is used to acquire real-time data information of the bandwidth from the real-time data stream, for example, the SQ L statement may include a domain name and a bandwidth value, for example, for the domain name "163. com", since the bandwidth values of the domain name in different regions are different, if it relates to 3 different regions, set to beijing, shanghai and fujian, respectively, the bandwidth of the domain name "163. com" is queried by writing the SQ L statement, so that the acquired real-time data information is 3 bandwidth values of the domain name at the current time point, such as 70M, 30M and 20M, respectively.
Optionally, in step 201, performing index calculation on the real-time data information of the item to be detected to obtain a real-time result of each index in the item to be detected, where the method includes: determining a calculation factor of each index of the item to be detected; and calling the calculation factors of the calculation platform, and performing index calculation on the real-time data information to obtain a real-time result of each index.
As an example, the Flink computing platform stores computing factors of indexes, for example, each index may include: solving the average value of the items to be detected according to the real-time data information, solving the sum value of the items to be detected according to the real-time data information, solving the range of the items to be detected according to the real-time data information, solving the variance of the items to be detected according to the real-time data information, and the like. The calculation factor is a specific calculation formula of each index.
Taking the item to be detected as the bandwidth, for example, the domain name "163. com" mentioned above, and taking the average value of the solved bandwidth as a specific example of each index of the bandwidth, the average value of the bandwidth of the domain name "163. com" is 40M; taking the solution sum as a specific example of each index of the bandwidth, the sum of the bandwidth is 120M for the domain name "163. com". Therefore, when the item to be detected is the bandwidth, the real-time result of each index of the bandwidth can be obtained after index calculation is carried out on the real-time data information through the calculation platform.
In the above step 202, the first period history data information of the item to be detected is obtained from the history data. The method may be performed by routing gateway 102 in fig. 1.
Taking the example that the item to be detected is the bandwidth, for the domain name of '163. com', the real-time data information of the domain name can be obtained from the real-time data stream; the real-time result of each index of the bandwidth can be determined through the processing of the computing platform; when judging whether the real-time data information of the bandwidth is abnormal, determining the prediction result of each index of the bandwidth: and determining whether the real-time data information of the bandwidth is abnormal or not by comparing the real-time result of each index of the bandwidth with the prediction result of each index of the bandwidth. When the prediction result of each index of the bandwidth is determined, firstly, data of the prediction result of each index of the bandwidth needs to be acquired, wherein the data refers to first-period historical data information, and then the acquired first-period historical data information is input into a preset model, so that the prediction result of each index of the bandwidth can be acquired.
In step 203, the first period historical data information is input into a preset model, and a prediction result of each index in the item to be detected is obtained. The method may be performed by the prediction service 103 of fig. 1.
Taking the item to be detected as the bandwidth as an example, after the real-time results of each index of the bandwidth are obtained, whether the real-time results are abnormal needs to be further determined, so as to finally determine whether the real-time data information of the bandwidth is abnormal. Therefore, as an example, whether the real-time result of each index of the bandwidth is abnormal or not can be determined by a model prediction method.
Optionally, the preset model of the item to be detected is determined by the following method: acquiring second time period historical data information of the item to be detected from the historical data; determining a data characteristic of the second period historical data information; and determining a preset model of the item to be detected from a plurality of preset models according to the data characteristics of the second time period historical data information.
Taking the bandwidth as an example, for the domain name of '163. com', the historical bandwidth data of the domain name in the past period of time is acquired from the data storage platform, that is, the historical data information in the second period of time. For example, historical bandwidth data of the domain name in the last half hour from the current time point may be obtained, where the sampling interval may be set to 1 minute. Thus, 30 bandwidth sample data for the "163. com" domain name in the last half hour from the current time point may be obtained. The last half hour from the current time point is the second time period, and the 30 bandwidth sample data is the second time period historical data information.
After acquiring the second period historical data information, the data characteristics thereof can be determined by:
optionally, determining the data characteristic of the second period historical data information includes: clustering the second period historical data information; and determining the data characteristics of the second time period historical data information according to the clustering result.
Taking the item to be detected as the bandwidth as an example, after 30 bandwidth sample data of the domain name of '163. com' within the half hour nearest to the current time point are acquired, it is further assumed that an index of an average value of the bandwidth is used as a sample point, therefore, after the average value is respectively calculated for each bandwidth sample data of the 30 bandwidth sample data, 30 sample points are obtained, and by clustering the 30 sample points, the data characteristic of the bandwidth data of the domain name of '163. com' within the half hour nearest to the current time point can be obtained according to the clustering result.
After the 30 sample points are clustered, if the numerical value of the 30 sample points shows periodic fluctuation as a clustering result, the data characteristic of the bandwidth data of the '163. com' domain name in the half hour nearest to the current time point is a smooth curve type; and if the clustering result is that the numerical values of the 30 sample points are basically stabilized at a certain fixed value, the data characteristic of the bandwidth data of the '163. com' domain name in the half hour nearest to the current time point is a stable linear type.
It can be understood that: when the second period historical data information of the item to be detected is clustered, due to different selection of indexes of the item to be detected, the data characteristics of the second period historical data information determined according to the clustering result may be different for the same item to be detected. For example, for 30 bandwidth sample data for the "163. com" domain name in the last half hour from the current time point: when the index of the bandwidth variance is used as a sample point, the clustering result may be that the data characteristic of the current second period historical data information is determined to be a stable linear type; when the average value of the bandwidth is taken as a sample point, the result of clustering may be to determine the data characteristic of the current second period historical data information as a smooth curve. For the situation, the data characteristics of the historical data information in the second time period can be finally determined according to the clustering result of each index and the principle that a minority is obeyed to a majority by clustering each index.
Alternatively, as a simple processing method, the following steps are performed in a preset manner: for example, for the item to be detected, the clustering result of the index, which is the average value of the item to be detected, is preset to determine the data characteristics of the historical data information in the second time period, and the clustering results of other indexes, such as the sum value, the range, the variance and the like of the bandwidth do not need to be considered again, which is adopted in the embodiment of the invention; for example, for the item to be detected of the state code, the clustering result of the index, namely the variance of the item is preset to determine the data characteristic of the historical data information in the second period, and the clustering results of other indexes, such as the average value, the sum value, the extreme difference and the like of the state code do not need to be considered again; for other items to be detected, reference may be made to the condition of bandwidth and status code, which are not described herein. The specific index according to which any item to be detected needs to be clustered can be preset according to the experience of actual work of business personnel.
After the data characteristic of the bandwidth data of the domain name "163. com" in the half hour nearest to the current time point is obtained, that is, after the data characteristic of the historical data information in the second time period is obtained, the preset model of the item to be detected can be determined from the plurality of preset models in the following manner:
optionally, determining the preset model of the item to be detected from a plurality of preset models according to the data characteristic of the second period historical data information, including: if the data characteristics of the historical data information in the second time period accord with a smooth curve type, determining that the preset model of the item to be detected is a numerical model; and if the data characteristics of the historical data information in the second time interval accord with a stable linear type, determining that the preset model of the item to be detected is a proportional model.
Taking the bandwidth as an example, for 30 sample points of the domain name of '163. com' which are the nearest half an hour from the current time point, if the data characteristics determined by the 30 sample points through the clustering operation conform to the smooth curve type, determining that the preset model of the bandwidth in the current real-time data stream is a numerical model, namely calling the numerical model when the bandwidth in the current real-time data stream is predicted; if the data characteristics determined by the 30 sample points through the clustering operation are in accordance with a stable linear type, determining that the preset model of the bandwidth in the current real-time data stream is a proportional model, namely calling the proportional model when predicting the bandwidth in the current real-time data stream.
In addition, for what kind of preset model is selected to predict the item to be detected, in addition to the above-mentioned exemplary method, the embodiment of the present invention may also use a pre-specified method: according to the experience of service personnel, a preset model of the item to be detected is preset. For example, for the item to be detected of the bandwidth, it can be set to use a numerical model for prediction; the items to be detected, such as the status code, the stuck rate, the error rate and the retransmission ratio, can be set to be predicted by using a proportional model.
Taking the example that the item to be detected is the bandwidth, for the domain name of '163. com', the real-time data information of the domain name can be obtained from the real-time data stream; the real-time result of each index of the bandwidth can be determined through the processing of the computing platform; further, a preset model used for predicting each index of the bandwidth in the current real-time data stream can be determined according to the data characteristics of the historical data information in the second time period; next, for data input to a preset model to determine a prediction result of each index of the bandwidth, where the data refers to the first period history data information, the data may be acquired by:
optionally, obtaining the first period historical data information of the item to be detected from the historical data includes: acquiring first time period historical data information of the item to be detected from the historical data of a first time period; the first time interval is the time interval closest to the current moment; acquiring second time period historical data information of the item to be detected from the historical data, wherein the second time period historical data information comprises: and acquiring second time period historical data information of the item to be detected from second time period historical data, wherein the second time period is shorter than the first time period.
Taking the example that the detection item is the bandwidth, for the domain name "163. com", before determining whether the bandwidth of the domain name at the current time point is abnormal, historical bandwidth data (here, the historical bandwidth data refers to the first time period historical data information) of the domain name "163. com" in a historical time period can be input into the preset model, and the prediction result of each index of the domain name can be obtained through the calculation of the preset model.
For example, if the current time point is 3/8/03: 00:00 in 2020, the time node for initial sampling of the historical bandwidth data of the "163. com" domain name may be 3/1/03: 00:00 in 2020, and the time node for terminating sampling thereof may be 02:00:00 in 8/3/2020, with a sampling duration of 7 days in total, and a sampling time interval may be set to 1 minute for one sample, and 10080 historical bandwidth data may be collected in total. The prediction result of each index can be obtained by inputting historical bandwidth data of the domain name of '163. com' in the period from 3/1/2020/03: 00:00 to 3/8/2020/02: 00:00 into a preset model. The period of time from 03:00:00 at 1 st/3/1 st/2020 to 02:00:00 at 8 th/3/2020 is a first time period, 10080 pieces of historical bandwidth data are first time period historical data information, and the historical bandwidth data are stored in the data storage platform.
Similarly, historical bandwidth data which is the last half hour away from the current time point, namely historical bandwidth data during the period from 3/8/2020/8/02: 30:00 to 3/8/2020/02: 59:00 can be acquired from the data storage platform, and if the sampling time interval can also be set to 1 minute and one sample, 30 pieces of historical bandwidth data can be acquired. Wherein, the period of time from 3/month and 8/day 02:30:00 in 2020 to 3/month and 8/day 02:59:00 in 2020 is the second time interval, and the 30 pieces of historical bandwidth data are the second time interval historical data information.
Because the historical bandwidth data of the domain name "163. com" needs to be input into the preset model to determine the prediction results of each index of the bandwidth at the current time point, the data volume of the historical bandwidth data is large enough to make the prediction results of each index of the bandwidth at the current time point more accurate, so the first time period is longer, such as 7 days of the history closest to the current time point in the embodiment of the present invention; in the process of determining whether the bandwidth of the domain name "163. com" at the current time point is abnormal, it is further required to determine which preset model is selected according to the data characteristics of the historical bandwidth data of the domain name "163. com", however, at this time, the historical bandwidth data of a historical time too long from the current time point is not required, and only the historical bandwidth data of a short time closest to the current time point is required, and the bandwidth of the current time point can be better reflected according to the data characteristics of the historical bandwidth data of a short time closest to the current time point, so the second time period is short, such as the historical half hour closest to the current time point in the embodiment of the present invention.
And determining a preset model used for predicting each index of the item to be detected in the current real-time data stream according to the data characteristics of the second time period historical data information, and determining the prediction result of each index of the item to be detected by a model prediction method after the first time period historical data information is obtained.
Taking the item to be detected as the bandwidth as an example, for the domain name of '163. com', after the first period historical data information of the domain name is acquired, the first period historical data information is input into a preset model, and then the prediction result of each index of the item to be detected can be obtained. For example, for an index of the average value of the bandwidth, the prediction result of the average value of the bandwidth can be obtained by processing the historical data information of the first time period through a preset model; similarly, for example, for the index of the sum of the bandwidth, the prediction result of the sum of the bandwidth can be obtained by processing the historical data information of the first time period through the preset model; similarly, for example, for an index of the extreme difference of the bandwidth, a prediction result of the extreme difference of the bandwidth can be obtained by processing the historical data information of the first time period through a preset model; similarly, for example, for an index of the variance of the bandwidth, the prediction result of the variance of the bandwidth can be obtained by processing the historical data information of the first period through a preset model.
In step 204, it is determined whether the real-time data information is abnormal according to the real-time result and the prediction result. The method may be performed by the prediction service 103 of fig. 1.
After the real-time result of each index of the item to be detected is obtained in step 201 and the prediction result of each index of the item to be detected is obtained in step 203, it can be determined whether the real-time data information in the real-time data stream is abnormal. As an example, whether the real-time data information is abnormal may be determined by:
optionally, determining whether the real-time data information is abnormal according to the real-time result and the prediction result includes: if the real-time result of each index is not greater than the prediction result of each index, determining that the real-time data information is normal; and if at least one real-time result in the real-time results of the indexes is larger than the corresponding prediction result, determining that the real-time data information is abnormal.
Taking the item to be detected as the bandwidth as an example, for the domain name of '163. com', setting 4 items, namely an average value of the bandwidth, a sum value of the bandwidth, a range of the bandwidth and a variance of the bandwidth, of each index related to the bandwidth, and determining that the real-time data information in the current real-time data stream is normal if the real-time results of the 4 items are not greater than the corresponding prediction results by comparing the real-time results and the prediction results of the average value of the bandwidth, the real-time results and the prediction results of the sum value of the bandwidth, the real-time results and the prediction results of the range of the bandwidth and the real-time results and the prediction results of the variance of the bandwidth, and determining that the real-time data information in the current real-time data stream is abnormal if the real-time results of at least one item in the 4 items are greater than.
It should be noted that, in the embodiment of the present invention, only 4 items, that is, the average value to be detected, the sum value of the items to be detected, the range of the items to be detected, and the variance of the items to be detected, are taken as examples for explanation, and the present invention is not limited to each index of the items to be detected, and can be flexibly set according to the actual work requirement of business personnel.
Optionally, before determining the preset model of the item to be detected, the method further includes: sending a prediction service calling request to a routing gateway through an HTTP interface; the prediction service calling request comprises the items to be detected; determining a preset model of the item to be detected, including: and the routing gateway determines a service interface corresponding to the preset model of the item to be detected, and sends the prediction service calling request to the preset model of the item to be detected through the service interface.
Referring to FIG. 1, the model is trained offline first. And accessing a big data storage platform, writing an algorithm program according to a general template from massive historical data in combination with algorithm personnel, carrying out large-scale machine learning training, and pushing a preset model obtained by training to a model storage server in a file form.
And sharing the preset model to the prediction service in a File form by using an NFS (Network File System) technology through the model storage server again.
Next, entering into the prediction service, one possible implementation is to integrate an algorithm into the service by using a container engine cluster (dockerccluster), and provide the service externally in the form of an HTTP interface. Therefore, a user only needs to initiate an HTTP request in a well-defined interface form and send a prediction service calling request to the routing gateway through the HTTP interface, and due to the fact that the prediction service calling request comprises the item to be detected, prediction abnormal service combining historical data of the item to be detected with a machine learning algorithm can be obtained without paying attention to specific algorithm implementation, and the algorithm is served.
For example, by utilizing the Flank SQ L real-time calculation, a User obtains real-time data information of the item to be detected by writing an SQ L statement, calls a calculation factor by the Flank calculation platform, namely, can process data to obtain a real-time result of each index of the item to be detected, and simultaneously calls a prediction service by a universal prediction UDF (User-Defined Functions), and sends a prediction service calling request to the routing gateway by an HTTP interface, wherein the prediction service calling request comprises the item to be detected.
After the real-time results of each index of the item to be detected are obtained and before the prediction service is started to be called, a preset model for predicting the item to be detected needs to be determined. Because a plurality of preset models suitable for different indexes, such as numerical abnormal indexes and proportional abnormal indexes, are operated in the prediction service, the type of the item to be detected needs to be determined through the routing gateway, so that the item to be detected can be routed to the optimal preset model, the routing gateway determines the interface service corresponding to the preset model of the item to be detected, and the prediction service request is sent to the preset model of the item to be detected through the interface service.
The prediction result of each index of the item to be detected can be determined by calling the prediction service and according to the historical data information of the first time period; meanwhile, the prediction service compares the prediction result of each index of the item to be detected with the real-time result, if the real-time result of each index is not greater than the prediction result of the index, the normal real-time data information is determined, at the moment, the prediction service feeds the normal real-time data information back to the computing platform through the routing gateway, and the real-time data information is normal, so the real-time data information does not need to be sent to the alarm platform together with the computing platform; if the real-time result of at least one index is larger than the predicted result of the index, determining that the real-time data information is abnormal, feeding the abnormal real-time data information back to the computing platform through the routing gateway by the prediction service, and issuing the real-time result of the abnormal index and the alarm information of the predicted result in the real-time data information on the alarm platform communicated with the computing platform to prompt service personnel to check the abnormality in time and make a response.
Meanwhile, no matter whether the real-time data information is abnormal or not, the prediction service sends the real-time result and the prediction result of each index of the item to be detected to the visualization platform, for example, the real-time result and the prediction result are visually displayed on Grafana, so that a user can conveniently adjust and optimize the sensitivity of the preset model on the visualization platform, and the preset model can better meet the actual service sensitivity requirement. Of course, the prediction result of each index of the item to be detected may also be sent to the computing platform, and the computing platform may compare the prediction result of each index of the item to be detected with the real-time result.
Based on the same concept, an embodiment of the present invention provides an apparatus for performing anomaly detection on a real-time data stream, as shown in fig. 3, the apparatus includes:
a real-time result determining unit 302, configured to perform index calculation on real-time data information of an item to be detected to obtain a real-time result of each index in the item to be detected;
a first period historical data information obtaining unit 304, configured to obtain first period historical data information of the item to be detected from historical data;
a prediction result determining unit 305, configured to input the first period historical data information into a preset model, so as to obtain a prediction result of each index in the item to be detected;
an anomaly determination unit 306, configured to determine whether the real-time data information is abnormal according to the real-time result and the prediction result.
Further, for the apparatus, a real-time data information obtaining unit 301 is further included, configured to: receiving a search statement input by a user, wherein the search statement is used for setting the item to be detected; and acquiring the real-time data information from a real-time data stream according to the search statement.
Further, for the apparatus, the real-time result determining unit 302 is specifically configured to: determining a calculation factor of each index of the item to be detected; and calling the calculation factors of the calculation platform, and performing index calculation on the real-time data information to obtain a real-time result of each index.
Further, for the apparatus, a preset model determining unit 303 is further included, configured to: acquiring second time period historical data information of the item to be detected from historical data; determining a data characteristic of the second period historical data information; and determining a preset model of the item to be detected from a plurality of preset models according to the data characteristics of the second time period historical data information.
Further, for the apparatus, the preset model determining unit 303 is specifically configured to: clustering the second period historical data information; and determining the data characteristics of the second time period historical data information according to the clustering result.
Further, for the apparatus, the preset model determining unit 303 is specifically configured to: if the data characteristics of the historical data information in the second time period accord with a smooth curve type, determining that the preset model of the item to be detected is a numerical model; and if the data characteristics of the historical data information in the second time interval accord with a stable linear type, determining that the preset model of the item to be detected is a proportional model.
Further, for the apparatus, the first period historical data information obtaining unit 304 is specifically configured to: acquiring first time period historical data information of the item to be detected from the historical data of a first time period; the first time interval is the time interval closest to the current moment; the device further comprises a second period historical data information acquisition unit, which is specifically configured to: and acquiring second time period historical data information of the item to be detected from second time period historical data, wherein the second time period is shorter than the first time period.
Further, for the apparatus, the abnormality determining unit 306 is specifically configured to: if the real-time result of each index is not greater than the prediction result of each index, determining that the real-time data information is normal; and if at least one real-time result in the real-time results of the indexes is larger than the corresponding prediction result, determining that the real-time data information is abnormal.
Further, for the apparatus, the preset model determining unit 303 is further configured to: sending a prediction service calling request to a routing gateway through an HTTP interface; the prediction service calling request comprises the items to be detected; the preset model determining unit 303 is specifically configured to: and determining a service interface corresponding to the preset model of the item to be detected, and sending the prediction service calling request to the preset model of the item to be detected through the service interface.
Embodiments of the present invention also provide a computing device, which may be specifically a desktop computer, a portable computer, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), etc., the computing device may include a Central Processing Unit (CPU), a memory, an input/output device, etc., the input device may include a keyboard, a mouse, a touch screen, etc., and the output device may include a display device, such as a liquid crystal display (L acquired crystal display, & "&/t &" gtt CD), a Cathode Ray Tube (cathodal Tube, CRT), etc.
Memory, which may include Read Only Memory (ROM) and Random Access Memory (RAM), provides the processor with program instructions and data stored in the memory. In embodiments of the present invention, the memory may be used to execute program instructions for performing an anomaly detection method for a real-time data stream;
and the processor is used for calling the program instruction stored in the memory and executing the method for detecting the abnormity of the real-time data stream according to the obtained program.
An embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are used to enable a computer to execute a method for performing anomaly detection on a real-time data stream.
It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (12)

1. A method for anomaly detection for a real-time data stream, comprising:
index calculation is carried out on the real-time data information of the item to be detected, and a real-time result of each index in the item to be detected is obtained;
acquiring first time period historical data information of the item to be detected from historical data;
inputting the historical data information of the first time period into a preset model to obtain a prediction result of each index in the item to be detected;
and determining whether the real-time data information is abnormal or not according to the real-time result and the prediction result.
2. The method of claim 1,
the acquiring of the real-time data information of the item to be detected comprises the following steps:
receiving a search statement input by a user, wherein the search statement is used for setting the item to be detected;
and acquiring the real-time data information of the item to be detected from a real-time data stream according to the search statement.
3. The method of claim 1,
index calculation is carried out on the real-time data information of the item to be detected to obtain the real-time result of each index in the item to be detected, and the index calculation method comprises the following steps:
determining a calculation factor of each index of the item to be detected;
and calling the calculation factors of the calculation platform, and performing index calculation on the real-time data information to obtain a real-time result of each index.
4. The method of claim 1,
before inputting the first period historical data information into a preset model, determining the preset model of the item to be detected; wherein the preset model of the item to be detected is determined by:
acquiring second time period historical data information of the item to be detected from the historical data;
determining a data characteristic of the second period historical data information;
and determining a preset model of the item to be detected from a plurality of preset models according to the data characteristics of the second time period historical data information.
5. The method of claim 4,
determining data characteristics of the second period historical data information, including:
clustering the second period historical data information;
and determining the data characteristics of the second time period historical data information according to the clustering result.
6. The method of claim 4,
determining a preset model of the item to be detected from a plurality of preset models according to the data characteristics of the second period historical data information, wherein the preset model comprises the following steps:
if the data characteristics of the historical data information in the second time period accord with a smooth curve type, determining that the preset model of the item to be detected is a numerical model;
and if the data characteristics of the historical data information in the second time interval accord with a stable linear type, determining that the preset model of the item to be detected is a proportional model.
7. The method of claim 4,
acquiring first time period historical data information of the item to be detected from historical data, wherein the first time period historical data information comprises:
acquiring first time period historical data information of the item to be detected from the historical data of a first time period; the first time interval is the time interval closest to the current moment;
acquiring second time period historical data information of the item to be detected from the historical data, wherein the second time period historical data information comprises:
and acquiring second time period historical data information of the item to be detected from second time period historical data, wherein the second time period is shorter than the first time period.
8. The method of any of claims 1-7, wherein determining whether the real-time data information is anomalous based on the real-time outcome and the predicted outcome comprises:
if the real-time result of each index is not greater than the prediction result of each index, determining that the real-time data information is normal;
and if at least one real-time result in the real-time results of the indexes is larger than the corresponding prediction result, determining that the real-time data information is abnormal.
9. The method of any one of claims 1 or 4,
before determining the preset model of the item to be detected, the method further comprises the following steps:
sending a prediction service calling request to a routing gateway through an HTTP interface; the prediction service calling request comprises the items to be detected;
determining a preset model of the item to be detected, including:
and the routing gateway determines a service interface corresponding to the preset model of the item to be detected, and sends the prediction service calling request to the preset model of the item to be detected through the service interface.
10. An apparatus for anomaly detection for real-time data streams, comprising:
the real-time result determining unit is used for carrying out index calculation on the real-time data information of the item to be detected to obtain the real-time result of each index in the item to be detected;
the first time period historical data information acquisition unit is used for acquiring the first time period historical data information of the item to be detected from historical data;
the prediction result determining unit is used for inputting the first period historical data information into a preset model to obtain the prediction result of each index in the item to be detected;
and the abnormity determining unit is used for determining whether the real-time data information is abnormal according to the real-time result and the prediction result.
11. A computing device, comprising:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 9 in accordance with the obtained program.
12. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1-9.
CN202010206736.XA 2020-03-23 2020-03-23 Method and device for anomaly detection of real-time data stream Expired - Fee Related CN111447193B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010206736.XA CN111447193B (en) 2020-03-23 2020-03-23 Method and device for anomaly detection of real-time data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010206736.XA CN111447193B (en) 2020-03-23 2020-03-23 Method and device for anomaly detection of real-time data stream

Publications (2)

Publication Number Publication Date
CN111447193A true CN111447193A (en) 2020-07-24
CN111447193B CN111447193B (en) 2022-11-04

Family

ID=71653378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010206736.XA Expired - Fee Related CN111447193B (en) 2020-03-23 2020-03-23 Method and device for anomaly detection of real-time data stream

Country Status (1)

Country Link
CN (1) CN111447193B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114697247A (en) * 2022-03-01 2022-07-01 乐视云计算有限公司 Fault detection method, device, equipment and storage medium of streaming media system
CN114827636A (en) * 2021-01-18 2022-07-29 武汉斗鱼网络科技有限公司 Method and related device for diagnosing video playing abnormity

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400152A (en) * 2013-08-20 2013-11-20 哈尔滨工业大学 High sliding window data stream anomaly detection method based on layered clustering
CN103944757A (en) * 2014-04-11 2014-07-23 珠海市君天电子科技有限公司 Network anomaly detecting method and device
CN107086944A (en) * 2017-06-22 2017-08-22 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN107368517A (en) * 2017-06-02 2017-11-21 上海恺英网络科技有限公司 A kind of method and apparatus of high amount of traffic inquiry
CN108494747A (en) * 2018-03-08 2018-09-04 上海观安信息技术股份有限公司 Traffic anomaly detection method, electronic equipment and computer program product
CN109542740A (en) * 2017-09-22 2019-03-29 阿里巴巴集团控股有限公司 Method for detecting abnormality and device
CN110086649A (en) * 2019-03-19 2019-08-02 深圳壹账通智能科技有限公司 Detection method, device, computer equipment and the storage medium of abnormal flow
CN110210508A (en) * 2018-12-06 2019-09-06 北京奇艺世纪科技有限公司 Model generating method, anomalous traffic detection method, device, electronic equipment, computer readable storage medium
CN110377447A (en) * 2019-07-17 2019-10-25 腾讯科技(深圳)有限公司 A kind of abnormal deviation data examination method, device and server
CN110888788A (en) * 2019-10-16 2020-03-17 平安科技(深圳)有限公司 Anomaly detection method and device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400152A (en) * 2013-08-20 2013-11-20 哈尔滨工业大学 High sliding window data stream anomaly detection method based on layered clustering
CN103944757A (en) * 2014-04-11 2014-07-23 珠海市君天电子科技有限公司 Network anomaly detecting method and device
CN107368517A (en) * 2017-06-02 2017-11-21 上海恺英网络科技有限公司 A kind of method and apparatus of high amount of traffic inquiry
CN107086944A (en) * 2017-06-22 2017-08-22 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN109542740A (en) * 2017-09-22 2019-03-29 阿里巴巴集团控股有限公司 Method for detecting abnormality and device
CN108494747A (en) * 2018-03-08 2018-09-04 上海观安信息技术股份有限公司 Traffic anomaly detection method, electronic equipment and computer program product
CN110210508A (en) * 2018-12-06 2019-09-06 北京奇艺世纪科技有限公司 Model generating method, anomalous traffic detection method, device, electronic equipment, computer readable storage medium
CN110086649A (en) * 2019-03-19 2019-08-02 深圳壹账通智能科技有限公司 Detection method, device, computer equipment and the storage medium of abnormal flow
CN110377447A (en) * 2019-07-17 2019-10-25 腾讯科技(深圳)有限公司 A kind of abnormal deviation data examination method, device and server
CN110888788A (en) * 2019-10-16 2020-03-17 平安科技(深圳)有限公司 Anomaly detection method and device, computer equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114827636A (en) * 2021-01-18 2022-07-29 武汉斗鱼网络科技有限公司 Method and related device for diagnosing video playing abnormity
CN114697247A (en) * 2022-03-01 2022-07-01 乐视云计算有限公司 Fault detection method, device, equipment and storage medium of streaming media system
CN114697247B (en) * 2022-03-01 2024-02-06 乐视云网络技术(北京)有限公司 Fault detection method, device, equipment and storage medium of streaming media system

Also Published As

Publication number Publication date
CN111447193B (en) 2022-11-04

Similar Documents

Publication Publication Date Title
US11631014B2 (en) Computer-based systems configured for detecting, classifying, and visualizing events in large-scale, multivariate and multidimensional datasets and methods of use thereof
CN107086944B (en) Anomaly detection method and device
CN109634801B (en) Data trend analysis method, system, computer device and readable storage medium
WO2022267735A1 (en) Service data processing method and apparatus, computer device, and storage medium
CN111045894B (en) Database abnormality detection method, database abnormality detection device, computer device and storage medium
CN109684162B (en) Equipment state prediction method, system, terminal and computer readable storage medium
CN113518011B (en) Abnormality detection method and apparatus, electronic device, and computer-readable storage medium
CN111447193B (en) Method and device for anomaly detection of real-time data stream
CN110471821A (en) Abnormal alteration detection method, server and computer readable storage medium
CN111444060A (en) Anomaly detection model training method, anomaly detection method and related device
CN110334816A (en) A kind of industrial equipment detection method, device, equipment and readable storage medium storing program for executing
CN115576502B (en) Data storage method and device, electronic equipment and storage medium
CN113537337A (en) Training method, abnormality detection method, apparatus, device, and storage medium
CN107391230B (en) Implementation method and device for determining load of virtual machine
CN113111139A (en) Alarm detection method and device based on Internet of things sensor
CN110795324A (en) Data processing method and device
CN113553234A (en) Data anomaly detection method
CN112541595A (en) Model construction method and device, storage medium and electronic equipment
CN111666191A (en) Data quality monitoring method and device, electronic equipment and storage medium
CN117009181A (en) Processing method, storage medium and device for exception log
CN107480703A (en) Transaction fault detection method and device
CN116843395A (en) Alarm classification method, device, equipment and storage medium of service system
CN110580265A (en) ETL task processing method, device, equipment and storage medium
US20230052619A1 (en) Real-time error prevention during invoice creation
CN113129473B (en) Data acquisition method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20221104

CF01 Termination of patent right due to non-payment of annual fee