CN116781377A - Flow data processing method and device, electronic equipment and storage medium - Google Patents

Flow data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116781377A
CN116781377A CN202310839185.4A CN202310839185A CN116781377A CN 116781377 A CN116781377 A CN 116781377A CN 202310839185 A CN202310839185 A CN 202310839185A CN 116781377 A CN116781377 A CN 116781377A
Authority
CN
China
Prior art keywords
flow
time sequence
target
reference data
detection result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310839185.4A
Other languages
Chinese (zh)
Inventor
李任鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310839185.4A priority Critical patent/CN116781377A/en
Publication of CN116781377A publication Critical patent/CN116781377A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure provides a flow data processing method, a flow data processing device, electronic equipment and a storage medium, and relates to the technical field of computers, in particular to the technical fields of Internet, big data and the like. The specific implementation scheme is as follows: extracting the flow to be detected under the condition that the identifier corresponding to the flow to be detected belongs to the dimension of the target identifier, wherein the dimension of the target identifier is obtained based on the target historical flow which is misjudged by the detection result; detecting time sequence information of the flow to be detected to obtain a first detection result; and generating reporting information of the flow to be detected based on the first detection result, and sending the reporting information to a rule engine, wherein the rule engine is used for detecting the flow to be detected based on the reporting information. The technical scheme provided by the disclosure can improve the accuracy of flow detection to be detected.

Description

Flow data processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the technical fields of the internet, big data, and the like.
Background
In the prior art, the service traffic is generally detected based on a rule engine to detect abnormal traffic such as a crawler traffic, however, when the rule engine is used to detect the traffic, misjudgment may occur in the obtained traffic detection result, for example, abnormal traffic is detected as normal traffic. Therefore, how to improve the accuracy of flow detection becomes a problem to be solved.
Disclosure of Invention
The disclosure provides a flow data processing method, a flow data processing device, electronic equipment and a storage medium.
According to a first aspect of the present disclosure, there is provided a method for processing traffic data, including:
extracting the flow to be detected under the condition that an identifier corresponding to the flow to be detected belongs to a target identifier dimension, wherein the target identifier dimension is obtained based on a target historical flow which is misjudged by a detection result;
detecting the time sequence information of the flow to be detected to obtain a first detection result, wherein the first detection result is used for indicating whether the time sequence information of the flow to be detected is normal or not;
and generating reporting information of the flow to be detected based on the first detection result, and sending the reporting information to a rule engine, wherein the rule engine is used for detecting the flow to be detected based on the reporting information.
According to a second aspect of the present disclosure, there is provided a processing apparatus for traffic data, including:
the flow obtaining module to be measured is used for extracting the flow to be measured under the condition that the identifier corresponding to the flow to be measured belongs to the dimension of the target identifier, wherein the dimension of the target identifier is obtained based on the target historical flow which is misjudged by the detection result;
The first time sequence detection module is used for detecting the time sequence information of the flow to be detected to obtain a first detection result, wherein the first detection result is used for indicating whether the time sequence information of the flow to be detected is normal or not;
and the report information generation module is used for generating report information of the flow to be detected based on the first detection result and sending the report information to a rule engine, wherein the rule engine is used for detecting the flow to be detected based on the report information.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of processing traffic data of the first aspect.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method of processing flow data of the aforementioned first aspect.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of processing traffic data of the aforementioned first aspect.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
The technical scheme provided by the embodiment can extract the flow to be detected which is easy to misjudge based on the dimension of the target mark; for the flow to be detected which is easy to be misjudged, before the rule engine is adopted for detection, time sequence detection is carried out, and reporting information is generated according to a first detection result obtained by the time sequence detection, so that the rule engine can more accurately detect the flow to be detected based on the reporting information, thus the time sequence detection can be carried out on the flow to be detected which is easy to be misjudged in advance, the reporting information is regenerated by combining the result of the time sequence detection, the rule engine can carry out the detection of the flow to be detected again through the reporting information, the probability of misjudgment of the rule engine can be reduced, and the accuracy of the flow to be detected is improved.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a method for processing flow data according to an embodiment of the present disclosure;
FIG. 2 is a normal timing sequence trend graph provided in accordance with an embodiment of the present disclosure;
FIG. 3 is an anomaly timing sequence trend graph provided in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a flow process for a flow to be measured according to an embodiment of the present disclosure;
FIG. 5 is a schematic block diagram of an apparatus for processing traffic data provided in accordance with an embodiment of the present disclosure;
FIG. 6 is yet another schematic block diagram of a processing device for flow data provided in accordance with an embodiment of the present disclosure;
fig. 7 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
An embodiment of a first aspect of the present disclosure provides a method for processing traffic data, as shown in fig. 1, including:
s101, extracting the flow to be detected under the condition that an identifier corresponding to the flow to be detected belongs to a target identifier dimension, wherein the target identifier dimension is obtained based on a target historical flow which is misjudged by a detection result;
s102, detecting the time sequence information of the flow to be detected to obtain a first detection result, wherein the first detection result is used for indicating whether the time sequence information of the flow to be detected is normal or not;
s103, generating reporting information of the flow to be detected based on the first detection result, and sending the reporting information to a rule engine, wherein the rule engine is used for detecting the flow to be detected based on the reporting information.
The processing method of the flow data can be realized by the electronic equipment. By way of example, the electronic device may be a terminal or server or the like having computing and/or processing capabilities.
By adopting the scheme, the flow to be detected which is easy to misjudge can be extracted based on the dimension of the target mark; for the flow to be detected which is easy to be misjudged, before the rule engine is adopted for detection, time sequence detection is carried out, and reporting information is generated according to a first detection result obtained by the time sequence detection, so that the rule engine can more accurately detect the flow to be detected based on the reporting information, thus the time sequence detection can be carried out on the flow to be detected which is easy to be misjudged in advance, the reporting information is regenerated by combining the result of the time sequence detection, the rule engine can carry out the detection of the flow to be detected again through the reporting information, the probability of misjudgment of the rule engine can be reduced, and the accuracy of the flow to be detected is improved.
In some possible embodiments, before determining that the identifier corresponding to the flow to be measured belongs to the target identifier dimension, the method further includes: detecting the time sequence information of the candidate historical flow to obtain a time sequence detection result of the candidate historical flow, wherein the detection result of the candidate historical flow is abnormal; and under the condition that the false judgment occurs on the detection result of the candidate historical flow based on the time sequence detection result of the candidate historical flow, taking the candidate historical flow as a target historical flow, and obtaining the target identification dimension based on the identification dimension to which the identification corresponding to the target historical flow belongs.
The detecting the time sequence information of the candidate historical flow, before obtaining the time sequence detection result of the candidate historical flow, may include: and acquiring detection results of the rule engine on a plurality of historical flows, acquiring the historical flow when any one of the historical flows is abnormal, and taking the historical flow as a candidate historical flow. Each of the one or more historical flows with abnormal detection results can be used as candidate historical flows to perform the same analysis and processing, and will not be described in detail herein.
Before obtaining the detection results of the rule engine on the plurality of historical traffic, the method may include: generating a plurality of historical traffic and sending the plurality of historical traffic to a rules engine. The generating the plurality of historical traffic may include: and normalizing the first part of logs of the target service line to obtain first part of normalized logs of the target service line, and aggregating the first part of normalized logs based on the identification to obtain the plurality of historical traffic. Each of the plurality of historical traffic generated by aggregation corresponds to one identifier, and different historical traffic corresponds to different identifiers. The standardized process may include: at least one of data cleansing, field extraction, dropping, etc., which is not intended to be exhaustive or limiting.
The identifier corresponding to any one of the plurality of historical traffic may be any one of the following: a first IP (Internet Protocol ) address, a second IP address, a third IP address, a first IPC address, a second IPC address, a third IPC address, a first UA (user agent), a second UA, a third UA, a first JA3 fingerprint, a second JA3 fingerprint, a third JA3 fingerprint, and the like. The IPC address refers to the first three segments of the IP address, and if the IP address is 1.2.3.4, the first three segments of the IP address are 1.2.3.JA3 fingerprint is a method for fingerprint identification of a transport layer security application program, and the JA3 fingerprint can uniquely identify a corresponding browser. It should be understood that the identifier corresponding to any one of the historical traffic may further include other identifiers known in the art besides the above identifier, which is not limited herein. Correspondingly, the identifier dimension to which the identifier of any one historical traffic belongs may be any one of the following: IP related dimension, UA dimension, fingerprint identification dimension. It should be understood that the identifier dimension to which the identifier corresponding to any one of the historical traffic belongs may further include other identifier dimensions known in the art besides the foregoing dimensions, which is not limited herein. The IP related dimension can be an IP address dimension or an IPC address dimension. The fingerprinting dimension may be, but is not limited to, the JA3 fingerprinting dimension.
It should be noted that, the foregoing identification dimensions correspond to the identified categories, and one or more identifications may be located in any one of the identification dimensions.
The detecting the time sequence information of the candidate historical flow to obtain a time sequence detection result of the candidate historical flow may specifically include: obtaining time sequence information of the candidate historical flow; and detecting the time sequence information of the candidate historical flow to obtain a time sequence detection result of the candidate historical flow.
The time sequence information of the candidate historical flow can be a time sequence, and the time sequence can be a time sequence corresponding to the candidate historical flow and identified in a target period. Wherein the target period may include a plurality of sub-periods. For example, in the case where the target period is 1 day and each sub-period in the target period is 1 minute, the time sequence information of the candidate historical traffic is a 24×60=1440 dimension feature, and each value in the 24×60=1440 dimension feature may represent the number of requests for the target traffic line corresponding to the identifier corresponding to the candidate historical traffic in the corresponding 1 minute. Correspondingly, the obtaining the time sequence information of the candidate historical traffic may be: and counting the request times of the identifier aiming at the target service line corresponding to the candidate historical flow every minute in one day to obtain 1440 request times. The 1440 request times are the time sequence information of the candidate historical flow. The statistics processing may be performed using a big data computing engine, which may be a spark platform.
The time window corresponding to the target period and the sub-period may be set according to a specific scene and a specific requirement, for example, the target period may be 1 day, 2 days, 3 days, or the like, and the sub-period may be 1 minute, 5 minutes, 10 minutes, 30 minutes, 60 minutes, or the like, which is not limited herein.
The method for detecting the time sequence information of the candidate historical flow to obtain the time sequence detection result of the candidate historical flow is not limited in any way. In a preferred example, detecting the time sequence information of the candidate historical traffic to obtain a time sequence detection result of the candidate historical traffic may specifically include: calculating a plurality of similarities of the time sequence information of the candidate historical traffic and a plurality of first reference data, wherein the plurality of first reference data comprises a plurality of first normal time sequence reference data and one or more first abnormal time sequence reference data, the number of the plurality of first normal time sequence reference data is larger than that of the one or more first abnormal time sequence reference data, and different similarities in the plurality of similarities correspond to different first reference data; selecting a first maximum number of target similarities from the plurality of similarities, and determining target reference data corresponding to the first number of target similarities respectively, wherein the first number is smaller than the plurality of similarities; determining that the time sequence information of the candidate historical flow is normal under the condition that the number of the first normal time sequence target reference data is larger than the number of the first abnormal time sequence target reference data in the target reference data corresponding to the first number of target similarities respectively; and obtaining a time sequence detection result of the candidate historical flow under the condition that the number of the first abnormal time sequence target reference data is larger than the number of the first normal time sequence target reference data in the target reference data corresponding to the first number of the target similarities respectively, wherein the time sequence detection result is used for indicating that the time sequence information of the candidate historical flow is abnormal.
Wherein the number of the plurality of similarities is equal to a sum of the number of the plurality of first normal timing reference data plus the number of the one or more first abnormal timing reference data. The first number is an integer, and the value of the first number can be flexibly set according to the requirement, which is not limited herein. In a preferred embodiment, the first number is an odd number.
Wherein each of the plurality of first normal timing reference data may be more typical normal timing data in a traffic scenario, and each of the one or more first abnormal timing reference data may be more typical abnormal timing data in a traffic scenario. In some examples, the trend of any one of the plurality of first normal timing reference data may be similar to any one of the normal timing sequence trends shown in fig. 2. In some examples, the trend of any one of the one or more first anomalous timing reference data may be similar to any one of the anomalous timing sequence trends shown in fig. 3. It should be appreciated that any one of the first timing reference data may also be similar to the normal timing sequence trend other than fig. 2, and any one of the first abnormal timing reference data may also be similar to the abnormal timing sequence trend other than fig. 3, without limitation. The abscissa in fig. 2 and 3 represents time, and the unit may be minutes; the ordinate indicates the normalized number of requests.
In the above technical solution, the number of the plurality of first normal time sequence reference data is greater than the number of the one or more first abnormal time sequence reference data, and the characteristic that the plurality of first reference data is unbalanced is utilized, so that the probability that the time sequence information of the candidate historical flow is detected as the normal time sequence is improved, and then the detection of the target historical flow with erroneous judgment is facilitated, and further the subsequent accurate acquisition of the target identification dimension is facilitated.
The above detection process of the time sequence information of the candidate historical traffic may be completed by a first preset time sequence detection model, and may specifically include: and inputting the time sequence information of the candidate historical flow into the first preset time sequence detection model to obtain a time sequence detection result of the candidate historical flow output by the first preset time sequence detection model. The timing detection result is abnormal or normal, and the one or more algorithms used or adopted by the first preset timing detection model may at least include a K-Nearest Neighbor (KNN) algorithm.
The first preset time sequence detection model is determined by the following steps:
the method comprises the steps of firstly, obtaining detection results of a rule engine on a plurality of flow samples of a target service line in one day or more to obtain a normal flow set and an abnormal flow set, wherein the detection results of the plurality of flow samples contained in the normal flow set are normal, and the detection results of the plurality of flow samples contained in the abnormal flow set are abnormal.
The second step, based on the normal flow set, may include: (1) And acquiring time sequence information of each flow sample aiming at a plurality of flow samples in the normal flow set to obtain a plurality of normal time sequence samples. The obtaining manner may be the same as the foregoing manner of obtaining the time sequence information of the candidate historical traffic, which is not described herein. (2) Counting the request times and the number of users of each normal time sequence sample in the plurality of normal time sequence samples; and determining that the normal time sequence sample is a candidate positive sample under the condition that the request times of any one normal time sequence sample are larger than a first threshold value and the number of users is larger than a second threshold value, and obtaining a plurality of candidate positive samples from the plurality of normal time sequence samples based on the processing. The first threshold and the second threshold may be flexibly set according to the actual scene, which is not limited herein. For example, the first threshold may be 5000 and the second threshold may be 30, and if the number of requests of the normal time sequence sample is greater than 5000 and the number of users is greater than 30, the normal time sequence sample is taken as a candidate positive sample for any one normal time sequence sample. In a specific embodiment, the number of requests is greater than 5000 and the number of users is greater than 30, and the plurality of normal time sequence samples are screened to obtain 182 candidate positive samples.
Similarly, based on the abnormal traffic set, obtaining a plurality of candidate negative samples may specifically include: (1) And acquiring time sequence information of each flow sample aiming at a plurality of flow samples in the abnormal flow set to obtain a plurality of abnormal time sequence samples. The obtaining manner may be the same as the foregoing manner of obtaining the time sequence information of the candidate historical traffic, which is not described herein. (2) Counting the request times and the number of users of each abnormal time sequence sample in the plurality of abnormal time sequence samples; when the number of requests of any one abnormal time sequence sample is larger than the third threshold value and the number of users is smaller than the fourth threshold value, the abnormal time sequence sample is determined to be a candidate negative sample, and based on the processing, a plurality of candidate negative samples can be obtained from the plurality of abnormal time sequence samples. The third threshold and the fourth threshold may be flexibly set according to the actual scene, and are not limited herein. For example, if the third threshold value is 5000 and the fourth threshold value is 10, for any abnormal time sequence sample, if the number of requests of the abnormal time sequence sample is greater than 5000 and the number of users is less than 10, the abnormal time sequence sample is taken as a candidate negative sample. In a specific embodiment, the number of requests is greater than 5000 and the number of users is less than 10, and the plurality of abnormal time sequence samples are screened to obtain 103 candidate negative samples.
The third step, based on the plurality of candidate positive samples and the plurality of candidate negative samples, obtains a first verification set, which may specifically include: randomly selecting a plurality of first target candidate positive samples from the plurality of candidate positive samples; randomly selecting a plurality of first target candidate negative samples from the plurality of candidate negative samples; a first validation set is constructed using the plurality of first target candidate positive samples and the plurality of first target candidate negative samples. Wherein the number of the plurality of first target candidate positive samples may be greater than the number of the plurality of first target candidate negative samples. For example, randomly picking 40 first target candidate positive samples from the 182 candidate positive samples; randomly selecting 20 first target candidate negative samples from the 103 candidate negative samples; the 40 first target candidate positive samples and the 20 first target candidate negative samples constitute a first validation set.
Fourth, selecting a first preset number of first reference negative samples from the plurality of candidate negative samples. The selection mode can be as follows: and sequencing the plurality of candidate negative samples according to the number of requests from large to small, and selecting a first preset number of first reference negative samples with the largest number of requests. Wherein, the value of the first preset number can be determined according to user experience and the like, and is not limited herein. For example, when the value of the first preset number is 15, the above process may be: and sequencing the 103 candidate negative samples according to the number of requests from large to small, and selecting 15 first reference negative samples with the largest number of requests.
Fifth, determining a first target number of first reference positive samples and a value of the first number based on a first preset number of first reference negative samples may specifically include: (1) constructing a plurality of first intermediate models. And each first intermediate model in the plurality of first intermediate models uses a first preset number of first reference negative samples and X first reference positive samples when calculating the similarity, and the number of the selected target similarity is K, wherein X is larger than the first preset number, and X is smaller than or equal to the sum of the first preset number and a second preset number, and the second preset number is larger than 1, and K is an odd number. The X first reference positive samples and/or the values of K used by different ones of the plurality of first intermediate models are different. It should be noted that, for any one first intermediate model, when it uses X first reference positive samples, the X first reference positive samples used by it can be obtained by: and sequencing the candidate positive samples according to the request times from large to small, and selecting X first reference positive samples with the largest request times. (2) And inputting the constructed first verification set into each first intermediate model for detection, and obtaining the first detection accuracy of each first intermediate model on the first verification set so as to obtain a plurality of first detection accuracy corresponding to the plurality of first intermediate models respectively. (3) And taking the first target intermediate model corresponding to the highest first detection accuracy rate in the plurality of first detection accuracy rates as a first preset time sequence detection model. A first preset number of first reference negative samples and X first reference positive samples used in the first target intermediate model are a plurality of first reference data related in the first preset time sequence detection model; the value of K used in the first target intermediate model is the first number of values.
For example, in the implementation, the fifth step is to make the first preset number 15 and the second preset number 10, and determine to use 15 first reference negative samples; x sequentially gets from 16 to 25 one by one, K sequentially gets 1, 3, 5, 7 and 9, and can determine that X gets 16 and K gets 1, corresponding first intermediate models, determine that X gets 17 and K gets 1, corresponding first intermediate models, determine that X gets 19 and K gets 3, and the like, which are not enumerated herein, so as to obtain 50 first intermediate models; detecting the verification set by adopting the 50 first intermediate models to obtain the detection accuracy of each first intermediate model so as to obtain 50 first detection accuracy; and taking the first target intermediate model corresponding to the highest first detection rate in the 50 first detection accuracy rates as the first preset time sequence detection model. Assuming that X corresponding to the first target intermediate model is equal to 19 and k is equal to 3, the 15 first reference negative samples and the 19 first reference positive samples used by the first target intermediate model are the plurality of first reference data related in the first preset time sequence detection model; the 3 used in the first target intermediate model is the first number of values.
According to the training method, the first preset time sequence detection model with high accuracy can be obtained by adopting the traversing modes of X and K.
The determining that the detection result of the candidate historical flow is misjudged based on the time sequence detection result of the candidate historical flow may specifically include: under the condition that the time sequence detection result of the candidate historical flow indicates that the time sequence information of the candidate historical flow is normal, determining that the detection result of the candidate historical flow is misjudged; or under the condition that the time sequence detection result of the candidate historical flow indicates that the time sequence information of the candidate historical flow is normal, manually or in other modes, judging the candidate historical flow again, and under the condition that the candidate historical flow is judged to be normal again, determining that the detection result of the candidate historical flow is misjudged.
It should be noted that there may be one or more candidate historical flows (i.e., target historical flows) with erroneous judgment. When there are a plurality of target historical flows, if the identification dimension of each target historical flow in the plurality of target historical flows is the same, a target identification dimension can be obtained; if the identification dimensions to which each target historical flow belongs in the plurality of target historical flows are different, a plurality of target identification dimensions can be obtained.
By adopting the scheme, whether the detection result of the candidate historical flow is misjudged or not can be determined based on the time sequence detection result of the candidate historical flow, the candidate historical flow is used as the target historical flow under the condition of misjudgment, then the target identification dimension is obtained based on the identification dimension of the identification corresponding to the target historical flow, and the flow to be detected can be extracted based on the target identification dimension. Therefore, through the technical scheme, the target identification dimension of the target historical flow which is easy to misjudge can be provided, and the follow-up accurate extraction of the flow to be detected which is easy to misjudge is ensured.
In some possible embodiments, in a case where the identifier corresponding to the flow to be measured belongs to the target identifier dimension, extracting the flow to be measured may specifically include: and for any one of the flows, taking the flow as the flow to be measured under the condition that the identifier corresponding to the flow belongs to the dimension of the target identifier.
The generating manner of the plurality of flow rates is similar to that of the plurality of historical flow rates, and the generating manner comprises the following steps: and normalizing the second part of logs of the target service line to obtain second part of normalized logs of the target service line, and aggregating the second part of normalized logs based on the identification to obtain the plurality of flows. Each flow in the plurality of flows generated by aggregation corresponds to one identifier, and different flows correspond to different identifiers. The second partially standardized log is different from the first partially standardized log described above, and is generated after the first partially standardized log.
It can be seen that the flow to be measured can be one or more, and based on the foregoing description, it can be known that the target identification dimension can be one or more, and the target historical flow can be one or more. Because the processing modes related to each of the plurality of flows to be measured are the same, the processing modes related to each of the plurality of target identification dimensions are the same, and the processing modes related to each of the plurality of target historical flows are the same, in order to make the description of the flows to be measured, the target identification dimensions, the target historical flows, etc. stand only at one angle for the sake of clarity and economy.
In some possible embodiments, the detecting the timing information of the flow to be detected to obtain a first detection result includes: calculating a plurality of first similarities between the time sequence information of the flow to be detected and a plurality of first reference data, wherein the plurality of first reference data comprises a plurality of first normal time sequence reference data and one or more first abnormal time sequence reference data, the number of the plurality of first normal time sequence reference data is larger than that of the one or more first abnormal time sequence reference data, and different first similarities in the plurality of first similarities correspond to different first reference data; selecting a first maximum number of first target similarities from the first plurality of similarities, and determining first target reference data corresponding to the first number of first target similarities respectively, wherein the first number is smaller than the first plurality of first similarities; and obtaining a first detection result when the number of the first normal time sequence target reference data is larger than the number of the first abnormal time sequence target reference data in the first target reference data corresponding to the first number of the first target similarities, wherein the first detection result is used for indicating that the time sequence information of the flow to be detected is normal.
Wherein the number of the plurality of first similarities is equal to a sum of the number of the plurality of first normal timing reference data plus the number of the one or more first abnormal timing reference data. The first number is an integer, and the value of the first number can be flexibly set according to the requirement, which is not limited herein. In a preferred embodiment, the first number is an odd number.
As can be seen from the foregoing description, the time sequence information of the flow to be measured is represented in the same manner as the time sequence information of the candidate historical flow; the process of detecting the time sequence information of the flow to be detected to obtain the first detection result is the same as the process of detecting the time sequence information of the candidate historical flow to obtain the time sequence detection result of the candidate historical flow, so that the description is omitted here. It should be noted that, the detection of the time sequence information of the flow to be detected may also use the first preset time sequence detection model.
According to the technical scheme, the number of the first normal time sequence reference data is larger than that of the one or the first abnormal time sequence reference data, the unbalanced characteristic of the first time sequence reference data is utilized, the probability that time sequence information of the flow to be detected is detected as normal time sequence is improved, then the first detection result which indicates that the flow to be detected is normal is generated, further the follow-up rule engine can detect whether the flow to be detected is abnormal or not more accurately based on the report information, the probability that the rule engine makes misjudgment is reduced, and therefore the accuracy of flow detection to be detected is improved.
In some possible embodiments, the generating the report information of the flow to be measured based on the first detection result, and sending the report information to a rule engine, includes: and under the condition that the first detection result indicates that the time sequence information of the flow to be detected is normal, filtering attribute features of the identifiers in the various features corresponding to the flow to be detected to obtain one or more features to be reported of the flow to be detected, adding the one or more features to be reported into reporting information, and sending the reporting information to the rule engine.
The identified attribute features may be described by anomalies or normals, or may be described by black or white. And the various features corresponding to the flow to be detected comprise the attribute features of the identifier. The one or more pieces of information to be reported do not include the identified attribute characteristics and may include other characteristics of the flow to be measured. The other features may be statistical features and/or behavioral features. The statistical feature may be, for example, the number of requests in 1 minute, etc. The behavior feature may be, for example, a click interval, a mouse track, a touch screen track, etc.
When the rule engine is used for detecting the traffic, the following steps are found: the reason that some traffic rule engines misjudge when detecting some traffic is that the identified attribute features of these traffic hit the target policy of the rule engine that determines that the traffic is abnormal. For example, the target policy is to determine that the flow to be measured is abnormal when the dimension of the target identifier to which the identifier of the flow to be measured belongs is black and other features satisfy a certain condition. For example, the identifiers corresponding to the multiple flows to be tested, which are misjudged, belong to the JA3 dimension, and when the online engine detects the multiple flows, a strategy comprising the fact that the JA3 fingerprint dimension is black is used. Therefore, for the flow to be detected corresponding to the identifier belonging to the JA3 dimension, when the online engine detects the flow, if the attribute characteristics of the identifier are not based, the probability of misjudgment of the online engine can be reduced.
According to the technical scheme, when the first detection result indicates that the time sequence information of the flow to be detected is normal, the attribute features of the mark are not added to the reporting information, so that when the rule engine detects whether the flow to be detected is abnormal based on the reporting information, the rule engine cannot be based on the attribute features of the mark, the probability of misjudgment of the on-line engine is reduced, and the accuracy of flow detection to be detected is improved.
In other possible embodiments, the generating the report information of the flow to be measured based on the first detection result, and sending the report information to a rule engine, includes: and adding various characteristics corresponding to the flow to be detected into reporting information and sending the reporting information to the rule engine under the condition that the first detection result indicates that the time sequence information of the flow to be detected is abnormal.
Because the rule engine judges the normal flow as the abnormal flow when misjudging, if the first detection result indicates that the time sequence information of the flow to be detected is abnormal, the method can directly add various features corresponding to the flow to be detected into the report information without filtering the various features, so that the rule engine can detect whether the flow to be detected is abnormal or not based on the report information. The plurality of features may include attribute features of the identity corresponding to the flow to be measured.
In some possible embodiments, after the sending the report information to a rules engine, the method further includes: obtaining a second detection result of the flow to be detected from the rule engine, wherein the second detection result is obtained by detecting the flow to be detected based on the reported information; under the condition that the flow to be detected is determined to be normal based on the second detection result, detecting time sequence information of the flow to be detected to obtain a third detection result; and determining that the flow to be detected is abnormal flow under the condition that the third detection result indicates that the time sequence information of the flow to be detected is abnormal.
And when the third detection result indicates that the time sequence information of the flow to be detected is abnormal, determining that the second detection result is wrong, and determining that the flow to be detected is abnormal which is missed by the rule engine.
In some possible embodiments, the detecting the timing information of the flow to be detected to obtain a third detection result includes: calculating a plurality of second similarities of the time sequence information of the flow to be detected and a plurality of second reference data, wherein the plurality of second reference data comprises one or a plurality of second normal time sequence reference data and a plurality of second abnormal time sequence reference data, the number of the one or the plurality of second normal time sequence reference data is smaller than that of the plurality of second abnormal time sequence reference data, and different second similarities in the plurality of second similarities correspond to different second reference data; selecting a maximum second number of second target similarities from the plurality of second similarities, and determining second target reference data corresponding to the second number of second target similarities respectively, wherein the second number is smaller than the plurality of second similarities; and obtaining a third detection result when the number of second normal time sequence target reference data is smaller than the number of second abnormal time sequence target reference data in second target reference data corresponding to the second number of second target similarities, wherein the third detection result is used for indicating that the flow to be detected is abnormal flow.
Wherein the number of the plurality of second similarities is equal to a sum of the number of the one or more second normal timing reference data plus the number of the plurality of second abnormal timing reference data. The second number is an integer, and the value of the second number can be flexibly set according to the requirement, which is not limited herein. In a preferred example, the second number is an odd number.
It can be seen that the process of detecting the time sequence information of the flow to be detected to obtain the third detection result is similar to the process of detecting the time sequence information of the candidate historical flow to obtain the time sequence detection result of the candidate historical flow, so that the description thereof is omitted herein.
The detecting process for detecting the time sequence information of the flow to be detected to obtain the third detection result can also be completed by a time sequence detection model, for example, a second time sequence detection model. The manner of determining the second timing detection model may be similar to the manner of determining the first preset timing detection model described above, except that the third, fourth, and fifth steps in the determining step of the second timing detection model are different from those in the determining step of the first timing detection model described above, and therefore, the following description will be made only with respect to the third, fourth, and fifth steps in the determining step of the second timing detection model, specifically:
The third step, based on the plurality of candidate positive samples and the plurality of candidate negative samples, obtains a second verification set, which may specifically include: randomly selecting a plurality of second target candidate positive samples from the plurality of candidate positive samples; randomly selecting a plurality of second target candidate negative samples from the plurality of candidate negative samples; a second validation set is constructed using the plurality of second target candidate positive samples and the plurality of second target candidate negative samples. Wherein the number of the plurality of second target candidate positive samples may be smaller than the number of the plurality of second target candidate negative samples. For example, randomly picking 20 second target candidate positive samples from the 182 candidate positive samples; randomly selecting 40 second target candidate negative samples from the 103 candidate negative samples; the 20 second target candidate positive samples and the 40 second target candidate negative samples constitute a second validation set.
Fourth, selecting a third preset number of second reference positive samples from the plurality of candidate positive samples. The selection mode can be as follows: and sequencing the candidate positive samples according to the number of requests from large to small, and selecting a third preset number of second reference positive samples with the largest number of requests. Wherein, the value of the third preset number can be determined according to user experience and the like, and is not limited herein. For example, when the third preset number has a value of 15, the above process may be: and sequencing the 182 candidate positive samples from large to small according to the request times, and selecting 15 second reference positive samples with the largest request times.
Fifth, determining a second target number of second reference negative samples and the value of the second number based on a third preset number of second reference positive samples may specifically include: (1) constructing a plurality of second intermediate models. And each second intermediate model in the plurality of second intermediate models uses a third preset number of second reference positive samples and x second reference negative samples when calculating the similarity, and the number of the selected second target similarity is k, wherein x is greater than the third preset number, x is less than or equal to the sum of the third preset number and a fourth preset number, the fourth preset number is greater than 1, and k is an odd number. The x second reference negative samples and/or the values of k used by different ones of the plurality of second intermediate models are different. It should be noted that, for any one second intermediate model, when x second reference negative samples are used, the x second reference negative samples used by the second intermediate model can be obtained by: and sequencing the candidate negative samples according to the number of requests from large to small, and selecting x second reference negative samples with the largest number of requests. (2) And inputting the constructed second verification set into each second intermediate model for detection, and obtaining second detection accuracy of each second intermediate model on the second verification set so as to obtain a plurality of second detection accuracy corresponding to the plurality of second intermediate models respectively. (3) And taking the second target intermediate model corresponding to the highest second detection accuracy rate in the plurality of second detection accuracy rates as a second preset time sequence detection model. A third preset number of second reference positive samples and x second reference negative samples used in the second target intermediate model are the second reference data related in the second preset time sequence detection model; the value of k used in the second target intermediate model is the second number of values.
For example, in the implementation of the fifth step, the third preset number is set to be 15, the fourth preset number is set to be 10, and 15 second reference positive samples are determined to be used; x sequentially gets from 16 to 25 one by one, k sequentially gets 1, 3, 5, 7 and 9, and can determine that x gets 16 and k gets 1, the corresponding second intermediate model determines that x gets 17 and k gets 1, the corresponding second intermediate model determines that x gets 19 and k gets 3, and the like, which are not enumerated herein, so as to obtain 50 second intermediate models; detecting the verification set by adopting the 50 second intermediate models to obtain the detection accuracy of each second intermediate model so as to obtain 50 second detection accuracy; and taking the second target intermediate model corresponding to the highest second detection rate in the 50 second detection accuracy rates as the second preset time sequence detection model. Assuming that x corresponding to the second target intermediate model is equal to 19 and k is equal to 3, the 15 second reference positive samples and the 19 second reference negative samples used by the second target intermediate model are the plurality of second reference data related in the second preset time sequence detection model; and 3 used in the second target intermediate model is a second number of values.
According to the training method, the traversing modes of x and k are adopted, so that a second preset time sequence detection model with high accuracy can be obtained.
According to the technical scheme, under the condition that the flow to be detected is determined to be normal based on the second detection result, the time sequence information of the flow to be detected is detected again to obtain the third detection result, and under the condition that the time sequence information of the flow to be detected is abnormal as indicated by the third detection result, the flow to be detected is determined to be abnormal, and the condition that the flow to be detected is abnormal and is not detected by the rule engine is avoided, namely the rule engine is prevented from detecting the flow to be detected in a missing mode, so that the detection accuracy of the flow to be detected is improved.
And the number of the one or more second normal time sequence reference data is smaller than that of the plurality of second abnormal time sequence reference data, so that the unbalanced characteristic of the plurality of second time sequence reference data is utilized, the probability that the time sequence information of the flow to be detected is detected as the abnormal time sequence is improved, the missed flow to be detected is facilitated, and the detection accuracy of the flow to be detected is improved.
As shown in fig. 4, illustratively, the processing of the flow to be measured by adopting the method includes:
S401, aggregating the second part of standardized logs based on the identification to obtain a plurality of flows. The identification may be an IP address, IPC address, JA3 fingerprint, UA, etc. The aggregation process is performed to employ a big data calculation engine, which may be a spark platform, for statistics.
S402, regarding any one flow in a plurality of flows, when the identifier corresponding to the flow belongs to the dimension of the target identifier, the flow is taken as the flow to be measured.
S403, acquiring time sequence information of the flow to be detected. The acquisition mode can be based on the identification of the flow to be detected, and the time sequence of the identification in each sub-period of the target period is counted. When the target period is 1 day and each sub-period in the target period is 1 minute, the time sequence information of the flow to be measured is a 24×60=1440 dimension feature, and each numerical value in the 24×60=1440 dimension feature may represent the number of requests of the identifier corresponding to the flow to be measured for the target service line within the corresponding 1 minute. The target period may be 1 day, 2 days, etc., and the subperiod may be 1 minute, 5 minutes, 10 minutes, etc., except for the others. The statistics are processed to make statistics using a big data calculation engine, which may be a spark platform.
S404, inputting the time sequence information of the flow to be detected into a first preset time sequence detection model to obtain a first detection result output by the first preset time sequence detection model.
And S405, generating reporting information of the flow to be detected based on the first detection result, and sending the reporting information to a rule engine.
S406, obtaining a second detection result of the flow to be detected from the rule engine, wherein the second detection result is obtained by detecting the flow to be detected based on the reported information.
S407, when the flow to be measured is determined to be abnormal based on the second detection result, determining that the flow to be measured is abnormal.
And S408, detecting time sequence information of the flow to be detected under the condition that the flow to be detected is determined to be normal based on the second detection result, so as to obtain a third detection result.
S409, determining that the flow to be detected is abnormal flow when the third detection result indicates that the time sequence information of the flow to be detected is abnormal.
S410, determining that the flow to be detected is normal flow under the condition that the third detection result indicates that the time sequence information of the flow to be detected is normal.
An embodiment of a second aspect of the present disclosure provides a processing apparatus for traffic data, as shown in fig. 5, including:
The flow obtaining module to be tested 501 is configured to extract a flow to be tested when an identifier corresponding to the flow to be tested belongs to a target identifier dimension, where the target identifier dimension is obtained based on a target historical flow of which a detection result is misjudged;
the first timing detection module 502 is configured to detect timing information of the flow to be detected, and obtain a first detection result, where the first detection result is used to indicate whether the timing information of the flow to be detected is normal;
and the report information generating module 503 is configured to generate report information of the flow to be detected based on the first detection result, and send the report information to a rule engine, where the rule engine is configured to detect the flow to be detected based on the report information.
In some possible embodiments, the report information generating module 503 is configured to, when the first detection result indicates that the time sequence information of the flow to be measured is normal, filter the identified attribute features in the multiple features corresponding to the flow to be measured to obtain one or more features to be reported of the flow to be measured, add the one or more features to be reported to report information, and send the report information to the rule engine.
As shown in fig. 6, in some possible embodiments, the apparatus further comprises: the target identification dimension determining module 504 is configured to detect timing information of a candidate historical traffic, and obtain a timing detection result of the candidate historical traffic, where the detection result of the candidate historical traffic is abnormal; and under the condition that the false judgment occurs on the detection result of the candidate historical flow based on the time sequence detection result of the candidate historical flow, taking the candidate historical flow as a target historical flow, and obtaining the target identification dimension based on the identification dimension to which the identification corresponding to the target historical flow belongs.
In some possible embodiments, the first timing detection module 502 in fig. 5 or fig. 6 is configured to calculate a plurality of first similarities between the timing information of the flow to be measured and a plurality of first reference data, where the plurality of first reference data includes a plurality of first normal timing reference data and one or more first abnormal timing reference data, and a number of the plurality of first normal timing reference data is greater than a number of the one or more first abnormal timing reference data, and different first similarities in the plurality of first similarities correspond to different first reference data; selecting a first maximum number of first target similarities from the first plurality of similarities, and determining first target reference data corresponding to the first number of first target similarities respectively, wherein the first number is smaller than the first plurality of first similarities; and obtaining a first detection result when the number of the first normal time sequence target reference data is larger than the number of the first abnormal time sequence target reference data in the first target reference data corresponding to the first number of the first target similarities, wherein the first detection result is used for indicating that the time sequence information of the flow to be detected is normal.
Referring again to fig. 6, in some possible embodiments, the apparatus further comprises:
the detection result obtaining module 505 is configured to obtain a second detection result of the flow to be detected from the rule engine, where the second detection result is obtained by detecting the flow to be detected based on the reported information;
a second timing detection module 506, configured to detect timing information of the flow to be detected, to obtain a third detection result, where the flow to be detected is determined to be normal based on the second detection result; and determining that the flow to be detected is abnormal flow under the condition that the third detection result indicates that the time sequence information of the flow to be detected is abnormal.
In some possible embodiments, the second timing detection module 506 is configured to calculate a plurality of second similarities between the timing information of the flow to be measured and a plurality of second reference data, where the plurality of second reference data includes one or more second normal timing reference data and a plurality of second abnormal timing reference data, and a number of the one or more second normal timing reference data is smaller than a number of the plurality of second abnormal timing reference data, and different second similarities in the plurality of second similarities correspond to different second reference data; selecting a maximum second number of second target similarities from the plurality of second similarities, and determining second target reference data corresponding to the second number of second target similarities respectively, wherein the second number is smaller than the plurality of second similarities; and obtaining a third detection result when the number of second normal time sequence target reference data is smaller than the number of second abnormal time sequence target reference data in second target reference data corresponding to the second number of second target similarities, wherein the third detection result is used for indicating that the flow to be detected is abnormal flow.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above. For example, in some embodiments, the various methods described above may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. When the computer program is loaded into RAM703 and executed by computing unit 701, one or more steps of the various methods described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the methods described above in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (15)

1. A method of processing traffic data, comprising:
extracting the flow to be detected under the condition that an identifier corresponding to the flow to be detected belongs to a target identifier dimension, wherein the target identifier dimension is obtained based on a target historical flow which is misjudged by a detection result;
detecting the time sequence information of the flow to be detected to obtain a first detection result, wherein the first detection result is used for indicating whether the time sequence information of the flow to be detected is normal or not;
And generating reporting information of the flow to be detected based on the first detection result, and sending the reporting information to a rule engine, wherein the rule engine is used for detecting the flow to be detected based on the reporting information.
2. The method of claim 1, wherein the generating the report information of the traffic under test based on the first detection result, and sending the report information to a rules engine, comprises:
and under the condition that the first detection result indicates that the time sequence information of the flow to be detected is normal, filtering attribute features of the identifiers in the various features corresponding to the flow to be detected to obtain one or more features to be reported of the flow to be detected, adding the one or more features to be reported into reporting information, and sending the reporting information to the rule engine.
3. The method of claim 1, the method further comprising:
detecting the time sequence information of the candidate historical flow to obtain a time sequence detection result of the candidate historical flow, wherein the detection result of the candidate historical flow is abnormal;
and under the condition that the false judgment occurs on the detection result of the candidate historical flow based on the time sequence detection result of the candidate historical flow, taking the candidate historical flow as a target historical flow, and obtaining the target identification dimension based on the identification dimension to which the identification corresponding to the target historical flow belongs.
4. The method of claim 1, wherein the detecting the timing information of the flow to be detected to obtain a first detection result includes:
calculating a plurality of first similarities between the time sequence information of the flow to be detected and a plurality of first reference data, wherein the plurality of first reference data comprises a plurality of first normal time sequence reference data and one or more first abnormal time sequence reference data, the number of the plurality of first normal time sequence reference data is larger than that of the one or more first abnormal time sequence reference data, and different first similarities in the plurality of first similarities correspond to different first reference data;
selecting a first maximum number of first target similarities from the first plurality of similarities, and determining first target reference data corresponding to the first number of first target similarities respectively, wherein the first number is smaller than the first plurality of first similarities;
and obtaining a first detection result when the number of the first normal time sequence target reference data is larger than the number of the first abnormal time sequence target reference data in the first target reference data corresponding to the first number of the first target similarities, wherein the first detection result is used for indicating that the time sequence information of the flow to be detected is normal.
5. The method of claim 1, wherein after the sending the report information to a rules engine, the method further comprises:
obtaining a second detection result of the flow to be detected from the rule engine, wherein the second detection result is obtained by detecting the flow to be detected based on the reported information;
under the condition that the flow to be detected is determined to be normal based on the second detection result, detecting time sequence information of the flow to be detected to obtain a third detection result;
and determining that the flow to be detected is abnormal flow under the condition that the third detection result indicates that the time sequence information of the flow to be detected is abnormal.
6. The method of claim 5, wherein detecting the timing information of the flow to be detected to obtain a third detection result comprises:
calculating a plurality of second similarities of the time sequence information of the flow to be detected and a plurality of second reference data, wherein the plurality of second reference data comprises one or a plurality of second normal time sequence reference data and a plurality of second abnormal time sequence reference data, the number of the one or the plurality of second normal time sequence reference data is smaller than that of the plurality of second abnormal time sequence reference data, and different second similarities in the plurality of second similarities correspond to different second reference data;
Selecting a maximum second number of second target similarities from the plurality of second similarities, and determining second target reference data corresponding to the second number of second target similarities respectively, wherein the second number is smaller than the plurality of second similarities;
and obtaining a third detection result when the number of second normal time sequence target reference data is smaller than the number of second abnormal time sequence target reference data in second target reference data corresponding to the second number of second target similarities, wherein the third detection result is used for indicating that the flow to be detected is abnormal flow.
7. A processing apparatus for traffic data, comprising:
the flow obtaining module to be measured is used for extracting the flow to be measured under the condition that the identifier corresponding to the flow to be measured belongs to the dimension of the target identifier, wherein the dimension of the target identifier is obtained based on the target historical flow which is misjudged by the detection result;
the first time sequence detection module is used for detecting the time sequence information of the flow to be detected to obtain a first detection result, wherein the first detection result is used for indicating whether the time sequence information of the flow to be detected is normal or not;
And the report information generation module is used for generating report information of the flow to be detected based on the first detection result and sending the report information to a rule engine, wherein the rule engine is used for detecting the flow to be detected based on the report information.
8. The apparatus of claim 7, wherein the report information generating module is configured to, when the first detection result indicates that the time sequence information of the flow to be measured is normal, filter attribute features of the identifier in the multiple features corresponding to the flow to be measured to obtain one or more features to be reported of the flow to be measured, add the one or more features to be reported to report information, and send the report information to the rule engine.
9. The apparatus of claim 7, the apparatus further comprising: the target identification dimension determining module is used for detecting the time sequence information of the candidate historical flow to obtain a time sequence detection result of the candidate historical flow, wherein the detection result of the candidate historical flow is abnormal; and under the condition that the false judgment occurs on the detection result of the candidate historical flow based on the time sequence detection result of the candidate historical flow, taking the candidate historical flow as a target historical flow, and obtaining the target identification dimension based on the identification dimension to which the identification corresponding to the target historical flow belongs.
10. The apparatus of claim 7, wherein the first timing detection module is configured to calculate a plurality of first similarities between the timing information of the flow to be measured and a plurality of first reference data, where the plurality of first reference data includes a plurality of first normal timing reference data and one or more first abnormal timing reference data, the number of the plurality of first normal timing reference data is greater than the number of the one or more first abnormal timing reference data, and different first similarities in the plurality of first similarities correspond to different first reference data; selecting a first maximum number of first target similarities from the first plurality of similarities, and determining first target reference data corresponding to the first number of first target similarities respectively, wherein the first number is smaller than the first plurality of first similarities; and obtaining a first detection result when the number of the first normal time sequence target reference data is larger than the number of the first abnormal time sequence target reference data in the first target reference data corresponding to the first number of the first target similarities, wherein the first detection result is used for indicating that the time sequence information of the flow to be detected is normal.
11. The apparatus of claim 7, the apparatus further comprising:
the detection result acquisition module is used for acquiring a second detection result of the flow to be detected from the rule engine, wherein the second detection result is obtained by detecting the flow to be detected based on the reported information;
the second time sequence detection module is used for detecting time sequence information of the flow to be detected under the condition that the flow to be detected is determined to be normal based on the second detection result, so as to obtain a third detection result; and determining that the flow to be detected is abnormal flow under the condition that the third detection result indicates that the time sequence information of the flow to be detected is abnormal.
12. The apparatus of claim 11, wherein the second timing detection module is configured to calculate a plurality of second similarities of the timing information of the flow to be measured and a plurality of second reference data, wherein the plurality of second reference data includes one or more second normal timing reference data and a plurality of second abnormal timing reference data, the number of the one or more second normal timing reference data is smaller than the number of the plurality of second abnormal timing reference data, and different second similarities in the plurality of second similarities correspond to different second reference data; selecting a maximum second number of second target similarities from the plurality of second similarities, and determining second target reference data corresponding to the second number of second target similarities respectively, wherein the second number is smaller than the plurality of second similarities; and obtaining a third detection result when the number of second normal time sequence target reference data is smaller than the number of second abnormal time sequence target reference data in second target reference data corresponding to the second number of second target similarities, wherein the third detection result is used for indicating that the flow to be detected is abnormal flow.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-6.
CN202310839185.4A 2023-07-10 2023-07-10 Flow data processing method and device, electronic equipment and storage medium Pending CN116781377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310839185.4A CN116781377A (en) 2023-07-10 2023-07-10 Flow data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310839185.4A CN116781377A (en) 2023-07-10 2023-07-10 Flow data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116781377A true CN116781377A (en) 2023-09-19

Family

ID=88011463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310839185.4A Pending CN116781377A (en) 2023-07-10 2023-07-10 Flow data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116781377A (en)

Similar Documents

Publication Publication Date Title
CN108989150B (en) Login abnormity detection method and device
CN108667856B (en) Network anomaly detection method, device, equipment and storage medium
CN111143102B (en) Abnormal data detection method and device, storage medium and electronic equipment
CN108228722B (en) Method for detecting geographic space distribution uniformity of sampling points in crushing area
CN107276851B (en) Node abnormity detection method and device, network node and console
CN110598959A (en) Asset risk assessment method and device, electronic equipment and storage medium
CN114595765A (en) Data processing method and device, electronic equipment and storage medium
CN116743474A (en) Decision tree generation method and device, electronic equipment and storage medium
CN116226644A (en) Method and device for determining equipment fault type, electronic equipment and storage medium
CN116781377A (en) Flow data processing method and device, electronic equipment and storage medium
CN113535458B (en) Abnormal false alarm processing method and device, storage medium and terminal
CN115333783A (en) API call abnormity detection method, device, equipment and storage medium
CN115102728B (en) Scanner identification method, device, equipment and medium for information security
CN115829160B (en) Time sequence abnormality prediction method, device, equipment and storage medium
CN117395071B (en) Abnormality detection method, abnormality detection device, abnormality detection equipment and storage medium
CN115378746B (en) Network intrusion detection rule generation method, device, equipment and storage medium
CN112836212B (en) Mail data analysis method, phishing mail detection method and device
CN115426143A (en) Method, device, equipment and storage medium for identifying abnormal identity
CN117061216A (en) Automatic blocking method, device, equipment and storage medium for network attack
CN117076988A (en) Abnormal behavior detection method, device, equipment and medium
CN117768193A (en) Safety monitoring method, device, equipment and medium for industrial control network
CN117608896A (en) Transaction data processing method and device, electronic equipment and storage medium
CN115643182A (en) Flow detection method and device and electronic equipment
CN115603947A (en) Abnormal access detection method and device
CN117609723A (en) Object identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination