CN113691525A - Traffic data processing method, device, equipment and storage medium - Google Patents

Traffic data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113691525A
CN113691525A CN202110967208.0A CN202110967208A CN113691525A CN 113691525 A CN113691525 A CN 113691525A CN 202110967208 A CN202110967208 A CN 202110967208A CN 113691525 A CN113691525 A CN 113691525A
Authority
CN
China
Prior art keywords
threat
data
target
intelligence data
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110967208.0A
Other languages
Chinese (zh)
Inventor
杭家囡
范渊
黄进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202110967208.0A priority Critical patent/CN113691525A/en
Publication of CN113691525A publication Critical patent/CN113691525A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for processing flow data; in the scheme, a threat probability table is required to be trained and generated, and the condition probability values of each threat characteristic in non-threat information data and threat information data are recorded in the threat probability table; when the target flow data to be analyzed is processed, the target threat characteristics are identified from the target flow data, and the probability value of the target flow data as the threat flow data is calculated by combining the threat probability table and the Bayesian classification algorithm; and if the probability value is larger than the first threat threshold value, judging that the target flow data is threat flow data. Therefore, when the flow data is processed, the threat flow data can be quickly and accurately identified by combining the target threat characteristics in the target flow data with the Bayesian classification algorithm and the threat probability table, and the safety of the flow is improved.

Description

Traffic data processing method, device, equipment and storage medium
Technical Field
The present invention relates to the field of network security technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing traffic data.
Background
With the continuous progress of information technology in China, the number of crimes related to the field of computer information is increased, and the influence on the country and individuals is also increased. The method has become a key point for identifying threat traffic and acquiring attack sources in real time and rapidly. However, the vast amount of threat intelligence data makes it difficult to find the source of the attack in a short time. In the conventional scheme, manual experience screening is required, such as: according to the past experience, the whole network access and the local operation screening are carried out, a large amount of manpower and material resources are wasted, and complicated and disordered attack means cannot be dealt with.
Disclosure of Invention
The invention aims to provide a flow data processing method, a flow data processing device, flow data processing equipment and a flow data processing storage medium, so that threat flow data can be identified quickly and accurately.
In order to achieve the above object, the present invention provides a traffic data processing method, including:
acquiring target flow data to be analyzed;
identifying a target threat characteristic from the target traffic data;
calculating the probability value of the target flow data as threat flow data according to the target threat characteristics, the threat probability table and a Bayesian classification algorithm; the threat probability table records the conditional probability values of each threat characteristic in non-threat intelligence data and threat intelligence data respectively;
and if the probability value is larger than a first threat threshold value, judging that the target flow data is threat flow data.
The traffic data processing method further comprises the following steps:
collecting non-threat intelligence data and threat intelligence data from the router and the gateway;
extracting the non-threat intelligence data and the threat characteristics of the threat intelligence data;
and training and generating the threat probability table by using the occurrence frequency of each threat characteristic in the non-threat intelligence data and the threat intelligence data respectively.
Wherein the training to generate the threat probability table by using the occurrence frequency of each threat feature in the non-threat intelligence data and the threat intelligence data respectively comprises:
generating a non-threat intelligence data hash table and a threat intelligence data hash table by using the occurrence frequency of each threat characteristic in the non-threat intelligence data and the threat intelligence data respectively;
calculating the ratio of the occurrence frequency of each threat characteristic in the non-threat intelligence data to the total occurrence frequency of all threat characteristics in the non-threat intelligence data according to the non-threat intelligence data hash table, taking the ratio as the conditional probability value of each threat characteristic in the non-threat intelligence data, and generating a threat probability table of the non-threat intelligence data according to the conditional probability value of each threat characteristic in the non-threat intelligence data;
and calculating the ratio of the occurrence frequency of each threat characteristic in the threat intelligence data to the total occurrence frequency of all the threat characteristics in the threat intelligence data according to the threat intelligence data hash table, wherein the ratio is used as the conditional probability value of each threat characteristic in the threat intelligence data, and a threat probability table of the threat intelligence data is generated according to the conditional probability value of each threat characteristic in the threat intelligence data.
Wherein, the threat characteristic of extracting the non-threat intelligence data and the threat intelligence data comprises:
extracting from the non-threat intelligence data and threat intelligence data: at least one of access interface information, host accessed asset information, port information, and access operation information as a threat characteristic.
The traffic data processing method further comprises the following steps:
if the probability value is not larger than a first threat threshold value, judging whether the probability value is larger than a second threat threshold value; the second threat threshold is less than the first threat threshold;
if yes, judging that the target flow data is suspicious flow data, and storing the suspicious flow data so as to identify whether the stored suspicious flow data is threat flow data or not in a manual mode regularly;
if not, the target flow data is judged to be safe flow data.
After the target traffic data is judged to be threat traffic data, the method further comprises the following steps:
searching the threat characteristic with the maximum threat probability value from the threat probability values of all target threat characteristics as a final probability characteristic; wherein the threat probability value of each target threat characteristic is: the ratio of the conditional probability value of each target threat characteristic in the threat intelligence data to the sum of the conditional probability values of all target threat characteristics in the threat intelligence data;
and determining an attack mode and an attack source of the target traffic data according to the final probability characteristics.
After determining the attack mode and the attack source of the target traffic data according to the final probability characteristics, the method further includes:
generating alarm information of the target flow data; wherein, the alarm information includes an attack mode and an attack source of the target traffic data.
To achieve the above object, the present invention provides a traffic data processing apparatus, including:
the acquisition module is used for acquiring target flow data to be analyzed;
an identification module to identify a target threat characteristic from the target traffic data;
the calculation module is used for calculating the probability value of the target flow data as threat flow data according to the target threat characteristics, the threat probability table and a Bayesian classification algorithm; the threat probability table records the conditional probability values of each threat characteristic in non-threat intelligence data and threat intelligence data respectively;
and the first judging module is used for judging that the target flow data is threat flow data when the probability value is greater than a first threat threshold value.
To achieve the above object, the present invention provides an electronic device comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the flow data processing method when executing the computer program.
To achieve the above object, the present invention provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the above traffic data processing method.
According to the scheme, the embodiment of the invention provides a flow data processing method, a flow data processing device, flow data processing equipment and a storage medium; in the scheme, a threat probability table is required to be trained and generated, and the condition probability values of each threat characteristic in non-threat information data and threat information data are recorded in the threat probability table; when the target flow data to be analyzed is processed, the target threat characteristics are identified from the target flow data, and the probability value of the target flow data as the threat flow data is calculated by combining a threat probability table and a Bayesian classification algorithm; and if the probability value is larger than the first threat threshold value, judging that the target flow data is threat flow data. Therefore, when the flow data is processed, the threat flow data can be quickly and accurately identified by combining the target threat characteristics in the target flow data with the Bayesian classification algorithm and the threat probability table, and the safety of the flow is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a traffic data processing method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a traffic data processing apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a flow data processing method, a flow data processing device, flow data processing equipment and a flow data processing storage medium, which are used for identifying threat flow data in real time, quickly and accurately by utilizing an advanced computer information security technology.
Referring to fig. 1, a flow diagram of a traffic data processing method provided in an embodiment of the present invention includes:
s101, obtaining target flow data to be analyzed;
specifically, when the traffic data is processed, the present embodiment may process newly acquired traffic data in real time, so as to improve the identification speed of the threat traffic data.
S102, identifying target threat characteristics from target flow data;
in this embodiment, the target threat characteristic may be at least one of access interface information, host accessed asset information, port information, access operation information, and the like, and is not limited herein as long as the obtained characteristic can be used to perform analysis of threat traffic.
S103, calculating probability values of the target flow data as threat flow data according to the target threat characteristics, the threat probability table and a Bayesian classification algorithm; the threat probability table records the conditional probability value of each threat characteristic in the non-threat intelligence data and the threat intelligence data respectively;
in this embodiment, the threat probability table is generated through training, and the threat probability table describes conditional probability values of each threat feature in the non-threat intelligence data and the threat intelligence data, respectively, and in this embodiment, the process of generating the threat probability table specifically includes: collecting non-threat intelligence data and threat intelligence data from the router and the gateway; extracting non-threat intelligence data and threat characteristics of the threat intelligence data; and training to generate a threat probability table by using the occurrence frequency of each threat characteristic in the non-threat intelligence data and the threat intelligence data respectively.
Specifically, when the non-threat intelligence data and the threat intelligence data are collected from the router and the gateway, the data are collected through hard probes and soft probes deployed on the router and the gateway, the probes can acquire real-time traffic information such as interface change, host asset change, port access and access information and the like, and then the collected data are screened, so that the non-threat intelligence data and the threat intelligence data are generated. Then extracting and counting the non-threat intelligence data and the threat characteristics in the threat intelligence data, such as: and at least one of the access interface information, the host accessed asset information, the port information and the access operation information is used as a threat characteristic, so that a non-threat intelligence data hash table and a threat intelligence data hash table can be generated by using the occurrence frequency of each threat characteristic in the non-threat intelligence data and the threat intelligence data respectively.
Further, the scheme also needs to calculate the ratio of the occurrence frequency of each threat characteristic in the non-threat intelligence data to the total occurrence frequency of the threat characteristic in the non-threat intelligence data according to a non-threat intelligence data hash table, and the ratio is used as a conditional probability value of each threat characteristic in the non-threat intelligence data, and a threat probability table of the non-threat intelligence data is generated according to the conditional probability value of each threat characteristic in the non-threat intelligence data; and calculating the ratio of the occurrence frequency of each threat characteristic in the threat intelligence data to the total occurrence frequency of all the threat characteristics in the threat intelligence data according to the threat intelligence data hash table, wherein the ratio is used as the conditional probability value of each threat characteristic in the threat intelligence data, and a threat probability table of the threat intelligence data is generated according to the conditional probability value of each threat characteristic in the threat intelligence data.
In this embodiment, for the threat characteristic a, the conditional probability value in the threat probability table of the non-threat intelligence data is g (a), the conditional probability value in the threat probability table of the threat intelligence data is n (a), and the values of g (a) and n (a) are the probability of the threat characteristic a appearing in the overall threat/non-threat data. See table 1 for a threat probability table for non-threat intelligence data, see table 2 for a threat probability table for threat intelligence data:
TABLE 1
Figure BDA0003224365700000051
Figure BDA0003224365700000061
TABLE 2
Threat feature Type of feature Conditional probability
Post Type of access 0.71
80 Access port 0.43
text/html;charset=utf-8 Traffic data type 0.67
mysql_select_db Sql attacksBy means of sentences 0.94
………… ………… …………
It should be noted that the bayesian classification formula is a classification model converted according to a naive bayes algorithm, and the naive bayes algorithm is as follows: let each data sample describe the values of n attributes with an n-dimensional feature vector, i.e.: x is { X1, X2, …, xn }, and m classes are assumed and are denoted by C1, C2, …, and Cm, respectively. Given an unknown data sample X (i.e., no class label), if the naive Bayes classification assigns the unknown sample X to a class Ci, then it must be P (Ci | X) > P (Cj | X)1 ≦ j ≦ m, j ≠ i. According to bayesian theorem, maximizing the posterior probability P (Ci | X) can be translated into maximizing the prior probability P (X | Ci) P (Ci), since P (X) is constant for all classes. If the training dataset has many attributes and tuples, the cost of computing P (X | Ci) can be very large, for which reason it is usually assumed that the values of the attributes are independent from each other, so that the prior probabilities P (X1| Ci), P (X2| Ci), …, P (xn | Ci) can be derived from the training dataset. According to the method, for a sample X of an unknown class, the probability P (X | Ci) P (Ci) that X belongs to each class Ci can be calculated respectively, and then the class with the highest probability is selected as the class. The naive Bayes algorithm is established on the premise that the attributes are independent. When the dataset satisfies this independence assumption, the accuracy of the classification is higher, otherwise it may be lower. In addition, the algorithm has no classification rule output.
In this embodiment, the bayesian classification algorithm used to calculate the probability value is:
Figure BDA0003224365700000071
a1, a2 and A3.. An are target threat characteristics, n is the total number of the target threat characteristics, and p (T | a1, a2 and A3.. An) represents the probability value of the target flow data when the target flow data contains a1, a2 and A3.. An; p (a1, a2, A3.. An | T) is the probability that threat intelligence data contains features a1 to An, p (T) is the proportion of threat intelligence data in the overall data, and p (a1, a2, A3.. An) is the probability that threat features a1, a2, A3.. An are contained. The process of obtaining the probability value through the Bayesian classification algorithm, the target threat characteristic and the threat probability table is as follows:
Figure BDA0003224365700000072
wherein the content of the first and second substances,
Figure BDA0003224365700000073
n (a1), N (a 2.... no.. N (An)) are conditional probability values of the targeted threat signatures a1, a2, A3.. No. An in the threat probability table of the non-threat intelligence data, G (a1), G (a 2.. No. G (An)) are conditional probability values of the targeted threat signatures a1, a2, A3.. No. An in the threat probability table of the threat intelligence data.
p (a1| T), p (a2| T),.. p (An | T) may be expressed as p (a | T), a threat probability value for the targeted threat signature a, such as: the threat probability value for the target threat signature a1 is p (a1| T), the threat probability value for the target threat signature a2 is p (a2| T), and so on.
In this embodiment, p (a | T) is calculated as:
Figure BDA0003224365700000081
that is, the threat probability value for each targeted threat feature is: a ratio of the conditional probability value in the threat intelligence data for each targeted threat characteristic to a sum of the conditional probability values in the threat intelligence data for all targeted threat characteristics, such as:
Figure BDA0003224365700000082
wherein p (a1), p (a2), p (an) can be represented as p (a), i.e.. p (an): the probability containing the threat characteristic A is calculated by the following steps:
Figure BDA0003224365700000083
the probability of containing threat signature a is:
Figure BDA0003224365700000084
therefore, in this embodiment, after the target threat characteristic is determined, the relevant conditional probability value may be searched from the threat probability table, and then the probability value that the target traffic data is the threat traffic data may be calculated through the bayesian classification algorithm.
S104, judging whether the probability value is larger than a first threat threshold value or not;
if yes, executing S105; if not, executing S106;
s105, judging that the target flow data are threat flow data;
s106, judging whether the probability value is larger than a second threat threshold value or not; wherein the second threat threshold is less than the first threat threshold; if yes, executing S107; if not, executing S108;
s107, judging that the target flow data are suspicious flow data, and storing the suspicious flow data so as to identify whether the stored suspicious flow data are threat flow data or not in a manual mode at regular intervals;
and S108, judging that the target flow data is safe flow data.
In this embodiment, the probability that the current target traffic data is threat traffic data can be obtained by calculating p (T | a1, a2, A3.. An). In this embodiment, the first threat threshold and the second threat threshold may be set by a user according to actual conditions. In this embodiment, the first threat threshold may be set to 83% and the second threat threshold may be set to 52%, so in this embodiment, if the calculated probability value is greater than 83%, it is determined that there is threat traffic data, and at this time, it needs to be intercepted immediately; if the calculated probability value is greater than 52% and less than 83%, suspicious flow data is judged, at this time, the suspicious flow data does not need to be intercepted, but the suspicious flow data needs to be stored, so that whether the stored suspicious flow data is threat flow data or not is identified periodically in a manual mode, and the stored suspicious flow data is used as new non-threat information data/threat information data to train and update the threat probability table; if the calculated probability value is less than 52%, the data is directly judged to be safe flow data, and the flow does not need to be intercepted.
In this embodiment, the training can be retrained by continuously inputting new non-threat intelligence data and threat intelligence data. After new non-threat intelligence data and threat intelligence data are acquired, the data need to be put into model training to regenerate a threat probability table of the non-threat intelligence data and the threat intelligence data. In addition, in this embodiment, after the target traffic data is determined to be threat traffic data, a threat feature with the maximum threat probability value may be searched from the threat probability values of each target threat feature as a final probability feature; and determining an attack mode and an attack source of the target traffic data according to the final probability characteristics. Wherein the threat probability value p (ajt) for each targeted threat feature is: a ratio of the conditional probability value in the threat intelligence data for each targeted threat characteristic to a sum of the conditional probability values in the threat intelligence data for all targeted threat characteristics, such as: the threat probability value for the target threat signature a1 is p (a1| T), the threat probability value for the target threat signature a2 is p (a2| T), and so on. After the final probability feature with the maximum threat probability value is determined, the attack mode of the target traffic data can be determined according to the feature type of the final probability feature recorded in the threat probability table, and the attack source can be identified according to an Internet Protocol (IP) address in the target traffic data.
Further, in this embodiment, after determining the attack mode and the attack source of the target traffic data according to the final probability characteristic, alarm information related to the target traffic data may be generated; wherein, the alarm information includes the attack mode and the attack source of the target flow data. By the method, managers can acquire threat flow data in time and know the attack mode and the attack source of the threat flow data.
In summary, in the scheme, a threat intelligence sensing method based on a bayesian classification algorithm is provided, non-threat intelligence data and threat intelligence data are obtained through a hard detection technology and a soft detection technology, a threat probability table is generated based on existing threat characteristics (access port behaviors, access paths, attack frequency and the like), and when traffic data is accessed, whether the traffic data is a threat traffic is judged through a bayesian classification formula and the threat probability table. In addition, the threat probability table in the scheme can also expand new threat characteristics in real time, retrain the new threat probability table to obtain a new threat probability table, and analyze the attack mode of threat information, even the attack source, according to the probability on the threat probability table. The scheme can improve the detection capability in the face of complex threat situation report with a mixed attack means through a Bayesian classification algorithm.
The processing apparatus, the processing device, and the storage medium according to the embodiments of the present invention are described below, and the processing apparatus, the processing device, and the storage medium described below may be referred to the processing method described above.
Referring to fig. 2, a schematic structural diagram of a traffic data processing apparatus provided in an embodiment of the present invention includes:
the acquisition module 11 is configured to acquire target flow data to be analyzed;
an identification module 12 for identifying a target threat characteristic from the target traffic data;
the calculation module 13 is configured to calculate a probability value of the target traffic data as the threat traffic data according to the target threat characteristic, the threat probability table, and a bayesian classification algorithm; the threat probability table records the conditional probability values of each threat characteristic in non-threat intelligence data and threat intelligence data respectively;
a first determining module 14, configured to determine that the target traffic data is threat traffic data when the probability value is greater than a first threat threshold.
Wherein the apparatus further comprises:
the collection module is used for collecting non-threat information data and threat information data from the router and the gateway;
the extraction module is used for extracting the non-threat intelligence data and the threat characteristics of the threat intelligence data;
and the training module is used for training and generating the threat probability table by utilizing the occurrence frequency of each threat characteristic in the non-threat intelligence data and the threat intelligence data respectively.
Wherein the training module comprises:
the first generation unit is used for generating a non-threat intelligence data hash table and a threat intelligence data hash table by utilizing the occurrence frequency of each threat characteristic in the non-threat intelligence data and the threat intelligence data respectively;
a second generation unit, configured to calculate, according to the non-threat intelligence data hash table, a ratio of the occurrence frequency of each threat feature in the non-threat intelligence data to the total occurrence frequency of all threat features in the non-threat intelligence data, as a conditional probability value of each threat feature in the non-threat intelligence data, and generate a threat probability table of the non-threat intelligence data according to the conditional probability value of each threat feature in the non-threat intelligence data;
and the third generating unit is used for calculating the ratio of the occurrence frequency of each threat characteristic in the threat intelligence data to the total occurrence frequency of all the threat characteristics in the threat intelligence data according to the threat intelligence data Hash table, and taking the ratio as the conditional probability value of each threat characteristic in the threat intelligence data, and generating the threat probability table of the threat intelligence data according to the conditional probability value of each threat characteristic in the threat intelligence data.
Wherein the extraction module is specifically configured to: extracting from the non-threat intelligence data and threat intelligence data: at least one of access interface information, host accessed asset information, port information, and access operation information as a threat characteristic.
Wherein the apparatus further comprises:
the judging module is used for judging whether the probability value is larger than a second threat threshold value or not when the probability value is not larger than a first threat threshold value; the second threat threshold is less than the first threat threshold;
the second judging module is used for judging that the target flow data is suspicious flow data when the probability value is larger than a second threat threshold value, and storing the suspicious flow data so as to identify whether the stored suspicious flow data is threat flow data or not in a manual mode regularly;
and the third judging module is used for judging that the target flow data is safe flow data when the probability value is not greater than a second threat threshold value.
Wherein the apparatus further comprises:
the searching module is used for searching the threat characteristic with the maximum threat probability value from the threat probability values of all the target threat characteristics as a final probability characteristic; wherein the threat probability value of each target threat characteristic is: the ratio of the conditional probability value of each target threat characteristic in the threat intelligence data to the sum of the conditional probability values of all target threat characteristics in the threat intelligence data;
and the determining module is used for determining the attack mode and the attack source of the target flow data according to the final probability characteristics.
Wherein the apparatus further comprises:
the generating module is used for generating alarm information of the target flow data; wherein, the alarm information includes an attack mode and an attack source of the target traffic data.
Referring to fig. 3, an embodiment of the present invention further discloses a structural schematic diagram of an electronic device, including:
a memory 21 for storing a computer program;
a processor 22, configured to implement the steps of the traffic data processing method according to the above method embodiment when executing the computer program.
In this embodiment, the device may be a PC (Personal Computer), or may be a terminal device such as a smart phone, a tablet Computer, a palmtop Computer, or a portable Computer.
The device may include a memory 21, a processor 22, and a bus 23.
The memory 21 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 21 may in some embodiments be an internal storage unit of the device, for example a hard disk of the device. The memory 21 may also be an external storage device of the device in other embodiments, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) Card, Flash memory Card (Flash Card), etc. provided on the device. Further, the memory 21 may also include both an internal storage unit of the device and an external storage device. The memory 21 may be used not only to store application software installed in the device and various types of data such as program codes for executing processing methods, etc., but also to temporarily store data that has been output or is to be output.
The processor 22 may be, in some embodiments, a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip for executing program codes stored in the memory 21 or Processing data, such as program codes for executing Processing methods.
The bus 23 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.
Further, the device may further include a network interface 24, and the network interface 24 may optionally include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are generally used to establish a communication connection between the device and other electronic devices.
Optionally, the device may further comprise a user interface 25, the user interface 25 may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 25 may also comprise a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the device and for displaying a visualized user interface.
Fig. 3 shows only the device with the components 21-25, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the device, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the flow data processing method of the embodiment of the method are realized.
Wherein the storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In conclusion, the scheme can extract the keywords of the alarm data, such as access port behaviors, access paths, attack frequency and the like, on the basis of the equipment platform for acquiring the traffic data. Then training out a threat intelligent perception data model through probability statistical knowledge of a Bayesian classification algorithm, such as: according to the method, after new flow data are received, whether the flow is the threat flow can be detected through the data model, and the position of an attack source is determined.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for processing traffic data, comprising:
acquiring target flow data to be analyzed;
identifying a target threat characteristic from the target traffic data;
calculating the probability value of the target flow data as threat flow data according to the target threat characteristics, the threat probability table and a Bayesian classification algorithm; the threat probability table records the conditional probability values of each threat characteristic in non-threat intelligence data and threat intelligence data respectively;
and if the probability value is larger than a first threat threshold value, judging that the target flow data is threat flow data.
2. The traffic data processing method according to claim 1, characterized by further comprising:
collecting non-threat intelligence data and threat intelligence data from the router and the gateway;
extracting the non-threat intelligence data and the threat characteristics of the threat intelligence data;
and training and generating the threat probability table by using the occurrence frequency of each threat characteristic in the non-threat intelligence data and the threat intelligence data respectively.
3. The traffic data processing method according to claim 2, wherein the training to generate the threat probability table using the number of occurrences of each threat feature in the non-threat intelligence data and the threat intelligence data, respectively, comprises:
generating a non-threat intelligence data hash table and a threat intelligence data hash table by using the occurrence frequency of each threat characteristic in the non-threat intelligence data and the threat intelligence data respectively;
calculating the ratio of the occurrence frequency of each threat characteristic in the non-threat intelligence data to the total occurrence frequency of all threat characteristics in the non-threat intelligence data according to the non-threat intelligence data hash table, taking the ratio as the conditional probability value of each threat characteristic in the non-threat intelligence data, and generating a threat probability table of the non-threat intelligence data according to the conditional probability value of each threat characteristic in the non-threat intelligence data;
and calculating the ratio of the occurrence frequency of each threat characteristic in the threat intelligence data to the total occurrence frequency of all the threat characteristics in the threat intelligence data according to the threat intelligence data hash table, wherein the ratio is used as the conditional probability value of each threat characteristic in the threat intelligence data, and a threat probability table of the threat intelligence data is generated according to the conditional probability value of each threat characteristic in the threat intelligence data.
4. The traffic data processing method according to claim 2, wherein said extracting threat characteristics of said non-threat intelligence data and threat intelligence data comprises:
extracting from the non-threat intelligence data and threat intelligence data: at least one of access interface information, host accessed asset information, port information, and access operation information as a threat characteristic.
5. The traffic data processing method according to claim 1, characterized by further comprising:
if the probability value is not larger than a first threat threshold value, judging whether the probability value is larger than a second threat threshold value; the second threat threshold is less than the first threat threshold;
if yes, judging that the target flow data is suspicious flow data, and storing the suspicious flow data so as to identify whether the stored suspicious flow data is threat flow data or not in a manual mode regularly;
if not, the target flow data is judged to be safe flow data.
6. The traffic data processing method according to any one of claims 1 to 5, further comprising, after determining that the target traffic data is threat traffic data:
searching the threat characteristic with the maximum threat probability value from the threat probability values of all target threat characteristics as a final probability characteristic; wherein the threat probability value of each target threat characteristic is: the ratio of the conditional probability value of each target threat characteristic in the threat intelligence data to the sum of the conditional probability values of all target threat characteristics in the threat intelligence data;
and determining an attack mode and an attack source of the target traffic data according to the final probability characteristics.
7. The traffic data processing method according to claim 6, wherein after determining the attack mode and the attack source of the target traffic data according to the final probability feature, the method further comprises:
generating alarm information of the target flow data; wherein, the alarm information includes an attack mode and an attack source of the target traffic data.
8. A traffic data processing apparatus, comprising:
the acquisition module is used for acquiring target flow data to be analyzed;
an identification module to identify a target threat characteristic from the target traffic data;
the calculation module is used for calculating the probability value of the target flow data as threat flow data according to the target threat characteristics, the threat probability table and a Bayesian classification algorithm; the threat probability table records the conditional probability values of each threat characteristic in non-threat intelligence data and threat intelligence data respectively;
and the first judging module is used for judging that the target flow data is threat flow data when the probability value is greater than a first threat threshold value.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the traffic data processing method according to any one of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the traffic data processing method according to any one of claims 1 to 7.
CN202110967208.0A 2021-08-23 2021-08-23 Traffic data processing method, device, equipment and storage medium Withdrawn CN113691525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110967208.0A CN113691525A (en) 2021-08-23 2021-08-23 Traffic data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110967208.0A CN113691525A (en) 2021-08-23 2021-08-23 Traffic data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113691525A true CN113691525A (en) 2021-11-23

Family

ID=78581437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110967208.0A Withdrawn CN113691525A (en) 2021-08-23 2021-08-23 Traffic data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113691525A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114218992A (en) * 2021-12-29 2022-03-22 重庆紫光华山智安科技有限公司 Abnormal object detection method and related device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102957579A (en) * 2012-09-29 2013-03-06 北京邮电大学 Network anomaly traffic monitoring method and device
CN111125694A (en) * 2019-12-20 2020-05-08 杭州安恒信息技术股份有限公司 Threat information analysis method and system based on ant colony algorithm
WO2021017614A1 (en) * 2019-07-31 2021-02-04 平安科技(深圳)有限公司 Threat intelligence data collection and processing method and system, apparatus, and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102957579A (en) * 2012-09-29 2013-03-06 北京邮电大学 Network anomaly traffic monitoring method and device
WO2021017614A1 (en) * 2019-07-31 2021-02-04 平安科技(深圳)有限公司 Threat intelligence data collection and processing method and system, apparatus, and storage medium
CN111125694A (en) * 2019-12-20 2020-05-08 杭州安恒信息技术股份有限公司 Threat information analysis method and system based on ant colony algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴凡: "基于机器学习的异常数据流量分类", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114218992A (en) * 2021-12-29 2022-03-22 重庆紫光华山智安科技有限公司 Abnormal object detection method and related device
CN114218992B (en) * 2021-12-29 2023-09-08 重庆紫光华山智安科技有限公司 Abnormal object detection method and related device

Similar Documents

Publication Publication Date Title
CN109325165B (en) Network public opinion analysis method, device and storage medium
CN107992746B (en) Malicious behavior mining method and device
WO2019218514A1 (en) Method for extracting webpage target information, device, and storage medium
CN110163647B (en) Data processing method and device
CN111818198B (en) Domain name detection method, domain name detection device, equipment and medium
CN109284371B (en) Anti-fraud method, electronic device, and computer-readable storage medium
CN109165529B (en) Dark chain tampering detection method and device and computer readable storage medium
CN111813960B (en) Knowledge graph-based data security audit model device, method and terminal equipment
CN111581956B (en) Sensitive information identification method and system based on BERT model and K nearest neighbor
CN112765003B (en) Risk prediction method based on APP behavior log
WO2019196259A1 (en) Method for identifying false message and device thereof
CN111460803B (en) Equipment identification method based on Web management page of industrial Internet of things equipment
CN113076735A (en) Target information acquisition method and device and server
CN114692593B (en) Network information safety monitoring and early warning method
CN113315851A (en) Domain name detection method, device and storage medium
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
CN112883730A (en) Similar text matching method and device, electronic equipment and storage medium
CN111400448A (en) Method and device for analyzing incidence relation of objects
CN108875050B (en) Text-oriented digital evidence-obtaining analysis method and device and computer readable medium
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
CN113691525A (en) Traffic data processing method, device, equipment and storage medium
CN110674288A (en) User portrait method applied to network security field
CN112492606B (en) Classification recognition method and device for spam messages, computer equipment and storage medium
CN112579781A (en) Text classification method and device, electronic equipment and medium
Malik et al. Performance Evaluation of Classification Algorithms for Intrusion Detection on NSL-KDD Using Rapid Miner

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20211123