CN117081784A - Traffic identification model training method, identification method, device, equipment and medium - Google Patents

Traffic identification model training method, identification method, device, equipment and medium Download PDF

Info

Publication number
CN117081784A
CN117081784A CN202310884340.4A CN202310884340A CN117081784A CN 117081784 A CN117081784 A CN 117081784A CN 202310884340 A CN202310884340 A CN 202310884340A CN 117081784 A CN117081784 A CN 117081784A
Authority
CN
China
Prior art keywords
data
flow
flow data
training
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310884340.4A
Other languages
Chinese (zh)
Inventor
程筱彪
徐雷
张曼君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202310884340.4A priority Critical patent/CN117081784A/en
Publication of CN117081784A publication Critical patent/CN117081784A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a traffic recognition model training method, a traffic recognition device, traffic recognition equipment and a traffic recognition medium. The method comprises the following steps: acquiring sample flow data for model training; the sample flow data is a flow data packet transmitted based on an internet security protocol; extracting features of the sample flow data to obtain data features of the sample flow data; and carrying out clustering processing on each sample flow data at least one round based on the data characteristics until a preset training stop condition is met, and obtaining a trained flow identification model. According to the method provided by the application, the flow identification is performed by adopting the identification algorithm obtained based on the multiple feature clusters, so that the identification accuracy of abnormal flow is improved.

Description

Traffic identification model training method, identification method, device, equipment and medium
Technical Field
The present application relates to the field of traffic recognition technologies, and in particular, to a traffic recognition model training method, a traffic recognition device, a traffic recognition apparatus, and a traffic recognition medium.
Background
In recent years, the number of network attack events is increased, and at the same time, attack means are also becoming more hidden, and an attacker usually adopts an encrypted communication transmission mode to perform information interaction, so that the encrypted attack traffic is mixed in normal service data, and the discovery difficulty is high.
In the prior art, in the encrypted traffic identification method, a feature library is generated according to the existing network protocol, and is compared with the traffic features of the acquired traffic, so as to determine whether the captured network traffic is encrypted traffic or not according to the comparison result.
During the implementation process, the following steps are found: the mode of the independent comparison feature library may occur that part of novel encryption traffic cannot be identified due to untimely updating of the network protocol feature library; and judging only through the dimension of the network protocol, a large amount of abnormal traffic which is not easy to identify can possibly occur, so that the identification accuracy of the abnormal traffic is reduced.
Disclosure of Invention
The application provides a flow identification model training method, an identification device, equipment and a medium, which are used for solving the problem of low accuracy of an identification result caused by carrying out flow identification only through a feature comparison method in the prior art, and improving the identification accuracy of abnormal flow by carrying out flow identification through an identification algorithm obtained based on multiple feature clusters.
In a first aspect, the present application provides a traffic recognition model training method, including:
acquiring sample flow data for model training; the sample flow data is a flow data packet transmitted based on an internet security protocol;
Extracting features of the sample flow data to obtain data features of the sample flow data;
and carrying out clustering processing on each sample flow data at least one round based on the data characteristics until a preset training stop condition is met, and obtaining a trained flow identification model.
Optionally, the acquiring sample flow data for model training includes:
carrying out data packet screening treatment on the plurality of the captured network flow data packets to obtain flow data packets with complete session;
respectively carrying out field analysis processing on each flow data packet to obtain a data field corresponding to each flow data packet;
and screening the flow data packets based on the data fields to obtain sample flow data meeting the training requirements.
Optionally, the data features include attribute features and status features; the attribute features comprise zone bit data and transmission speed data of the sample flow data; the status features include idle status time data and active status time data of the traffic data.
Optionally, the clustering processing of at least one round is performed on each sample flow data based on the data features until a preset training stop condition is met, to obtain a trained flow identification model, which includes:
In the training process of any round, carrying out clustering processing on the data features based on the initial clustering center points of the current round to obtain clustering center points after the clustering processing;
and stopping training when each cluster center point meets a preset training stopping condition, and obtaining a flow identification model after training is completed.
Optionally, the method further comprises: before stopping training, judging whether each clustering center point meets a preset training stopping condition; the judging method comprises the following steps:
for any clustering center point, determining at least one data feature belonging to the current clustering center point, and acquiring a feature tag of each data feature;
and determining whether the current clustering center point meets a preset training stop condition or not based on each feature label.
In a second aspect, the present application further provides a traffic identification method, including:
acquiring flow data, and performing feature extraction on the flow data to obtain data features of the flow data; the flow data is a flow data packet transmitted based on an internet security protocol;
performing flow identification on the data characteristics based on a pre-trained flow identification model to obtain a target identification result of the flow data; the flow identification model is obtained by training based on the flow identification model training method described in the first aspect.
In a third aspect, the present application further provides a traffic recognition model training device, including:
the sample flow data acquisition module is used for acquiring sample flow data for model training; the sample flow data is a flow data packet transmitted based on an internet security protocol;
the data characteristic obtaining module is used for carrying out characteristic extraction on the sample flow data to obtain the data characteristics of the sample flow data;
and the model training module is used for carrying out clustering processing on each sample flow data at least one round based on the data characteristics until a preset training stop condition is met, so as to obtain a trained flow identification model.
In a fourth aspect, the present application also provides a flow identification device, including:
the data characteristic obtaining module is used for obtaining flow data, and carrying out characteristic extraction on the flow data to obtain data characteristics of the flow data; the flow data is a flow data packet transmitted based on an internet security protocol;
the recognition result obtaining module is used for carrying out flow recognition on the data characteristics based on a pre-trained flow recognition model to obtain a target recognition result of the flow data; the flow identification model is obtained by training based on the flow identification model training method described in the first aspect.
In a fifth aspect, the present application provides a terminal device, including: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method as described in the first aspect.
In a sixth aspect, the present application provides a computer readable storage medium having stored therein computer executable instructions for carrying out the method according to the first aspect when executed by a processor.
In a seventh aspect, the application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of the first aspect.
In the technical scheme provided by the application, the transmission protocol adopted by each network flow data is judged, and the network flow data transmitted by adopting the Internet security protocol is used as sample flow data to carry out subsequent recognition model training; extracting data features of the sample flow data to obtain various data features corresponding to the sample flow data; clustering training is further carried out on the basis of the data features, and a flow identification algorithm after training is completed is obtained; and the flow identification is performed by adopting an identification algorithm based on multiple feature clusters, so that the identification accuracy of abnormal flow is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is an application scenario diagram of a traffic recognition model training method according to an embodiment of the present application;
FIG. 2 is a flow chart of a flow recognition model training method according to an embodiment of the present application;
FIG. 3 is a flowchart of another flow identification model training method according to an embodiment of the present application;
fig. 4 is a flow chart of a flow identification method according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a flow recognition model training device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a flow rate identification device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application;
fig. 8 is a block diagram of a terminal device according to an embodiment of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
In practical applications, in order to ensure the security of network communication, traffic data communicated in the network needs to be identified. The existing identification method generally extracts the flow characteristics of flow data, compares the flow characteristics with all stored flows stored in a flow characteristic library which is constructed in advance based on the existing network protocol, and obtains the identification result of the flow data based on the comparison result. However, in the identification process based on the identification method, because the flow characteristic library is not updated timely, some novel encrypted flows cannot be identified correctly, and the accuracy of the identification result is low; in addition, the flow data is identified only by the flow characteristic comparison with a single dimension, and the accuracy of the identification result is further reduced possibly because some encrypted flows are identified by mistake.
The application provides a flow identification model training method, which aims to solve the technical problems in the prior art. Specifically, firstly, obtaining flow data meeting the transmission of an internet security protocol, realizing the flow screening identification of a first layer, further extracting a plurality of data features of the screened flow data, carrying out second layer flow identification by adopting a clustering algorithm which is trained in advance, obtaining a flow identification result, and improving the detection accuracy and coverage rate of the flow data.
Fig. 1 is an application scenario diagram of a traffic recognition model training method according to an embodiment of the present application. The application scenario to which the embodiment of the present application is applicable is described below with reference to fig. 1. Referring to fig. 1, an acquisition module and an identification module are disposed in the identification device. Specifically, the acquisition module acquires a plurality of groups of network flow data transmitted in equipment where the current identification device is located, and transmits the network flow data to the identification module; the identification module judges the transmission protocol adopted by each network flow data, and takes the network flow data transmitted by adopting the Internet security protocol as sample flow data to carry out subsequent identification model training; extracting data features of the sample flow data to obtain various data features corresponding to the sample flow data; and clustering training is further carried out on the basis of the data features respectively, so that a trained flow identification algorithm is obtained. In the subsequent recognition process, when the recognition module receives the network flow data input by the acquisition module again, if the network flow data is the network flow data transmitted by adopting the pre-transmission protocol, the data characteristics of the network flow data are acquired, and then the data characteristics are input into the trained flow recognition model, so that the flow recognition result output by the model is obtained.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a flow chart of a flow recognition model training method according to an embodiment of the present application. The method may be performed by a traffic recognition model training device, which may be a server or a terminal device, and the method in this embodiment may be implemented by software, hardware or a combination of software and hardware, as shown in fig. 2, where the method includes the following steps:
s210, acquiring sample flow data for model training; the sample traffic data is traffic data packets transmitted based on internet security protocol.
In an embodiment of the present application, traffic data transmitted based on the internet security protocol includes traffic data that can be understood as IPSec (Internet Protocol Security, internet security protocol).
In practical application, a preset acquisition device is adopted to acquire network flow data of the communication of the equipment. It should be noted that, in order to increase the training sales of the subsequent traffic recognition model, the sample traffic data used for model training in the present application is a complete session data packet. Specifically, the flow data with complete session can be obtained by performing data preprocessing on the acquired network flow data.
Further, the collected data is subjected to first data screening. Namely, a transmission protocol adopted by the network traffic data is acquired, and IPSec traffic data transmitted by adopting an Internet security protocol is screened to be used as sample data for subsequent training. It should be noted that, data screening can determine data transmitted by adopting other protocols as abnormal data, so that workload of subsequent data identification is reduced, and identification efficiency of flow identification can be improved.
Optionally, in order to train the traffic recognition model based on the sample traffic data, the sample traffic data needs to be labeled. Specifically, the marked sample traffic data may include m pieces of positive sample traffic data, i.e., normally encrypted traffic data, and n pieces of negative sample data, i.e., maliciously encrypted traffic data.
S220, extracting features of the sample flow data to obtain data features of the sample flow data.
In the embodiment of the application, in order to facilitate subsequent flow identification, the obtained flow sample data can be subjected to feature extraction to obtain the data features of the sample flow data. Optionally, feature extraction calculation can be performed through a preset feature extraction expression to obtain data features; and the data characteristics can be obtained by carrying out characteristic processing through a characteristic extraction model which is trained in advance. The embodiment of the application does not limit the extraction mode of the feature extraction.
In the present application, the data features include attribute features and status features; the attribute features comprise flag bit data and transmission speed data of the sample flow data; the status features include idle status time data and active status time data of the traffic data. Specifically, the flag bit characteristics of the sample flow data may include flag bit data such as the number of flag bit settings of the data URG (urgent bit) and the number of flag bit settings of the data PSH (push bit). The transmission speed data includes a data byte size transmitted per second and a data packet size transmitted per second. The idle state time of the traffic data includes an average idle time of the data, a minimum idle time of the data, and a maximum idle time of the data. The active state time of the traffic data includes an average active time of the data, a maximum active time of the data, and a minimum active time of the data. It should be noted that the above data features are only exemplary, and should not be taken as limiting the technical solution of the present application, and the technical solution of the present application also includes some common other data features.
S230, clustering processing is carried out on each sample flow data at least one round based on the data characteristics until a preset training stop condition is met, and a trained flow recognition model is obtained.
In the technical scheme, on the basis of respectively obtaining the sample flow data and the data characteristics of the sample flow data, the data characteristics are taken as training data to be brought into a clustering model for algorithmic clustering training processing. When the data features are adopted for training, clustering training processing can be sequentially carried out by adopting the data features, so that a flow identification model which can be obtained after training is finished is obtained. Alternatively, all the data features can be adopted for clustering training processing at the same time, and the input sequence of the data features is not limited.
Specifically, a preset initial clustering center point is obtained, data features are input into a flow identification model to be trained, such as a k-means model, and a current clustering result is obtained. Further, judging whether the clustering result of the current time meets a preset training stopping condition or not; optionally, if the flow identification model meets the flow identification model, stopping training to obtain the flow identification model after training; otherwise, if the current cluster center point is not satisfied, taking the current cluster center point as the initial cluster center point of the next iteration training, and carrying out the next clustering treatment to form a new cluster center point, and obtaining the flow identification model after training until the formed cluster center point satisfies the training stop condition.
In the technical scheme, the transmission protocol adopted by each network flow data is judged, and the network flow data transmitted by adopting the Internet security protocol is used as sample flow data to carry out subsequent recognition model training; extracting data features of the sample flow data to obtain various data features corresponding to the sample flow data; clustering training is further carried out on the basis of the data features, and a flow identification algorithm after training is completed is obtained; and the flow identification is performed by adopting an identification algorithm based on multiple feature clusters, so that the identification accuracy of abnormal flow is improved.
Fig. 3 is a flow chart of another flow recognition model training method according to an embodiment of the present application. This embodiment may be understood as an embodiment of the above embodiment describing a method of specifying steps, and referring to fig. 3, the method may specifically include:
s310, acquiring sample flow data for model training; the sample traffic data is traffic data packets transmitted based on internet security protocol.
Specifically, for understanding and examples of the technical means, technical effects, and technical terms in step S310, reference may be made to the explanation of step S210 in the above embodiments.
On the basis of the foregoing embodiment, in this embodiment, the step of determining in step S310 may specifically include:
s311, carrying out data packet screening processing on the plurality of the captured network flow data packets to obtain the flow data packets with complete session.
In the embodiment of the application, a preset acquisition device is adopted to acquire network traffic data, for example, a front-end processor is adopted to capture the network traffic data. Furthermore, the sample flow data for training can be obtained by performing data preprocessing on the acquired network flow data. Exemplary data preprocessing includes, but is not limited to, detecting incomplete session traffic packets using deep packet inspection apparatus, and deleting the detected incomplete session traffic packets. Illustratively, the session incomplete traffic data includes TCP flows without a complete handshake procedure, signaling incomplete sessions, and so on.
S312, respectively carrying out field analysis processing on each flow data packet to obtain a data field corresponding to each flow data packet.
In the embodiment of the application, a preset field analysis tool is acquired, the field analysis processing is carried out on the flow data packet with complete session based on the field analysis tool, a plurality of analyzed data fields are obtained, and then the screening processing of the flow data packet is carried out based on the data value corresponding to the data field.
S313, screening the flow data packets based on the data fields to obtain sample flow data meeting the training requirements.
In the embodiment of the application, multi-level screening processing can be performed on the streaming data packet based on a plurality of bytes so as to improve screening accuracy.
Specifically, for any parsed flow data packet, the 13 th byte and the 14 th byte of the flow data packet are obtained, and whether the flow data packet is the flow data packet to be discarded or not is judged based on the data values corresponding to the 13 th byte and the 14 th byte, namely whether the flow data packet is IPSec flow data or not. Optionally, if it is determined that the traffic data is not an IP data packet based on bytes 13 and 14, determining that the traffic data packet is a traffic data packet to be discarded; otherwise, the screening needs to be continued.
Further, 10 bytes of the traffic data packet are obtained, and whether the traffic data packet is the traffic data packet to be discarded or not is judged based on the data value corresponding to the 10 th byte; optionally, if the traffic data packet is determined to be transmitted by adopting a TCP protocol based on the 10 th byte, determining that the traffic data packet is a traffic data packet to be discarded; if it is determined that the traffic data packet adopts ESP (Encapsulating Security Payload, encapsulation security payload protocol)/AH (Authentication Header, authentication header protocol) for data transmission based on the 10 th byte, determining that the traffic data packet is IPSec traffic data, which can be used for subsequent model training; if the flow data packet is determined to adopt the UDP protocol to carry out data transmission based on the 10 th byte, continuing screening is carried out.
Further, 11 th to 14 th bytes of the flow data packet are obtained, and whether the flow data packet is the flow data packet to be discarded or not is judged based on the data value corresponding to the 11 th to 14 th bytes, namely the port number of the flow data packet; optionally, if the port number of the traffic data packet is 443 based on the 11 th-14 th byte, it is indicated that the traffic data packet is transmitted by SSL (Secure Socket Layer ) protocol, so that the traffic data packet can be determined to be IPSec traffic data, and the IPSec traffic data can be used for subsequent model training; if the traffic packet port number is 4500/500 based on bytes 11-14, then the filtering needs to be continued.
Further, 13-20 bytes of the ground of the flow data packet are obtained, and whether the data value corresponding to the 13-20 bytes is a continuous random number or not is judged; optionally, if yes, determining the flow data packet as a flow data packet to be discarded; otherwise, if not, the flow data packet is indicated to be IPSec flow data, and the IPSec flow data packet may be used for subsequent model training.
Of course, the above screening process for sample flow data is only an optional implementation manner in the technical solution of the present application, and is not limited to the technical solution of the present application, and the present application may also adopt other screening manners to perform sample screening, which is not limited specifically.
S320, extracting features of the sample flow data to obtain data features of the sample flow data.
Specifically, for understanding and examples of the technical means, technical effects, and technical terms in step S320, reference may be made to the explanation of step S220 in the above embodiments.
S330, clustering processing is carried out on each sample flow data at least one round based on the data characteristics until a preset training stop condition is met, and a trained flow recognition model is obtained.
Specifically, for understanding and examples of the technical means, technical effects, and technical terms in step S330, reference may be made to the explanation of step S230 in the above embodiments.
On the basis of the foregoing embodiment, in this embodiment, the step of step S330 may specifically include:
s331, in the training process of any turn, clustering is carried out on the data features based on the initial clustering center points of the current turn, and each clustering center point after the clustering is obtained.
In the embodiment of the application, a preset initial clustering center point is obtained, the data characteristics are input into a flow identification model to be trained for clustering, and each clustered clustering center point is obtained after clustering.
It should be noted that, the initial cluster center points may be a plurality of candidate cluster center points obtained by performing initial calculation on the traffic recognition model to be trained based on different loss functions, and further, the plurality of candidate cluster center points are sorted and then screened to obtain initial cluster center points; of course, the initial cluster center point may also be determined based on other means, and is not limited in particular.
And S332, stopping training when each cluster center point meets a preset training stopping condition, and obtaining a flow identification model after training is completed.
In the embodiment of the application, iterative clustering training is performed on the flow identification model based on the implementation mode. Optionally, after each round of clustering is completed, judging whether the current clustering center point meets a preset training stop condition. In the present application, the training stop condition may be: sample labels corresponding to the data features gathered by any cluster center point are labels of the same category. On this basis, the judging method of the training stop condition may include: for any clustering center point, determining at least one data feature belonging to the current clustering center point, and acquiring a feature tag of each data feature; and determining whether the current clustering center point meets a preset training stop condition or not based on each feature label.
Specifically, any clustering center point k is obtained, each data feature collected in the range of the clustering center point k is obtained, and a sample label corresponding to sample flow data to which each data feature of the data belongs is determined. Optionally, if each sample label is a positive sample or is a negative sample, it is indicated that the cluster center point k meets a preset training stop condition.
Optionally, based on the above-mentioned judging method, when it is determined that all cluster center points meet the above-mentioned training stop conditions, it is determined that the flow identification model may stop training, and a trained flow identification model is obtained.
In the technical scheme, the transmission protocol adopted by each network flow data is judged, and the network flow data transmitted by adopting the Internet security protocol is used as sample flow data to carry out subsequent recognition model training; extracting data features of the sample flow data to obtain various data features corresponding to the sample flow data; clustering training is further carried out on the basis of the data features, and a flow identification algorithm after training is completed is obtained; and the flow identification is performed by adopting an identification algorithm based on multiple feature clusters, so that the identification accuracy of abnormal flow is improved.
Fig. 4 is a flow chart of a flow identification method according to an embodiment of the present application. The method may be performed by a traffic identifying device, which may be a server or a terminal device, and the method in this embodiment may be implemented by software, hardware or a combination of software and hardware, as shown in fig. 4, where the method includes the following steps:
S410, acquiring flow data, and performing feature extraction on the flow data to obtain data features of the flow data; the traffic data is traffic data packets transmitted based on internet security protocol.
S420, carrying out flow identification on the data characteristics based on the pre-trained flow identification model to obtain a target identification result of the flow data.
In the scheme, firstly, the flow data meeting the transmission of the Internet security protocol is acquired, the flow screening identification of the first layer is realized, a plurality of data features of the screened flow data are extracted, the clustering algorithm which is trained in advance is adopted to conduct the second layer flow identification, the flow identification result is obtained, and the detection accuracy and coverage rate of the flow data are improved.
Fig. 5 is a schematic structural diagram of a flow recognition model training device according to an embodiment of the present application. Referring to fig. 5, the apparatus includes: a sample flow data acquisition module 510, a data feature acquisition module 520, and a model training module 530; wherein,
a sample flow data obtaining module 510, configured to obtain sample flow data for performing model training; the sample flow data is a flow data packet transmitted based on an internet security protocol;
The data feature obtaining module 520 is configured to perform feature extraction on the sample flow data to obtain data features of the sample flow data;
the model training module 530 is configured to perform at least one round of clustering on the sample flow data based on the data features until a preset training stop condition is met, to obtain a trained flow identification model.
Optionally, the sample traffic data acquisition module 510 includes:
the flow data comprises an acquisition sub-module, a data packet screening module and a data processing module, wherein the acquisition sub-module is used for carrying out data packet screening processing on a plurality of captured network flow data packets to obtain a flow data packet with complete session;
the data field acquisition sub-module is used for respectively carrying out field analysis processing on each flow data packet to obtain a data field corresponding to each flow data packet;
and the sample flow data obtaining sub-module is used for screening each flow data packet based on the data field to obtain sample flow data meeting the training requirement.
Optionally, the data features include attribute features and status features; the attribute features comprise flag bit data and transmission speed data of the sample flow data; the status features include idle status time data and active status time data of the traffic data.
Optionally, the model training module 530 includes:
the clustering center point obtaining sub-module is used for carrying out clustering processing on the data characteristics based on the initial clustering center point of the current round in the training process of any round to obtain clustered clustering center points;
and the flow identification model obtaining submodule is used for stopping training when each clustering center point meets a preset training stopping condition to obtain a trained flow identification model.
Optionally, the apparatus further comprises:
the judging module is used for judging whether each clustering center point meets the preset training stopping condition before training is stopped; the judging method comprises the following steps:
for any clustering center point, determining at least one data feature belonging to the current clustering center point, and acquiring a feature tag of each data feature;
and determining whether the current clustering center point meets a preset training stop condition or not based on each feature label.
Fig. 6 is a schematic structural diagram of a flow identification device according to an embodiment of the present application. Referring to fig. 6, the apparatus includes: a data feature obtaining module 610 and a recognition result obtaining module 620; wherein,
the data feature obtaining module 610 is configured to obtain flow data, perform feature extraction on the flow data, and obtain data features of the flow data; the flow data is a flow data packet transmitted based on an internet security protocol;
The recognition result obtaining module 620 is configured to perform flow recognition on the data features based on the pre-trained flow recognition model, so as to obtain a target recognition result of the flow data.
Fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 7, the terminal device of the present embodiment may include:
at least one processor 710; and
a memory 720 communicatively coupled to the at least one processor;
wherein the memory 720 stores instructions executable by the at least one processor 710, the instructions being executable by the at least one processor 710 to cause the server to perform a method as in any of the embodiments described above.
Alternatively, memory 720 may be separate or integrated with processor 710.
The implementation principle and technical effects of the terminal device provided in this embodiment may be referred to the foregoing embodiments, and will not be described herein again.
The embodiment of the application also provides a computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, and when the processor executes the computer executable instructions, the method of any of the previous embodiments is realized.
Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the preceding embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules may be combined or integrated into another system, or some features may be omitted or not performed.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform some of the steps of the methods of the various embodiments of the application.
It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU for short), other general purpose processors, digital signal processor (Digital Signal Processor, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution. The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc.
The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). It is also possible that the processor and the storage medium reside as discrete components in a server or master device.
Fig. 8 is a block diagram of a terminal device, which may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc., according to an embodiment of the present application.
Device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.
The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the device 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the assemblies, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in position of the device 800 or one of the assemblies of the device 800, the presence or absence of user contact with the device 800, an orientation or acceleration/deceleration of the device 800, and a change in temperature of the device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the device 800 and other devices, either wired or wireless. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of device 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
A non-transitory computer readable storage medium, which when executed by a processor of a terminal device, causes the terminal device to perform the split screen processing method of the terminal device.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for training a traffic recognition model, the method comprising:
acquiring sample flow data for model training; the sample flow data is a flow data packet transmitted based on an internet security protocol;
Extracting features of the sample flow data to obtain data features of the sample flow data;
and carrying out clustering processing on each sample flow data at least one round based on the data characteristics until a preset training stop condition is met, and obtaining a trained flow identification model.
2. The method of claim 1, wherein the obtaining sample flow data for model training comprises:
carrying out data packet screening treatment on the plurality of the captured network flow data packets to obtain flow data packets with complete session;
respectively carrying out field analysis processing on each flow data packet to obtain a data field corresponding to each flow data packet;
and screening the flow data packets based on the data fields to obtain sample flow data meeting the training requirements.
3. The method of claim 1, wherein the data features include attribute features and status features; the attribute features comprise zone bit data and transmission speed data of the sample flow data; the status features include idle status time data and active status time data of the traffic data.
4. The method according to claim 1, wherein the clustering processing of each sample flow data for at least one round based on the data features until a preset training stop condition is met, to obtain a trained flow recognition model, includes:
in the training process of any round, carrying out clustering processing on the data features based on the initial clustering center points of the current round to obtain clustering center points after the clustering processing;
and stopping training when each cluster center point meets a preset training stopping condition, and obtaining a flow identification model after training is completed.
5. The method according to claim 4, wherein the method further comprises: before stopping training, judging whether each clustering center point meets a preset training stopping condition; the judging method comprises the following steps:
for any clustering center point, determining at least one data feature belonging to the current clustering center point, and acquiring a feature tag of each data feature;
and determining whether the current clustering center point meets a preset training stop condition or not based on each feature label.
6. A method of traffic identification, the method comprising:
Acquiring flow data, and performing feature extraction on the flow data to obtain data features of the flow data; the flow data is a flow data packet transmitted based on an internet security protocol;
performing flow identification on the data characteristics based on a pre-trained flow identification model to obtain a target identification result of the flow data; the flow identification model is trained based on the flow identification model training method according to any one of claims 1-5.
7. A traffic recognition model training device, the device comprising:
the sample flow data acquisition module is used for acquiring sample flow data for model training; the sample flow data is a flow data packet transmitted based on an internet security protocol;
the data characteristic obtaining module is used for carrying out characteristic extraction on the sample flow data to obtain the data characteristics of the sample flow data;
and the model training module is used for carrying out clustering processing on each sample flow data at least one round based on the data characteristics until a preset training stop condition is met, so as to obtain a trained flow identification model.
8. A flow identification device, the device comprising:
The data characteristic obtaining module is used for obtaining flow data, and carrying out characteristic extraction on the flow data to obtain data characteristics of the flow data; the flow data is a flow data packet transmitted based on an internet security protocol;
the recognition result obtaining module is used for carrying out flow recognition on the data characteristics based on a pre-trained flow recognition model to obtain a target recognition result of the flow data; the flow identification model is trained based on the flow identification model training method according to any one of claims 1-5.
9. A terminal device, comprising: a processor and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor, when executing the computer-executable instructions, is configured to implement the traffic recognition model training method of any one of claims 1 to 5, and/or the traffic recognition method of claim 6.
10. A computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, which when executed by a processor are configured to implement the flow identification model training method according to any one of claims 1 to 5 and/or the flow identification method according to claim 6.
CN202310884340.4A 2023-07-18 2023-07-18 Traffic identification model training method, identification method, device, equipment and medium Pending CN117081784A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310884340.4A CN117081784A (en) 2023-07-18 2023-07-18 Traffic identification model training method, identification method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310884340.4A CN117081784A (en) 2023-07-18 2023-07-18 Traffic identification model training method, identification method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN117081784A true CN117081784A (en) 2023-11-17

Family

ID=88708770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310884340.4A Pending CN117081784A (en) 2023-07-18 2023-07-18 Traffic identification model training method, identification method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN117081784A (en)

Similar Documents

Publication Publication Date Title
US20210166040A1 (en) Method and system for detecting companions, electronic device and storage medium
WO2016134630A1 (en) Method and device for recognizing malicious call
CN111539443A (en) Image recognition model training method and device and storage medium
US10318408B2 (en) Data processing method, data processing device, terminal and smart device
CN104125265B (en) Program interaction method, device, terminal, server and system
CN104615655A (en) Information recommendation method and device
JP2022523243A (en) Image processing methods and devices, electronic devices and storage media
CN111814629A (en) Person detection method and device, electronic device and storage medium
CN111049786A (en) Network attack detection method, device, equipment and storage medium
WO2024098906A1 (en) Image tracking method and apparatus for gigapixel photographic device
CN111666015A (en) Suspension short message display method and device
TW202145064A (en) Object counting method electronic equipment computer readable storage medium
CN116707965A (en) Threat detection method and device, storage medium and electronic equipment
CN104735139A (en) Terminal information statistical method, device, terminal and server
CN107316207A (en) A kind of method and apparatus for obtaining bandwagon effect information
CN108053241B (en) Data analysis method, device and computer readable storage medium
CN106331076A (en) Advertisement push method and apparatus
CN113839852B (en) Mail account abnormity detection method, device and storage medium
WO2016037489A1 (en) Method, device and system for monitoring rcs spam messages
CN111478861B (en) Traffic identification method and device, electronic equipment and storage medium
CN112688924A (en) Network protocol analysis system
CN117081784A (en) Traffic identification model training method, identification method, device, equipment and medium
CN111814627A (en) Person detection method and device, electronic device and storage medium
CN113452714B (en) Host clustering method and device
WO2023082605A1 (en) Http message extraction method and apparatus, and medium and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination