CN114765634A - Network protocol identification method and device, electronic equipment and readable storage medium - Google Patents

Network protocol identification method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN114765634A
CN114765634A CN202110042481.2A CN202110042481A CN114765634A CN 114765634 A CN114765634 A CN 114765634A CN 202110042481 A CN202110042481 A CN 202110042481A CN 114765634 A CN114765634 A CN 114765634A
Authority
CN
China
Prior art keywords
network
sample
initial
identification
protocol
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110042481.2A
Other languages
Chinese (zh)
Other versions
CN114765634B (en
Inventor
范宇河
杨勇
甘祥
郑兴
许艾斯
彭婧
华珊珊
郭晶
刘羽
唐文韬
何澍
申军利
常优
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110042481.2A priority Critical patent/CN114765634B/en
Publication of CN114765634A publication Critical patent/CN114765634A/en
Application granted granted Critical
Publication of CN114765634B publication Critical patent/CN114765634B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The application relates to the technical field of artificial intelligence computers, and discloses a network protocol identification method, a device, an electronic device and a readable storage medium, wherein the network protocol identification method comprises the following steps: acquiring network flow data to be identified; extracting network characteristics from the network traffic data; identifying the network characteristics based on the trained identification network, and determining a network protocol corresponding to the network traffic data to be identified; the identification network is obtained by training based on sample background traffic data and sample network traffic data corresponding to different sample network protocols. The network protocol identification method provided by the application can effectively improve the identification accuracy of the network protocol, thereby carrying out network security protection.

Description

Network protocol identification method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of network protocol identification technologies, and in particular, to a network protocol identification method, an apparatus, an electronic device, and a readable storage medium.
Background
The rapid development of computer network communication technology enables the internet to be applied to be continuously popularized in various fields all over the world, the network bandwidth is greatly improved, the number of internet users is also continuously increased, a large number of novel network protocols and applications emerge and present new characteristics, and the network security protection of network diversification faces challenges. In the field of network space security, the network protocol of network traffic data is efficiently and accurately identified, the network space security situation can be better sensed, and the network space security situation recognition method has strong practical significance and use value.
At present, the network protocol of the network traffic data is generally identified based on rules, that is, the features of the network traffic data are extracted, and the network protocol is identified according to the attributes or features of the extracted features. However, different protocols may also exhibit similar behavior characteristics in communications, resulting in a lower accuracy of network protocol identification.
Disclosure of Invention
The purpose of the present application is to solve at least one of the above technical drawbacks, and to provide the following solutions:
in a first aspect, a network protocol identification method is provided, including:
acquiring network flow data to be identified;
extracting network characteristics from the network traffic data;
identifying the network characteristics based on the trained identification network, and determining a network protocol corresponding to the network traffic data to be identified;
the identification network is obtained by training based on sample background traffic data and sample network traffic data corresponding to different sample network protocols.
In an alternative embodiment of the first aspect, the recognition network is trained by:
acquiring sample network flow data corresponding to different sample network protocols, and acquiring sample background flow data;
mixing sample network flow data corresponding to different network protocols with sample background flow data according to an initial proportion to generate initial sample data;
Determining an initial identification network corresponding to initial sample data;
and if the initial identification network meets the preset conditions, setting the initial identification network as the identification network.
In an optional embodiment of the first aspect, obtaining sample network traffic data corresponding to different sample network protocols includes:
acquiring at least one application program; the at least one application program is respectively used for generating sample flow data corresponding to different sample network protocols;
respectively acquiring a triggering mode of at least one application program;
and respectively triggering the corresponding application programs based on the acquired triggering modes, so that at least one application program respectively generates sample flow data corresponding to different sample network protocols.
In an optional embodiment of the first aspect, determining an initial identified network corresponding to the initial sample data comprises:
extracting sample network characteristics of initial sample data;
determining a state transition matrix based on the sample network characteristics;
an initial identification network is determined based on the state transition matrix.
In an optional embodiment of the first aspect, if the initial identified network meets the preset condition, before setting the initial identified network as the identified network, the method further includes:
inputting initial sample data into an initial identification network to obtain a current identification protocol;
Determining a loss value of the initial identification network based on the current identification protocol and the sample network protocol;
and if the loss value is smaller than the first preset threshold value, judging that the initial identification network meets the preset condition.
In an optional embodiment of the first aspect, if the loss value is smaller than the first preset threshold, determining that the initial identification network meets the preset condition includes:
if the loss value is smaller than a first preset threshold value, performing cross validation on the initial identification network to obtain a validation error;
and if the cross validation error is smaller than a second preset threshold value, judging that the initial identification network meets the preset condition.
In an optional embodiment of the first aspect, further comprising:
if the loss value is greater than or equal to the first preset threshold value, the initial proportion is adjusted to obtain an updated initial proportion, updated sample network data and an updated initial identification network are generated based on the updated initial proportion, and the loss value corresponding to the updated initial identification network is determined until the loss value corresponding to the updated initial identification network is smaller than the first preset threshold value.
In an optional embodiment of the first aspect, before mixing the sample network traffic data corresponding to different network protocols with the sample background traffic data according to the initial ratio, the method further includes:
Distributing sample network flow data and sample background flow data corresponding to different network protocols according to equal proportion segmentation to obtain an initial proportion;
the ratio of the sample network traffic data corresponding to each network protocol and the ratio of the sample background traffic data form an equal ratio sequence, and the equal ratio value in the equal ratio sequence conforms to normal distribution.
In an optional embodiment of the first aspect, the network characteristic comprises at least one of handshake protocol information, byte distribution information, packet length information, time sequence information, protocol header information, stream header characteristics, and communication behavior characteristics.
In a second aspect, there is provided a network protocol identification apparatus, including:
the acquisition module is used for acquiring network flow data to be identified;
the extraction module is used for extracting network characteristics from the network flow data;
the identification module is used for identifying the network characteristics based on the trained identification network and determining a network protocol corresponding to the network traffic data to be identified;
the identification network is obtained by training based on sample background traffic data and sample network traffic data corresponding to different sample network protocols.
In an optional embodiment of the second aspect, further comprising a training module for:
Acquiring sample network flow data corresponding to different sample network protocols, and acquiring sample background flow data;
mixing sample network flow data corresponding to different network protocols with sample background flow data according to an initial proportion to generate initial sample data;
determining an initial identification network corresponding to initial sample data;
and if the initial identification network meets the preset conditions, setting the initial identification network as the identification network.
In an optional embodiment of the second aspect, when obtaining sample network traffic data corresponding to different sample network protocols, the training module is specifically configured to:
acquiring at least one application program; the at least one application program is respectively used for generating sample flow data corresponding to different sample network protocols;
respectively acquiring a triggering mode of at least one application program;
and respectively triggering the corresponding application programs based on the acquired triggering modes, so that at least one application program respectively generates sample flow data corresponding to different sample network protocols.
In an optional embodiment of the second aspect, when determining the initial identification network corresponding to the initial sample data, the training module is specifically configured to:
extracting sample network characteristics of initial sample data;
Determining a state transition matrix based on the sample network characteristics;
an initial identification network is determined based on the state transition matrix.
In an optional embodiment of the second aspect, further comprising a determining module configured to:
inputting initial sample data into an initial identification network to obtain a current identification protocol;
determining a loss value of the initial identification network based on the current identification protocol and the sample network protocol;
and if the loss value is smaller than the first preset threshold value, judging that the initial identification network meets the preset condition.
In an optional embodiment of the second aspect, when the determining module determines that the initial identification network meets the preset condition if the loss value is smaller than the first preset threshold, the determining module is specifically configured to:
if the loss value is smaller than a first preset threshold value, performing cross validation on the initial identification network to obtain a validation error;
and if the cross validation error is smaller than a second preset threshold value, judging that the initial identification network meets the preset condition.
In an optional embodiment of the second aspect, further comprising an update module configured to:
if the loss value is greater than or equal to the first preset threshold value, the initial proportion is adjusted to obtain an updated initial proportion, updated sample network data and an updated initial identification network are generated based on the updated initial proportion, and the loss value corresponding to the updated initial identification network is determined until the loss value corresponding to the updated initial identification network is smaller than the first preset threshold value.
In an optional embodiment of the second aspect, further comprising an allocation module for:
distributing sample network flow data and sample background flow data corresponding to different network protocols according to equal proportion segmentation to obtain an initial proportion;
the ratio of the sample network traffic data corresponding to each network protocol and the ratio of the sample background traffic data form an equal ratio sequence, and the equal ratio value in the equal ratio sequence conforms to normal distribution.
In an optional embodiment of the second aspect, the network characteristic comprises at least one of handshake protocol information, byte distribution information, packet length information, time sequence information, protocol header information, stream header characteristics and communication behavior characteristics.
In a third aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the network protocol identification method shown in the first aspect of the present application is implemented.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the network protocol identification method of the first aspect of the present application.
The beneficial effect that technical scheme that this application provided brought is:
the method comprises the steps of mixing sample background flow data and sample network flow data corresponding to different sample network protocols according to different proportions, training to generate an identification network, identifying the network flow data corresponding to various different network protocols by the identification network, extracting the network flow data to be identified, extracting network characteristics from the network flow data, and identifying the network characteristics by using the trained identification network, so that the identification accuracy of the network protocols can be effectively improved.
Further, sample network flow data corresponding to different network protocols with an initial proportion and the sample background flow data are mixed according to an equal proportion segmentation method, the initial proportion is adjusted until the finally determined recognition network conforms to preset adjustment, the recognition network obtained through training can effectively recognize the to-be-recognized network data corresponding to the different network protocols, and the accuracy of network protocol recognition is improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is an application scenario diagram of a network protocol identification method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a network protocol identification method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a process for training a recognition network according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an embodiment of a method for obtaining an identification network according to an example of the present application;
FIG. 5 is a schematic diagram of an embodiment of a method for obtaining an identification network according to an example of the present application;
fig. 6 is a schematic diagram of a scheme for performing equal-ratio segmentation on traffic data of different network protocols according to an example provided in the present application;
fig. 7 is a schematic diagram of a scheme for acquiring an identification network in an example provided by an embodiment of the present application;
fig. 8 is a flowchart illustrating a network protocol identification method according to an example provided in an embodiment of the present application;
fig. 9 is a schematic structural diagram of a network protocol identification apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device identified by a network protocol according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary only for explaining the present application and are not construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
Cloud technology refers to a hosting technology for unifying series of resources such as hardware, software, and network in a wide area network or a local area network to realize calculation, storage, processing, and sharing of data.
Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of technical network systems require a large amount of computing and storage resources, such as video websites, picture-like websites and more portal websites. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
Cloud Security (Cloud Security) refers to a generic term for Security software, hardware, users, organizations, secure Cloud platforms for Cloud-based business model applications. The cloud security integrates emerging technologies and concepts such as parallel processing, grid computing and unknown virus behavior judgment, abnormal monitoring of software behaviors in the network is achieved through a large number of meshed clients, the latest information of trojans and malicious programs in the internet is obtained and sent to the server for automatic analysis and processing, and then the virus and trojan solution is distributed to each client.
The main research directions of cloud security include: 1. the cloud computing security mainly researches how to guarantee the security of the cloud and various applications on the cloud, including the security of a cloud computer system, the security storage and isolation of user data, user access authentication, information transmission security, network attack protection, compliance audit and the like; 2. the cloud of the security infrastructure mainly researches how to adopt cloud computing to newly build and integrate security infrastructure resources and optimize a security protection mechanism, and comprises the steps of constructing a super-large-scale security event and an information acquisition and processing platform through a cloud computing technology, realizing the acquisition and correlation analysis of mass information, and improving the handling control capability and the risk control capability of the security event of the whole network; 3. the cloud security service mainly researches various security services such as anti-virus services and the like provided for users based on a cloud computing platform.
The network protocol identification method provided by the application can effectively identify the network protocol, especially the encrypted network protocol, so that the network attack can be protected.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The scheme provided by the embodiment of the application relates to an artificial intelligence network protocol identification technology, and is specifically explained by the following embodiment.
Various network protocols are often mixed in the network of an enterprise, and a ring of network management is very important when identifying the types of the network protocols. With the development of network information security technology, the application of encryption technology in network traffic is becoming more and more extensive, and according to statistics, about more than 70% of network traffic is encrypted. After being encrypted, network traffic is often difficult to detect and analyze the content of the network traffic, so that protocol identification is difficult to perform, and a gap is caused, so that the network cannot be managed.
Network protocol identification can currently be performed in two ways: 1. a rule-based detection; 2. detection based on machine learning.
Rule detection is the extraction of features of network traffic data, identifying network protocols for attributes or features of the extracted features. For example, different network protocols have different characteristics such as packet length and time sequence, and by extracting and analyzing the characteristics, preparation can be made for subsequent identification. However, different protocols may also exhibit similar behavior characteristics in communications, resulting in a lower accuracy of network protocol identification.
The unsupervised learning is to directly perform training learning on network traffic, no knowledge is preset for the formation of network protocols in the traffic, and if the traffic of some network protocols is lost for a long time in the traffic, the training is insufficient, and new network protocols cannot be identified in real use.
According to the network protocol identification method, under the environment of encrypted flow, the protocol identification effect is greatly improved, and the accuracy and the practicability are improved; under the intelligent distribution of training samples and background flow, the type number of the recognition protocol is increased; in addition, compare unsupervised study, promoted training efficiency, completion training process that can be faster.
The application provides a network protocol identification method, a network protocol identification device, an electronic device and a computer-readable storage medium, which aim to solve the above technical problems in the prior art.
The following describes the technical solution of the present application and how to solve the above technical problems in detail by specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
As shown in fig. 1, the network protocol identification method of the present application may be applied to the scenario shown in fig. 1, specifically, a plurality of application programs 101 are triggered on a terminal 110 to generate sample traffic data corresponding to different sample network protocols; acquiring sample background traffic data, mixing the sample background traffic data and sample network traffic data corresponding to different sample network protocols, and sending the mixture to the server 120, wherein the server 120 trains and generates an identification network based on the mixed sample background traffic data and the sample network traffic data corresponding to different sample network protocols; the server 120 receives the network traffic data to be identified sent by the terminal 130, identifies the network features in the network traffic data to be identified based on the identification network, and determines the network protocol corresponding to the network traffic data to be identified.
In the scenario shown in fig. 1, the network protocol identification method may be performed in the server, or in another scenario, may be performed in the terminal.
Those skilled in the art will understand that the "terminal" used herein may be a Mobile phone, a tablet computer, a PDA (Personal Digital Assistant), an MID (Mobile Internet Device), etc.; a "server" may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
A possible implementation manner is provided in the embodiment of the present application, and as shown in fig. 2, a network protocol identification method is provided, which may be applied to a terminal or a server, and may include the following steps:
step S201, network traffic data to be identified is obtained.
Specifically, the network traffic data to be identified may be obtained from data interaction between at least one terminal and a server or other terminals, and the network traffic data to be identified may be formed by network traffic data corresponding to at least one network protocol.
Step S202, network characteristics are extracted from the network flow data.
Wherein the network characteristics include at least one of handshake protocol information, byte distribution information, packet length information, time series information, protocol header information, stream header characteristics, and communication behavior characteristics.
Specifically, the handshake protocol information may be TLS (Transport Layer Security), and may include information such as encryption suites (ciphersuites), supported extensions (extensions), and metadata with unencrypted public key length; the byte distribution information may represent the probability that a particular byte value appears in the payload of a data packet in the stream, and a counter array may be used to calculate the byte distribution of the stream; the packet length information may indicate the length (number of bytes) of the application load of each of the first several packets of the stream; the time series information may be the inter-arrival time of each of the first several packets of the stream; the protocol header information may be byte information of the protocol header, e.g., may include the first 4 bytes of the protocol header; the stream header characteristics may include an IP (Internet Protocol) address, a port, and a stream byte number; the communication behavior characteristics may include distribution information of the number of destination/source IPs communicated with each host.
And step S203, identifying the network characteristics based on the trained identification network, and determining a network protocol corresponding to the network traffic data to be identified.
The identification network is obtained by training based on sample background traffic data and sample network traffic data corresponding to different sample network protocols.
Specifically, the sample background traffic data and the sample network traffic data corresponding to different sample network protocols may be mixed according to an initial proportion, and an initial identification network corresponding to the initial proportion may be determined; and then adjusting the mixing proportion between the sample background traffic data and the sample network traffic data corresponding to different sample network protocols, and re-determining the updated initial identification network until the updated initial identification network meets the preset conditions, so as to obtain the trained identification network, wherein the process of specifically determining the identification network is explained in detail below.
In the embodiment, the sample background traffic data and the sample network traffic data corresponding to different sample network protocols are mixed according to different proportions, and are trained to generate the identification network, the identification network can identify the network traffic data corresponding to various different network protocols, extract the network traffic data to be identified, extract the network features from the network traffic data, and identify the network features by using the trained identification network, so that the accuracy of identifying the network protocols can be effectively improved.
The acquisition process of identifying a network is described below in conjunction with specific embodiments.
A possible implementation manner is provided in the embodiment of the present application, as shown in fig. 3, the recognition network is obtained by training in the following manner:
step S301, sample network traffic data corresponding to different sample network protocols are obtained, and sample background traffic data is obtained.
Specifically, the sample network traffic data corresponding to different sample network protocols may be obtained by triggering different application programs.
Different sample network protocols may include websocket, QUIC, http2, https, sftp, ssh, and the like, among others.
A websocket is a protocol for full duplex communication over a single TCP connection. WebSocket enables data exchange between the client and the server to be simpler, and allows the server to actively push data to the client. In the WebSocket API, the browser and the server only need to complete one handshake, and persistent connection can be directly established between the browser and the server, and bidirectional data transmission is carried out. QUIC (quick UDP Internet connection) is a UDP-based low-latency Internet transport layer protocol established by Google, and the QUIC well solves various requirements faced by the current transport layer and application layer, including handling more connections, security and low latency. http (Hypertext Transfer Protocol) is a simple request-response Protocol that typically runs on top of TCP. It specifies what messages the client may send to the server and what responses to get. http2 is hypertext transfer protocol 2.0, which is the next generation http protocol. http (Hyper Text Transfer Protocol over secure token Layer) is an http channel with a target of security, the security of a transmission process is ensured through transmission encryption and identity authentication on the basis of http, and https has a default port different from http and an encryption/identity verification Layer (between http and TCP). The system provides authentication and encrypted communication methods. It is widely used for security sensitive communications on the world wide web, such as transaction payments. sftp (secure File Transfer Protocol) is a network transport Protocol that provides a data stream connection, providing File access, Transfer, and management functions. ssh (Secure Shell, Secure Shell protocol) is a security protocol established on an application layer basis. ssh is a relatively reliable protocol that is dedicated to providing security for telnet sessions and other network services. The information leakage problem in the remote management process can be effectively prevented by utilizing the ssh protocol.
In a specific implementation process, multiple application programs for generating sample network traffic data corresponding to different sample network protocols can be pre-installed, and the application programs are triggered respectively to obtain the sample network traffic data corresponding to the different sample network protocols.
Specifically, the sample background traffic data may be desensitized sampled data, for example, if the network traffic data to be identified is enterprise data, sample data of an enterprise may be extracted, and the sample data is desensitized to obtain the sample background traffic data.
Step S302, mixing the sample network traffic data corresponding to different network protocols with the sample background traffic data according to an initial proportion to generate initial sample data.
Specifically, the sample network traffic data and the sample background traffic data corresponding to different network protocols may be segmented according to an equal ratio segmentation method to generate an initial ratio, and then the sample network traffic data and the sample background traffic data corresponding to different network protocols are mixed according to the initial ratio.
Specifically, the geometric proportion segmentation method is that ratios between different traffic data in initial sample data obtained by mixing form a geometric proportion sequence, and a specific manner of determining the initial ratio will be described in detail below.
Step S303, an initial identification network corresponding to the initial sample data is determined.
Specifically, a TCP (Transmission Control Protocol) session process of a network Protocol is a markov process, that is, an initial recognition network may conform to a markov chain.
Specifically, a sample network feature corresponding to the initial sample data may be extracted first, and a state transition matrix is determined according to the sample network feature, so as to determine the initial identification network, where a specific process of determining the initial identification network will be described in detail below.
In step S304, if the initial identification network meets the preset condition, the initial identification network is set as the identification network.
Specifically, the preset condition may be that the loss value of the initial identification network is smaller than a first preset threshold, or that the cross validation error of the initial identification network is smaller than a second preset threshold while the loss value of the initial identification network is smaller than the first preset threshold.
Specifically, the initial sample data may be input into the initial identification network to obtain a current identification protocol, a loss value of the initial identification network is determined based on the current identification protocol and the sample network protocol, and a specific process of determining whether the initial identification network meets a preset condition will be described in detail below.
A possible implementation manner is provided in the embodiment of the present application, and the obtaining of the sample network traffic data corresponding to different sample network protocols in step S301 may include:
(1) acquiring at least one application program; the at least one application program is respectively used for generating sample flow data corresponding to different sample network protocols;
(2) respectively acquiring a triggering mode of at least one application program;
(3) and respectively triggering the corresponding application programs based on the acquired triggering modes, so that at least one application program respectively generates sample flow data corresponding to different sample network protocols.
Specifically, different sample network protocols, that is, applications and trigger modes corresponding to the encryption protocol, may be configured first, and for example, the sample network protocol may include: websocket, QUIC, http2, https, sftp, ssh, etc.; the triggering mode comprises a bash script, a python script and the like for data interaction.
A possible implementation manner is provided in the embodiment of the present application, and the determining the initial identification network corresponding to the initial sample data in step S303 may include:
(1) extracting sample network characteristics of initial sample data;
(2) determining a state transition matrix based on the sample network characteristics;
(3) An initial identification network is determined based on the state transition matrix.
Wherein the initial recognition network may be a markov chain.
A markov chain is a set of discrete random variables having markov properties. Specifically, for a random variable set X with a one-dimensional variable set as an index set (index set) in the probability space (Ω, F, P) { X ═ Xn: n is more than 0}, if the values of the random variables are in the countable set: x ═ si,siE s, and the conditional probability of the random variable satisfies the following relation:
p(Xt+1|Xt,…,X1)=p(Xt+1|Xt) (1)
where X is called a markov chain, the set of countable s ∈ Z is called a state space (state space), and the value of the markov chain in the state space is called a state. A Markov chain as defined herein is a Discrete-Time Markov chain (DTMC), and the case of having a Continuous set of indices, although referred to as a Continuous-Time Markov chain (CTMC), is essentially a Markov process.
The above formula defines a Markov property, which is also called "memoryless", while defining a Markov chain, i.e., the random variable of step t +1 is conditionally independent (conditional independent) from the rest of the random variables after the random variable of step t is given, on the basis of which the Markov chain has a strong Markov property, i.e., for an arbitrary stopping time (stopping time), the states of the Markov chain before and after stopping are independent from each other.
The TCP session process of the network protocol is a markov process, and thus can be described by using a finite state machine, and the initialization probability vector and the state transition probability matrix are obtained by the above feature calculation. Through the above formula (1), the transition probability between any two states, i.e. the transition matrix, can be obtained, and when the state transition matrix P is determined, the initial recognition model is obtained.
The process of determining whether the initial identification network meets the preset condition will be further described below with reference to the embodiments.
In this embodiment, a possible implementation manner is provided, where if the initial identification network in step S304 meets the preset condition, before setting the initial identification network as the identification network, the method may further include:
(1) and inputting the initial sample data into the initial identification network to obtain the current identification protocol.
The current identification protocol is a network protocol obtained by identifying the initial identification network aiming at the initial sample data in real time.
(2) A loss value for the initially identified network is determined based on the current identification protocol and the sample network protocol.
Specifically, the sample network protocol is a known network protocol of the initial sample data, and the loss function, i.e., the loss value, of the initial identification network can be calculated by the current identification protocol obtained by identifying the initial sample data in real time and the known sample network protocol.
(3) And if the loss value is smaller than the first preset threshold value, judging that the initial identification network meets the preset condition.
In one embodiment, if the loss value is smaller than a first preset threshold, the initial recognition network may be determined as the trained recognition network.
As shown in fig. 4, inputting initial sample data into the initial identification network to obtain a current identification protocol; determining a loss value of the initial identification network based on the current identification protocol and the sample network protocol; judging whether the loss value is smaller than a first preset threshold value, if so, setting the initial identification network as an identification network; and if not, updating the initial proportion, updating the initial sample data and the initial identification network, and repeatedly acquiring the loss value until the loss value is smaller than the first preset threshold value.
In a specific implementation process, if the loss value is smaller than the first preset threshold, determining that the initial identification network meets the preset condition may include:
a. if the loss value is smaller than a first preset threshold value, performing cross validation on the initial identification network to obtain a validation error;
b. and if the cross validation error is smaller than a second preset threshold value, judging that the initial identification network meets the preset condition.
In another embodiment, if the loss value is smaller than the first preset threshold, the initial identification network needs to be further cross-verified, and if the cross-verification error is smaller than the second preset threshold, it is determined that the initial identification network meets the preset condition.
As shown in fig. 5, inputting initial sample data into the initial identification network to obtain a current identification protocol; determining a loss value of the initial identification network based on the current identification protocol and the sample network protocol; judging whether the loss value is smaller than a first preset threshold value, if so, performing cross validation on the initial identification network, and acquiring a cross validation error; judging whether the cross validation error is smaller than a second preset threshold value or not; if so, setting the initial identification network as an identification network; and if not, updating the initial proportion, updating the initial sample data and the initial identification network, and repeatedly acquiring the loss value until the loss value is smaller than a first preset threshold value and the cross validation error is smaller than a second preset threshold value.
In machine learning modeling, it is common practice to divide the data into a training set and a test set. The test set is data independent of training, and is not involved in training at all for evaluation of the final model. In the training process, the problem of overfitting often occurs, namely that the model can well match the training data, but can not well predict the data outside the training set. If the test data is used to adjust the model parameters at this time, the information corresponding to the known part of the test data during training may affect the accuracy of the final evaluation result. It is common practice to divide a part of the training data as verification (Validation) data to evaluate the training effect of the model.
The verification data is taken from the training data, but does not participate in the training, so that the matching degree of the model to the data outside the training set can be relatively objectively evaluated. The evaluation of models in validation data is often referred to as cross validation, also known as cycle validation. The method divides original data into K groups (K-Fold), carries out primary verification set on each subset data, and takes the rest K-1 groups of subset data as training sets, thus obtaining K models. The K models evaluate the results in a verification set respectively, and the final error MSE (mean Squared error) is added and averaged to obtain the cross-verification error. The cross validation effectively utilizes limited data, and the evaluation result can be as close as possible to the performance of the model on the test set, and can be used as an index for model optimization.
The embodiment of the application provides a possible implementation manner, if the loss value is greater than or equal to a first preset threshold, the initial proportion is adjusted to obtain an updated initial proportion, updated sample network data and an updated initial identification network are generated based on the updated initial proportion, and the loss value corresponding to the updated initial identification network is determined until the loss value corresponding to the updated initial identification network is less than the first preset threshold.
Specifically, if the loss value is greater than or equal to a first preset threshold, adjusting the initial proportion to obtain an updated initial proportion, mixing sample network traffic data and sample background traffic data corresponding to different sample network protocols based on the updated initial proportion to obtain updated initial sample data, determining an updated initial identification network based on the updated initial sample data, and determining a loss value corresponding to the updated initial identification network, and if the loss value corresponding to the updated initial identification network is less than the first preset threshold, setting the updated initial identification network as the identification network; if the loss value corresponding to the updated initial identification network is greater than or equal to the first preset threshold value, the initial proportion is repeatedly adjusted to obtain the loss value of the updated initial identification network until the loss value corresponding to the updated initial identification network is smaller than the first preset threshold value.
In the embodiment, the sample network traffic data and the sample background traffic data corresponding to different network protocols in the initial proportion are mixed according to the equal proportion segmentation method, and the initial proportion is adjusted until the finally determined recognition network conforms to the preset adjustment, so that the recognition network obtained through training can effectively recognize the network data to be recognized corresponding to different network protocols, and the accuracy of network protocol recognition is improved.
The above embodiment describes a process of determining an identification network based on sample network traffic data and sample background traffic data corresponding to different sample network protocols, and the process of determining an initial proportion is further described below with reference to the embodiment.
A possible implementation manner is provided in this embodiment of the present application, before mixing the sample network traffic data corresponding to different network protocols with the sample background traffic data according to the initial proportion in step S302, the method may further include:
and distributing the sample network traffic data and the sample background traffic data corresponding to different network protocols according to equal proportion segmentation to obtain an initial proportion.
The ratio of the sample network traffic data corresponding to each network protocol and the ratio of the sample background traffic data form an equal ratio sequence, and the equal ratio value in the equal ratio sequence conforms to normal distribution.
Taking fig. 6 as an example, taking P as sample network traffic data corresponding to different network protocols, P1 as websocket, P2 as QUIC, P3 as http2, P4 as https, P5 as sftp, P6 as ssh, P7 as background traffic, and the ratios of P1 to P7 form an equal ratio sequence.
Specifically, the geometric partition method conforms to the following formula:
Figure BDA0002896435010000171
wherein, P represents the occupation ratio condition of different protocols and background flow, n is the sequence number of occupation ratio protocol, and m is the maximum value.
For example: p1Is websocket, P2Is QUIC, P3Is http2, P4Is https, P5Is sftp, P6Is ssh, P7As background traffic. Wherein m is 7; the proportional sum of the flows representing all protocols is approximately equal to 1.
Pn=xPn+1 (3)
Wherein, P represents the occupation ratio condition of different protocols and background flow, n is the serial number of occupation ratio protocol, x is a fixed value; indicating that each protocol is equally proportional to the other.
Figure BDA0002896435010000181
Where x follows a standard normal distribution with μ ═ 0 and σ ═ 1.
In order to better understand the above network protocol identification method, as shown in fig. 7, an example of the acquisition identification network of the present invention is set forth in detail as follows:
in one example, the process for acquiring the identification network provided by the present application may include the following steps:
1) acquiring various application software; the multiple kinds of application software are respectively used for generating different encrypted flow data, namely for generating sample network flow data corresponding to different sample network protocols;
2) acquiring mixed background flow; namely obtaining background flow data of a sample;
3) generating corresponding flow data packets according to the proportion, and segmenting the flows of different protocols by using an equal proportion segmentation method; distributing sample network traffic data and sample background traffic data corresponding to different network protocols according to equal-proportion segmentation to obtain an initial proportion; mixing sample network traffic data corresponding to different network protocols with sample background traffic data according to an initial proportion to generate initial sample data;
4) Training, learning and verifying the samples so as to adjust the proportion of different protocol flows; determining an initial identification network corresponding to the initial sample data, and determining a loss value corresponding to the initial identification network; if the loss value is greater than or equal to a first preset threshold value, adjusting the initial proportion to obtain an updated initial proportion, mixing sample network traffic data and sample background traffic data corresponding to different sample network protocols based on the updated initial proportion to obtain updated initial sample data, determining an updated initial identification network based on the updated initial sample data, and determining a loss value corresponding to the updated initial identification network until the loss value corresponding to the updated initial identification network is less than the first preset threshold value; verifying whether the cross verification error of the updated initial identification network is smaller than a second preset threshold value;
5) generating a model; and if the loss value corresponding to the updated initial identification network is smaller than a first preset threshold value and the cross validation error of the updated initial identification network is smaller than a second preset threshold value, setting the updated initial identification network as the trained identification network.
In order to better understand the above network protocol identification method, as shown in fig. 8, an example of the network protocol identification method of the present invention is set forth in detail as follows:
in one example, the network protocol identification method provided by the present application may include the following steps:
1) acquiring sample network flow data corresponding to a plurality of different sample network protocols; sample network protocols may include websocket, QUIC, https, sftp, and the like;
2) obtaining mixed background flow; namely obtaining background flow data of a sample;
3) mixing sample network traffic data and sample background traffic data corresponding to different sample network protocols to generate initial sample data;
4) extracting sample network characteristics of initial sample data;
5) determining an initial identification network (i.e. a Markov chain) based on the characteristics of the sample network, adjusting the mixing proportion of sample network traffic data and sample background traffic data corresponding to different sample network protocols, and updating the initial identification network until the loss value of the updated initial identification network is less than a first preset threshold value;
6) performing cross validation on the updated initial identification network, wherein the cross validation can be K-fold cross validation, and if the error of the cross validation is smaller than a second preset threshold, setting the updated initial identification network as the identification network;
7) Acquiring encrypted traffic to be identified, namely network traffic data to be identified;
8) and classifying the network flow data to be identified based on the identification network, and determining a corresponding network protocol.
According to the network protocol identification method, the sample background flow data and the sample network flow data corresponding to different sample network protocols are mixed according to different proportions, training is carried out to generate the identification network, the identification network can identify the network flow data corresponding to various different network protocols, the network flow data to be identified is extracted, the network characteristics are extracted from the network flow data, the trained identification network is used for identifying the network characteristics, and the accuracy of identification of the network protocols can be effectively improved.
Furthermore, the initial proportion of the sample network flow data and the sample background flow data corresponding to different network protocols is mixed according to the equal proportion segmentation method, the initial proportion is adjusted until the finally determined identification network accords with the preset adjustment, so that the identification network obtained through training can effectively identify the network data to be identified corresponding to different network protocols, and the accuracy of network protocol identification is improved.
A possible implementation manner is provided in the embodiment of the present application, as shown in fig. 9, a network protocol identification apparatus 90 is provided, where the network protocol identification apparatus 90 may include: an acquisition module 901, an extraction module 902 and a recognition module 903, wherein,
An obtaining module 901, configured to obtain network traffic data to be identified;
an extracting module 902, configured to extract network features from the network traffic data;
the identification module 903 is configured to identify network features based on the trained identification network, and determine a network protocol corresponding to network traffic data to be identified;
the identification network is obtained by training based on sample background traffic data and sample network traffic data corresponding to different sample network protocols.
The embodiment of the present application provides a possible implementation manner, further including a training module, configured to:
acquiring sample network flow data corresponding to different sample network protocols, and acquiring sample background flow data;
mixing sample network traffic data corresponding to different network protocols with sample background traffic data according to an initial proportion to generate initial sample data;
determining an initial identification network corresponding to initial sample data;
and if the initial identification network meets the preset conditions, setting the initial identification network as the identification network.
In the embodiment of the present application, a possible implementation manner is provided, and when the training module obtains sample network traffic data corresponding to different sample network protocols, the training module is specifically configured to:
Acquiring at least one application program; the at least one application program is respectively used for generating sample flow data corresponding to different sample network protocols;
respectively acquiring a triggering mode of at least one application program;
and respectively triggering the corresponding application programs based on the acquired triggering modes, so that at least one application program respectively generates sample flow data corresponding to different sample network protocols.
In the embodiment of the present application, a possible implementation manner is provided, and when determining an initial identification network corresponding to initial sample data, the training module is specifically configured to:
extracting sample network characteristics of initial sample data;
determining a state transition matrix based on the sample network characteristics;
an initial identification network is determined based on the state transition matrix.
The embodiment of the present application provides a possible implementation manner, further including a determining module, configured to:
inputting initial sample data into an initial identification network to obtain a current identification protocol;
determining a loss value of the initial identification network based on the current identification protocol and the sample network protocol;
and if the loss value is smaller than the first preset threshold value, judging that the initial identification network meets the preset condition.
The embodiment of the present application provides a possible implementation manner, and when the determining module determines that the initial identification network meets the preset condition if the loss value is smaller than the first preset threshold, the determining module is specifically configured to:
If the loss value is smaller than a first preset threshold value, performing cross validation on the initial identification network to obtain a validation error;
and if the cross validation error is smaller than a second preset threshold value, judging that the initial identification network meets the preset condition.
The embodiment of the present application provides a possible implementation manner, further including an update module, configured to:
if the loss value is greater than or equal to the first preset threshold value, the initial proportion is adjusted to obtain an updated initial proportion, updated sample network data and an updated initial identification network are generated based on the updated initial proportion, and the loss value corresponding to the updated initial identification network is determined until the loss value corresponding to the updated initial identification network is smaller than the first preset threshold value.
The embodiment of the present application provides a possible implementation manner, further including an allocation module, configured to:
distributing sample network traffic data and sample background traffic data corresponding to different network protocols according to equal-proportion segmentation to obtain an initial proportion;
the proportion of the sample network traffic data corresponding to each network protocol and the proportion of the sample background traffic data form an equal proportion sequence, and the equal proportion value in the equal proportion sequence conforms to normal distribution.
A possible implementation manner is provided in the embodiment of the present application, where the network characteristic includes at least one of handshake protocol information, byte distribution information, packet length information, time sequence information, protocol header information, stream header characteristic, and communication behavior characteristic.
The network protocol identification device is used for training and generating the identification network by mixing the sample background flow data and the sample network flow data corresponding to different sample network protocols according to different proportions, the identification network can identify the network flow data corresponding to various different network protocols, extract the network flow data to be identified, extract network characteristics from the network flow data, and identify the network characteristics by using the trained identification network, so that the identification accuracy of the network protocols can be effectively improved.
Further, sample network flow data corresponding to different network protocols with an initial proportion and the sample background flow data are mixed according to an equal proportion segmentation method, the initial proportion is adjusted until the finally determined recognition network conforms to preset adjustment, the recognition network obtained through training can effectively recognize the to-be-recognized network data corresponding to the different network protocols, and the accuracy of network protocol recognition is improved.
The network protocol recognition device for pictures according to the embodiments of the present disclosure may execute the network protocol recognition method for pictures according to the embodiments of the present disclosure, and the implementation principle is similar, the actions performed by each module in the network protocol recognition device for pictures according to the embodiments of the present disclosure correspond to the steps in the network protocol recognition method for pictures according to the embodiments of the present disclosure, and for the detailed function description of each module of the network protocol recognition device for pictures, reference may be specifically made to the description in the network protocol recognition method for corresponding pictures shown in the foregoing, and details are not repeated here.
Based on the same principle as the method shown in the embodiments of the present disclosure, embodiments of the present disclosure also provide an electronic device, which may include but is not limited to: a processor and a memory; a memory for storing computer operating instructions; and the processor is used for executing the network protocol identification method shown in the embodiment by calling the computer operation instruction. Compared with the prior art, the network protocol identification method can effectively improve the identification accuracy of the network protocol.
In an alternative embodiment, an electronic device is provided, as shown in fig. 10, the electronic device 4000 shown in fig. 10 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.
Wherein, the electronic device includes but is not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the network protocol identification method can effectively improve the identification accuracy of the network protocol.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the method shown in the above embodiments.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of the computer device, and the computer instructions are executed by the processor, so that the computer device realizes the following conditions when executed:
acquiring network flow data to be identified;
extracting network features from the network traffic data;
identifying the network characteristics based on the trained identification network, and determining a network protocol corresponding to the network traffic data to be identified;
the identification network is obtained by training based on sample background traffic data and sample network traffic data corresponding to different sample network protocols.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module does not in some cases constitute a limitation on the module itself, for example, an identification module may also be described as a "module for identifying a network protocol".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (12)

1. A network protocol identification method, comprising:
acquiring network flow data to be identified;
extracting network features from the network traffic data;
identifying the network characteristics based on the trained identification network, and determining a network protocol corresponding to the network traffic data to be identified;
The identification network is obtained by training based on sample background traffic data and sample network traffic data corresponding to different sample network protocols.
2. The network protocol recognition method of claim 1, wherein the recognition network is trained by:
acquiring sample network flow data corresponding to different sample network protocols, and acquiring sample background flow data;
mixing sample network traffic data corresponding to different network protocols with the sample background traffic data according to an initial proportion to generate initial sample data;
determining an initial identification network corresponding to the initial sample data;
and if the initial identification network meets the preset condition, setting the initial identification network as the identification network.
3. The method according to claim 2, wherein the obtaining of the sample network traffic data corresponding to different sample network protocols comprises:
acquiring at least one application program; the at least one application program is respectively used for generating sample flow data corresponding to different sample network protocols;
respectively acquiring a triggering mode of at least one application program;
And respectively triggering the corresponding application programs based on the acquired triggering modes, so that the at least one application program respectively generates sample flow data corresponding to different sample network protocols.
4. The method according to claim 2, wherein said determining an initial identification network corresponding to the initial sample data comprises:
extracting sample network characteristics of the initial sample data;
determining a state transition matrix based on the sample network characteristics;
determining the initial identification network based on the state transition matrix.
5. The method of claim 2, wherein before setting the initial identified network as the identified network if the initial identified network meets a predetermined condition, further comprising:
inputting the initial sample data into the initial identification network to obtain a current identification protocol;
determining a loss value for the initial identified network based on the current identified protocol and the sample network protocol;
and if the loss value is smaller than a first preset threshold value, judging that the initial identification network meets the preset condition.
6. The method according to claim 5, wherein the determining that the initial identified network meets the predetermined condition if the loss value is smaller than a first predetermined threshold comprises:
If the loss value is smaller than a first preset threshold value, performing cross validation on the initial identification network to obtain a validation error;
and if the cross validation error is smaller than a second preset threshold value, judging that the initial identification network meets the preset condition.
7. The network protocol identification method of claim 6, further comprising:
if the loss value is greater than or equal to the first preset threshold value, the initial proportion is adjusted to obtain an updated initial proportion, updated sample network data and an updated initial identification network are generated based on the updated initial proportion, and the loss value corresponding to the updated initial identification network is determined until the loss value corresponding to the updated initial identification network is smaller than the first preset threshold value.
8. The method according to claim 2, wherein before mixing the sample network traffic data corresponding to different network protocols with the sample background traffic data according to the initial ratio, the method further comprises:
distributing sample network traffic data and the sample background traffic data corresponding to different network protocols according to equal-proportion segmentation to obtain the initial proportion;
The ratio of the sample network traffic data corresponding to each network protocol and the ratio of the sample background traffic data form an equal ratio sequence, and the equal ratio value in the equal ratio sequence conforms to normal distribution.
9. The network protocol identification method of claim 1, wherein the network characteristics comprise at least one of handshake protocol information, byte distribution information, packet length information, time series information, protocol header information, stream header characteristics, and communication behavior characteristics.
10. A network protocol identification device, comprising:
the acquisition module is used for acquiring network flow data to be identified;
the extraction module is used for extracting network characteristics from the network flow data;
the identification module is used for identifying the network characteristics based on the trained identification network and determining a network protocol corresponding to the network traffic data to be identified;
the identification network is obtained by training based on sample background traffic data and sample network traffic data corresponding to different sample network protocols.
11. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the network protocol identification method of any one of claims 1-9 when executing the program.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the network protocol identification method of any one of claims 1 to 9.
CN202110042481.2A 2021-01-13 2021-01-13 Network protocol identification method, device, electronic equipment and readable storage medium Active CN114765634B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110042481.2A CN114765634B (en) 2021-01-13 2021-01-13 Network protocol identification method, device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110042481.2A CN114765634B (en) 2021-01-13 2021-01-13 Network protocol identification method, device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN114765634A true CN114765634A (en) 2022-07-19
CN114765634B CN114765634B (en) 2023-12-12

Family

ID=82363196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110042481.2A Active CN114765634B (en) 2021-01-13 2021-01-13 Network protocol identification method, device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114765634B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140310517A1 (en) * 2013-04-15 2014-10-16 International Business Machines Corporation Identification and classification of web traffic inside encrypted network tunnels
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning
US20170104725A1 (en) * 2015-10-07 2017-04-13 International Business Machines Corporation Anonymization of traffic patterns over communication networks
CN107682216A (en) * 2017-09-01 2018-02-09 南京南瑞集团公司 A kind of network traffics protocol recognition method based on deep learning
CN107809343A (en) * 2016-09-09 2018-03-16 中国人民解放军信息工程大学 A kind of network protocol identification method and device
CN108510194A (en) * 2018-03-30 2018-09-07 平安科技(深圳)有限公司 Air control model training method, Risk Identification Method, device, equipment and medium
WO2018178028A1 (en) * 2017-03-28 2018-10-04 British Telecommunications Public Limited Company Initialisation vector identification for encrypted malware traffic detection
CN110365639A (en) * 2019-05-29 2019-10-22 中国科学院信息工程研究所 A kind of malicious traffic stream detection method and system based on depth residual error network
WO2019223553A1 (en) * 2018-05-22 2019-11-28 华为技术有限公司 Network traffic identification method and related device
CN110555526A (en) * 2019-08-20 2019-12-10 北京迈格威科技有限公司 Neural network model training method, image recognition method and device
WO2020119662A1 (en) * 2018-12-14 2020-06-18 深圳先进技术研究院 Network traffic classification method
WO2020119481A1 (en) * 2018-12-11 2020-06-18 深圳先进技术研究院 Network traffic classification method and system based on deep learning, and electronic device
US20200219005A1 (en) * 2019-01-09 2020-07-09 International Business Machines Corporation Device discovery and classification from encrypted network traffic
CN111582378A (en) * 2020-05-09 2020-08-25 上海钧正网络科技有限公司 Training generation method, position detection method and device of positioning recognition model
CN111726264A (en) * 2020-06-18 2020-09-29 中国电子科技集团公司第三十六研究所 Network protocol variation detection method, device, electronic equipment and storage medium
CN112003870A (en) * 2020-08-28 2020-11-27 国家计算机网络与信息安全管理中心 Network encryption traffic identification method and device based on deep learning

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140310517A1 (en) * 2013-04-15 2014-10-16 International Business Machines Corporation Identification and classification of web traffic inside encrypted network tunnels
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning
US20170104725A1 (en) * 2015-10-07 2017-04-13 International Business Machines Corporation Anonymization of traffic patterns over communication networks
CN107809343A (en) * 2016-09-09 2018-03-16 中国人民解放军信息工程大学 A kind of network protocol identification method and device
WO2018178028A1 (en) * 2017-03-28 2018-10-04 British Telecommunications Public Limited Company Initialisation vector identification for encrypted malware traffic detection
CN107682216A (en) * 2017-09-01 2018-02-09 南京南瑞集团公司 A kind of network traffics protocol recognition method based on deep learning
CN108510194A (en) * 2018-03-30 2018-09-07 平安科技(深圳)有限公司 Air control model training method, Risk Identification Method, device, equipment and medium
WO2019223553A1 (en) * 2018-05-22 2019-11-28 华为技术有限公司 Network traffic identification method and related device
WO2020119481A1 (en) * 2018-12-11 2020-06-18 深圳先进技术研究院 Network traffic classification method and system based on deep learning, and electronic device
WO2020119662A1 (en) * 2018-12-14 2020-06-18 深圳先进技术研究院 Network traffic classification method
US20200219005A1 (en) * 2019-01-09 2020-07-09 International Business Machines Corporation Device discovery and classification from encrypted network traffic
CN110365639A (en) * 2019-05-29 2019-10-22 中国科学院信息工程研究所 A kind of malicious traffic stream detection method and system based on depth residual error network
CN110555526A (en) * 2019-08-20 2019-12-10 北京迈格威科技有限公司 Neural network model training method, image recognition method and device
CN111582378A (en) * 2020-05-09 2020-08-25 上海钧正网络科技有限公司 Training generation method, position detection method and device of positioning recognition model
CN111726264A (en) * 2020-06-18 2020-09-29 中国电子科技集团公司第三十六研究所 Network protocol variation detection method, device, electronic equipment and storage medium
CN112003870A (en) * 2020-08-28 2020-11-27 国家计算机网络与信息安全管理中心 Network encryption traffic identification method and device based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯文博: "基于卷积神经网络的应用层协议识别方法", 《计算机应用》, pages 1 - 7 *
刘纪伟等: "一种增量式GHSOM算法在DDoS攻击检测中的应用", 《南京邮电大学学报)自然科学版》, pages 1 - 6 *

Also Published As

Publication number Publication date
CN114765634B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
Xu et al. Am I eclipsed? A smart detector of eclipse attacks for Ethereum
CN113315742B (en) Attack behavior detection method and device and attack detection equipment
CN112242984B (en) Method, electronic device and computer program product for detecting abnormal network request
CN112351031B (en) Method and device for generating attack behavior portraits, electronic equipment and storage medium
CN110011932B (en) Network traffic classification method capable of identifying unknown traffic and terminal equipment
WO2019199769A1 (en) Cyber chaff using spatial voting
US10015192B1 (en) Sample selection for data analysis for use in malware detection
CN111371778B (en) Attack group identification method, device, computing equipment and medium
CN114338064A (en) Method, device, equipment and storage medium for identifying network traffic type
Kaur et al. A novel blockchain model for securing IoT based data transmission
Shen et al. An experiment study on federated learning testbed
Dhasade et al. TEE-based decentralized recommender systems: The raw data sharing redemption
US11557005B2 (en) Addressing propagation of inaccurate information in a social networking environment
CN106411923B (en) Network risk assessment method based on ontology modeling
CN112688897A (en) Traffic identification method and device, storage medium and electronic equipment
US20170279777A1 (en) File signature system and method
CN114765634B (en) Network protocol identification method, device, electronic equipment and readable storage medium
CN114866310A (en) Malicious encrypted flow detection method, terminal equipment and storage medium
CN113783920A (en) Method and apparatus for identifying web access portal
CN117424764B (en) System resource access request information processing method and device, electronic equipment and medium
Devi et al. Development of Advanced IoT Devices using ECC-LSTM for an Enhanced Device Security
CN111786937B (en) Method, apparatus, electronic device and readable medium for identifying malicious request
EP4145768A1 (en) Inline detection of encrypted malicious network sessions
US20230229786A1 (en) Systems and methods for federated model validation and data verification
CN113055334B (en) Method and device for supervising network behavior of terminal user

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant