CN113938436B - Method and device for identifying service type of data - Google Patents

Method and device for identifying service type of data Download PDF

Info

Publication number
CN113938436B
CN113938436B CN202111131055.2A CN202111131055A CN113938436B CN 113938436 B CN113938436 B CN 113938436B CN 202111131055 A CN202111131055 A CN 202111131055A CN 113938436 B CN113938436 B CN 113938436B
Authority
CN
China
Prior art keywords
tcp
intersection
data
identified
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111131055.2A
Other languages
Chinese (zh)
Other versions
CN113938436A (en
Inventor
李京辉
郭省力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202111131055.2A priority Critical patent/CN113938436B/en
Publication of CN113938436A publication Critical patent/CN113938436A/en
Application granted granted Critical
Publication of CN113938436B publication Critical patent/CN113938436B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/19Flow control; Congestion control at layers above the network layer
    • H04L47/193Flow control; Congestion control at layers above the network layer at the transport layer, e.g. TCP related
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a method and a device for identifying a service type of data, relates to the technical field of communication, and solves the problem that the service type of encrypted data cannot be identified. The method comprises the following steps: obtaining m sample data messages; determining the length characteristics of m sample data messages; the length feature includes at least one first intersection; acquiring a data message to be identified, wherein the data message to be identified comprises at least one second TCP stream; determining at least one second intersection; the second intersection corresponds to a second TCP stream in the data message to be identified, and the second intersection is an intersection with the largest length sequence number in the corresponding second TCP stream and each first intersection respectively; determining effective TCP streams in a data message to be identified; if the number of the effective TCP streams in the data message to be identified is larger than or equal to the second ratio, determining that the service type of the data message to be identified is the service type of m sample data messages. The method and the device are used in the service data identification process.

Description

Method and device for identifying service type of data
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for identifying a service type of data.
Background
The service type of the service data transmitted in the operator network can provide basis for the evaluation optimization and the scene marketing analysis of the operator network capability. Therefore, the operator needs to analyze the transmitted service data to determine the service type of the service data.
Currently, when transmitting data in an operator network, in order to ensure the security of the data, encryption transmission is generally performed on the data. For example, the traffic data is encrypted by a secure transport layer protocol (transport layer security, TLS). In the prior art, when identifying the service type of the TLS encrypted data, the service type of the data is determined mainly by identifying the Server name field of the service data. However, the Server name field of the service data can only determine the website corresponding to the data, and it is difficult to further determine the service type of the service data.
Disclosure of Invention
The application provides a method and a device for identifying the service type of data, which can identify the service type of encrypted data.
In order to achieve the above purpose, the present application adopts the following technical scheme:
in a first aspect, the present application provides a method for identifying a service type of data, the method comprising:
Obtaining m sample data messages, wherein the m sample data messages are encrypted data messages of the same service type data; each of the m sample data messages includes at least one first transmission control protocol, TCP, flow, each first TCP flow including at least one length sequence; the first TCP stream comprises at least one encrypted data block, and the length sequence is the length value of the encrypted data block; m is a positive integer; determining the length characteristics of m sample data messages; the length features comprise at least one first intersection, wherein the first intersection is a set meeting a first preset condition in a set of length sequences shared among m first TCP streams; the m first TCP flows include one first TCP flow in each of the m sample data messages; acquiring a data message to be identified, wherein the data message to be identified comprises at least one second TCP stream; determining at least one second intersection; the second intersection corresponds to a second TCP stream in the data message to be identified, and the second intersection is an intersection with the largest length sequence number in the corresponding second TCP stream and each first intersection respectively; determining effective TCP streams in a data message to be identified; the ratio of the number of the length sequences in the second intersection corresponding to the effective TCP stream to the number of the length sequences in the effective TCP stream is larger than or equal to the first ratio; if the number of the effective TCP streams in the data message to be identified is larger than or equal to the second ratio, determining that the service type of the data message to be identified is the service type of m sample data messages.
Based on the technical scheme, the device for identifying the service type of the data can analyze the commonality relation between the sample data message and the data message to be identified through the length characteristics by acquiring the sample data message and the data message to be identified and extracting the length characteristic information in the data message, so that the service type of the data message to be identified is determined, the network capacity is evaluated and optimized through the identification of the service type, and the effect of better identifying the encrypted data service type is achieved.
With reference to the first aspect, in one possible implementation manner, the method further includes: step 1, selecting one first TCP stream from each data message, and determining m first TCP streams; step 2, determining intersections among m first TCP streams; the intersection among m first TCP streams comprises n length sequences, wherein n is an integer; traversing each first TCP stream in m sample data messages according to the step 1 and the step 2, and determining k intersections; k is a positive integer; determining that the intersection of the k intersections, in which the ratio of the value of the number of the length sequences to the minimum value of the number of the length sequences in the m first TCP streams is greater than a third ratio, is at least one first intersection; at least one first intersection is determined as a length characteristic of m sample data messages.
With reference to the first aspect, in one possible implementation manner, the method further includes: acquiring any second TCP stream in a data message to be identified; determining intersections between any one of the second TCP streams and each of the first intersections, respectively; determining the intersection with the largest value of the length sequence number in the intersections between any second TCP stream and each first intersection as a second intersection; and traversing each second TCP stream in the data message to be identified, and determining at least one second intersection.
With reference to the first aspect, in one possible implementation manner, the method further includes: generating a first instruction; the first instruction is used for indicating the target equipment to send the data message; respectively sending m first instructions to target equipment; m sample data messages are received from a target device.
With reference to the first aspect, in one possible implementation manner, the length sequence is a length sequence in a field used for characterizing the data content in the first TCP stream or the second TCP stream; the first TCP stream and the second TCP stream are TCP streams used for representing the picture data in the data message.
With reference to the first aspect, in one possible implementation manner, a ratio of a value of the number of length sequences of the intersection meeting the first preset condition to a minimum value of the number of length sequences in the m first TCP flows is greater than or equal to the third ratio.
In a second aspect, the present application provides an apparatus for identifying a service type of data, the apparatus comprising: a communication unit and a processing unit; the communication unit is used for acquiring m sample data messages, wherein the m sample data messages are encrypted data messages of the same service type data; each of the m sample data messages includes at least one first transmission control protocol, TCP, flow, each first TCP flow including at least one length sequence; the first TCP stream comprises at least one encrypted data block, and the length sequence is the length value of the encrypted data block; m is a positive integer; the processing unit is used for determining the length characteristics of the m sample data messages; the length features comprise at least one first intersection, wherein the first intersection is a set meeting a first preset condition in a set of length sequences shared among m first TCP streams; the m first TCP flows include one first TCP flow in each of the m sample data messages; the communication unit is further used for acquiring a data message to be identified, wherein the data message to be identified comprises at least one second TCP stream; a processing unit for determining at least one second intersection; the second intersection corresponds to a second TCP stream in the data message to be identified, and the second intersection is an intersection with the largest length sequence number in the corresponding second TCP stream and each first intersection respectively; the processing unit is also used for determining the effective TCP stream in the data message to be identified; the ratio of the number of the length sequences in the second intersection corresponding to the effective TCP stream to the number of the length sequences in the effective TCP stream is larger than or equal to the first ratio; if the number of the effective TCP flows in the data packet to be identified is greater than or equal to the second ratio, the processing unit is further configured to determine that the service type of the data packet to be identified is the service type of m sample data packets.
With reference to the second aspect, in one possible implementation manner, the processing unit is specifically configured to: step 1, selecting one first TCP stream from each data message, and determining m first TCP streams; step 2, determining intersections among m first TCP streams; the intersection among m first TCP streams comprises n length sequences, wherein n is an integer; traversing each first TCP stream in m sample data messages according to the step 1 and the step 2, and determining k intersections; k is a positive integer; determining that the intersection of the k intersections, in which the ratio of the value of the number of the length sequences to the minimum value of the number of the length sequences in the m first TCP streams is greater than a third ratio, is at least one first intersection; at least one first intersection is determined as a length characteristic of m sample data messages.
With reference to the second aspect, in one possible implementation manner, the processing unit is specifically configured to: acquiring any second TCP stream in a data message to be identified; determining intersections between any one of the second TCP streams and each of the first intersections, respectively; determining the intersection with the largest value of the length sequence number in the intersections between any second TCP stream and each first intersection as a second intersection; and traversing each second TCP stream in the data message to be identified, and determining at least one second intersection.
With reference to the second aspect, in one possible implementation manner, the communication unit is specifically configured to: generating a first instruction; the first instruction is used for indicating the target equipment to send the data message; respectively sending m first instructions to target equipment; m sample data messages are received from a target device.
With reference to the second aspect, in one possible implementation manner, the length sequence is a length sequence in a field used for characterizing data content in the first TCP stream or the second TCP stream; the first TCP stream and the second TCP stream are TCP streams used for representing the picture data in the data message.
With reference to the second aspect, in one possible implementation manner, a ratio of a value of the number of length sequences of the intersection that meets the first preset condition to a minimum value of the number of length sequences in the m first TCP flows is greater than or equal to the third ratio.
In a third aspect, the present application provides an apparatus for identifying a service type of data, the apparatus comprising: a processor and a communication interface; the communication interface is coupled to a processor for running a computer program or instructions to implement a method of identifying a traffic type of data as described in any one of the possible implementations of the first aspect and the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium having instructions stored therein which, when run on a terminal, cause the terminal to perform a method of identifying a traffic type of data as described in any one of the possible implementations of the first aspect and the first aspect.
In a fifth aspect, the present application provides a computer program product comprising instructions which, when run on an apparatus for identifying a traffic type of data, cause the apparatus for identifying a traffic type of data to perform a method of identifying a traffic type of data as described in any one of the possible implementations of the first aspect and the first aspect.
In a sixth aspect, the present application provides a chip comprising a processor and a communication interface, the communication interface and the processor being coupled, the processor being for running a computer program or instructions to implement a method of identifying a traffic type of data as described in any one of the possible implementations of the first aspect and the first aspect.
In particular, the chip provided in the present application further includes a memory for storing a computer program or instructions.
It should be noted that the above-mentioned computer instructions may be stored in whole or in part on the first computer readable storage medium. The first computer readable storage medium may be packaged together with the processor of the apparatus or may be packaged separately from the processor of the apparatus, which is not limited in this application.
In a seventh aspect, the present application provides a system for identifying a service type of data, including: a target device and means for identifying a traffic type of data, wherein the means for identifying a traffic type of data is for performing a method of identifying a traffic type of data as described in any one of the possible implementations of the second aspect and the second aspect.
The description of the second to seventh aspects of the present invention may refer to the detailed description of the first aspect; also, the advantageous effects described in the second aspect to the seventh aspect may refer to the advantageous effect analysis of the first aspect, and are not described herein.
In the present application, the names of the above-mentioned devices for identifying the service types of data do not constitute limitations on the devices or function modules themselves, and in actual implementation, these devices or function modules may appear under other names. Insofar as the function of each device or function module is similar to that of the present invention, it falls within the scope of the claims of the present invention and the equivalents thereof.
These and other aspects of the invention will be more readily apparent from the following description.
Drawings
Fig. 1 is a schematic structural diagram of a communication system according to an embodiment of the present application;
Fig. 2 is a schematic structural diagram of a data packet according to an embodiment of the present application;
fig. 3 is a flowchart of a method for identifying a service type of data according to an embodiment of the present application;
fig. 4 is a flow chart of a method for obtaining a data packet according to an embodiment of the present application;
fig. 5 is a flow chart of a method for determining length characteristics of a data packet according to an embodiment of the present application;
fig. 6 is a flowchart of a second intersection determining method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an apparatus for identifying a service type of data according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of another apparatus for identifying a service type of data according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
The following describes in detail a method and an apparatus for identifying a service type of data according to embodiments of the present application with reference to the accompanying drawings.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.
The terms "first" and "second" and the like in the description and in the drawings are used for distinguishing between different objects or for distinguishing between different processes of the same object and not for describing a particular sequential order of objects.
Furthermore, references to the terms "comprising" and "having" and any variations thereof in the description of the present application are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or apparatus.
It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more.
The following explains the terms related to the embodiments of the present application, so as to facilitate the understanding of the reader.
(1) Transmission control protocol (transmission control protocol, TCP): the transmission control protocol is a connection-oriented, reliable, byte stream based transport layer communication protocol.
The TPC protocol is intended to accommodate a layered protocol hierarchy that supports multiple network applications. Reliable communication services are provided by means of the TCP protocol between pairs of processes in host computers connected to different but interconnected computer communication networks.
(2) Secure transport layer protocol (transport layer security, TLS): the secure transport layer protocol is used to provide confidentiality and data integrity between two communication applications. The TLS protocol is applied between the network layer and the application layer of the TCP/IP protocol model.
The TLS protocol includes a TLS recording protocol. The TLS recording protocol is a layered protocol. The information in each layer contains fields such as length, description, and content. The TLS recording protocol supports information transfer, segmentation of data into processable blocks, compression of data, application of medium access control (media access control, MAC), encryption, transfer results, etc. The receiving device decrypts, verifies, decompresses, reassembles, etc., the received data before transmitting them to the higher-level client.
(3) Hypertext transfer protocol (hyper text transfer protocol, HTTP): the hypertext transfer protocol is an application-layer object-oriented protocol for enabling communication between a client browser and a Web server.
(4) Five-tuple: the five-tuple refers to a set formed by a source IP address, a source port, a destination IP address, a destination port and a protocol identifier. The five-tuple can distinguish between different sessions and the corresponding session is unique.
Fig. 1 is a schematic structural diagram of a communication system according to an embodiment of the present application. The communication system includes: one or more means 11 for identifying the traffic type of the data, one or more target devices 12.
The means 11 for identifying the traffic type of the data and the target device 12 may communicate via a communication link of a communication network. The plurality of devices 11 identifying the traffic type of the data may also communicate over a communication link of a communication network. Communication between the plurality of target devices 12 may also be via communication links of a communication network.
The communication network referred to in the present application may be a 4G network, a 5G network, or another type of communication network, which is not limited in this application. In this application, a communication network is mainly described as a 5G network.
In the case where the communication network is a 5G network, the apparatus 11 for identifying a service type of data acquires service data transmitted by the target device 12 through the 5G network. The device 11 for identifying the service type of the data analyzes the acquired service data, analyzes the service characteristics, and thus evaluates and optimizes the network capability and analyzes the scene marketing.
The service data sent by the target device 12 may be unencrypted or encrypted. Illustratively, the encryption may be TLS encryption.
When the target device 12 transmits the service data encrypted by TLS, the device 11 for identifying the service type of the data cannot identify the data service by the conventional HTTP service identification method, so that evaluation optimization of network capability and analysis of scene marketing cannot be performed.
It should be noted that, the embodiments of the present application may refer to or refer to each other, for example, the same or similar steps, and the method embodiment, the communication system embodiment, and the device embodiment may refer to each other, which is not limited.
The one or more means 11 for identifying the traffic type of the data may be a server as shown in fig. 1. One or more of the target devices 12 may be a server 121, a terminal 122 as shown in fig. 1.
Hereinafter, the server will be described in detail.
The server comprises:
the processor may be a general purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the programs of the present application.
The transceiver may be a device using any transceiver or the like for communicating with other devices or communication networks, such as ethernet, radio access network (radio access network, RAN), wireless local area network (wireless local area networks, WLAN), etc.
Memory, which may be, but is not limited to, read-only memory (ROM) or other type of static storage device that may store static information and instructions, random access memory (random access memory, RAM) or other type of dynamic storage device that may store information and instructions, but may also be electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), compact disc read-only memory (compact disc read-only memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be stand alone and be coupled to the processor via a communication line. The memory may also be integrated with the processor.
The terminal 122 will be described in detail below.
Terminal 122, which is a device with wireless communication capabilities, may be deployed on land, including indoors or outdoors, hand-held or vehicle-mounted. Can also be deployed on the water surface (such as a ship, etc.). But may also be deployed in the air (e.g., on aircraft, balloon, satellite, etc.). Terminals, also called User Equipment (UE), mobile Stations (MSs), mobile Terminals (MT), and terminal equipment, etc., are devices that provide voice and/or data connectivity to a user. For example, the terminal includes a handheld device, an in-vehicle device, and the like having a wireless connection function. Currently, the terminal may be: a mobile phone, a tablet, a laptop, a palmtop, a mobile internet device (mobile internet device, MID), a wearable device (e.g., a smartwatch, a smartband, a pedometer, etc.), a vehicle-mounted device (e.g., an automobile, a bicycle, an electric car, an airplane, a ship, a train, a high-speed rail, etc.), a Virtual Reality (VR) device, an augmented reality (augmented reality, AR) device, a wireless terminal in an industrial control (industrial control), a smart home device (e.g., a refrigerator, a television, an air conditioner, an electric meter, etc.), a smart robot, a workshop device, a wireless terminal in a drone (self driving), a wireless terminal in a teleoperation (remote medical surgery), a wireless terminal in a smart grid (smart grid), a wireless terminal in a transportation security (transportation safety), a wireless terminal in a smart city (smart city), or a wireless terminal in a smart home (smart home), a flying device (e.g., a smart robot, a hot balloon, an airplane, etc. In one possible application scenario, the terminal device is a terminal device that is often operated on the ground, for example a vehicle-mounted device. In this application, for convenience of description, a Chip disposed in the above device, such as a System-On-a-Chip (SOC), a baseband Chip, etc., or other chips having a communication function may also be referred to as a terminal.
The terminal can be a vehicle with corresponding communication function, or a vehicle-mounted communication device, or other embedded communication devices, or can be a handheld communication device of a user, including a mobile phone, a tablet personal computer and the like.
As an example, in the present embodiment, the terminal 122 may also be a wearable device. The wearable device can also be called as a wearable intelligent device, and is a generic name for intelligently designing daily wear by applying wearable technology and developing wearable devices, such as glasses, gloves, watches, clothes, shoes and the like. The wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The wearable device is not only a hardware device, but also can realize a powerful function through software support, data interaction and cloud interaction. The generalized wearable intelligent device includes full functionality, large size, and may not rely on the smart phone to implement complete or partial functionality, such as: smart watches or smart glasses, etc., and focus on only certain types of application functions, and need to be used in combination with other devices, such as smart phones, for example, various smart bracelets, smart jewelry, etc. for physical sign monitoring.
In the above, the application scenario of the embodiment of the present application is described. The following describes a message structure of a data message according to an embodiment of the present application:
fig. 2 is a schematic structural diagram of a data packet according to an embodiment of the present application. Each data message includes a plurality of stream files. In the embodiment of the present application, taking the TCP protocol as an example, each data packet includes a plurality of TCP flows. Each TCP flow includes a plurality of field information, for example: quintuple information, protocol information, etc.
Illustratively, embodiments of the present application classify TCP streams by five-tuple and stream index, in the TCP protocol. The quintuple information can determine a sending end and a receiving end of the TCP stream. The flow index is the internal mapping of the TCP protocol to the source IP address, source port to destination IP address, destination port. The values of the source IP address, source port, destination IP address, and destination port corresponding to TCP flows having the same flow index are also the same. Thus, by classifying the above information, one or more TCP flows corresponding to the same data can be obtained.
For easy understanding, taking TLS recording protocol in TLS protocol as an example, the encryption process of encrypted data in TCP stream is explained in detail:
Step 1, segmenting application data.
For example, application data "djqwnuhdsafgre" is segmented into three data blocks "djqw", "nuhds", "afgre".
And step 2, adding the MAC address into the data block.
And 3, encrypting each data block.
And 4, packaging the encrypted data block, and attaching field information such as a protocol record header and the like.
Optionally, the segmented data block may be compressed before step 2, which is not specifically described in the embodiments of the present application.
As such, as shown in fig. 2, the encrypted data in each TCP stream may include one or more encrypted data blocks.
Based on the communication system shown in fig. 1 and the data message structure shown in fig. 2, when the computing device analyzes the service data transmitted in the operator network, if the transmitted service data is an encrypted data message, the computing device in the prior art can only obtain the plaintext information in the data message. For example, the website corresponding to the data message is determined by identifying the Server name field in the data message. The application data content in the data packet is mainly stored in the encrypted data field information, however, the scheme in the prior art cannot analyze the encrypted data field, so that it is difficult to further determine the service type of the service data.
In order to solve the problem that the service type cannot be determined according to encrypted service data in the prior art, the application provides a method for identifying the service type of the data.
As shown in fig. 3, a flowchart of a method for identifying a service type of data according to an embodiment of the present application is provided, where the method includes the following steps:
s201, a device for identifying the service type of data acquires m sample data messages.
Wherein, the m sample data messages are the same service type data. Each data packet of the m sample data packets includes at least one first TCP stream, each first TCP stream includes at least one length sequence, the first TCP stream includes at least one encrypted data block, the length sequence is a length value of the encrypted data block, and m is a positive integer.
Further, in the data transmission process under the TCP protocol, a plurality of transmission stages are included. Each data message comprises a plurality of TCP streams, and the TCP streams bear instruction information of a plurality of stages of the data message in the data transmission process. Wherein the content type field in the TCP stream may characterize the type information of the TCP stream. For example, the content type may include handshake information handshake, application data, and the like. Among them, TCP, in which content type is application data, is generally used for carrying application data information such as pictures and videos.
Optionally, the embodiment of the present application selects a TCP stream with a content type as application data as the first TCP stream. In this way, in the embodiment of the present application, the TCP flow used for representing the picture data in the data packet may be used as the first TCP flow of the data packet.
It should be noted that, as shown in fig. 2, the encrypted data in each TCP stream includes one or more encrypted data blocks, and in this embodiment of the present application, the length value of each encrypted data block is used as the length sequence corresponding to the encrypted data block. That is, each first TCP stream includes at least one length sequence, each length sequence corresponding to an encrypted data block in the encrypted data of the TCP stream.
For example, the embodiment of the application may acquire the data message by means of dial testing.
By way of example, the embodiments of the present application take TLS protocol under TCP protocol as an example, and details a method for identifying a service type of data provided by the embodiments of the present application. The present application does not limit the transport layer protocol, encryption protocol, etc. of the data.
S202, the device for identifying the service type of the data determines the length characteristics of m sample data messages.
The length features comprise at least one first intersection, wherein the first intersection is a set meeting a first preset condition in a set of length sequences shared among m first TCP streams; the m first TCP flows include one first TCP flow in each of the m sample data messages;
It should be noted that, in the above step S201, the first TCP flows in the m sample data packets acquired by the apparatus for identifying the service type of data in the embodiment of the present application may have a difference. Thus, the means for identifying the traffic type of the data can obtain a sequence of lengths common to the selected first TCP flows by selecting one first TCP flow from each acquired sample data packet and calculating the intersection between the selected first TCP flows. The means for identifying the traffic type of the data can obtain a common length sequence in the m sample data messages by calculating the intersection of all possible combinations. Therefore, by the method, the device for identifying the service type of the data can determine the length characteristics of the m sample data messages.
Specifically, the device for identifying the service type of the data in the embodiment of the present application may screen the intersection set obtained by the calculation of the method according to a first preset condition.
The first preset condition comprises: the ratio of the value of the number of length sequences of the intersection to the minimum value of the number of length sequences in the m first TCP streams is greater than or equal to the third ratio.
It will be readily appreciated that the means for identifying the traffic type of the data may derive the sequence of lengths common to the selected first TCP flows by selecting one first TCP flow from each acquired sample data packet and calculating the intersection between the selected first TCP flows as described above. Meanwhile, the number of elements in the intersection of the m first TCP flows, that is, the number of length sequences is smaller than or equal to any one of the m first TCP flows. Therefore, the device for identifying the service type of the data can obtain the proportion of the length sequences shared by the selected m first TCP streams by calculating the ratio of the value of the length sequence number of the intersection to the minimum value of the length sequence numbers in the m first TCP streams. The higher the specific gravity, the more length features that are common to the selected m first TCP flows, and vice versa, the less.
According to the method and the device, the intersection corresponding to the third ratio is used as the first intersection, and the intersection with lower specific gravity can be screened out, so that the length characteristics of m sample data messages reflected by the first intersection are more representative.
Illustratively, the third ratio in embodiments of the present application may be any value between 0 and 1. For example, 60% is taken as the third ratio in the examples of the present application.
S203, the device for identifying the service type of the data acquires the data message to be identified.
Wherein the data message to be identified comprises at least one second TCP stream. In the embodiment of the present application, the data packet to be identified has the same structural characteristics as the sample data packet, the structural characteristics of the second TCP flow in the data packet to be identified are the same as the first TCP flow in the sample data packet, and the details can refer to the related content in step S201 and the structural schematic diagram of the data packet shown in fig. 2.
It should be noted that the data packets to be identified in the embodiment of the present application are typically 1 data packet.
Optionally, the embodiment of the present application may first perform step S201 to obtain m sample data packets, step S202 to determine length characteristics of the m sample data packets, and then perform step S203 to obtain a data packet to be identified; the embodiment of the application may further execute step S203 to obtain the data packet to be identified, then execute step S201 to obtain m sample data packets, and S202 determine the length characteristics of the m sample data packets.
In one possible implementation manner, the embodiments of the present application may execute steps S201 to S202 multiple times, thereby obtaining multiple sample data packets of multiple different service types of data, thereby determining length characteristics of the multiple different service types of data, and pre-storing the obtained information in a sample database. In this way, when the related information of the sample data packet needs to be acquired, the device for identifying the service type of the data in the present application may be directly acquired from the sample database.
S204, the device for identifying the service type of the data determines at least one second intersection.
The second intersection corresponds to a second TCP flow in the data packet to be identified, and the second intersection is an intersection in which the corresponding second TCP flow meets a second preset condition in each intersection of the corresponding second TCP flow and each first intersection.
Specifically, the second preset condition includes an intersection set with the largest number of length sequences in the intersection set.
It should be noted that the data packet to be identified includes at least one second TCP flow, and the at least one second TCP flow is used for characterizing a service type of the data packet to be identified. The first intersection is used for representing the length characteristics of the service types corresponding to the m sample data messages. However, the correspondence between the second TCP flow and the first intersection is indeterminate. Therefore, the device for identifying the service type of the data in the embodiment of the present application calculates an intersection between the second TCP flow and each first intersection, where the intersection is a length sequence shared by the second TCP flow and each first intersection. The intersection can characterize a commonality of the second TCP flow with each of the first intersections, respectively. By selecting the intersection with the largest length sequence number in the intersection as a second intersection corresponding to the second TCP stream, the second intersection can represent the commonality relationship between the second TCP stream corresponding to the second intersection and the sample data message, and the determined at least one second intersection can represent the commonality relationship between the data message to be identified and the sample data message as a whole.
S205, the device for identifying the service type of the data determines the effective TCP stream in the data message to be identified.
The ratio of the number of the length sequences in the second intersection corresponding to the effective TCP stream to the number of the length sequences in the effective TCP stream is greater than or equal to the first ratio.
Specifically, the data message to be identified includes at least one second TCP flow, and each second TCP flow corresponds to a second intersection. And if the ratio of the number of the length sequences in the corresponding second intersection to the number of the length sequences in the second TCP stream is greater than or equal to the first ratio, the second TCP stream is considered to be an effective TCP stream in the data message to be identified. Thus, the ratio of the number of length sequences in the second intersection corresponding to the effective TCP stream to the number of length sequences in the effective TCP stream is greater than or equal to the first ratio.
It should be noted that, as shown in the content of step S204, the second intersection represents a common relationship between the corresponding second TCP flow and the sample data packet. Therefore, the ratio obtained by calculating the number of the length sequences in the second intersection and the number of the length sequences in the second TCP stream can be used for expressing the degree of the commonality relation between the second TCP stream corresponding to the second intersection and the sample data message.
It is easy to understand that, when the calculated ratio is larger, the commonality relationship between the second TCP flow corresponding to the second intersection and the sample data packet is stronger; otherwise, when the calculated ratio is smaller, the common relation between the second TCP stream corresponding to the second intersection and the sample data message is weaker.
It should be noted that, the second intersection is obtained by calculating the intersection of the corresponding second TCP flow and each first intersection. Therefore, the number of the length sequences in the second intersection is smaller than or equal to the number of the length sequences in the corresponding second TCP stream. Thus, the first ratio may select any real number from 0-1.
S206, the device for identifying the service type of the data judges whether the number of the effective TCP streams in the data message to be identified is larger than or equal to a second ratio.
Specifically, the number of valid TCP flows in the data packet to be identified is a ratio of the number of valid TCP flows in the data packet to be identified to the number of second TCP flows in the data packet to be identified.
It is easy to understand that, as known from step S205, the valid TCP flow is used to represent the second TCP flow with a strong commonality relationship between the data packet to be identified and the sample data packet. Therefore, when the number of effective TCP streams in the data message to be identified is higher, the data message to be identified and the sample data message are more strongly correlated; conversely, when the number of effective TCP streams in the data message to be identified is lower, the weaker the correlation between the data message to be identified and the sample data message is.
It should be noted that, as shown in step S205, the number of valid TCP flows in the data packet to be identified is less than or equal to the number of second TCP flows in the data packet to be identified. Thus, the second ratio may select any real number from 0-1.
If the number of valid TCP flows in the data packet to be identified is greater than or equal to the second ratio, step S207 is performed.
S207, the device for identifying the service type of the data determines that the service type of the data message to be identified is the service type of m sample data messages.
As can be seen from the content in step S207, when the number of effective TCP flows in the data packet to be identified reaches a certain level, the data packet to be identified and the m sample data packets can be considered as the same service type data.
In the embodiment of the application, the device for identifying the service type of the data acquires the sample data message and the data message to be identified, and extracts the length characteristic information in the data message, so that the device for identifying the service type of the data can analyze the commonality relation between the sample data message and the data message to be identified through the length characteristic, thereby determining the service type of the data message to be identified, facilitating evaluation optimization of network capability and scene marketing analysis through identification of the service type, and achieving the effect of better identifying the encrypted data service type.
In conjunction with the step S201, a process of obtaining m sample data packets by a device for identifying a service type of data is described in detail, specifically, with reference to fig. 3, as shown in fig. 4, fig. 4 is a flow chart of a data packet obtaining method provided in an embodiment of the present application. The step S201 may be specifically implemented by the following steps S2011-S2013:
s2011, a device that identifies a service type of data generates a first instruction.
The first instruction is used for indicating the target equipment to send the data message.
And S2012, the device for identifying the service type of the data respectively sends m first instructions to the target equipment.
S2013, the device for identifying the service type of the data receives m sample data messages from the target equipment.
By the method, the device for identifying the service type of the data can continuously dial and measure the same target device, can acquire a plurality of groups of data messages corresponding to the source IP address and the source port, and can determine the service type corresponding to the plurality of groups of data messages.
It is easy to understand that m sample data messages in the embodiment of the present application are data messages encrypted by the same service type data. The first TCP flow in each acquired sample datagram may not be exactly the same due to various reasons such as packet loss. Therefore, the m sample data messages are determined through repeated dial testing, so that the information of the service type data can be obtained more comprehensively.
The embodiment of the application can also acquire 1 sample data message, i.e. m is equal to 1. The embodiments of the present application are not limited in this regard.
Illustratively, the first TCP stream may be denoted STCP [ i ] [ j ], expressed as: the j first TCP stream in the i-th sample data packet. Wherein i is a positive integer less than or equal to m, and j is a positive integer.
STCP [ i ] [ j ] may comprise one or more length sequences, such as: STCP [ i ] [ j ] = {10246, 11256, 9686} is expressed as: the first TCP stream STCP [ i ] [ j ] includes three length sequences, the first length sequence having a length value of 10246, the second length sequence having a length value of 11256, and the third length sequence having a length value of 9686.
Taking m equal to 3 as an example, 3 sample data messages are as follows:
STCP[1][1]STCP[1][2]STCP[1][3]
STCP[2][1]STCP[2][2]STCP[2][3]STCP[2][4]
STCP[3][1]STCP[3][2]STCP[3][3]STCP[3][4]STCP[3][5]
wherein the first sample data packet comprises 3 first TCP streams, the second sample data packet comprises 4 first TCP streams, and the third sample data packet comprises 5 first TCP streams.
In conjunction with the step S202, a process of determining the length characteristics of the m sample data packets by the device for identifying the service type of the data will be described in detail, specifically, with reference to fig. 3, as shown in fig. 5, fig. 5 is a flow chart of a method for determining the length characteristics of the data packets according to an embodiment of the present application. The step S202 may be specifically implemented by the following steps S2021 to S2025:
S2021, the means for identifying a traffic type of data determines m first TCP flows from optionally one first TCP flow in each data packet.
In combination with the above example, there are 3 sample data messages. From each datagram, a first TCP flow is optionally selected, for example: STCP [1] [1], STCP [2] [1], STCP [3] [1], totaling 3 first TCP streams.
S2022, the means for identifying a traffic type of the data determines an intersection between the m first TCP flows.
The intersection among m first TCP streams comprises n length sequences, wherein n is an integer.
In connection with the above example, the intersection between the m first TCP flows is:
STCP[1][1]∩STCP[2][1]∩STCP[3][1]
s2023, according to the above steps S2021 and S2022, the means for identifying a traffic type of data traverses each first TCP flow in the m sample data packets, and determines k intersections. k is a positive integer.
In connection with the above example, the traversal results are as follows:
STCP[1][1]∩STCP[2][1]∩STCP[3][1]
STCP[1][1]∩STCP[2][1]∩STCP[3][2]
STCP[1][1]∩STCP[2][1]∩STCP[3][5]
STCP[1][1]∩STCP[2][2]∩STCP[3][1]
STCP[1][1]∩STCP[2][2]∩STCP[3][2]
STCP[1][1]∩STCP[2][4]∩STCP[3][5]
STCP[1][2]∩STCP[2][1]∩STCP[3][1]
STCP[1][3]∩STCP[2][4]∩STCP[3][1]
STCP[1][3]∩STCP[2][4]∩STCP[3][5]
in the above example provided by the present application, there are 3 sample data packets, the first sample data packet includes 3 first TCP flows, the second sample data packet includes 4 first TCP flows, and the third sample data packet includes 5 first TCP flows. Thus, by performing the traversal through the above steps, an intersection of a total of 60 combinations, i.e., k of 60 in the example, can be obtained.
S2024, the device for identifying the service type of the data determines that the intersection set of which the ratio of the value of the number of the length sequences to the minimum value of the number of the length sequences in the m first TCP streams is larger than the third ratio is at least one first intersection set.
In the present embodiment, STCP [1] [1 ]. AndSTCP [2] [1 ]. AndSTCP [3] [1], the third ratio of 60% is exemplified as follows:
STCP[1][1]={10246,11256,9686}
STCP[2][1]={10246,11256,9784,10002}
STCP[3][1]={10246,11256,9784,12406,11644}
STCP[1][1]∩STCP[2][1]∩STCP[3][1]={10246,11256}
as can be seen, the number of length sequences of the intersection of STCP 1, STCP 2, STCP 3, 1 is 2, the minimum number of length sequences of STCP 1, STCP 2, STCP 3, 1 is STCP 1, and the number of length sequences is 3. Therefore, the ratio of the number of length sequences of the intersection to the number of STCP [1] [1] length sequences is 66.6% and is greater than 60% of the third ratio.
The means for identifying the traffic type of the data determines the intersection as a first intersection of the 3 sample data messages.
By means of the above method, the means for identifying the traffic type of the data calculates the intersections of the 60 combinations, and all the first intersections can be determined.
S2025, the means for identifying the traffic type of the data determines that at least one first intersection is a length characteristic of the m sample data messages.
For example, the means for identifying the traffic type of the data obtains 27 first intersections in step S2024, and these 27 first intersections are the length characteristics of the 3 sample data packets.
For ease of illustration, the 27 first intersections are noted as: a1, A2 and … … A27.
In one possible implementation manner of S203, in combination with the example in the above step, correspondingly, the process of acquiring the data packet to be identified by the device for identifying the service type of the data in step S203 is specifically:
the device for identifying the service type of the data acquires 1 data message to be identified, wherein the acquired data message to be identified comprises 4 second TCP streams.
Illustratively, the second TCP stream may be denoted as TTCP [ g ], expressed as: and g second TCP stream in the data message to be identified. Wherein g is a positive integer.
The acquired data message to be identified can be expressed as follows:
TTCP[1]TTCP[2]TTCP[3]TTCP[4]
in conjunction with the step S204, a process of determining at least one second intersection by the device for identifying a service type of data will be described in detail, specifically, with reference to fig. 3, as shown in fig. 6, fig. 6 is a flow chart of a second intersection determining method provided in an embodiment of the present application. The step S204 may be specifically implemented by the following steps S2041 to S2044:
s2041, the device for identifying the service type of the data acquires any second TCP stream in the data message to be identified.
In combination with the above example, TTCP 2 is any second TCP flow in the acquired data packet to be identified.
S2042, the means for identifying the traffic type of the data determines an intersection between any one of the second TCP flows and each of said first intersections, respectively.
In combination with the above example, the first intersection is: a1, A2 and … … A27.
The device for identifying the service type of the data calculates TTCP 2 n A1, TTCP 2 n A2 … … TTCP 2 n A27 respectively.
S2043, the device for identifying the service type of the data determines that the intersection with the largest value of the length sequence number is the second intersection in the intersections between any second TCP stream and each first intersection respectively;
illustratively, the second intersection may be denoted as B [ g ], expressed as: a second intersection B [ g ] corresponding to the second TCP flow TTCP [ g ].
In combination with the above example, the means for identifying the service type of the data selects the intersection with the largest value of the length sequence number among TTCP [2] n A [1], TTCP [2] n A [2], TTCP [2] n A [3] … … TTCP [2] n A [27] as the second intersection.
For example, TTCP [2 ]. AndA [19] is the intersection in which the value of the number of length sequences is the largest.
The method is characterized by comprising the following steps: B2=TTCP 2 ≡A 19. Wherein B2 is the second intersection corresponding to TTCP 2.
S2044, traversing each second TCP stream in the data message to be identified by the device for identifying the service type of the data, and determining at least one second intersection.
In combination with the above example, through steps S2041-S2043, the means for identifying the traffic type of the data calculates the second intersections corresponding to TTCP [1], TTCP [2], TTCP [3], TTCP [4], respectively, noted as: b1, B2, B3, B4.
In a possible implementation manner of step S205, in combination with the example in the above step, correspondingly, the process of determining, by the device for identifying a service type of data in step S205, a valid TCP flow in the data packet to be identified is specifically:
in connection with the above example, the data message to be identified includes a second TCP stream TTCP [1], TTCP [2], TTCP [3], TTCP [4]. The second intersection is B1, B2, B3, B4.
Illustratively, the number of length sequences in TTCP [1] is 5, the number of length sequences in TTCP [2] is 6, the number of length sequences in TTCP [3] is 7, and the number of length sequences in TTCP [4] is 8.
The number of length sequences in B1 is 3, the number of length sequences in B2 is 4, the number of length sequences in B3 is 5, and the number of length sequences in B4 is 6.
Taking the first ratio as 65% as an example, the ratio of the number of length sequences in TTCP [1] to the number of length sequences in B [1] is smaller than the first ratio, so TTCP [1] is not an effective TCP stream in the data message to be identified.
The ratio of the number of length sequences in TTCP 2 to the number of length sequences in B2 is greater than the first ratio, so TTCP 2 is the effective TCP stream in the data message to be identified.
The ratio of the number of length sequences in TTCP 3 to the number of length sequences in B3 is greater than the first ratio, so TTCP 3 is the effective TCP stream in the data message to be identified.
The ratio of the number of length sequences in TTCP 4 to the number of length sequences in B4 is greater than the first ratio, so TTCP 4 is the effective TCP stream in the data message to be identified.
Thus, the effective TCP stream in the data message to be identified is TTCP 2, TTCP 3, TTCP 4.
In one possible implementation manner of the steps S206 to S207, in combination with the example in the above steps, the corresponding steps S205 to S207 are specifically:
taking the effective TCP streams in the data message to be identified as TTCP 2, TTCP 3, TTCP 4, and the second ratio of 70% as examples.
At this time, the number of effective TCP flows in the data packet to be identified is 75% greater than the second ratio, so the device for identifying the service type of data determines that the service type of the data packet to be identified is the service type of the 3 sample data packets.
In the embodiment of the application, the device for identifying the service type of the data acquires the sample data message and the data message to be identified, and extracts the length characteristic information in the data message, so that the device for identifying the service type of the data can analyze the commonality relation between the sample data message and the data message to be identified through the length characteristic, thereby determining the service type of the data message to be identified, facilitating evaluation optimization of network capability and scene marketing analysis through identification of the service type, and achieving the effect of better identifying the encrypted data service type.
The embodiment of the application may divide the functional modules or functional units of the device for identifying the service type of the data according to the above method example, for example, each functional module or functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware, or in software functional modules or functional units. The division of the modules or units in the embodiments of the present application is merely a logic function division, and other division manners may be implemented in practice.
As shown in fig. 7, fig. 7 is a schematic structural diagram of an apparatus for identifying a service type of data according to an embodiment of the present application, where the apparatus includes:
a communication unit 302, configured to obtain m sample data packets.
Wherein, the m sample data messages are encrypted data messages of the same service type data; each of the m sample data messages includes at least one first transmission control protocol, TCP, flow, each first TCP flow including at least one length sequence; the first TCP stream comprises at least one encrypted data block, and the length sequence is the length value of the encrypted data block; m is a positive integer;
the processing unit 301 is configured to determine a length characteristic of the m sample data packets.
The length features comprise at least one first intersection, wherein the first intersection is a set meeting a first preset condition in a set of length sequences shared among m first TCP streams; the m first TCP flows include one first TCP flow in each of the m sample data messages;
the communication unit 302 is further configured to obtain a data packet to be identified, where the data packet to be identified includes at least one second TCP flow;
the processing unit 301 is further configured to determine at least one second intersection.
The second intersection corresponds to a second TCP stream in the data message to be identified, and the second intersection is an intersection with the largest length sequence number in the corresponding second TCP stream and each first intersection respectively;
the processing unit 301 is further configured to determine a valid TCP flow in the data packet to be identified.
The ratio of the number of the length sequences in the second intersection corresponding to the effective TCP stream to the number of the length sequences in the effective TCP stream is larger than or equal to the first ratio;
if the number of valid TCP flows in the data packet to be identified is greater than or equal to the second ratio, the processing unit 301 is further configured to determine that the service type of the data packet to be identified is the service type of the m sample data packets.
Optionally, the processing unit 301 is specifically configured to: step 1, selecting one first TCP stream from each data message, and determining m first TCP streams; step 2, determining intersections among m first TCP streams; the intersection among m first TCP streams comprises n length sequences, wherein n is an integer; traversing each first TCP stream in m sample data messages according to the step 1 and the step 2, and determining k intersections; k is a positive integer; determining that the intersection of the k intersections, in which the ratio of the value of the number of the length sequences to the minimum value of the number of the length sequences in the m first TCP streams is greater than a third ratio, is at least one first intersection; at least one first intersection is determined as a length characteristic of m sample data messages.
Optionally, the processing unit 301 is specifically configured to: acquiring any second TCP stream in a data message to be identified; determining intersections between any one of the second TCP streams and each of the first intersections, respectively; determining the intersection with the largest value of the length sequence number in the intersections between any second TCP stream and each first intersection as a second intersection; and traversing each second TCP stream in the data message to be identified, and determining at least one second intersection.
Optionally, the communication unit 302 is specifically configured to: generating a first instruction; the first instruction is used for indicating the target equipment to send the data message; respectively sending m first instructions to target equipment; m sample data messages are received from a target device.
Optionally, the length sequence is a length sequence in a field used for characterizing the data content in the first TCP stream or the second TCP stream; the first TCP stream and the second TCP stream are TCP streams used for representing the picture data in the data message.
Optionally, the ratio of the value of the number of length sequences of the intersection meeting the first preset condition to the minimum value of the number of length sequences in the m first TCP flows is greater than or equal to the third ratio.
When implemented in hardware, the communication unit 302 in the embodiments of the present application may be integrated on a communication interface, and the processing unit 301 may be integrated on a processor. A specific implementation is shown in fig. 8.
Fig. 8 shows a further possible structural schematic diagram of the device for identifying the traffic type of data involved in the above embodiment. The device for identifying the service type of the data comprises: a processor 402 and a communication interface 403. The processor 402 is configured to control and manage actions of the device that identifies the traffic type of the data, e.g., to perform the steps performed by the processing unit 301 described above, and/or to perform other processes of the techniques described herein. The communication interface 403 is used for supporting communication of the device identifying the traffic type of the data with other network entities, e.g. performing the steps performed by the communication unit 302 described above. The means for identifying the traffic type of the data may further comprise a memory 401 and a bus 404, the memory 401 being adapted to store program codes and data of the means for identifying the traffic type of the data.
Wherein the memory 401 may be a memory or the like in a device that identifies a traffic type of data, which may include a volatile memory, such as a random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk or solid state disk; the memory may also comprise a combination of the above types of memories.
The processor 402 described above may be implemented or executed with various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, etc.
Bus 404 may be an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The bus 404 may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 8, but not only one bus or one type of bus.
Fig. 9 is a schematic structural diagram of a chip 50 according to an embodiment of the present application. The chip 50 includes one or more (including two) processors 501 and a communication interface 503.
Optionally, the chip 50 also includes a memory 504, the memory 504 may include read only memory and random access memory, and provide operating instructions and data to the processor 501. A portion of the memory 504 may also include non-volatile random access memory (NVRAM).
In some implementations, the memory 504 stores elements, execution modules or data structures, or a subset thereof, or an extended set thereof.
In the present embodiment, the corresponding operation is performed by calling an operation instruction stored in the memory 504 (the operation instruction may be stored in the operating system).
Wherein the processor 501 may implement or execute the various exemplary logic blocks, units and circuits described in connection with the present disclosure. The processor may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, units and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, etc.
Memory 504 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk or solid state disk; the memory may also comprise a combination of the above types of memories.
Bus 502 may be an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus or the like. The bus 502 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in fig. 9, but not only one bus or one type of bus.
From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.
The present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of identifying a traffic type of data in the method embodiments described above.
The embodiment of the application also provides a computer readable storage medium, in which instructions are stored, which when executed on a computer, cause the computer to execute the method for identifying the service type of the data in the method flow shown in the method embodiment.
The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access Memory (Random Access Memory, RAM), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a register, a hard disk, an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, or any other form of computer readable storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuit, ASIC). In the context of the present application, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Embodiments of the present invention provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform a method of identifying a traffic type of data as described in fig. 3 to 6.
Since the apparatus, the computer readable storage medium, and the computer program product for identifying the service type of the data in the embodiments of the present invention can be applied to the above-mentioned method, the technical effects that can be obtained by the method can also refer to the above-mentioned method embodiments, and the embodiments of the present invention are not repeated here.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, indirect coupling or communication connection of devices or units, electrical, mechanical, or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A method of identifying a traffic type of data, comprising:
obtaining m sample data messages, wherein the m sample data messages are encrypted data messages of the same service type data; each data message of the m sample data messages includes at least one first transmission control protocol TCP flow, each of the first TCP flows including at least one length sequence; the first TCP stream comprises at least one encrypted data block, and the length sequence is the length value of the encrypted data block; m is a positive integer;
Determining the length characteristics of the m sample data messages; the length features comprise at least one first intersection, and the first intersection is a set meeting a first preset condition in a set of length sequences shared among m first TCP streams; the m first TCP flows include one first TCP flow in each of the m sample data packets;
acquiring a data message to be identified, wherein the data message to be identified comprises at least one second TCP stream;
determining at least one second intersection; the second intersection corresponds to a second TCP stream in the data message to be identified, and the second intersection is an intersection with the largest length sequence number in the intersection of the corresponding second TCP stream and each first intersection;
determining effective TCP streams in the data message to be identified; the ratio of the number of the length sequences in the second intersection corresponding to the effective TCP stream to the number of the length sequences in the effective TCP stream is larger than or equal to a first ratio;
if the number ratio of the effective TCP streams in the data message to be identified is larger than or equal to a second ratio, determining that the service type of the data message to be identified is the service type of the m sample data messages;
The determining the length characteristics of the m sample data messages includes:
step 1, selecting one first TCP stream from each data message, and determining m first TCP streams;
step 2, determining intersections among the m first TCP streams; the intersection set between the m first TCP streams comprises n length sequences, wherein n is an integer;
traversing each first TCP stream in the m sample data messages according to the step 1 and the step 2, and determining k intersections; k is a positive integer;
determining that the intersection of the k intersections, in which the ratio of the value of the number of length sequences to the minimum value of the number of length sequences in the m first TCP flows is greater than a third ratio, is the at least one first intersection;
determining the at least one first intersection as a length characteristic of the m sample data messages.
2. The method of claim 1, wherein the determining at least one second intersection comprises:
any second TCP stream in the data message to be identified is acquired;
determining intersections between the any one of the second TCP streams and each of the first intersections, respectively;
determining that the intersection with the largest value of the length sequence number in the intersections between any one of the second TCP streams and each of the first intersections is the second intersection;
Traversing each second TCP stream in the data message to be identified, and determining the at least one second intersection.
3. The method according to claim 1 or 2, wherein the obtaining m sample data messages comprises:
generating a first instruction; the first instruction is used for indicating the target equipment to send a data message;
respectively sending m first instructions to the target equipment;
and receiving m sample data messages from the target equipment.
4. The method according to claim 1 or 2, wherein the length sequence is a length sequence in a field used for characterizing data content in the first TCP stream or the second TCP stream;
the first TCP flow and the second TCP flow are TCP flows used for representing the picture data in the data message.
5. The method according to claim 1 or 2, wherein the ratio of the value of the number of length sequences of the intersection satisfying the first preset condition to the minimum value of the number of length sequences in the m first TCP streams is greater than or equal to a third ratio.
6. An apparatus for identifying a traffic type of data, comprising: a communication unit and a processing unit;
the communication unit is used for acquiring m sample data messages, wherein the m sample data messages are encrypted data messages of the same service type data; each data message of the m sample data messages includes at least one first transmission control protocol TCP flow, each of the first TCP flows including at least one length sequence; the first TCP stream comprises at least one encrypted data block, and the length sequence is the length value of the encrypted data block; m is a positive integer;
The processing unit is used for determining the length characteristics of the m sample data messages; the length features comprise at least one first intersection, and the first intersection is a set meeting a first preset condition in a set of length sequences shared among m first TCP streams; the m first TCP flows include one first TCP flow in each of the m sample data packets;
the communication unit is further configured to obtain a data packet to be identified, where the data packet to be identified includes at least one second TCP flow;
the processing unit is further configured to determine at least one second intersection; the second intersection corresponds to a second TCP stream in the data message to be identified, and the second intersection is an intersection with the largest length sequence number in the intersection of the corresponding second TCP stream and each first intersection;
the processing unit is further configured to determine an effective TCP flow in the data packet to be identified; the ratio of the number of the length sequences in the second intersection corresponding to the effective TCP stream to the number of the length sequences in the effective TCP stream is larger than or equal to a first ratio;
the processing unit is further configured to determine that the service type of the data packet to be identified is the service type of the m sample data packets if the number of effective TCP flows in the data packet to be identified is greater than or equal to a second ratio;
The processing unit is specifically configured to: step 1, selecting one first TCP stream from each data message, and determining m first TCP streams;
step 2, determining intersections among the m first TCP streams; the intersection set between the m first TCP streams comprises n length sequences, wherein n is an integer;
traversing each first TCP stream in the m sample data messages according to the step 1 and the step 2, and determining k intersections; k is a positive integer;
determining that the intersection of the k intersections, in which the ratio of the value of the number of length sequences to the minimum value of the number of length sequences in the m first TCP flows is greater than a third ratio, is the at least one first intersection;
determining the at least one first intersection as a length characteristic of the m sample data messages.
7. The apparatus according to claim 6, wherein the processing unit is specifically configured to:
any second TCP stream in the data message to be identified is acquired;
determining intersections between the any one of the second TCP streams and each of the first intersections, respectively;
determining that the intersection with the largest value of the length sequence number in the intersections between any one of the second TCP streams and each of the first intersections is the second intersection;
Traversing each second TCP stream in the data message to be identified, and determining the at least one second intersection.
8. An apparatus for identifying a traffic type of data, comprising: a processor and a communication interface; the communication interface being coupled to the processor for running a computer program or instructions to implement a method of identifying a traffic type of data as claimed in any one of claims 1 to 5.
9. A computer readable storage medium having instructions stored therein, wherein when the instructions are executed by a computer, the computer performs the method of identifying a traffic type of data as claimed in any one of claims 1-5.
CN202111131055.2A 2021-09-26 2021-09-26 Method and device for identifying service type of data Active CN113938436B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111131055.2A CN113938436B (en) 2021-09-26 2021-09-26 Method and device for identifying service type of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111131055.2A CN113938436B (en) 2021-09-26 2021-09-26 Method and device for identifying service type of data

Publications (2)

Publication Number Publication Date
CN113938436A CN113938436A (en) 2022-01-14
CN113938436B true CN113938436B (en) 2023-05-26

Family

ID=79276812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111131055.2A Active CN113938436B (en) 2021-09-26 2021-09-26 Method and device for identifying service type of data

Country Status (1)

Country Link
CN (1) CN113938436B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114724069B (en) * 2022-04-09 2023-04-07 北京天防安全科技有限公司 Video equipment model confirming method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101442541A (en) * 2008-12-30 2009-05-27 北京畅讯信通科技有限公司 Method for recognizing P2P application encipher flux
CN102164049A (en) * 2011-04-28 2011-08-24 中国人民解放军信息工程大学 Universal identification method for encrypted flow
CN103873320A (en) * 2013-12-27 2014-06-18 北京天融信科技有限公司 Encrypted flow rate recognizing method and device
CN109802924A (en) * 2017-11-17 2019-05-24 华为技术有限公司 A kind of method and device identifying encrypting traffic
CN109951347A (en) * 2017-12-21 2019-06-28 华为技术有限公司 Business recognition method, device and the network equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210045125A1 (en) * 2019-10-11 2021-02-11 Intel Corporation Multiplexing transmission types in multiple-panel user equipments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101442541A (en) * 2008-12-30 2009-05-27 北京畅讯信通科技有限公司 Method for recognizing P2P application encipher flux
CN102164049A (en) * 2011-04-28 2011-08-24 中国人民解放军信息工程大学 Universal identification method for encrypted flow
CN103873320A (en) * 2013-12-27 2014-06-18 北京天融信科技有限公司 Encrypted flow rate recognizing method and device
CN109802924A (en) * 2017-11-17 2019-05-24 华为技术有限公司 A kind of method and device identifying encrypting traffic
CN109951347A (en) * 2017-12-21 2019-06-28 华为技术有限公司 Business recognition method, device and the network equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于DNS和流量特征的业务识别系统设计;阳洋;王明森;沈为;;工业控制计算机(第07期);全文 *

Also Published As

Publication number Publication date
CN113938436A (en) 2022-01-14

Similar Documents

Publication Publication Date Title
CN110874440B (en) Information pushing method and device, model training method and device, and electronic equipment
Kuladinithi et al. Implementation of coap and its application in transport logistics
CN109495476B (en) Data stream differential privacy protection method and system based on edge calculation
CN113505882B (en) Data processing method based on federal neural network model, related equipment and medium
Santana et al. A privacy-aware crowd management system for smart cities and smart buildings
CN111563267B (en) Method and apparatus for federal feature engineering data processing
CN113938436B (en) Method and device for identifying service type of data
CN110213241A (en) A kind of data transmission method, equipment, medium and device
CN113301111A (en) Digital twinning method, edge computing device, mobile terminal and storage medium
CN107360122A (en) The method and apparatus for preventing malicious requests
CN112073444B (en) Data set processing method and device and server
KR101419437B1 (en) Method and apparatus for providing contents by selecting data acceleration algorithm
JP2022099261A (en) Hierarchical access to target area of within video frame
CN115865836A (en) Content injection using network devices
CN106878102B (en) People flow detection method and system based on network flow multi-field identification
CN108833500B (en) Service calling method, service providing method, data transmission method and server
Pan et al. An integrated data exchange platform for Intelligent Transportation Systems
CN110781066A (en) User behavior analysis method, device, equipment and storage medium
CN113094739B (en) Data processing method and device based on privacy protection and server
KR102385702B1 (en) Data analysis service method and data analysis service system using the method
CN114398975A (en) Internet of things card identification method and device
CN113094745B (en) Data transformation method and device based on privacy protection and server
US20110265184A1 (en) Security monitoring method, security monitoring system and security monitoring program
Sasián et al. A Dictionary Based Protocol over LoRa (Long Range) Technology for Applications in Internet of Things
Iglesias et al. Improving ISO 11783 file transfers into mobile farm equipments using on-the-fly data compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant