WO2021052379A1 - 一种数据流类型识别方法及相关设备 - Google Patents
一种数据流类型识别方法及相关设备 Download PDFInfo
- Publication number
- WO2021052379A1 WO2021052379A1 PCT/CN2020/115693 CN2020115693W WO2021052379A1 WO 2021052379 A1 WO2021052379 A1 WO 2021052379A1 CN 2020115693 W CN2020115693 W CN 2020115693W WO 2021052379 A1 WO2021052379 A1 WO 2021052379A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data stream
- stream type
- type
- recognition model
- information
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/20—Traffic policing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2475—Traffic characterised by specific attributes, e.g. priority or QoS for supporting traffic characterised by the type of applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2483—Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
- H04L65/403—Arrangements for multi-party communication, e.g. for conferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/762—Media network packet handling at the source
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/22—Parsing or analysis of headers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present invention relates to the field of computer technology and communication, and further relates to the application of artificial intelligence (Artificial Intelligence, AI) in the field of computer technology and communication, and in particular to a data stream type identification method and related equipment.
- AI Artificial Intelligence
- the first method is based on manual identification, which is mainly to manually configure rules to match keywords in network traffic (including header data, payload data, etc.) to identify App types.
- manual identification is time-consuming and labor-intensive, and with the dynamics brought about by the cloudification of enterprise office applications, transmission encryption is usually performed for security, which makes it difficult to manually set identification rules for application types, which is very difficult. Difficult to identify the type of application.
- the second method is to use offline learning, which is mainly to collect sample data in advance, and then manually or use third-party tools to label the sample data, and then use machine learning or neural network algorithms for the labeled sample data to offline the model training.
- the offline trained model is used to infer the application types of live network traffic.
- the trained model may not be usable; in addition, there is no specification for the implementation of private applications, and the applicable models for the same type of applications of different enterprises are different. That is to say, when the model trained above is used to identify the type of a modified, updated application or another enterprise's application, the recognition accuracy rate may not be high.
- the embodiment of the invention discloses a data stream type identification method and related equipment, which can identify the data stream type more accurately.
- an embodiment of the present application provides a data stream type identification method, which includes:
- the behavior recognition model is a report based on multiple data flow samples A model obtained from the message characteristics and data flow types; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
- the content recognition model is a model obtained according to the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model;
- the data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
- the above methods involve behavior recognition models and content recognition models.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
- the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
- the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
- this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- the determining the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level include:
- the weight value of the first confidence level, the second confidence level corresponding to the first data stream type and the second confidence level Calculating a weight value corresponding to a comprehensive confidence level of the first data stream type, where the first data stream type is any one of the at least one data stream type;
- the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
- the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
- introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
- the method further includes:
- the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold, send the characteristic information of the current data stream and the information of the first data stream type to the device, and the second data stream type
- the preset threshold is greater than the first preset threshold
- the content recognition model is updated according to the first information to obtain a new content recognition model.
- the method further includes:
- the content recognition model is updated according to the characteristic information of the current data stream and the first data stream type information to A new content recognition model is obtained, and the second preset threshold is greater than the first preset threshold.
- the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
- the feature information of the current data stream and the first data updates the content recognition model to obtain a new content recognition model, including:
- the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
- Training is performed on a plurality of records including the second record to obtain a new content recognition model.
- the data stream type in the record is updated to the current data stream
- the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
- the at least one first confidence level and the at least one first After determining the data stream type of the current data stream with the second degree of confidence, it further includes:
- the information of the data flow type of the current data flow is sent to the operation and maintenance support system OSS, and the information of the data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
- the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
- the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
- the packet length includes the Ethernet frame length, IP length, and transmission length in the packet.
- One or more of the protocol length and the header length, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
- an embodiment of the present application provides a data stream type identification method, the method including:
- the characteristic information of the current data stream and the information of the data stream type sent by the receiving device are the characteristic information of the current data stream and the information of the data stream type sent by the receiving device;
- the first information is sent to the device for updating a content recognition model, and the content recognition model is used to obtain at least one second confidence level corresponding to at least one data stream type, wherein the content recognition model is A model obtained according to the characteristic information and data stream type of one or more historical data streams, the data stream type of the historical data stream is obtained according to a behavior recognition model, and the behavior recognition model is a report based on multiple data stream samples A model obtained by the message characteristics and data flow types, where the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction.
- the above methods involve behavior recognition models and content recognition models.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
- the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
- the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
- this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- an embodiment of the present application provides a data stream type identification device, including a memory and a processor, where the memory is used to store a computer program, and the processor invokes the computer program to perform the following operations:
- the behavior recognition model is a report based on multiple data flow samples A model obtained from the message characteristics and data flow types; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
- the content recognition model is a model obtained according to the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model;
- the data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
- the above methods involve behavior recognition models and content recognition models.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
- the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
- the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
- this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- the determining the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level Specifically:
- the weight value of the first confidence level, the second confidence level corresponding to the first data stream type and the second confidence level Calculating a weight value corresponding to a comprehensive confidence level of the first data stream type, where the first data stream type is any one of the at least one data stream type;
- the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
- the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
- introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
- the device further includes a transceiver, and the processor is further configured to:
- the comprehensive confidence level corresponding to the first data stream type is less than a second preset threshold, the characteristic information of the current data stream and the information of the first data stream type are sent to other devices through the transceiver.
- the second preset threshold is greater than the first preset threshold
- the content recognition model is updated according to the first information to obtain a new content recognition model.
- the processor is further configured to:
- the content recognition model is updated according to the characteristic information of the current data stream and the first data stream type information to A new content recognition model is obtained, and the second preset threshold is greater than the first preset threshold.
- the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
- the feature information of the current data stream and the first data updates the content recognition model to obtain a new content recognition model, specifically:
- the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
- Training is performed on a plurality of records including the second record to obtain a new content recognition model.
- the data stream type in the record is updated to the current data stream
- the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
- the device further includes a transceiver, and the processor is further configured to: After the data flow type of the current data flow is determined according to the at least one first confidence level and the at least one second confidence level, the transceiver sends the information of the current data flow to the operation and maintenance support system OSS.
- the information of the data flow type the information of the data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
- the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
- the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
- the packet length includes the Ethernet frame length, IP length, and transmission length in the packet.
- One or more of the protocol length and the header length, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
- an embodiment of the present application provides a data stream type identification device, including a memory, a processor, and a transceiver, where the memory is used to store a computer program, and the processor invokes the computer program to perform the following operations :
- the first information is sent to the other device through the transceiver to be used to update a content recognition model, and the content recognition model is used to obtain at least one second confidence level corresponding to at least one data stream type, wherein,
- the content recognition model is a model obtained based on the feature information of one or more historical data streams and the data stream type, the data stream type of the historical data stream is obtained according to the behavior recognition model, and the behavior recognition model is based on multiple A model obtained by message characteristics and data flow types of a data flow sample, where the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction.
- the behavior recognition model and the content recognition model are involved.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the feature information of the data stream and the behavior model.
- the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
- the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
- this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- an embodiment of the present application provides a data stream type identification device, which includes:
- the first identification unit is configured to obtain at least one first confidence level corresponding to at least one data flow type of the current data flow according to the message characteristics and behavior identification model of the current data flow, wherein the behavior identification model is based on A model obtained from the message characteristics and data flow types of multiple data flow samples; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
- the second recognition unit is configured to obtain at least one second confidence level of the current data stream corresponding to the at least one data stream type according to the characteristic information of the current data stream and the content recognition model, wherein the characteristic information Including the destination address and the protocol type, the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model ;
- the determining unit is configured to determine the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
- the above-mentioned equipment involves a behavior recognition model and a content recognition model.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
- the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
- the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- the determining unit is configured to determine the current data stream according to the at least one first confidence level and the at least one second confidence level
- the data stream type specifically:
- the first confidence level corresponding to the first data stream type It is used to calculate the first confidence level corresponding to the first data stream type, the weight value of the first confidence level, the second confidence level corresponding to the first data stream type, and the second confidence level. Calculation of the weight value of the degree corresponding to the comprehensive confidence of the first data stream type, where the first data stream type is any one of the at least one data stream type;
- the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
- the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
- introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
- the device further includes:
- the first sending unit is configured to send the characteristic information of the current data stream and the first data stream to other devices when the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold.
- a receiving unit configured to receive first information sent by the other device, where the first information is obtained by the other device according to the characteristic information of the current data stream and the identification information of the first data stream type;
- the update unit is used to update the content recognition model according to the first information to obtain a new content recognition model.
- the third possible implementation manner of the fifth aspect further includes:
- An update unit configured to update according to the characteristic information of the current data stream and the first data stream type information when the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold
- the content recognition model is used to obtain a new content recognition model, and the second preset threshold is greater than the first preset threshold.
- the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
- the feature information of the current data stream and the first data updates the content recognition model to obtain a new content recognition model, specifically:
- the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
- Training is performed on a plurality of records including the second record to obtain a new content recognition model.
- the data stream type in the record is updated to the current data stream
- the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
- the device further includes:
- the second sending unit is configured to send the data stream type of the current data stream to the operation and maintenance support system OSS after the determining unit determines the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
- the information of the data flow type of the current data flow, and the information of the data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
- the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
- the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
- the packet length includes the Ethernet frame length, IP length, and transmission length in the packet.
- One or more of the protocol length and the header length, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
- an embodiment of the present application provides a data stream type identification device, which includes:
- the receiving unit is used to receive the characteristic information of the current data stream and the information of the data stream type sent by other devices;
- a generating unit configured to generate first information according to the characteristic information of the current data stream and the information of the data stream type
- the sending unit is configured to send the first information to the other device for updating a content recognition model, and the content recognition model is used to obtain at least one second confidence level corresponding to at least one data stream type, wherein,
- the content recognition model is a model obtained based on the feature information of one or more historical data streams and the data stream type, the data stream type of the historical data stream is obtained according to the behavior recognition model, and the behavior recognition model is based on multiple A model obtained by message characteristics and data flow types of a data flow sample, where the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction.
- the above-mentioned equipment involves a behavior recognition model and a content recognition model.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
- the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
- the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and when it runs on a processor, it realizes the first aspect or any possibility of the first aspect The method described in the implementation.
- embodiments of the present application provide a computer program product, the computer program product is stored in a memory, and when the computer program product runs on a processor, the first aspect or any possible aspect of the first aspect is realized. Implement the method described in the method.
- an embodiment of the present application provides a data stream type identification system.
- the system includes a first device and a second device, wherein the second device is the third aspect described above, or any possibility of the third aspect Or the data stream type identification device described in the fifth aspect or any possible implementation of the fifth aspect; the first device is the fourth aspect above, or any possible implementation of the fourth aspect Or the data stream type identification device described in the sixth aspect or any possible implementation manner of the sixth aspect.
- the behavior recognition model and the content recognition model are involved.
- the behavior recognition model is obtained by pre-training the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the data stream type.
- the data stream type recognized by the behavior model is trained, so the content recognition model is an online learning model; the behavior recognition model can identify some basic (or typical) data stream types, and certain characteristics of the same type of data stream (Such as message length, message transmission speed, etc.) may change in the subsequent transmission process, so the online learning feature of the content recognition model is used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.) , Can hedge the recognition error of the data stream type of the data stream by the behavior recognition model in the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition of the data stream type of the data stream. The accuracy also improves the generalization of recognition, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- FIG. 1 is a schematic structural diagram of a data stream type identification system provided by an embodiment of the present invention
- 2A is a schematic diagram of a scene of a content recognition model and a behavior recognition model provided by an embodiment of the present invention
- 2B is a schematic structural diagram of a classification model provided by an embodiment of the present invention.
- 2C is a schematic structural diagram of a classification model provided by an embodiment of the present invention.
- 2D is a schematic structural diagram of a classification model provided by an embodiment of the present invention.
- FIG. 3 is a schematic flowchart of a data stream type identification method provided by an embodiment of the present invention.
- FIG. 4 is a schematic diagram of scenes of data stream a and data stream b provided by an embodiment of the present invention.
- FIG. 5 is a schematic diagram of an example of data stream recording provided by an embodiment of the present invention.
- FIG. 6 is a schematic diagram of an example of data stream recording provided by an embodiment of the present invention.
- FIG. 7 is a schematic structural diagram of a second device provided by an embodiment of the present invention.
- FIG. 8 is a schematic structural diagram of a first device provided by an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of yet another second device provided by an embodiment of the present invention.
- FIG. 10 is a schematic structural diagram of still another first device according to an embodiment of the present invention.
- Figure 1 is a schematic structural diagram of a data stream type identification system provided by an embodiment of the present invention.
- the system includes an operational system support (OSS) 101, a server 102, a forwarding device 103, and a terminal. 104.
- OSS operational system support
- the terminals 104 are used to run various applications, such as video conferencing applications, voice conferencing applications, desktop cloud applications, etc.
- the data stream types (also called application types) of the data streams generated by different applications are often different. In the embodiment of the present application, the data stream generated by the terminal 104 needs to be sent to the destination device through the forwarding device 103 first.
- the forwarding device may include routers, switches, etc., and the number of forwarding devices 103 may be one or more, for example, , There is one router and three switches; another example, there is only one switch; another example, there are three switches, and so on.
- the aforementioned server 102 may be one server or a server cluster composed of multiple servers.
- how the data stream generated by the terminal 104 should be sent on the terminal 104 and how it should be forwarded on the forwarding device 103 can be performed in accordance with the flow control strategy generated by OSS101.
- the flow control strategy specifies a video conference application
- the generated data stream has the highest priority
- the data streams generated by the video conferencing application will be transmitted first.
- Data flow It should be noted that the flow control strategy is generated by OSS101 based on the data flow type of the current data flow.
- the data flow type of the current data flow used by the OSS 101 to generate the flow control policy is determined by the second device.
- the second device needs to use a behavior recognition model and a content recognition model when determining the data stream type of the current data stream.
- the parameters of the model may include, but are not limited to, confidence weight vectors (w1, w2)
- the first preset threshold ⁇ 1 and the second preset threshold ⁇ 2 of the data stream type where the first preset threshold may also be called a classification threshold, which is used to measure whether the data stream type is classified into a certain category; the second preset threshold
- the set threshold is also called the model update threshold, which is used to measure when to update the content recognition model.
- the input of the content recognition model when recognizing the current data stream's data stream type can include characteristic information (such as destination IP, destination port, protocol type, etc.), and the behavior recognition model when recognizing the current data stream's data stream type
- the input can include message information (such as message length, message transmission speed, message interval time, message direction, etc.).
- the confidence obtained by the second device on the content recognition model and the confidence obtained by the behavior recognition model is based on the confidence weight vector (w1, w2) to obtain the final data stream type; in this process, if the content is determined by the second preset threshold ⁇ 2 If the recognition model needs to be updated, the parameters needed to update the content recognition model are obtained.
- the parameters required for the update may be obtained by the second device through training on some data stream related information, or the first device may be obtained by training on some data stream related information, and then sent to the first device. Two equipment.
- the second device may be the aforementioned OSS101, the aforementioned server 102, or the aforementioned forwarding device 103; in addition, the first device can be the aforementioned OSS101, the aforementioned server 102, or It is the aforementioned forwarding device 103.
- the first device and the second device may be the same device or different devices.
- the server 102 does not exist in the architecture shown in FIG. 1.
- the above content recognition model is essentially a classification model.
- the classification model can be a tree model, as shown in Figure 2C, and the classification model can also be a neural network model, as shown in Figure 2D.
- the classification model can also be a support vector machine (SVM) model, and the classification model can also be other forms of models.
- SVM support vector machine
- the content recognition model is classified by extracting the characteristics of the input vector (such as the destination Internet protocol (IP) address, destination port number, protocol type and other characteristic information)
- the content recognition model Data streams with the same destination IP address, same protocol type, and same port number can be identified as the same data stream type; data streams with the same network segment, same protocol type, and similar port number can be identified as the same data stream type; the destination The port number is 20 (a well-known port number of the file transfer protocol (file transfer protocol, FTP)).
- the transmission control protocol transmission control protocol, TCP
- TCP transmission control protocol
- some information about the current data stream that the second device needs to use when identifying the data stream type of the current data stream can be sent to the second device by the terminal 104 or other devices, or collected by the second device itself.
- Figure 3 is a data stream type identification method provided by an embodiment of the present invention.
- the method can be implemented based on the architecture shown in Figure 1.
- the method includes but is not limited to the following steps:
- Step S301 The second device obtains at least one first confidence level of the current data flow corresponding to at least one data flow type according to the message characteristics and behavior recognition model of the current data flow.
- the behavior recognition model is a model obtained based on the message characteristics and data stream types of multiple data stream samples; optionally, the multiple data stream samples may be offline samples, that is, the behavior recognition model may be offline The trained model.
- the multiple data stream samples may also be pre-selected typical (or representative) samples.
- the data stream packets of a video conference application usually have a relatively long packet length, but occasionally there may be packets. When the message length is relatively short, in comparison, the relatively long message length can better reflect that the current data stream is the data stream of the video conference application. Therefore, when selecting the data stream for the video conference application, try to choose the message length comparison The long one is representative as a data stream sample.
- the data stream type of the multiple data stream samples may be considered to be determined, that is, manual labeling.
- the behavior recognition model is a model obtained based on the message characteristics and data flow types of multiple data flow samples, the behavior recognition model can reflect some relationships between the message characteristics in a data flow and the data flow type, Therefore, when the message characteristics of the current data flow are input to the behavior recognition model, it can predict to a certain extent the tendency (or probability) that the current data flow belongs to a certain or certain data flow types, and reflect the tendency (or The parameter of probability) can also be called confidence.
- the message characteristics may include one or more of message length, message transmission speed, message interval time, and message direction.
- the message length includes the message One or more of the Ethernet frame length, the IP length, the transmission protocol length, and the header length, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
- message characteristics can also include other characteristics, such as message length, message transmission speed, message interval time, and maximum, minimum, mean, variance, and variance in the message direction. Quantile etc.
- the message feature can be input into the behavior recognition model in the form of a vector, for example, it can be in the form of (message length, message transmission speed, message interval time).
- the data stream type in the embodiment of the present application may also be referred to as an application type.
- N there may be N data stream types, and N is greater than or equal to 1.
- This embodiment of the application can estimate (or predict) the confidence that the current data stream belongs to each of the N data stream types , That is, the N first confidence levels corresponding to the N data stream types of the current data stream are obtained.
- the N data stream types refer to the data stream type of a video conference and the data stream of a voice conference Type, the data stream type of the desktop cloud
- the behavior recognition model needs to be used to estimate the first confidence that the current data stream belongs to the data stream type of the video conference, the first confidence of the data stream type of the voice conference, and the first confidence of the data stream type of the desktop cloud.
- the first confidence level of the data stream type If the N data stream types refer to the data stream type of the video conference, the behavior recognition model needs to be used to estimate the first confidence that the current data stream belongs to the data stream type of the video conference.
- the embodiment of this application focuses on one of the data stream types. Therefore, the embodiment of this application only estimates (or predicts) that the current data stream belongs to the data stream type that is the focus of attention.
- the confidence level of the current data stream corresponding to a data stream type is obtained. For example, if the multiple data stream types refer to the data stream type of the video conference and the data of the voice conference The stream type is the data stream type of the desktop cloud.
- the embodiment of the present application only focuses on the data stream type of the video conference. Therefore, it is only necessary to estimate the first confidence that the current data stream belongs to the data stream type of the video conference through the behavior recognition model.
- Step S202 The second device obtains at least one second confidence level of the current data stream corresponding to the at least one data stream type according to the characteristic information of the current data stream and the content recognition model.
- the content recognition model is a model obtained based on the characteristic information and data stream types of one or more historical data streams.
- the one or more historical data streams may be online data streams, that is, one or more data streams continuously generated in a period of time before that, and the data stream type of the historical data stream is identified by the above behavior Model recognition, that is, the content recognition model can be a model obtained through online training.
- the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type
- the content recognition model can reflect some relationships between the characteristic information in a data stream and the data stream type, Therefore, when the characteristic information of the current data stream is input to the content recognition model, it can predict to a certain extent the tendency (or probability) that the current data stream belongs to a certain or certain data stream types, and reflect the tendency (or probability)
- the parameter of) can also be called confidence.
- the feature information may include one or more of the destination address, protocol type, and port number.
- the destination address may be an IP address, a destination MAC address, or other forms. Address;
- the feature information can include other features in addition to the features exemplified here.
- the feature information here may be target-specific information, for example, target IP, target port, and so on.
- the feature information can be input into the content recognition model in the form of a vector, and can be in the form of (ip, port, protocol), for example (10.29.74.5, 8443, 6). It can also be in the form of (mac, port, protocol), for example (05FA1525EEFF, 8443, 6). Of course, it can also be in other forms, and I will not give examples one by one here.
- the embodiment of this application can estimate (or predict) the confidence that the current data stream belongs to each of the N data stream types, that is, obtain the current The N second confidence levels of the data stream corresponding to the N data stream types.
- the content recognition model needs to be used to estimate the second confidence that the current data stream belongs to the data stream type of the video conference, the second confidence of the data stream type of the voice conference, and the second confidence of the data stream type of the desktop cloud. Confidence. If the N data stream types refer to the data stream type of the video conference, the content recognition model needs to be used to estimate the second confidence that the current data stream belongs to the data stream type of the video conference.
- the embodiment of this application focuses on one of the data stream types. Therefore, the embodiment of this application only estimates (or predicts) that the current data stream belongs to the data stream type that is the focus of attention.
- the confidence level of the current data stream corresponding to a data stream type is obtained. For example, if the multiple data stream types refer to the data stream type of a video conference and the data of a voice conference The stream type is the data stream type of the desktop cloud.
- the embodiment of the present application only focuses on the data stream type of the video conference. Therefore, it is only necessary to estimate the second confidence that the current data stream belongs to the data stream type of the video conference through the content recognition model.
- Step S303 The second device determines the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
- At least one first degree of confidence can characterize the data type tendency of the current data stream to a certain extent
- at least one second confidence degree can also characterize the data stream type tendency of the current data stream to a certain extent, so the difference between the two Comprehensive consideration can obtain a more accurate and credible data stream type tendency, and thus the data stream type of the current data stream can be obtained.
- the data stream type is determined as the data stream type of the current data stream, for example, according to the first confidence of the data stream type of the video conference
- the overall confidence level of the data stream type of the video conference determined by the second confidence level of the data stream type of the video conference is 0.7; according to the first confidence level of the data stream type of the voice conference and the second confidence level of the data stream type of the voice conference
- the comprehensive confidence of the data stream type of the voice conference determined by the second confidence is 0.2;
- the data of the desktop cloud is determined according to the first confidence of the data stream type of the desktop cloud and the second confidence of the data stream type of the desktop cloud
- the comprehensive confidence level of the stream type is 0.1; since the comprehensive confidence level of the data stream type of the video conference is the largest, the data stream type of the current data stream is determined as the data stream type of the video conference.
- the determining the data stream type of the current data stream according to at least one first confidence level and at least one second confidence level may be specifically: according to all data corresponding to the first data stream type.
- the calculation of the first confidence level, the weight value of the first confidence level, the second confidence level corresponding to the first data stream type, and the weight value of the second confidence level corresponds to the first data
- the comprehensive confidence level of the stream type, the first data stream type is any one of the at least one data stream type, that is, each data stream type in the at least one data stream type satisfies the first data stream type here.
- the characteristics of a data stream type is any one of the at least one data stream type, that is, each data stream type in the at least one data stream type satisfies the first data stream type here.
- the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type, for example, if it corresponds to a video conference If the comprehensive confidence level of the data stream type is greater than the first preset threshold, the data stream type of the current data stream is determined to be the data stream type of the video conference; if the comprehensive confidence level corresponds to the data stream type of the desktop cloud If the degree is greater than the first preset threshold, it is determined that the data flow type of the current data flow is the data flow type of the desktop cloud.
- the confidence weight vector (w1, w2) is (0.4, 0.6)
- the first confidence weight which can also be regarded as the weight of the behavior recognition model
- the second confidence weight which can also be regarded as Is the weight of the content recognition model
- the first preset threshold ⁇ 1 of the data stream type is equal to 0.5.
- the horizontal axis represents the sequence number of the data stream, and the vertical axis represents the message length.
- the message length is greater than 0 for an uplink message, and the message length is less than 0 for a downlink message;
- Both data stream a and data stream b are data stream types of the desktop cloud.
- data stream b has upstream packets, which are relatively more representative of the characteristics of the desktop cloud scene. Therefore, the message behavior of data stream b is considered to be typical the behavior of.
- Data flow a has no uplink messages for a long period of time, and it cannot be clearly indicated that it is a desktop cloud scenario.
- the message behavior of data flow a is considered to be an atypical behavior; the behavior recognition model is usually based on the data flow of typical behaviors.
- the behavior recognition model can identify the data stream type of data stream b, but cannot identify the data stream type of data stream a.
- the characteristic information of data stream a and data stream b is as follows.
- the protocol type of data stream a is TCP, the destination IP address is 10.129.74.5, and the destination port number is 8443.
- the protocol type of data stream b is TCP, the destination IP address is 10.129.56.39, and the destination port number is 443.
- the second confidence of the content recognition model that the data stream type of the desktop cloud, the data stream type of the voice conference, and the data stream type of the video conference are all 0.
- the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0.5, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
- One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
- the content recognition model recognizes that the data stream type of the desktop cloud, the data stream type of the voice conference, and the second confidence of the data stream type of the video conference are all 0.
- the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0.9, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
- One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
- the content recognition model can also be updated. Two different update solutions are provided below.
- Solution 1 If the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold and less than a second preset threshold ⁇ 2, the second device sends the characteristics of the current data stream to the first device Information and information of the first data stream type, the second preset threshold is greater than the first preset threshold. For example, for data stream a, the comprehensive confidence of 0.3 corresponding to the data stream type of the desktop cloud is not within the interval ( ⁇ 1, ⁇ 2), so there is no need to send the characteristic information of the current data stream and the desktop cloud to the first device.
- Information about the data stream type for example, for data stream b, the comprehensive confidence of 0.54 corresponding to the data stream type of the desktop cloud is not within the interval ( ⁇ 1, ⁇ 2), so the current data stream needs to be sent to the first device Features information (such as destination IP address 10.129.56.39, destination port number 443, protocol type TCP) and desktop cloud data stream type information (such as name, identification, etc.).
- the first device receives the characteristic information of the current data stream and the information of the first data stream type sent by the second device, that is, there is one more data stream record on the first device, as shown in Figure 5, more A record for data stream b. Then, the first device obtains the first information according to the characteristic information of the current data stream and the information of the first data stream type, for example, some parameters that affect the calculation confidence.
- the first information belongs to a model file, so it can be transmitted as a model file.
- the ai model file of the common open source keras library is an h5 file/json file;
- the ai model file of the open source sklearn library is a pkl/m file.
- the second device receives the first information sent by the first device, and then updates the content recognition model according to the first information to obtain a new content recognition model.
- the destination IP address is 10.129.56.39
- the destination port number is 443
- the protocol type is TCP
- the updated content recognition model reconfirms the input data
- the estimated second confidence level of the data flow type corresponding to the desktop cloud is 1.
- the characteristic information of data stream a is similar to the characteristic information of data stream b, for example, the destination IP address is in the same network segment, the port number is similar, and the protocol type is similar. Therefore, the updated content recognition model When the input data stream a is estimated, the estimated result will be closer to the estimation result of the data stream b.
- the estimated second confidence level of the data stream type corresponding to the desktop cloud may be 0.6.
- the confidence weight vector (w1, w2), the first preset threshold ⁇ 1, and the second preset threshold ⁇ 2 remain unchanged.
- the second confidence level of the content recognition model identifying the data stream type of the desktop cloud is 0.6, and the second confidence level of identifying the data stream type of the voice conference and the data stream type of the video conference is 0.
- the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0.5, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
- One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
- the second confidence level of 0.54 for identifying the data stream type of the desktop cloud is within the interval ( ⁇ 1, ⁇ 2), it is necessary to send the characteristic information of the current data stream and the information of the data stream type of the desktop cloud to the first device to It is used to update the content recognition model later (the update principle has been introduced before, so I won't repeat it here).
- the second confidence level of the content recognition model identifying the data stream type of the desktop cloud is 1.
- the second confidence level of identifying the data stream type of the voice conference and the data stream type of the video conference are both Is 0.
- the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0.9, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
- One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
- the destination IP address is 10.129.56.39
- the destination port number is 443
- the cloud resource with the protocol type TCP is changed from providing services for desktop clouds to providing services for video conferences.
- the first information obtained by the above-mentioned first device according to the characteristic information of the current data stream and the information of the first data stream type may include: if the characteristic information of the first record among the multiple records is related to the characteristic information of the first data stream.
- the feature information of the current data stream is the same but the data stream type of the first record is different from the first data stream type, then the data stream type of the first record is updated to the first data stream type to obtain The second record; each of the multiple records includes feature information and data stream type; then the multiple records including the second record are trained to obtain the first information.
- the protocol type of data stream c is TCP
- the destination IP address is 10.129.56.39
- the destination port number is 443.
- the second confidence level of the content recognition model identifying the data stream type of the desktop cloud is 1.
- the second confidence level of identifying the data stream type of the voice conference and the data stream type of the video conference are both Is 0.
- the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0, the first confidence of the recognition is the data stream type of the voice conference is 0, and the recognition is the first confidence of the data stream type of the video conference
- the confidence level is 0.9. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
- the first device receives the characteristic information of the current data stream and the information of the first data stream type sent by the second device, that is, there is a data stream record on the first device, as shown in Figure 6, more A record for data stream c.
- Solution 2 If the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold, the second device does not need to send the characteristic information of the current data stream and the information of the first data stream type to the first data stream type. For a device, it updates the content recognition model based on the feature information of the current data stream and the first data stream type information to obtain a new content recognition model, and the second preset threshold is greater than all The first preset threshold.
- the second device The data stream type of the first record is updated to the first data stream type to obtain a second record; each record in the plurality of records includes feature information and data stream type; the pair includes the second data stream type;
- the recorded multiple records are trained to obtain a new content recognition model.
- the specific principle can refer to the first solution above, and the second device can replace one of the operations performed by the first device in the above solution.
- Step S304 The second device sends the data stream type information of the current data stream to the operation and maintenance support system OSS.
- the second device may send the data stream type information of the current data stream to the OSS every time the data stream type of the current data stream is determined, for example, when the data stream type of data stream a is generated for the first time, OSS sends information about the data stream type of data stream a.
- the data stream type of data stream b When the data stream type of data stream b is generated for the first time, it sends the data stream type information of data stream b to OSS, and when the data stream type of data stream a is generated for the second time , Send the data stream type information of data stream a to OSS, when the data stream type of data stream b is generated for the second time, send the data stream type information of data stream b to OSS; when the data stream type of data stream c is generated, Send the data stream type information of the data stream c to the OSS. It is understandable that if the second device is the OSS, the above step S304 does not need to be executed.
- Step S305 The OSS generates a flow control strategy for the current data flow according to the information of the data flow type of the current data flow. For example, if the data stream type information of the current data stream indicates that the current data stream is a data stream type of a video desktop cloud or a data stream type of a video conference, the current data stream is defined as a high-priority QoS.
- Step S306 The OSS sends the flow control strategy to the forwarding device or terminal.
- the forwarding device 103 may be a device such as a router or a switch
- the terminal 104 is a device that outputs the foregoing current data stream. If the forwarding device or terminal learns that the current data stream belongs to the high-priority QoS according to the flow control policy, when it finds that there are multiple data streams to be sent, the current data stream configured as high-priority is sent first.
- the method described in Figure 3 involves a behavior recognition model and a content recognition model.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristics of the data stream.
- Information and the data stream type identified by the behavior model are trained, so the content recognition model is an online learning model; the behavior recognition model can identify some basic (or typical) data stream types, and the same type of data stream.
- Some features such as message length, message transmission speed, etc.
- the data stream is affected by other aspects (such as destination address, protocol type, etc.).
- Type recognition can hedge the recognition error of the data stream type of the data stream by the behavior recognition model during the data stream transmission process.
- this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the data stream of the data stream.
- the accuracy of type recognition also improves the generalization of recognition, which can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- FIG. 7 is a schematic structural diagram of a data stream type identification device 70 according to an embodiment of the present invention.
- the device 70 may be the second device in the method embodiment shown in FIG. A recognition unit 701, a second recognition unit 702, and a determination unit 703, wherein the detailed description of each unit is as follows.
- the first identification unit 701 is configured to obtain at least one first confidence level of the current data flow corresponding to at least one data flow type according to the message characteristics and behavior identification model of the current data flow, wherein the behavior identification model is A model obtained according to the message characteristics and data flow types of multiple data flow samples; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
- the second recognition unit 702 is configured to obtain at least one second confidence level of the current data stream corresponding to the at least one data stream type according to the characteristic information of the current data stream and the content recognition model, wherein the characteristic The information includes the destination address and the protocol type, the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model of;
- the determining unit 703 is configured to determine the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
- the above-mentioned equipment involves a behavior recognition model and a content recognition model.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
- the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
- the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- the determining unit 703 is configured to determine the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level, specifically:
- the first confidence level corresponding to the first data stream type It is used to calculate the first confidence level corresponding to the first data stream type, the weight value of the first confidence level, the second confidence level corresponding to the first data stream type, and the second confidence level. Calculation of the weight value of the degree corresponding to the comprehensive confidence of the first data stream type, where the first data stream type is any one of the at least one data stream type;
- the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
- the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
- introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
- the device 70 further includes:
- the first sending unit is configured to send to other devices (the first device in the method embodiment shown in FIG. 3) when the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold. ) Sending the characteristic information of the current data stream and the information of the first data stream type, and the second preset threshold is greater than the first preset threshold;
- the receiving unit is configured to receive first information sent by the other device (the first device in the method embodiment shown in FIG. 3), where the first information is the other device (the method embodiment shown in FIG. 3)
- the first device in is obtained according to the characteristic information of the current data stream and the identification information of the first data stream type;
- the update unit is used to update the content recognition model according to the first information to obtain a new content recognition model.
- the device 70 further includes:
- An update unit configured to update according to the characteristic information of the current data stream and the first data stream type information when the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold
- the content recognition model is used to obtain a new content recognition model, and the second preset threshold is greater than the first preset threshold.
- the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
- the updating the content recognition model according to the characteristic information of the current data stream and the first data stream type information to obtain a new content recognition model is specifically:
- the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
- Training is performed on a plurality of records including the second record to obtain a new content recognition model.
- the data stream type in the record is updated to the current data stream
- the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
- the device 70 further includes:
- the second sending unit is configured to send the data stream type of the current data stream to the operation and maintenance support system OSS after the determining unit determines the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
- the information of the data flow type of the current data flow, and the information of the data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
- the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
- the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
- the message length includes one or more of the Ethernet frame length, IP length, transmission protocol length, and header length in the message
- the transmission protocol includes transmission control protocol TCP and / Or User Datagram Protocol UDP.
- each unit may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
- FIG. 8 is a schematic structural diagram of a data stream type identification device 80 provided by an embodiment of the present invention.
- the device 80 may be the first device in the method embodiment shown in FIG.
- the receiving unit 801 is configured to receive the characteristic information of the current data stream and the information of the data stream type sent by other devices (the second device in the method embodiment shown in FIG. 3);
- the generating unit 802 is configured to generate first information according to the characteristic information of the current data stream and the information of the data stream type;
- the sending unit 803 is configured to send the first information to the other device (the second device in the method embodiment shown in FIG. 3), so as to update the content recognition model, and the content recognition model is used to obtain the corresponding At least one second confidence level for at least one data stream type, wherein the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream It is obtained according to a behavior recognition model, which is a model obtained according to the message characteristics and data flow types of multiple data flow samples, and the message characteristics include message length, message transmission speed, and message interval One or more of time and message direction.
- a behavior recognition model and a content recognition model are involved.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior
- the data stream type identified by the model is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data stream ( For example, the length of the message, the transmission speed of the message, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model is used to identify the type of the data stream from other aspects (such as destination address, protocol type, etc.).
- this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the accuracy of the recognition of the data stream type of the data stream. It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- each unit may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
- FIG. 9 is a device 90 provided by an embodiment of the present invention.
- the device 90 may be the second device in the method embodiment shown in FIG. 3.
- the device 90 includes a processor 901, a memory 902, and a transceiver. 903.
- the processor 901, the memory 902, and the transceiver 903 are connected to each other through a bus.
- the memory 902 includes but is not limited to random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or A portable read-only memory (compact disc read-only memory, CD-ROM), the memory 902 is used for related computer programs and data.
- the transceiver 903 is used to receive and send data.
- the processor 901 may be one or more central processing units (CPUs).
- CPUs central processing units
- the processor 901 is a CPU
- the CPU may be a single-core CPU or a multi-core CPU.
- the processor 901 reads the computer program code stored in the memory 902, and is used to perform the following operations:
- the behavior recognition model is a report based on multiple data flow samples A model obtained from the message characteristics and data flow types; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
- the content recognition model is a model obtained according to the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model;
- the data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
- the behavior recognition model and the content recognition model are involved.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the feature information of the data stream and the behavior model.
- the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
- the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
- this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- the determining the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level is specifically:
- the weight value of the first confidence level, the second confidence level corresponding to the first data stream type and the second confidence level Calculating a weight value corresponding to a comprehensive confidence level of the first data stream type, where the first data stream type is any one of the at least one data stream type;
- the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
- the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
- introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
- the processor is further configured to:
- the transceiver is used to send the data to other devices (the first device in the method embodiment shown in FIG. 3).
- the characteristic information of the current data stream and the information of the first data stream type, the second preset threshold is greater than the first preset threshold;
- the first information sent by the other device (the first device in the method embodiment shown in FIG. 3) is received through the transceiver, and the first information is the other device (the method embodiment shown in FIG. 3)
- the first device in is obtained according to the characteristic information of the current data stream and the identification information of the first data stream type;
- the content recognition model is updated according to the first information to obtain a new content recognition model.
- the processor is further configured to:
- the content recognition model is updated according to the characteristic information of the current data stream and the first data stream type information to A new content recognition model is obtained, and the second preset threshold is greater than the first preset threshold.
- the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
- the updating the content recognition model according to the characteristic information of the current data stream and the first data stream type information to obtain a new content recognition model is specifically:
- the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
- Training is performed on a plurality of records including the second record to obtain a new content recognition model.
- the data stream type in the record is updated to the current data stream
- the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
- the processor is further configured to: after the determining the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level , Sending the information of the data stream type of the current data stream to the operation and maintenance support system OSS through the transceiver, and the information of the data stream type of the current data stream is used by the OSS to generate information for the current data stream The flow control strategy.
- the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
- the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
- the message length includes one or more of the Ethernet frame length, IP length, transmission protocol length, and header length in the message
- the transmission protocol includes transmission control protocol TCP and / Or User Datagram Protocol UDP.
- each operation may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
- FIG. 10 is a device 100 provided by an embodiment of the present invention.
- the device 100 may be the first device in the method embodiment shown in FIG. 3.
- the device 100 includes a processor 1001, a memory 1002, and a transceiver. 1003.
- the processor 1001, the memory 1002, and the transceiver 1003 are connected to each other through a bus.
- the memory 1002 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or A portable read-only memory (compact disc read-only memory, CD-ROM), the memory 1002 is used for related computer programs and data.
- the transceiver 1003 is used to receive and send data.
- the processor 1001 may be one or more central processing units (CPUs).
- CPUs central processing units
- the processor 1001 is a CPU
- the CPU may be a single-core CPU or a multi-core CPU.
- the processor 1001 reads the computer program code stored in the memory 1002, and is used to perform the following operations:
- the first information is sent to the other device (the second device in the method embodiment shown in FIG. 3) through the transceiver to update the content recognition model, and the content recognition model is used to obtain information corresponding to At least one second confidence level of at least one data stream type, wherein the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is Obtained according to the behavior recognition model, the behavior recognition model is a model obtained according to the message characteristics and data flow types of multiple data flow samples, and the message characteristics include message length, message transmission speed, and message interval time And one or more of the message direction.
- the behavior recognition model and the content recognition model are involved.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the feature information of the data stream and the behavior model.
- the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
- the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
- this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- each operation may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
- any of the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
- the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the connection relationship between the modules indicates that there is a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art can understand and implement it without creative work.
- An embodiment of the present invention also provides a chip system, the chip system includes at least one processor, a memory, and an interface circuit.
- the memory, the transceiver, and the at least one processor are interconnected by wires, and the at least one memory
- a computer program is stored therein; when the computer program is executed by the processor, the method flow shown in FIG. 3 is realized.
- the embodiment of the present invention also provides a computer-readable storage medium in which a computer program is stored, and when it runs on a processor, the method flow shown in FIG. 3 is implemented.
- the embodiment of the present invention also provides a computer program product.
- the computer program product runs on a processor, the method flow shown in FIG. 3 is realized.
- this application relates to a behavior recognition model and a content recognition model.
- the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the feature information and data stream characteristics.
- the data stream type identified by the behavior model is trained, so the content recognition model is an online learning model; the behavior recognition model can identify some basic (or typical) data stream types, some of the same type of data stream.
- the characteristics (such as message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning characteristics of the content recognition model are combined with other aspects (such as destination address, protocol type, etc.) to determine the type of the data stream.
- Recognition can hedge the recognition error of the data stream type of the data stream by the behavior recognition model in the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the data stream type of the data stream.
- the recognition accuracy also improves the generalization of recognition, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
- the computer program can be stored in a computer readable storage medium.
- the computer program During execution, it may include the processes of the foregoing method embodiments.
- the aforementioned storage media include: ROM or random storage RAM, magnetic disks or optical discs and other media that can store computer program codes.
- the first device, the first confidence level, the first data stream type, the first preset threshold, the first information, and the "first" in the first record mentioned in the embodiment of the present invention are only used for name identification, not Represents the first in the order. This rule also applies to “second”, “third” and “fourth”. However, the "first” in the first identifier mentioned in the embodiment of the present invention represents the first in order. This rule also applies to the "Nth”.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (23)
- 一种数据流类型识别方法,其特征在于,包括:根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度,其中,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度,其中,所述特征信息包括目的地址和协议类型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到的;根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
- 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,包括:根据对应于第一数据流类型的所述第一置信度和对应于所述第一数据流类型的所述第二置信度计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
- 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,包括:根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
- 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则向设备发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值;接收所述设备发送的第一信息,所述第一信息是所述设备根据所述当前数据流的特征信息和所述第一数据流类型的标识信息得到的;根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。
- 根据权利要求2或3所述的方法,其特征在于,还包括:若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
- 根据权利要求5所述的方法,其特征在于,所述根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,包括:若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
- 根据权利要求1-6任一项所述的方法,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型之后,还包括:向运维支持系统OSS发送所述当前数据流的数据流类型的信息,所述当前数据流的数据流类型的信息用于所述OSS生成针对所述当前数据流的流量控制策略。
- 根据权利要求1-7任一项所述的方法,其特征在于,所述报文长度包括报文中以太帧长度、IP长度、传输协议长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
- 一种数据流类型识别方法,其特征在于,包括:接收设备发送的当前数据流的特征信息和数据流类型的信息;根据所述当前数据流的特征信息和数据流类型的信息生成第一信息;向所述设备发送所述第一信息,以用于更新内容识别模型,所述内容识别模型用于得到对应于至少一个数据流类型的至少一个第二置信度,其中,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
- 根据权利要求9所述的方法,其特征在于,所述根据所述当前数据流的特征信息和数据流类型的信息生成第一信息,包括:若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述当前数据流的数据流类型不同,则将所述第一记录的数据流类型更新为所述当前数据流的数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
- 一种数据流类型识别设备,其特征在于,所述设备是第一设备,所述第一设备包括存储器和处理器,其中,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度,其中,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;所述报文特征包括报文长度、报文传输速度、报文间隔时间和报 文方向中的一项或者多项;根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度,其中,所述特征信息包括目的地址和协议类型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到的;根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
- 根据权利要求11所述的设备,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,具体为:根据对应于第一数据流类型的所述第一置信度和对应于所述第一数据流类型的所述第二置信度计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
- 根据权利要求11所述的设备,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,具体为:根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
- 根据权利要求12或13所述的设备,其特征在于,所述第一设备还包括收发器,所述处理器还用于:若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则通过所述收发器向第二设备发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值;通过所述收发器接收所述第二设备发送的第一信息,所述第一信息是所述第二设备根据所述当前数据流的特征信息和所述第一数据流类型的标识信息得到的;根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。
- 根据权利要求12或13所述的设备,其特征在于,所述处理器还用于:若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
- 根据权利要求15所述的设备,其特征在于,所述根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,具体为:若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录 的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
- 根据权利要求11-16任一项所述的设备,其特征在于,所述第一设备还包括收发器,所述处理器还用于,在所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型之后,通过所述收发器向运维支持系统OSS发送所述当前数据流的数据流类型的信息,所述当前数据流的数据流类型的所述信息用于所述OSS生成针对所述当前数据流的流量控制策略。
- 根据权利要求11-17任一项所述的设备,其特征在于,所述报文长度包括报文中以太帧长度、IP长度、传输协议长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
- 一种数据流类型识别设备,所述设备是第一设备,其特征在于,所述第一设备包括存储器、处理器和收发器,其中,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:通过所述收发器接收第二设备发送的当前数据流的特征信息和数据流类型的信息;根据所述当前数据流的特征信息和数据流类型的信息生成第一信息;通过所述收发器向所述第二设备发送所述第一信息,以用于更新内容识别模型,所述内容识别模型用于得到对应于至少一个数据流类型的至少一个第二置信度,其中,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
- 根据权利要求19所述的设备,其特征在于,所述根据所述当前数据流的特征信息和数据流类型的信息生成第一信息,具体为:若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述当前数据流的数据流类型不同,则将所述第一记录的数据流类型更新为所述当前数据流的数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
- 一种数据流类型识别设备,其特征在于,包括用于执行权利要求1-10任一项所述的方法的单元。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在处理器上运行时,实现权利要求1-10任一所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品存储在存储器上,当所述计算机程序产品在处理器上运行时,实现权利要求1-10任一项所述的方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20865499.6A EP4024791B1 (en) | 2019-09-16 | 2020-09-16 | Data stream type identification method and related devices |
KR1020227010740A KR20220053658A (ko) | 2019-09-16 | 2020-09-16 | 데이터 스트림 분류 방법 및 관련 장치 |
BR112022004814A BR112022004814A2 (pt) | 2019-09-16 | 2020-09-16 | Método de classificação de fluxo de dados e dispositivo relacionado |
JP2022516688A JP7413515B2 (ja) | 2019-09-16 | 2020-09-16 | データストリーム分類方法および関連デバイス |
US17/695,491 US11838215B2 (en) | 2019-09-16 | 2022-03-15 | Data stream classification method and related device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910872990.0A CN112511457B (zh) | 2019-09-16 | 2019-09-16 | 一种数据流类型识别方法及相关设备 |
CN201910872990.0 | 2019-09-16 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/695,491 Continuation US11838215B2 (en) | 2019-09-16 | 2022-03-15 | Data stream classification method and related device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021052379A1 true WO2021052379A1 (zh) | 2021-03-25 |
Family
ID=74883367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/115693 WO2021052379A1 (zh) | 2019-09-16 | 2020-09-16 | 一种数据流类型识别方法及相关设备 |
Country Status (7)
Country | Link |
---|---|
US (1) | US11838215B2 (zh) |
EP (1) | EP4024791B1 (zh) |
JP (1) | JP7413515B2 (zh) |
KR (1) | KR20220053658A (zh) |
CN (2) | CN112511457B (zh) |
BR (1) | BR112022004814A2 (zh) |
WO (1) | WO2021052379A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220200869A1 (en) * | 2017-11-27 | 2022-06-23 | Lacework, Inc. | Configuring cloud deployments based on learnings obtained by monitoring other cloud deployments |
US20220247769A1 (en) * | 2017-11-27 | 2022-08-04 | Lacework, Inc. | Learning from similar cloud deployments |
WO2023005278A1 (zh) * | 2021-07-27 | 2023-02-02 | 华为技术有限公司 | 一种流控处理的方法及通信装置 |
US11818156B1 (en) | 2017-11-27 | 2023-11-14 | Lacework, Inc. | Data lake-enabled security platform |
US12058160B1 (en) | 2017-11-22 | 2024-08-06 | Lacework, Inc. | Generating computer code for remediating detected events |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11742901B2 (en) * | 2020-07-27 | 2023-08-29 | Electronics And Telecommunications Research Institute | Deep learning based beamforming method and apparatus |
CN113434433B (zh) * | 2021-07-22 | 2024-08-02 | 中国工商银行股份有限公司 | 软件测试的智能辅助方法及装置 |
CN113935431B (zh) * | 2021-10-28 | 2022-04-08 | 北京永信至诚科技股份有限公司 | 一种多流关联分析识别私有加密数据的方法及系统 |
CN115037698B (zh) * | 2022-05-30 | 2024-01-02 | 天翼云科技有限公司 | 一种数据识别方法、装置及电子设备 |
WO2024195131A1 (ja) * | 2023-03-23 | 2024-09-26 | 日本電信電話株式会社 | 会議タイプ推定装置、会議タイプ推定方法、及びプログラム |
CN118430197A (zh) * | 2024-07-05 | 2024-08-02 | 海纳云物联科技有限公司 | 排水井的污水排放报警方法、装置、设备和介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011114060A2 (fr) * | 2010-03-17 | 2011-09-22 | Thales | Procédé d'identification d'un protocole à l'origine d'un flux de données |
CN107360032A (zh) * | 2017-07-20 | 2017-11-17 | 中国南方电网有限责任公司 | 一种网络流识别方法及电子设备 |
CN108667747A (zh) * | 2018-04-28 | 2018-10-16 | 深圳信息职业技术学院 | 网络流应用类型识别的方法、装置及计算机可读存储介质 |
CN108900432A (zh) * | 2018-07-05 | 2018-11-27 | 中山大学 | 一种基于网络流行为的内容感知方法 |
CN110048962A (zh) * | 2019-04-24 | 2019-07-23 | 广东工业大学 | 一种网络流量分类的方法、系统及设备 |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007228489A (ja) * | 2006-02-27 | 2007-09-06 | Nec Corp | アプリケーション識別システム、アプリケーション識別方法及びアプリケーション識別用プログラム |
US8634399B2 (en) | 2006-04-12 | 2014-01-21 | Qualcomm Incorporated | Uplink and bi-directional traffic classification for wireless communication |
US8682812B1 (en) * | 2010-12-23 | 2014-03-25 | Narus, Inc. | Machine learning based botnet detection using real-time extracted traffic features |
US9094288B1 (en) * | 2011-10-26 | 2015-07-28 | Narus, Inc. | Automated discovery, attribution, analysis, and risk assessment of security threats |
JP5812282B2 (ja) * | 2011-12-16 | 2015-11-11 | 公立大学法人大阪市立大学 | トラヒック監視装置 |
EP2806602A4 (en) * | 2013-02-04 | 2015-03-04 | Huawei Tech Co Ltd | CHARACTER EXTRACTION DEVICE, NETWORK TRAFFIC IDENTIFICATION PROCESS, DEVICE AND SYSTEM |
US20140321290A1 (en) * | 2013-04-30 | 2014-10-30 | Hewlett-Packard Development Company, L.P. | Management of classification frameworks to identify applications |
RU2589852C2 (ru) * | 2013-06-28 | 2016-07-10 | Закрытое акционерное общество "Лаборатория Касперского" | Система и способ автоматической регулировки правил контроля приложений |
US9721212B2 (en) * | 2014-06-04 | 2017-08-01 | Qualcomm Incorporated | Efficient on-device binary analysis for auto-generated behavioral models |
CN105007282B (zh) * | 2015-08-10 | 2018-08-10 | 济南大学 | 面向网络服务提供商的恶意软件网络行为检测方法及系统 |
JP2017139580A (ja) * | 2016-02-02 | 2017-08-10 | 沖電気工業株式会社 | 通信解析装置及び通信解析プログラム |
US20170364794A1 (en) * | 2016-06-20 | 2017-12-21 | Telefonaktiebolaget Lm Ericsson (Publ) | Method for classifying the payload of encrypted traffic flows |
CN107360159B (zh) * | 2017-07-11 | 2019-12-03 | 中国科学院信息工程研究所 | 一种识别异常加密流量的方法及装置 |
CN108173708A (zh) * | 2017-12-18 | 2018-06-15 | 北京天融信网络安全技术有限公司 | 基于增量学习的异常流量检测方法、装置及存储介质 |
CN108200032A (zh) * | 2017-12-27 | 2018-06-22 | 北京奇艺世纪科技有限公司 | 一种数据检测方法、装置及电子设备 |
CN108650195B (zh) * | 2018-04-17 | 2021-08-24 | 南京烽火星空通信发展有限公司 | 一种app流量自动识别模型构建方法 |
US10555040B2 (en) * | 2018-06-22 | 2020-02-04 | Samsung Electronics Co., Ltd. | Machine learning based packet service classification methods for experience-centric cellular scheduling |
CN109067612A (zh) * | 2018-07-13 | 2018-12-21 | 哈尔滨工程大学 | 一种基于增量聚类算法的在线流量识别方法 |
CN109818976B (zh) * | 2019-03-15 | 2021-09-21 | 杭州迪普科技股份有限公司 | 一种异常流量检测方法及装置 |
CN110138681B (zh) * | 2019-04-19 | 2021-01-22 | 上海交通大学 | 一种基于tcp报文特征的网络流量识别方法及装置 |
-
2019
- 2019-09-16 CN CN201910872990.0A patent/CN112511457B/zh active Active
- 2019-09-16 CN CN202111645523.8A patent/CN114465962B/zh active Active
-
2020
- 2020-09-16 BR BR112022004814A patent/BR112022004814A2/pt unknown
- 2020-09-16 WO PCT/CN2020/115693 patent/WO2021052379A1/zh unknown
- 2020-09-16 KR KR1020227010740A patent/KR20220053658A/ko not_active Application Discontinuation
- 2020-09-16 JP JP2022516688A patent/JP7413515B2/ja active Active
- 2020-09-16 EP EP20865499.6A patent/EP4024791B1/en active Active
-
2022
- 2022-03-15 US US17/695,491 patent/US11838215B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011114060A2 (fr) * | 2010-03-17 | 2011-09-22 | Thales | Procédé d'identification d'un protocole à l'origine d'un flux de données |
CN107360032A (zh) * | 2017-07-20 | 2017-11-17 | 中国南方电网有限责任公司 | 一种网络流识别方法及电子设备 |
CN108667747A (zh) * | 2018-04-28 | 2018-10-16 | 深圳信息职业技术学院 | 网络流应用类型识别的方法、装置及计算机可读存储介质 |
CN108900432A (zh) * | 2018-07-05 | 2018-11-27 | 中山大学 | 一种基于网络流行为的内容感知方法 |
CN110048962A (zh) * | 2019-04-24 | 2019-07-23 | 广东工业大学 | 一种网络流量分类的方法、系统及设备 |
Non-Patent Citations (2)
Title |
---|
See also references of EP4024791A4 |
ZHOU, DINGDING ET AL.: "Traffic Classification Based on Bayesian Updating Method", JOURNAL OF SYSTEM SIMULATION, vol. 25, no. 11, 30 November 2013 (2013-11-30), pages 2597 - 2603, XP009527021, ISSN: 1004-731X * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12058160B1 (en) | 2017-11-22 | 2024-08-06 | Lacework, Inc. | Generating computer code for remediating detected events |
US20220200869A1 (en) * | 2017-11-27 | 2022-06-23 | Lacework, Inc. | Configuring cloud deployments based on learnings obtained by monitoring other cloud deployments |
US20220247769A1 (en) * | 2017-11-27 | 2022-08-04 | Lacework, Inc. | Learning from similar cloud deployments |
US11785104B2 (en) * | 2017-11-27 | 2023-10-10 | Lacework, Inc. | Learning from similar cloud deployments |
US11818156B1 (en) | 2017-11-27 | 2023-11-14 | Lacework, Inc. | Data lake-enabled security platform |
US11894984B2 (en) * | 2017-11-27 | 2024-02-06 | Lacework, Inc. | Configuring cloud deployments based on learnings obtained by monitoring other cloud deployments |
US12126695B1 (en) | 2017-11-27 | 2024-10-22 | Fortinet, Inc. | Enhancing security of a cloud deployment based on learnings from other cloud deployments |
WO2023005278A1 (zh) * | 2021-07-27 | 2023-02-02 | 华为技术有限公司 | 一种流控处理的方法及通信装置 |
Also Published As
Publication number | Publication date |
---|---|
CN114465962B (zh) | 2024-01-05 |
KR20220053658A (ko) | 2022-04-29 |
JP2022548136A (ja) | 2022-11-16 |
BR112022004814A2 (pt) | 2022-06-21 |
US11838215B2 (en) | 2023-12-05 |
EP4024791B1 (en) | 2024-06-19 |
EP4024791A1 (en) | 2022-07-06 |
CN112511457A (zh) | 2021-03-16 |
EP4024791A4 (en) | 2022-09-28 |
US20220210082A1 (en) | 2022-06-30 |
CN112511457B (zh) | 2021-12-28 |
JP7413515B2 (ja) | 2024-01-15 |
CN114465962A (zh) | 2022-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021052379A1 (zh) | 一种数据流类型识别方法及相关设备 | |
WO2021169308A1 (zh) | 一种数据流类型识别模型更新方法及相关设备 | |
US11669751B2 (en) | Prediction of network events via rule set representations of machine learning models | |
JP6162337B2 (ja) | アプリケーションアウェアネットワーク管理 | |
CN108881028B (zh) | 基于深度学习实现应用感知的sdn网络资源调度方法 | |
WO2022028456A1 (zh) | 拥塞控制方法和装置、网络节点设备,及计算机可读存储介质 | |
WO2021164261A1 (zh) | 云网络设备的测试方法、存储介质和计算机设备 | |
CN116545936B (zh) | 拥塞控制方法、系统、装置、通信设备及存储介质 | |
US11956118B2 (en) | Fault root cause identification method, apparatus, and device | |
US10243816B2 (en) | Automatically optimizing network traffic | |
Mazhar Rathore et al. | Exploiting encrypted and tunneled multimedia calls in high-speed big data environment | |
CN107070851B (zh) | 基于网络流的连接指纹生成和垫脚石追溯的系统和方法 | |
KR20220029142A (ko) | Sdn 컨트롤러 서버 및 이의 sdn 기반 네트워크 트래픽 사용량 분석 방법 | |
Preamthaisong et al. | Enhanced DDoS detection using hybrid genetic algorithm and decision tree for SDN | |
Obasi | Encrypted network traffic classification using ensemble learning techniques | |
Al-Saadi et al. | Unsupervised machine learning-based elephant and mice flow identification | |
Xie et al. | A Decision Tree‐Based Online Traffic Classification Method for QoS Routing in Data Center Networks | |
An et al. | Evaluating SIP-based VoIP communication quality and network security | |
Fluechter et al. | Autonomous integration of TSN-unaware applications with QoS requirements in TSN networks | |
US12047261B1 (en) | Determining content perception | |
US20230239247A1 (en) | Method and system for dynamic load balancing | |
Tadesse | Statistical Modeling of Internet Traffic Flow Length and Flow Size | |
Song et al. | FlowBot: A Learning-Based Co-bottleneck Flow Detector for Video Servers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20865499 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022516688 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112022004814 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 20227010740 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2020865499 Country of ref document: EP Effective date: 20220330 |
|
ENP | Entry into the national phase |
Ref document number: 112022004814 Country of ref document: BR Kind code of ref document: A2 Effective date: 20220315 |