WO2021052379A1 - 一种数据流类型识别方法及相关设备 - Google Patents

一种数据流类型识别方法及相关设备 Download PDF

Info

Publication number
WO2021052379A1
WO2021052379A1 PCT/CN2020/115693 CN2020115693W WO2021052379A1 WO 2021052379 A1 WO2021052379 A1 WO 2021052379A1 CN 2020115693 W CN2020115693 W CN 2020115693W WO 2021052379 A1 WO2021052379 A1 WO 2021052379A1
Authority
WO
WIPO (PCT)
Prior art keywords
data stream
stream type
type
recognition model
information
Prior art date
Application number
PCT/CN2020/115693
Other languages
English (en)
French (fr)
Inventor
吴俊�
胡新宇
王雅莉
司晓云
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020227010740A priority Critical patent/KR20220053658A/ko
Priority to BR112022004814A priority patent/BR112022004814A2/pt
Priority to EP20865499.6A priority patent/EP4024791A4/en
Priority to JP2022516688A priority patent/JP7413515B2/ja
Publication of WO2021052379A1 publication Critical patent/WO2021052379A1/zh
Priority to US17/695,491 priority patent/US11838215B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/20Traffic policing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2475Traffic characterised by specific attributes, e.g. priority or QoS for supporting traffic characterised by the type of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/762Media network packet handling at the source 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the field of computer technology and communication, and further relates to the application of artificial intelligence (Artificial Intelligence, AI) in the field of computer technology and communication, and in particular to a data stream type identification method and related equipment.
  • AI Artificial Intelligence
  • the first method is based on manual identification, which is mainly to manually configure rules to match keywords in network traffic (including header data, payload data, etc.) to identify App types.
  • manual identification is time-consuming and labor-intensive, and with the dynamics brought about by the cloudification of enterprise office applications, transmission encryption is usually performed for security, which makes it difficult to manually set identification rules for application types, which is very difficult. Difficult to identify the type of application.
  • the second method is to use offline learning, which is mainly to collect sample data in advance, and then manually or use third-party tools to label the sample data, and then use machine learning or neural network algorithms for the labeled sample data to offline the model training.
  • the offline trained model is used to infer the application types of live network traffic.
  • the trained model may not be usable; in addition, there is no specification for the implementation of private applications, and the applicable models for the same type of applications of different enterprises are different. That is to say, when the model trained above is used to identify the type of a modified, updated application or another enterprise's application, the recognition accuracy rate may not be high.
  • the embodiment of the invention discloses a data stream type identification method and related equipment, which can identify the data stream type more accurately.
  • an embodiment of the present application provides a data stream type identification method, which includes:
  • the behavior recognition model is a report based on multiple data flow samples A model obtained from the message characteristics and data flow types; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
  • the content recognition model is a model obtained according to the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model;
  • the data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
  • the above methods involve behavior recognition models and content recognition models.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
  • the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
  • the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
  • this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • the determining the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level include:
  • the weight value of the first confidence level, the second confidence level corresponding to the first data stream type and the second confidence level Calculating a weight value corresponding to a comprehensive confidence level of the first data stream type, where the first data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
  • the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
  • introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
  • the method further includes:
  • the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold, send the characteristic information of the current data stream and the information of the first data stream type to the device, and the second data stream type
  • the preset threshold is greater than the first preset threshold
  • the content recognition model is updated according to the first information to obtain a new content recognition model.
  • the method further includes:
  • the content recognition model is updated according to the characteristic information of the current data stream and the first data stream type information to A new content recognition model is obtained, and the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
  • the feature information of the current data stream and the first data updates the content recognition model to obtain a new content recognition model, including:
  • the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
  • Training is performed on a plurality of records including the second record to obtain a new content recognition model.
  • the data stream type in the record is updated to the current data stream
  • the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
  • the at least one first confidence level and the at least one first After determining the data stream type of the current data stream with the second degree of confidence, it further includes:
  • the information of the data flow type of the current data flow is sent to the operation and maintenance support system OSS, and the information of the data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the packet length includes the Ethernet frame length, IP length, and transmission length in the packet.
  • One or more of the protocol length and the header length, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
  • an embodiment of the present application provides a data stream type identification method, the method including:
  • the characteristic information of the current data stream and the information of the data stream type sent by the receiving device are the characteristic information of the current data stream and the information of the data stream type sent by the receiving device;
  • the first information is sent to the device for updating a content recognition model, and the content recognition model is used to obtain at least one second confidence level corresponding to at least one data stream type, wherein the content recognition model is A model obtained according to the characteristic information and data stream type of one or more historical data streams, the data stream type of the historical data stream is obtained according to a behavior recognition model, and the behavior recognition model is a report based on multiple data stream samples A model obtained by the message characteristics and data flow types, where the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction.
  • the above methods involve behavior recognition models and content recognition models.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
  • the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
  • the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
  • this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • an embodiment of the present application provides a data stream type identification device, including a memory and a processor, where the memory is used to store a computer program, and the processor invokes the computer program to perform the following operations:
  • the behavior recognition model is a report based on multiple data flow samples A model obtained from the message characteristics and data flow types; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
  • the content recognition model is a model obtained according to the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model;
  • the data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
  • the above methods involve behavior recognition models and content recognition models.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
  • the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
  • the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
  • this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • the determining the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level Specifically:
  • the weight value of the first confidence level, the second confidence level corresponding to the first data stream type and the second confidence level Calculating a weight value corresponding to a comprehensive confidence level of the first data stream type, where the first data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
  • the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
  • introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
  • the device further includes a transceiver, and the processor is further configured to:
  • the comprehensive confidence level corresponding to the first data stream type is less than a second preset threshold, the characteristic information of the current data stream and the information of the first data stream type are sent to other devices through the transceiver.
  • the second preset threshold is greater than the first preset threshold
  • the content recognition model is updated according to the first information to obtain a new content recognition model.
  • the processor is further configured to:
  • the content recognition model is updated according to the characteristic information of the current data stream and the first data stream type information to A new content recognition model is obtained, and the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
  • the feature information of the current data stream and the first data updates the content recognition model to obtain a new content recognition model, specifically:
  • the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
  • Training is performed on a plurality of records including the second record to obtain a new content recognition model.
  • the data stream type in the record is updated to the current data stream
  • the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
  • the device further includes a transceiver, and the processor is further configured to: After the data flow type of the current data flow is determined according to the at least one first confidence level and the at least one second confidence level, the transceiver sends the information of the current data flow to the operation and maintenance support system OSS.
  • the information of the data flow type the information of the data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the packet length includes the Ethernet frame length, IP length, and transmission length in the packet.
  • One or more of the protocol length and the header length, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
  • an embodiment of the present application provides a data stream type identification device, including a memory, a processor, and a transceiver, where the memory is used to store a computer program, and the processor invokes the computer program to perform the following operations :
  • the first information is sent to the other device through the transceiver to be used to update a content recognition model, and the content recognition model is used to obtain at least one second confidence level corresponding to at least one data stream type, wherein,
  • the content recognition model is a model obtained based on the feature information of one or more historical data streams and the data stream type, the data stream type of the historical data stream is obtained according to the behavior recognition model, and the behavior recognition model is based on multiple A model obtained by message characteristics and data flow types of a data flow sample, where the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction.
  • the behavior recognition model and the content recognition model are involved.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the feature information of the data stream and the behavior model.
  • the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
  • the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
  • this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • an embodiment of the present application provides a data stream type identification device, which includes:
  • the first identification unit is configured to obtain at least one first confidence level corresponding to at least one data flow type of the current data flow according to the message characteristics and behavior identification model of the current data flow, wherein the behavior identification model is based on A model obtained from the message characteristics and data flow types of multiple data flow samples; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
  • the second recognition unit is configured to obtain at least one second confidence level of the current data stream corresponding to the at least one data stream type according to the characteristic information of the current data stream and the content recognition model, wherein the characteristic information Including the destination address and the protocol type, the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model ;
  • the determining unit is configured to determine the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
  • the above-mentioned equipment involves a behavior recognition model and a content recognition model.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
  • the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
  • the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • the determining unit is configured to determine the current data stream according to the at least one first confidence level and the at least one second confidence level
  • the data stream type specifically:
  • the first confidence level corresponding to the first data stream type It is used to calculate the first confidence level corresponding to the first data stream type, the weight value of the first confidence level, the second confidence level corresponding to the first data stream type, and the second confidence level. Calculation of the weight value of the degree corresponding to the comprehensive confidence of the first data stream type, where the first data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
  • the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
  • introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
  • the device further includes:
  • the first sending unit is configured to send the characteristic information of the current data stream and the first data stream to other devices when the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold.
  • a receiving unit configured to receive first information sent by the other device, where the first information is obtained by the other device according to the characteristic information of the current data stream and the identification information of the first data stream type;
  • the update unit is used to update the content recognition model according to the first information to obtain a new content recognition model.
  • the third possible implementation manner of the fifth aspect further includes:
  • An update unit configured to update according to the characteristic information of the current data stream and the first data stream type information when the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold
  • the content recognition model is used to obtain a new content recognition model, and the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
  • the feature information of the current data stream and the first data updates the content recognition model to obtain a new content recognition model, specifically:
  • the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
  • Training is performed on a plurality of records including the second record to obtain a new content recognition model.
  • the data stream type in the record is updated to the current data stream
  • the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
  • the device further includes:
  • the second sending unit is configured to send the data stream type of the current data stream to the operation and maintenance support system OSS after the determining unit determines the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
  • the information of the data flow type of the current data flow, and the information of the data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the packet length includes the Ethernet frame length, IP length, and transmission length in the packet.
  • One or more of the protocol length and the header length, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
  • an embodiment of the present application provides a data stream type identification device, which includes:
  • the receiving unit is used to receive the characteristic information of the current data stream and the information of the data stream type sent by other devices;
  • a generating unit configured to generate first information according to the characteristic information of the current data stream and the information of the data stream type
  • the sending unit is configured to send the first information to the other device for updating a content recognition model, and the content recognition model is used to obtain at least one second confidence level corresponding to at least one data stream type, wherein,
  • the content recognition model is a model obtained based on the feature information of one or more historical data streams and the data stream type, the data stream type of the historical data stream is obtained according to the behavior recognition model, and the behavior recognition model is based on multiple A model obtained by message characteristics and data flow types of a data flow sample, where the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction.
  • the above-mentioned equipment involves a behavior recognition model and a content recognition model.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
  • the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
  • the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and when it runs on a processor, it realizes the first aspect or any possibility of the first aspect The method described in the implementation.
  • embodiments of the present application provide a computer program product, the computer program product is stored in a memory, and when the computer program product runs on a processor, the first aspect or any possible aspect of the first aspect is realized. Implement the method described in the method.
  • an embodiment of the present application provides a data stream type identification system.
  • the system includes a first device and a second device, wherein the second device is the third aspect described above, or any possibility of the third aspect Or the data stream type identification device described in the fifth aspect or any possible implementation of the fifth aspect; the first device is the fourth aspect above, or any possible implementation of the fourth aspect Or the data stream type identification device described in the sixth aspect or any possible implementation manner of the sixth aspect.
  • the behavior recognition model and the content recognition model are involved.
  • the behavior recognition model is obtained by pre-training the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the data stream type.
  • the data stream type recognized by the behavior model is trained, so the content recognition model is an online learning model; the behavior recognition model can identify some basic (or typical) data stream types, and certain characteristics of the same type of data stream (Such as message length, message transmission speed, etc.) may change in the subsequent transmission process, so the online learning feature of the content recognition model is used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.) , Can hedge the recognition error of the data stream type of the data stream by the behavior recognition model in the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition of the data stream type of the data stream. The accuracy also improves the generalization of recognition, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • FIG. 1 is a schematic structural diagram of a data stream type identification system provided by an embodiment of the present invention
  • 2A is a schematic diagram of a scene of a content recognition model and a behavior recognition model provided by an embodiment of the present invention
  • 2B is a schematic structural diagram of a classification model provided by an embodiment of the present invention.
  • 2C is a schematic structural diagram of a classification model provided by an embodiment of the present invention.
  • 2D is a schematic structural diagram of a classification model provided by an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a data stream type identification method provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of scenes of data stream a and data stream b provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an example of data stream recording provided by an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an example of data stream recording provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a second device provided by an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a first device provided by an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of yet another second device provided by an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of still another first device according to an embodiment of the present invention.
  • Figure 1 is a schematic structural diagram of a data stream type identification system provided by an embodiment of the present invention.
  • the system includes an operational system support (OSS) 101, a server 102, a forwarding device 103, and a terminal. 104.
  • OSS operational system support
  • the terminals 104 are used to run various applications, such as video conferencing applications, voice conferencing applications, desktop cloud applications, etc.
  • the data stream types (also called application types) of the data streams generated by different applications are often different. In the embodiment of the present application, the data stream generated by the terminal 104 needs to be sent to the destination device through the forwarding device 103 first.
  • the forwarding device may include routers, switches, etc., and the number of forwarding devices 103 may be one or more, for example, , There is one router and three switches; another example, there is only one switch; another example, there are three switches, and so on.
  • the aforementioned server 102 may be one server or a server cluster composed of multiple servers.
  • how the data stream generated by the terminal 104 should be sent on the terminal 104 and how it should be forwarded on the forwarding device 103 can be performed in accordance with the flow control strategy generated by OSS101.
  • the flow control strategy specifies a video conference application
  • the generated data stream has the highest priority
  • the data streams generated by the video conferencing application will be transmitted first.
  • Data flow It should be noted that the flow control strategy is generated by OSS101 based on the data flow type of the current data flow.
  • the data flow type of the current data flow used by the OSS 101 to generate the flow control policy is determined by the second device.
  • the second device needs to use a behavior recognition model and a content recognition model when determining the data stream type of the current data stream.
  • the parameters of the model may include, but are not limited to, confidence weight vectors (w1, w2)
  • the first preset threshold ⁇ 1 and the second preset threshold ⁇ 2 of the data stream type where the first preset threshold may also be called a classification threshold, which is used to measure whether the data stream type is classified into a certain category; the second preset threshold
  • the set threshold is also called the model update threshold, which is used to measure when to update the content recognition model.
  • the input of the content recognition model when recognizing the current data stream's data stream type can include characteristic information (such as destination IP, destination port, protocol type, etc.), and the behavior recognition model when recognizing the current data stream's data stream type
  • the input can include message information (such as message length, message transmission speed, message interval time, message direction, etc.).
  • the confidence obtained by the second device on the content recognition model and the confidence obtained by the behavior recognition model is based on the confidence weight vector (w1, w2) to obtain the final data stream type; in this process, if the content is determined by the second preset threshold ⁇ 2 If the recognition model needs to be updated, the parameters needed to update the content recognition model are obtained.
  • the parameters required for the update may be obtained by the second device through training on some data stream related information, or the first device may be obtained by training on some data stream related information, and then sent to the first device. Two equipment.
  • the second device may be the aforementioned OSS101, the aforementioned server 102, or the aforementioned forwarding device 103; in addition, the first device can be the aforementioned OSS101, the aforementioned server 102, or It is the aforementioned forwarding device 103.
  • the first device and the second device may be the same device or different devices.
  • the server 102 does not exist in the architecture shown in FIG. 1.
  • the above content recognition model is essentially a classification model.
  • the classification model can be a tree model, as shown in Figure 2C, and the classification model can also be a neural network model, as shown in Figure 2D.
  • the classification model can also be a support vector machine (SVM) model, and the classification model can also be other forms of models.
  • SVM support vector machine
  • the content recognition model is classified by extracting the characteristics of the input vector (such as the destination Internet protocol (IP) address, destination port number, protocol type and other characteristic information)
  • the content recognition model Data streams with the same destination IP address, same protocol type, and same port number can be identified as the same data stream type; data streams with the same network segment, same protocol type, and similar port number can be identified as the same data stream type; the destination The port number is 20 (a well-known port number of the file transfer protocol (file transfer protocol, FTP)).
  • the transmission control protocol transmission control protocol, TCP
  • TCP transmission control protocol
  • some information about the current data stream that the second device needs to use when identifying the data stream type of the current data stream can be sent to the second device by the terminal 104 or other devices, or collected by the second device itself.
  • Figure 3 is a data stream type identification method provided by an embodiment of the present invention.
  • the method can be implemented based on the architecture shown in Figure 1.
  • the method includes but is not limited to the following steps:
  • Step S301 The second device obtains at least one first confidence level of the current data flow corresponding to at least one data flow type according to the message characteristics and behavior recognition model of the current data flow.
  • the behavior recognition model is a model obtained based on the message characteristics and data stream types of multiple data stream samples; optionally, the multiple data stream samples may be offline samples, that is, the behavior recognition model may be offline The trained model.
  • the multiple data stream samples may also be pre-selected typical (or representative) samples.
  • the data stream packets of a video conference application usually have a relatively long packet length, but occasionally there may be packets. When the message length is relatively short, in comparison, the relatively long message length can better reflect that the current data stream is the data stream of the video conference application. Therefore, when selecting the data stream for the video conference application, try to choose the message length comparison The long one is representative as a data stream sample.
  • the data stream type of the multiple data stream samples may be considered to be determined, that is, manual labeling.
  • the behavior recognition model is a model obtained based on the message characteristics and data flow types of multiple data flow samples, the behavior recognition model can reflect some relationships between the message characteristics in a data flow and the data flow type, Therefore, when the message characteristics of the current data flow are input to the behavior recognition model, it can predict to a certain extent the tendency (or probability) that the current data flow belongs to a certain or certain data flow types, and reflect the tendency (or The parameter of probability) can also be called confidence.
  • the message characteristics may include one or more of message length, message transmission speed, message interval time, and message direction.
  • the message length includes the message One or more of the Ethernet frame length, the IP length, the transmission protocol length, and the header length, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
  • message characteristics can also include other characteristics, such as message length, message transmission speed, message interval time, and maximum, minimum, mean, variance, and variance in the message direction. Quantile etc.
  • the message feature can be input into the behavior recognition model in the form of a vector, for example, it can be in the form of (message length, message transmission speed, message interval time).
  • the data stream type in the embodiment of the present application may also be referred to as an application type.
  • N there may be N data stream types, and N is greater than or equal to 1.
  • This embodiment of the application can estimate (or predict) the confidence that the current data stream belongs to each of the N data stream types , That is, the N first confidence levels corresponding to the N data stream types of the current data stream are obtained.
  • the N data stream types refer to the data stream type of a video conference and the data stream of a voice conference Type, the data stream type of the desktop cloud
  • the behavior recognition model needs to be used to estimate the first confidence that the current data stream belongs to the data stream type of the video conference, the first confidence of the data stream type of the voice conference, and the first confidence of the data stream type of the desktop cloud.
  • the first confidence level of the data stream type If the N data stream types refer to the data stream type of the video conference, the behavior recognition model needs to be used to estimate the first confidence that the current data stream belongs to the data stream type of the video conference.
  • the embodiment of this application focuses on one of the data stream types. Therefore, the embodiment of this application only estimates (or predicts) that the current data stream belongs to the data stream type that is the focus of attention.
  • the confidence level of the current data stream corresponding to a data stream type is obtained. For example, if the multiple data stream types refer to the data stream type of the video conference and the data of the voice conference The stream type is the data stream type of the desktop cloud.
  • the embodiment of the present application only focuses on the data stream type of the video conference. Therefore, it is only necessary to estimate the first confidence that the current data stream belongs to the data stream type of the video conference through the behavior recognition model.
  • Step S202 The second device obtains at least one second confidence level of the current data stream corresponding to the at least one data stream type according to the characteristic information of the current data stream and the content recognition model.
  • the content recognition model is a model obtained based on the characteristic information and data stream types of one or more historical data streams.
  • the one or more historical data streams may be online data streams, that is, one or more data streams continuously generated in a period of time before that, and the data stream type of the historical data stream is identified by the above behavior Model recognition, that is, the content recognition model can be a model obtained through online training.
  • the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type
  • the content recognition model can reflect some relationships between the characteristic information in a data stream and the data stream type, Therefore, when the characteristic information of the current data stream is input to the content recognition model, it can predict to a certain extent the tendency (or probability) that the current data stream belongs to a certain or certain data stream types, and reflect the tendency (or probability)
  • the parameter of) can also be called confidence.
  • the feature information may include one or more of the destination address, protocol type, and port number.
  • the destination address may be an IP address, a destination MAC address, or other forms. Address;
  • the feature information can include other features in addition to the features exemplified here.
  • the feature information here may be target-specific information, for example, target IP, target port, and so on.
  • the feature information can be input into the content recognition model in the form of a vector, and can be in the form of (ip, port, protocol), for example (10.29.74.5, 8443, 6). It can also be in the form of (mac, port, protocol), for example (05FA1525EEFF, 8443, 6). Of course, it can also be in other forms, and I will not give examples one by one here.
  • the embodiment of this application can estimate (or predict) the confidence that the current data stream belongs to each of the N data stream types, that is, obtain the current The N second confidence levels of the data stream corresponding to the N data stream types.
  • the content recognition model needs to be used to estimate the second confidence that the current data stream belongs to the data stream type of the video conference, the second confidence of the data stream type of the voice conference, and the second confidence of the data stream type of the desktop cloud. Confidence. If the N data stream types refer to the data stream type of the video conference, the content recognition model needs to be used to estimate the second confidence that the current data stream belongs to the data stream type of the video conference.
  • the embodiment of this application focuses on one of the data stream types. Therefore, the embodiment of this application only estimates (or predicts) that the current data stream belongs to the data stream type that is the focus of attention.
  • the confidence level of the current data stream corresponding to a data stream type is obtained. For example, if the multiple data stream types refer to the data stream type of a video conference and the data of a voice conference The stream type is the data stream type of the desktop cloud.
  • the embodiment of the present application only focuses on the data stream type of the video conference. Therefore, it is only necessary to estimate the second confidence that the current data stream belongs to the data stream type of the video conference through the content recognition model.
  • Step S303 The second device determines the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
  • At least one first degree of confidence can characterize the data type tendency of the current data stream to a certain extent
  • at least one second confidence degree can also characterize the data stream type tendency of the current data stream to a certain extent, so the difference between the two Comprehensive consideration can obtain a more accurate and credible data stream type tendency, and thus the data stream type of the current data stream can be obtained.
  • the data stream type is determined as the data stream type of the current data stream, for example, according to the first confidence of the data stream type of the video conference
  • the overall confidence level of the data stream type of the video conference determined by the second confidence level of the data stream type of the video conference is 0.7; according to the first confidence level of the data stream type of the voice conference and the second confidence level of the data stream type of the voice conference
  • the comprehensive confidence of the data stream type of the voice conference determined by the second confidence is 0.2;
  • the data of the desktop cloud is determined according to the first confidence of the data stream type of the desktop cloud and the second confidence of the data stream type of the desktop cloud
  • the comprehensive confidence level of the stream type is 0.1; since the comprehensive confidence level of the data stream type of the video conference is the largest, the data stream type of the current data stream is determined as the data stream type of the video conference.
  • the determining the data stream type of the current data stream according to at least one first confidence level and at least one second confidence level may be specifically: according to all data corresponding to the first data stream type.
  • the calculation of the first confidence level, the weight value of the first confidence level, the second confidence level corresponding to the first data stream type, and the weight value of the second confidence level corresponds to the first data
  • the comprehensive confidence level of the stream type, the first data stream type is any one of the at least one data stream type, that is, each data stream type in the at least one data stream type satisfies the first data stream type here.
  • the characteristics of a data stream type is any one of the at least one data stream type, that is, each data stream type in the at least one data stream type satisfies the first data stream type here.
  • the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type, for example, if it corresponds to a video conference If the comprehensive confidence level of the data stream type is greater than the first preset threshold, the data stream type of the current data stream is determined to be the data stream type of the video conference; if the comprehensive confidence level corresponds to the data stream type of the desktop cloud If the degree is greater than the first preset threshold, it is determined that the data flow type of the current data flow is the data flow type of the desktop cloud.
  • the confidence weight vector (w1, w2) is (0.4, 0.6)
  • the first confidence weight which can also be regarded as the weight of the behavior recognition model
  • the second confidence weight which can also be regarded as Is the weight of the content recognition model
  • the first preset threshold ⁇ 1 of the data stream type is equal to 0.5.
  • the horizontal axis represents the sequence number of the data stream, and the vertical axis represents the message length.
  • the message length is greater than 0 for an uplink message, and the message length is less than 0 for a downlink message;
  • Both data stream a and data stream b are data stream types of the desktop cloud.
  • data stream b has upstream packets, which are relatively more representative of the characteristics of the desktop cloud scene. Therefore, the message behavior of data stream b is considered to be typical the behavior of.
  • Data flow a has no uplink messages for a long period of time, and it cannot be clearly indicated that it is a desktop cloud scenario.
  • the message behavior of data flow a is considered to be an atypical behavior; the behavior recognition model is usually based on the data flow of typical behaviors.
  • the behavior recognition model can identify the data stream type of data stream b, but cannot identify the data stream type of data stream a.
  • the characteristic information of data stream a and data stream b is as follows.
  • the protocol type of data stream a is TCP, the destination IP address is 10.129.74.5, and the destination port number is 8443.
  • the protocol type of data stream b is TCP, the destination IP address is 10.129.56.39, and the destination port number is 443.
  • the second confidence of the content recognition model that the data stream type of the desktop cloud, the data stream type of the voice conference, and the data stream type of the video conference are all 0.
  • the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0.5, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
  • One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the content recognition model recognizes that the data stream type of the desktop cloud, the data stream type of the voice conference, and the second confidence of the data stream type of the video conference are all 0.
  • the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0.9, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
  • One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the content recognition model can also be updated. Two different update solutions are provided below.
  • Solution 1 If the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold and less than a second preset threshold ⁇ 2, the second device sends the characteristics of the current data stream to the first device Information and information of the first data stream type, the second preset threshold is greater than the first preset threshold. For example, for data stream a, the comprehensive confidence of 0.3 corresponding to the data stream type of the desktop cloud is not within the interval ( ⁇ 1, ⁇ 2), so there is no need to send the characteristic information of the current data stream and the desktop cloud to the first device.
  • Information about the data stream type for example, for data stream b, the comprehensive confidence of 0.54 corresponding to the data stream type of the desktop cloud is not within the interval ( ⁇ 1, ⁇ 2), so the current data stream needs to be sent to the first device Features information (such as destination IP address 10.129.56.39, destination port number 443, protocol type TCP) and desktop cloud data stream type information (such as name, identification, etc.).
  • the first device receives the characteristic information of the current data stream and the information of the first data stream type sent by the second device, that is, there is one more data stream record on the first device, as shown in Figure 5, more A record for data stream b. Then, the first device obtains the first information according to the characteristic information of the current data stream and the information of the first data stream type, for example, some parameters that affect the calculation confidence.
  • the first information belongs to a model file, so it can be transmitted as a model file.
  • the ai model file of the common open source keras library is an h5 file/json file;
  • the ai model file of the open source sklearn library is a pkl/m file.
  • the second device receives the first information sent by the first device, and then updates the content recognition model according to the first information to obtain a new content recognition model.
  • the destination IP address is 10.129.56.39
  • the destination port number is 443
  • the protocol type is TCP
  • the updated content recognition model reconfirms the input data
  • the estimated second confidence level of the data flow type corresponding to the desktop cloud is 1.
  • the characteristic information of data stream a is similar to the characteristic information of data stream b, for example, the destination IP address is in the same network segment, the port number is similar, and the protocol type is similar. Therefore, the updated content recognition model When the input data stream a is estimated, the estimated result will be closer to the estimation result of the data stream b.
  • the estimated second confidence level of the data stream type corresponding to the desktop cloud may be 0.6.
  • the confidence weight vector (w1, w2), the first preset threshold ⁇ 1, and the second preset threshold ⁇ 2 remain unchanged.
  • the second confidence level of the content recognition model identifying the data stream type of the desktop cloud is 0.6, and the second confidence level of identifying the data stream type of the voice conference and the data stream type of the video conference is 0.
  • the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0.5, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
  • One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the second confidence level of 0.54 for identifying the data stream type of the desktop cloud is within the interval ( ⁇ 1, ⁇ 2), it is necessary to send the characteristic information of the current data stream and the information of the data stream type of the desktop cloud to the first device to It is used to update the content recognition model later (the update principle has been introduced before, so I won't repeat it here).
  • the second confidence level of the content recognition model identifying the data stream type of the desktop cloud is 1.
  • the second confidence level of identifying the data stream type of the voice conference and the data stream type of the video conference are both Is 0.
  • the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0.9, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
  • One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the destination IP address is 10.129.56.39
  • the destination port number is 443
  • the cloud resource with the protocol type TCP is changed from providing services for desktop clouds to providing services for video conferences.
  • the first information obtained by the above-mentioned first device according to the characteristic information of the current data stream and the information of the first data stream type may include: if the characteristic information of the first record among the multiple records is related to the characteristic information of the first data stream.
  • the feature information of the current data stream is the same but the data stream type of the first record is different from the first data stream type, then the data stream type of the first record is updated to the first data stream type to obtain The second record; each of the multiple records includes feature information and data stream type; then the multiple records including the second record are trained to obtain the first information.
  • the protocol type of data stream c is TCP
  • the destination IP address is 10.129.56.39
  • the destination port number is 443.
  • the second confidence level of the content recognition model identifying the data stream type of the desktop cloud is 1.
  • the second confidence level of identifying the data stream type of the voice conference and the data stream type of the video conference are both Is 0.
  • the behavior recognition model is based on message characteristics to recognize that the first confidence of the data stream type of the desktop cloud is 0, the first confidence of the recognition is the data stream type of the voice conference is 0, and the recognition is the first confidence of the data stream type of the video conference
  • the confidence level is 0.9. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the first device receives the characteristic information of the current data stream and the information of the first data stream type sent by the second device, that is, there is a data stream record on the first device, as shown in Figure 6, more A record for data stream c.
  • Solution 2 If the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold, the second device does not need to send the characteristic information of the current data stream and the information of the first data stream type to the first data stream type. For a device, it updates the content recognition model based on the feature information of the current data stream and the first data stream type information to obtain a new content recognition model, and the second preset threshold is greater than all The first preset threshold.
  • the second device The data stream type of the first record is updated to the first data stream type to obtain a second record; each record in the plurality of records includes feature information and data stream type; the pair includes the second data stream type;
  • the recorded multiple records are trained to obtain a new content recognition model.
  • the specific principle can refer to the first solution above, and the second device can replace one of the operations performed by the first device in the above solution.
  • Step S304 The second device sends the data stream type information of the current data stream to the operation and maintenance support system OSS.
  • the second device may send the data stream type information of the current data stream to the OSS every time the data stream type of the current data stream is determined, for example, when the data stream type of data stream a is generated for the first time, OSS sends information about the data stream type of data stream a.
  • the data stream type of data stream b When the data stream type of data stream b is generated for the first time, it sends the data stream type information of data stream b to OSS, and when the data stream type of data stream a is generated for the second time , Send the data stream type information of data stream a to OSS, when the data stream type of data stream b is generated for the second time, send the data stream type information of data stream b to OSS; when the data stream type of data stream c is generated, Send the data stream type information of the data stream c to the OSS. It is understandable that if the second device is the OSS, the above step S304 does not need to be executed.
  • Step S305 The OSS generates a flow control strategy for the current data flow according to the information of the data flow type of the current data flow. For example, if the data stream type information of the current data stream indicates that the current data stream is a data stream type of a video desktop cloud or a data stream type of a video conference, the current data stream is defined as a high-priority QoS.
  • Step S306 The OSS sends the flow control strategy to the forwarding device or terminal.
  • the forwarding device 103 may be a device such as a router or a switch
  • the terminal 104 is a device that outputs the foregoing current data stream. If the forwarding device or terminal learns that the current data stream belongs to the high-priority QoS according to the flow control policy, when it finds that there are multiple data streams to be sent, the current data stream configured as high-priority is sent first.
  • the method described in Figure 3 involves a behavior recognition model and a content recognition model.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristics of the data stream.
  • Information and the data stream type identified by the behavior model are trained, so the content recognition model is an online learning model; the behavior recognition model can identify some basic (or typical) data stream types, and the same type of data stream.
  • Some features such as message length, message transmission speed, etc.
  • the data stream is affected by other aspects (such as destination address, protocol type, etc.).
  • Type recognition can hedge the recognition error of the data stream type of the data stream by the behavior recognition model during the data stream transmission process.
  • this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the data stream of the data stream.
  • the accuracy of type recognition also improves the generalization of recognition, which can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • FIG. 7 is a schematic structural diagram of a data stream type identification device 70 according to an embodiment of the present invention.
  • the device 70 may be the second device in the method embodiment shown in FIG. A recognition unit 701, a second recognition unit 702, and a determination unit 703, wherein the detailed description of each unit is as follows.
  • the first identification unit 701 is configured to obtain at least one first confidence level of the current data flow corresponding to at least one data flow type according to the message characteristics and behavior identification model of the current data flow, wherein the behavior identification model is A model obtained according to the message characteristics and data flow types of multiple data flow samples; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
  • the second recognition unit 702 is configured to obtain at least one second confidence level of the current data stream corresponding to the at least one data stream type according to the characteristic information of the current data stream and the content recognition model, wherein the characteristic The information includes the destination address and the protocol type, the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model of;
  • the determining unit 703 is configured to determine the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
  • the above-mentioned equipment involves a behavior recognition model and a content recognition model.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior model.
  • the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
  • the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • the determining unit 703 is configured to determine the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level, specifically:
  • the first confidence level corresponding to the first data stream type It is used to calculate the first confidence level corresponding to the first data stream type, the weight value of the first confidence level, the second confidence level corresponding to the first data stream type, and the second confidence level. Calculation of the weight value of the degree corresponding to the comprehensive confidence of the first data stream type, where the first data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
  • the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
  • introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
  • the device 70 further includes:
  • the first sending unit is configured to send to other devices (the first device in the method embodiment shown in FIG. 3) when the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold. ) Sending the characteristic information of the current data stream and the information of the first data stream type, and the second preset threshold is greater than the first preset threshold;
  • the receiving unit is configured to receive first information sent by the other device (the first device in the method embodiment shown in FIG. 3), where the first information is the other device (the method embodiment shown in FIG. 3)
  • the first device in is obtained according to the characteristic information of the current data stream and the identification information of the first data stream type;
  • the update unit is used to update the content recognition model according to the first information to obtain a new content recognition model.
  • the device 70 further includes:
  • An update unit configured to update according to the characteristic information of the current data stream and the first data stream type information when the comprehensive confidence corresponding to the first data stream type is less than a second preset threshold
  • the content recognition model is used to obtain a new content recognition model, and the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
  • the updating the content recognition model according to the characteristic information of the current data stream and the first data stream type information to obtain a new content recognition model is specifically:
  • the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
  • Training is performed on a plurality of records including the second record to obtain a new content recognition model.
  • the data stream type in the record is updated to the current data stream
  • the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
  • the device 70 further includes:
  • the second sending unit is configured to send the data stream type of the current data stream to the operation and maintenance support system OSS after the determining unit determines the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
  • the information of the data flow type of the current data flow, and the information of the data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the message length includes one or more of the Ethernet frame length, IP length, transmission protocol length, and header length in the message
  • the transmission protocol includes transmission control protocol TCP and / Or User Datagram Protocol UDP.
  • each unit may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
  • FIG. 8 is a schematic structural diagram of a data stream type identification device 80 provided by an embodiment of the present invention.
  • the device 80 may be the first device in the method embodiment shown in FIG.
  • the receiving unit 801 is configured to receive the characteristic information of the current data stream and the information of the data stream type sent by other devices (the second device in the method embodiment shown in FIG. 3);
  • the generating unit 802 is configured to generate first information according to the characteristic information of the current data stream and the information of the data stream type;
  • the sending unit 803 is configured to send the first information to the other device (the second device in the method embodiment shown in FIG. 3), so as to update the content recognition model, and the content recognition model is used to obtain the corresponding At least one second confidence level for at least one data stream type, wherein the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream It is obtained according to a behavior recognition model, which is a model obtained according to the message characteristics and data flow types of multiple data flow samples, and the message characteristics include message length, message transmission speed, and message interval One or more of time and message direction.
  • a behavior recognition model and a content recognition model are involved.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the characteristic information of the data stream and the behavior
  • the data stream type identified by the model is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data stream ( For example, the length of the message, the transmission speed of the message, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model is used to identify the type of the data stream from other aspects (such as destination address, protocol type, etc.).
  • this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the accuracy of the recognition of the data stream type of the data stream. It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • each unit may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
  • FIG. 9 is a device 90 provided by an embodiment of the present invention.
  • the device 90 may be the second device in the method embodiment shown in FIG. 3.
  • the device 90 includes a processor 901, a memory 902, and a transceiver. 903.
  • the processor 901, the memory 902, and the transceiver 903 are connected to each other through a bus.
  • the memory 902 includes but is not limited to random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or A portable read-only memory (compact disc read-only memory, CD-ROM), the memory 902 is used for related computer programs and data.
  • the transceiver 903 is used to receive and send data.
  • the processor 901 may be one or more central processing units (CPUs).
  • CPUs central processing units
  • the processor 901 is a CPU
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 901 reads the computer program code stored in the memory 902, and is used to perform the following operations:
  • the behavior recognition model is a report based on multiple data flow samples A model obtained from the message characteristics and data flow types; the message characteristics include one or more of message length, message transmission speed, message interval time, and message direction;
  • the content recognition model is a model obtained according to the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model;
  • the data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
  • the behavior recognition model and the content recognition model are involved.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the feature information of the data stream and the behavior model.
  • the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
  • the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
  • this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • the determining the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level is specifically:
  • the weight value of the first confidence level, the second confidence level corresponding to the first data stream type and the second confidence level Calculating a weight value corresponding to a comprehensive confidence level of the first data stream type, where the first data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the first data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the first data stream type.
  • the confidence weights are configured for the two models (the weight configured for the behavior recognition model is the weight of the first confidence degree, which is The weight of the content recognition model configuration is the weight of the second confidence degree), so the comprehensive confidence degree calculated based on the weight of the two confidence degrees can better reflect the actual type of the current data stream.
  • introducing the first preset threshold to measure whether the corresponding data stream type is advisable can improve the efficiency and accuracy of determining the data stream type.
  • the processor is further configured to:
  • the transceiver is used to send the data to other devices (the first device in the method embodiment shown in FIG. 3).
  • the characteristic information of the current data stream and the information of the first data stream type, the second preset threshold is greater than the first preset threshold;
  • the first information sent by the other device (the first device in the method embodiment shown in FIG. 3) is received through the transceiver, and the first information is the other device (the method embodiment shown in FIG. 3)
  • the first device in is obtained according to the characteristic information of the current data stream and the identification information of the first data stream type;
  • the content recognition model is updated according to the first information to obtain a new content recognition model.
  • the processor is further configured to:
  • the content recognition model is updated according to the characteristic information of the current data stream and the first data stream type information to A new content recognition model is obtained, and the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application uses the identification result of the data stream type of the current data stream to correct the content recognition model. Specifically, a second preset threshold is introduced. When the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the device for training. To obtain a new content recognition model to make the next determination result more accurate.
  • the updating the content recognition model according to the characteristic information of the current data stream and the first data stream type information to obtain a new content recognition model is specifically:
  • the first record of the The data stream type is updated to the first data stream type to obtain a second record; each record in the plurality of records includes characteristic information and data stream type;
  • Training is performed on a plurality of records including the second record to obtain a new content recognition model.
  • the data stream type in the record is updated to the current data stream
  • the first data stream type is mainly to adapt to the elastic deployment of cloud resources. For example, the same cloud resource was used for video conferencing in the previous period and used for desktop cloud in the next period of time; the above method can be used in the next period of time. Update the data stream type to desktop cloud, so that the data stream type of the current data stream can still be accurately identified in the case of elastic deployment of cloud resources.
  • the processor is further configured to: after the determining the data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level , Sending the information of the data stream type of the current data stream to the operation and maintenance support system OSS through the transceiver, and the information of the data stream type of the current data stream is used by the OSS to generate information for the current data stream The flow control strategy.
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the message length includes one or more of the Ethernet frame length, IP length, transmission protocol length, and header length in the message
  • the transmission protocol includes transmission control protocol TCP and / Or User Datagram Protocol UDP.
  • each operation may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
  • FIG. 10 is a device 100 provided by an embodiment of the present invention.
  • the device 100 may be the first device in the method embodiment shown in FIG. 3.
  • the device 100 includes a processor 1001, a memory 1002, and a transceiver. 1003.
  • the processor 1001, the memory 1002, and the transceiver 1003 are connected to each other through a bus.
  • the memory 1002 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or A portable read-only memory (compact disc read-only memory, CD-ROM), the memory 1002 is used for related computer programs and data.
  • the transceiver 1003 is used to receive and send data.
  • the processor 1001 may be one or more central processing units (CPUs).
  • CPUs central processing units
  • the processor 1001 is a CPU
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 1001 reads the computer program code stored in the memory 1002, and is used to perform the following operations:
  • the first information is sent to the other device (the second device in the method embodiment shown in FIG. 3) through the transceiver to update the content recognition model, and the content recognition model is used to obtain information corresponding to At least one second confidence level of at least one data stream type, wherein the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is Obtained according to the behavior recognition model, the behavior recognition model is a model obtained according to the message characteristics and data flow types of multiple data flow samples, and the message characteristics include message length, message transmission speed, and message interval time And one or more of the message direction.
  • the behavior recognition model and the content recognition model are involved.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the feature information of the data stream and the behavior model.
  • the identified data stream type is trained, so the content recognition model is an online learning model; the behavior recognition model can identify the data stream types of some basic (or typical) data streams, and certain characteristics of the same type of data streams (such as The message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning feature of the content recognition model can be used to identify the type of data stream from other aspects (such as destination address, protocol type, etc.).
  • the hedging behavior recognition model recognizes the data stream type of the data stream during the data stream transmission process.
  • this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the recognition accuracy of the data stream type of the data stream. , It also improves the generalization of identification, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • each operation may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
  • any of the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that there is a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art can understand and implement it without creative work.
  • An embodiment of the present invention also provides a chip system, the chip system includes at least one processor, a memory, and an interface circuit.
  • the memory, the transceiver, and the at least one processor are interconnected by wires, and the at least one memory
  • a computer program is stored therein; when the computer program is executed by the processor, the method flow shown in FIG. 3 is realized.
  • the embodiment of the present invention also provides a computer-readable storage medium in which a computer program is stored, and when it runs on a processor, the method flow shown in FIG. 3 is implemented.
  • the embodiment of the present invention also provides a computer program product.
  • the computer program product runs on a processor, the method flow shown in FIG. 3 is realized.
  • this application relates to a behavior recognition model and a content recognition model.
  • the behavior recognition model is pre-trained from the message characteristics and data stream types of multiple data stream samples, and the content recognition model is based on the feature information and data stream characteristics.
  • the data stream type identified by the behavior model is trained, so the content recognition model is an online learning model; the behavior recognition model can identify some basic (or typical) data stream types, some of the same type of data stream.
  • the characteristics (such as message length, message transmission speed, etc.) may change in the subsequent transmission process. Therefore, the online learning characteristics of the content recognition model are combined with other aspects (such as destination address, protocol type, etc.) to determine the type of the data stream.
  • Recognition can hedge the recognition error of the data stream type of the data stream by the behavior recognition model in the data stream transmission process. Therefore, this application adopts the recognition method that combines the behavior recognition model and the content recognition model to improve the data stream type of the data stream.
  • the recognition accuracy also improves the generalization of recognition, and can be applied to various scenarios such as application cloud deployment, application transmission encryption, and private applications.
  • the computer program can be stored in a computer readable storage medium.
  • the computer program During execution, it may include the processes of the foregoing method embodiments.
  • the aforementioned storage media include: ROM or random storage RAM, magnetic disks or optical discs and other media that can store computer program codes.
  • the first device, the first confidence level, the first data stream type, the first preset threshold, the first information, and the "first" in the first record mentioned in the embodiment of the present invention are only used for name identification, not Represents the first in the order. This rule also applies to “second”, “third” and “fourth”. However, the "first” in the first identifier mentioned in the embodiment of the present invention represents the first in order. This rule also applies to the "Nth”.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种数据流类型识别方法及相关设备。该方法根据当前数据流的报文特征和行为识别模型得到当前数据流的对应于至少一个数据流类型的至少一个第一置信度,行为识别模型为根据多个数据流样本得到的模型;根据当前数据流的特征信息和内容识别模型得到当前数据流的对应于至少一个数据流类型的至少一个第二置信度,内容识别模型为根据一条或多条历史数据流得到的模型;根据至少一个第一置信度和至少一个第二置信度确定当前数据流的数据流类型。采用该方法能够更准确地识别出数据流类型。

Description

一种数据流类型识别方法及相关设备
本申请要求于2019年9月16日提交的申请号为201910872990.0、发明名称为“一种数据流类型识别方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机技术领域和通信领域,进一步涉及人工智能(Artificial Intelligence,AI)在计算机技术领域和通信领域的应用,尤其涉及一种数据流类型识别方法及相关设备。
背景技术
随着计算机技术的迅猛发展,越来越多的企业使用私有的办公类应用进行办公,如桌面云、语音会议、视频会议等均属于私有的办公类应用。为了合理安排各业务的流量提高业务的可靠性,通常需要合理配置QoS优先级、实时选路等。而QoS优先级、实时选路等的前提是需要获知当前的办公应用属于哪种类型的应用(即哪种数据流类型)。
目前主要采用如下两种方式来获知办公应用的类型:第一种,基于人工识别的方式,主要是手工配置规则来匹配网络流量中的关键字(包括报头数据、载荷数据等),从而识别出应用类型。然而,人工识别的方式耗时耗力,而且随着企业办公应用云化带来的动态性,出于安全通常会进行传输加密,这就导致人工难以设定出应用类型的识别规则,从而很难识别出应用类型。第二种,采用离线学习的方式,主要是预先采集样本数据,然后通过人工或使用第三方工具对样本数据进行标注,然后对标注过的样本数据使用机器学习或者神经网络的算法进行模型的离线训练。将离线训练好的模型,用于推理现网流量的应用类型。但是,企业应用每次进行改动或更新,都可能训练出的模型无法使用;另外,私有应用的实现没有规范,不同企业的同类型应用,适用的模型也不一样。即以上训练出的模型在用于改动、更新后的应用或另一个企业的应用的类型识别时,都可能出现识别准确率不高的问题。
如何快速准确地识别出办公应用的应用类型(也称数据流类型)是本领域的技术人员正在研究的技术问题。
发明内容
本发明实施例公开了一种数据流类型识别方法及相关设备,能够更准确地识别出数据流类型。
第一方面,本申请实施例提供一种数据流类型识别方法,该方法包括:
根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度,其中,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;
根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度,其中,所述特征信息包括目的地址和协议类型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到的;
根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
上述方法中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
结合第一方面,在第一方面的第一种可能的实现方式中,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,包括:
根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
在上述方法中,预先根据内容识别模型和行为识别模型各自对最终识别结果的影响程度,分别为两种模型配置置信度的权重(为行为识别模型配置的权重为第一置信度的权重,为内容识别模型配置的权重为第二置信度的权重),因此基于这两个置信度的权重计算出的综合置信度更能体现当前数据流的实际类型。另外,引入第一预设阈值来衡量对应的数据流类型是否可取,能够提高确定数据流类型的效率和准确性。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第二种可能的实现方式中,所述方法还包括:
若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则向设备发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值;
接收所述设备发送的第一信息,所述第一信息是所述设备根据所述当前数据流的特征信息和所述第一数据流类型的标识信息得到的;
根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第三种可能的实现方式中,还包括:
若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行校正。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,将当前数据流的相关信息发送给设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第四种可能 的实现方式中,所述根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,包括:
若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;
对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
也即是说,如果在已经识别过的记录中,存在历史数据流的特征信息与当前数据流的特征信息相同但数据流类型不同的情况,则将记录中的数据流类型更新为当前数据流的第一数据流类型,这主要是为了适应云资源的弹性部署,例如,同样的云资源在前一段时间用于视频会议,在下一段时间用于桌面云;采用上述方法能够在下一段时间内及时将数据流类型更新为桌面云,从而在弹性部署云资源的情况下仍然可以准确识别出当前数据流的数据流类型。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第五种可能的实现方式中,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型之后,还包括:
向运维支持系统OSS发送所述当前数据流的数据流类型的信息,所述当前数据流的数据流类型的所述信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第六种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP长度、传输协议长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
第二方面,本申请实施例提供一种数据流类型识别方法,该方法包括:
接收设备发送的当前数据流的特征信息和数据流类型的信息;
根据所述当前数据流的特征信息和数据流类型的信息生成第一信息;
向所述设备发送所述第一信息,以用于更新内容识别模型,所述内容识别模型用于得到对应于至少一个数据流类型的至少一个第二置信度,其中,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
上述方法中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度, 也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
第三方面,本申请实施例提供一种数据流类型识别设备,包括存储器和处理器,其中,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:
根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度,其中,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;
根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度,其中,所述特征信息包括目的地址和协议类型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到的;
根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
上述方法中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
结合第三方面,在第三方面的第一种可能的实现方式中,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,具体为:
根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
在上述操作中,预先根据内容识别模型和行为识别模型各自对最终识别结果的影响程度,分别为两种模型配置置信度的权重(为行为识别模型配置的权重为第一置信度的权重,为内容识别模型配置的权重为第二置信度的权重),因此基于这两个置信度的权重计算出的综合置信度更能体现当前数据流的实际类型。另外,引入第一预设阈值来衡量对应的数据流类型是否可取,能够提高确定数据流类型的效率和准确性。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第二种可能的实现方式中,所述设备还包括收发器,所述处理器还用于:
若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则通过所述收发器向其他设备发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值;
通过所述收发器接收所述其他设备发送的第一信息,所述第一信息是所述其他设备根据 所述当前数据流的特征信息和所述第一数据流类型的标识信息得到的;
根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第三种可能的实现方式中,所述处理器还用于:
若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行校正。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,将当前数据流的相关信息发送给设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第四种可能的实现方式中,所述根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,具体为:
若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;
对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
也即是说,如果在已经识别过的记录中,存在历史数据流的特征信息与当前数据流的特征信息相同但数据流类型不同的情况,则将记录中的数据流类型更新为当前数据流的第一数据流类型,这主要是为了适应云资源的弹性部署,例如,同样的云资源在前一段时间用于视频会议,在下一段时间用于桌面云;采用上述方法能够在下一段时间内及时将数据流类型更新为桌面云,从而在弹性部署云资源的情况下仍然可以准确识别出当前数据流的数据流类型。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第五种可能的实现方式中,所述设备还包括收发器,所述处理器还用于,在所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型之后,通过所述收发器向运维支持系统OSS发送所述当前数据流的数据流类型的信息,所述当前数据流的数据流类型的所述信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第六种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP长度、传输协议长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
第四方面,本申请实施例提供一种数据流类型识别设备,包括存储器、处理器和收发器,其中,存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:
通过所述收发器接收其他设备发送的当前数据流的特征信息和数据流类型的信息;
根据所述当前数据流的特征信息和数据流类型的信息生成第一信息;
通过所述收发器向所述其他设备发送所述第一信息,以用于更新内容识别模型,所述内 容识别模型用于得到对应于至少一个数据流类型的至少一个第二置信度,其中,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
上述操作中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
第五方面,本申请实施例提供一种数据流类型识别设备,该设备包括:
第一识别单元,用于根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度,其中,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;
第二识别单元,用于根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度,其中,所述特征信息包括目的地址和协议类型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到的;
确定单元,用于根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
上述设备中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
结合第五方面,在第五方面的第一种可能的实现方式中,所述确定单元,用于根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,具体为:
用于根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数 据流的数据流类型为所述第一数据流类型。
在上述设备中,预先根据内容识别模型和行为识别模型各自对最终识别结果的影响程度,分别为两种模型配置置信度的权重(为行为识别模型配置的权重为第一置信度的权重,为内容识别模型配置的权重为第二置信度的权重),因此基于这两个置信度的权重计算出的综合置信度更能体现当前数据流的实际类型。另外,引入第一预设阈值来衡量对应的数据流类型是否可取,能够提高确定数据流类型的效率和准确性。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第二种可能的实现方式中,所述设备还包括:
第一发送单元,用于在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值的情况下,向其他设备发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值;
接收单元,用于接收所述其他设备发送的第一信息,所述第一信息是所述其他设备根据所述当前数据流的特征信息和所述第一数据流类型的标识信息得到的;
更新单元,用于根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第三种可能的实现方式中,还包括:
更新单元,用于在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值的情况下,根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行校正。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,将当前数据流的相关信息发送给设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第四种可能的实现方式中,所述根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,具体为:
若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;
对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
也即是说,如果在已经识别过的记录中,存在历史数据流的特征信息与当前数据流的特征信息相同但数据流类型不同的情况,则将记录中的数据流类型更新为当前数据流的第一数据流类型,这主要是为了适应云资源的弹性部署,例如,同样的云资源在前一段时间用于视频会议,在下一段时间用于桌面云;采用上述方法能够在下一段时间内及时将数据流类型更新为桌面云,从而在弹性部署云资源的情况下仍然可以准确识别出当前数据流的数据流类型。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第五种可能的实现方式中,所述设备还包括:
第二发送单元,用于在所述确定单元根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型之后,向运维支持系统OSS发送所述当前数据流的数据流类型的信息,所述当前数据流的数据流类型的所述信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第六种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP长度、传输协议长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
第六方面,本申请实施例提供一种数据流类型识别设备,该设备包括:
接收单元,用于接收其他设备发送的当前数据流的特征信息和数据流类型的信息;
生成单元,用于根据所述当前数据流的特征信息和数据流类型的信息生成第一信息;
发送单元,用于向所述其他设备发送所述第一信息,以用于更新内容识别模型,所述内容识别模型用于得到对应于至少一个数据流类型的至少一个第二置信度,其中,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
上述设备中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
第七方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在处理器上运行时,实现第一方面或者第一方面的任意可能的实现方式所描述的方法。
第八方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品存储在存储器上,当所述计算机程序产品在处理器上运行时,实现第一方面或者第一方面的任意可能的实现方式所描述的方法。
第九方面,本申请实施例提供了一种数据流类型识别系统,该系统包括第一设备和第二设备,其中,所述第二设备为上述第三方面,或者第三方面的任一可能的实现方式,或者第五方面,或者第五方面的任一可能的实现方式所描述的数据流类型识别设备;所述第一设备为上述第四方面,或者第四方面的任一可能的实现方式,或者第六方面,或者第六方面的任一可能的实现方式所描述的数据流类型识别设备。
本申请实施例中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模 型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
附图说明
以下对本发明实施例用到的附图进行介绍。
图1是本发明实施例提供的一种数据流类型的识别系统的架构示意图;
图2A是本发明实施例提供的内容识别模型和行为识别模型的场景示意图;
图2B是本发明实施例提供的一种分类模型的结构示意图;
图2C是本发明实施例提供的一种分类模型的结构示意图;
图2D是本发明实施例提供的一种分类模型的结构示意图;
图3是本发明实施例提供的一种数据流类型识别方法的流程示意图;
图4是本发明实施例提供的数据流a和数据流b的场景示意图;
图5是本发明实施例提供的数据流记录的样例示意图;
图6是本发明实施例提供的数据流记录的样例示意图;
图7是本发明实施例提供的一种第二设备的结构示意图;
图8是本发明实施例提供的一种第一设备的结构示意图;
图9是本发明实施例提供的又一种第二设备的结构示意图;
图10是本发明实施例提供的又一种第一设备的结构示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。
请参见图1,图1是本发明实施例提供的一种数据流类型的识别系统的架构示意图,该系统包括运维支持系统(operational system support,OSS)101、服务器102、转发设备103、终端104,其中,终端104的数量可以为多个,图中仅以6个为例进行了示意,终端104上用于运行各种应用,例如,视频会议应用、语音会议应用、桌面云应用等等,不同的应用所产生的数据流的数据流类型(也称应用类型)往往不同。本申请实施例中,终端104所产生的数据流需要先通过转发设备103发往目的设备,其中,转发设备可以包括路由器、交换机等,转发设备103的数量可以是一个也可以是多个,例如,存在一个路由器和三个交换机;再如,仅存在一个交换机;再如,存在三个交换机,等等。上述服务器102可以为一个服务器或者多个服务器组成的服务器集群。
其中,终端104所产生的数据流在该终端104上该如何发送,以及在转发设备103上该如何转发,都可以按照OSS101生成的流量控制策略来进行,例如,当流量控制策略规定视频会议应用所产生的数据流具有最高优先级时,假若在终端104,或者转发设备103上有包括视频会议应用所产生的数据流在内的多种数据流需要发送时,会优先传输视频会议应用所产生的数据流。需要说明的是,该流量控制策略由OSS101基于当前数据流的数据流类型来生成的。本申请实施例中,OSS101生成流量控制策略所用到的当前数据流的数据流类型是通过第二设备来确定的。
如图2A所示,第二设备在确定当前数据流的数据流类型的时候需要用到行为识别模型 和内容识别模型,其中,模型的参数可以包括但不限于置信度权重向量(w1,w2)、数据流类型的第一预设阈值θ1、第二预设阈值θ2,其中,第一预设阈值也可以称为分类阈值,用于衡量是否将数据流类型划分到某一类;第二预设阈值也称为模型更新阈值,用于衡量何时对内容识别模型进行更新。这些模型参数将在后续的方法流程中做更具体的阐述。其中,内容识别模型在对当前数据流进行数据流类型识别时的输入可以包括特征信息(如目的IP、目的端口、协议类型等),行为识别模型在对当前数据流进行数据流类型识别时的输入可以包括报文信息(如报文长度、报文传输速度、报文间隔时间、报文方向等)。第二设备对内容识别模型得到的置信度和行为识别模型得到的置信度基于置信度权重向量(w1,w2)得到最终的数据流类型;这个过程中,如果通过第二预设阈值θ2确定内容识别模型需要进行更新,则获取更新内容识别模型所需的参数。
可选的,更新所需要用到的参数可以是第二设备通过对一些数据流的相关信息进行训练得到,也可以是第一设备通过对一些数据流的相关信息进行训练得到,然后发送给第二设备。
本申请实施例中,该第二设备可以是上述OSS101,也可以是上述服务器102,也可以是上述转发设备103;另外,该第一设备可以是上述OSS101,也可以是上述服务器102,也可以是上述转发设备103。结合以上描述,该第一设备与该第二设备可以是相同的设备,也可以是不同的设备。并且,当第一设备和第二设备都不是上述服务器102时,可以认为图1所示的架构的中不存在服务器102。
需要说明的是,上述内容识别模型实质上是一个分类模型,如图2B所示,分类模型可以是树模型,如图2C所示,分类模型还可以是神经网络模型,如图2D所示,分类模型还可以是支持向量机(support vector machine,SVM)模型,当分类模型还可以是其他形式的模型。可选的,由于上述内容识别模型是通过提取输入的向量(如目的互联网协议(internet protocol,IP)地址、目的端口号、协议类型等特征信息)的特征来分类得到的,因此该内容识别模型可以将同一目的IP地址,同一协议类型,相同端口号的数据流识别为相同的数据流类型;将同一网段,同一协议类型,相近端口号的数据流识别为相同的数据流类型;将目的端口号是20(文件传输协议(file transfer protocol,FTP)的知名端口号)的传输控制协议(transmission control protocol,TCP)流量识别为下载的数据流类型。
另外,该第二设备在识别当前数据流的数据流类型时需要用到的一些关于当前数据流的信息,都可以由终端104或者其他设备发送给第二设备,或者第二设备自己采集得到。
请参见图3,图3是本发明实施例提供的一种数据流类型识别方法,该方法可以基于图1所示的架构来实现,该方法包括但不限于如下步骤:
步骤S301:第二设备根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度。
具体地,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;可选的,该多个数据流样本可以为离线样本,即该行为识别模型可以为离线训练得到的模型。该多个数据流样本还可以为预先挑选的比较典型(或说有代表性)的样本,例如,视频会议应用的数据流的报文大多时候的报文长度比较长,但是也会偶尔出现报文长度比较短的时候,相比而言,报文长度比较长更能反映当前数据流为视频会议应用的数据流,因此,在选择关于视频会议应用的数据流时,尽量选择报文长度比较长的这种有代表性的作为数据流样本。可选的,该多个数据流样本的数据流类型可以是认为确定的,即人工打标签。由于所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,因此该行为识别模型 能够反映一个数据流中的报文特征与数据流类型之间的一些关系,因此,当向该行为识别模型输入当前数据流的报文特征时,其能够一定程度上预测该当前数据流属于某个或者某些数据流类型的倾向(或者说概率),体现倾向(或者说概率)的参数也可以称为置信度。
本申请实施例中,所述报文特征可以包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,可选的,所述报文长度包括报文中以太帧长度、IP长度、传输协议长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。当然,报文特征除了包括这里举例的特征外,还可以包括其他特征,例如,报文长度、报文传输速度、报文间隔时间和报文方向中的最大值、最小值、均值、方差、分位数等。该报文特征可以是以向量的形式输入到行为识别模型中,例如可以是(报文长度,报文传输速度,报文间隔时间)这种形式。另外,本申请实施例中的数据流类型还可以称之为应用类型。
第一种可能情况中,数据流类型可能有N个,N大于或者等于1,本申请实施例可以估计(或者说预测)出当前数据流属于该N个数据流类型中每个类型的置信度,即得到所述当前数据流的对应于N个数据流类型的N个第一置信度,举例来说,假若该N个数据流类型指的是视频会议的数据流类型、语音会议的数据流类型、桌面云的数据流类型,那么,需要通过行为识别模型估计当前数据流属于视频会议的数据流类型的第一置信度、属于语音会议的数据流类型的第一置信度、属于桌面云的数据流类型的第一置信度。假若N个数据流类型指的是视频会议的数据流类型,则需要通过行为识别模型估计当前数据流属于视频会议的数据流类型的第一置信度。
第二种可能情况,数据流类型可能有多个,但本申请实施例重点关注其中一个数据流类型,因此本申请实施例只估计(或者说预测)当前数据流属于该重点关注的数据流类型的置信度,即得到所述当前数据流的对应于一个数据流类型的一个第一置信度,举例来说,假若该多个数据流类型指的是视频会议的数据流类型、语音会议的数据流类型、桌面云的数据流类型,但本申请实施例只关注视频会议的数据流类型,因此只需通过行为识别模型估计当前数据流属于视频会议的数据流类型的第一置信度即可。
步骤S202:第二设备根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度。
具体地,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型。可选的,该一条或者多条历史数据流可以为在线数据流,即在此之前的一段时间内持续产生的一条或者多条数据流,所述历史数据流的数据流类型是由上述行为识别模型识别得到的,即该内容识别模型可以为在线训练得到的模型。由于所述内容识别模型为根据一条或者多条历史数据流的特征信息和数据流类型得到的模型,因此该内容识别模型能够反映一个数据流中的特征信息与数据流类型之间的一些关系,因此,当向该内容识别模型输入当前数据流的特征信息时,其能够一定程度上预测该当前数据流属于某个或者某些数据流类型的倾向(或者说概率),体现倾向(或者说概率)的参数也可以称为置信度。
本申请实施例中,所述特征信息可以包括目的地址、协议类型、端口号中的一项或者多项,其中,目的地址可以为IP地址,也可以为目的MAC地址,还可以为其他形式的地址;当然,特征信息除了包括这里举例的特征外,还可以包括其他特征。进一步地,这里的特征信息可以是针对目标的信息,例如,目标IP,目标端口等。该特征信息可以是以向量的形式输入到内容识别模型中,可以是(ip,port,protocol)这种形式,例如可以为(10.29.74.5,8443,6)。也可以是(mac,port,protocol)这种形式,例如可以为(05FA1525EEFF,8443, 6)。当然还可以是其他形式,此处不再一一举例。
第一种可能情况中,数据流类型可能有N个,本申请实施例可以估计(或者说预测)出当前数据流属于该N个数据流类型中每个类型的置信度,即得到所述当前数据流的对应于N个数据流类型的N个第二置信度,举例来说,假若该N个数据流类型指的是视频会议的数据流类型、语音会议的数据流类型、桌面云的数据流类型,那么,需要通过内容识别模型估计当前数据流属于视频会议的数据流类型的第二置信度、属于语音会议的数据流类型的第二置信度、属于桌面云的数据流类型的第二置信度。假若N个数据流类型指的是视频会议的数据流类型,则需要通过内容识别模型估计当前数据流属于视频会议的数据流类型的第二置信度。
第二种可能情况,数据流类型可能有多个,但本申请实施例重点关注其中一个数据流类型,因此本申请实施例只估计(或者说预测)当前数据流属于该重点关注的数据流类型的置信度,即得到所述当前数据流的对应于一个数据流类型的一个第二置信度,举例来说,假若该多个数据流类型指的是视频会议的数据流类型、语音会议的数据流类型、桌面云的数据流类型,但本申请实施例只关注视频会议的数据流类型,因此只需通过内容识别模型估计当前数据流属于视频会议的数据流类型的第二置信度即可。
步骤S303:第二设备根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
具体地,因为至少一个第一置信度能够在一定程度上表征当前数据流的数据类型倾向,至少一个第二置信度也可以在一定程度上表征当前数据流的数据流类型倾向,因此对两者进行综合考虑可以获得更准确可信的数据流类型倾向,从而得出当前数据流的数据流类型。
一种可选的方案中,得到的哪种数据流类型的综合置信度最大,则将该数据流类型确定为当前数据流的数据流类型,例如,根据视频会议的数据流类型的第一置信度和视频会议的数据流类型的第二置信度确定出的视频会议的数据流类型的综合置信度为0.7;根据语音会议的数据流类型的第一置信度和语音会议的数据流类型的第二置信度确定出的语音会议的数据流类型的综合置信度为0.2;根据桌面云的数据流类型的第一置信度和桌面云的数据流类型的第二置信度确定出的桌面云的数据流类型的综合置信度为0.1;由于视频会议的数据流类型的综合置信度最大,因此将当前数据流的数据流类型确定为视频会议的数据流类型。
又一种可选的方案中,所述根据至少一个第一置信度和至少一个第二置信度确定所述当前数据流的数据流类型,可以具体为:根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个,也即是说,该至少一个数据流类型中的每个数据流类型均满足这里的第一数据流类型的特征。若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型,例如,假若对应于视频会议的数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为视频会议的数据流类型;假若对应于桌面云的数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为桌面云的数据流类型。
举例来说,假若置信度权重向量(w1,w2)为(0.4,0.6),即第一置信度权重(也可以认为是行为识别模型的权重)为0.6,第二置信度权重(也可以认为是内容识别模型的权重)为0.4、数据流类型的第一预设阈值θ1等于0.5。在第二设备进行数据流类型识别之初,由于内容识别模型未得到充分训练,因此行为识别模型能够对输入的数据流进行数据流类型的识别,而内容识别模型无法对输入的流数据进行数据流类型的识别,因此起初内容识别模型识 别的对应于任何数据流类型的置信度均为0。
假若存在两条桌面云的数据流,如图4所示,横轴代表数据流序号,纵轴代表报文长度,报文长度大于0为上行报文,报文长度小于0为下行报文;数据流a和数据流b均为桌面云的数据流类型,其中,数据流b有上行的报文,相对来说更能代表桌面云场景的特性,因此认为数据流b的报文行为是典型的行为。数据流a在很长一段时间内没有上行报文,无法明显表明其为桌面云场景,因此认为数据流a的报文行为是不典型的行为;行为识别模型通常是通过对典型行为的数据流进行训练得到,因此行为识别模型能够识别出数据流b的数据流类型,但无法识别出数据流a的数据流类型。数据流a和数据流b的特征信息如下。
数据流a的协议类型是TCP,目的IP地址是10.129.74.5,目的端口号是8443。
数据流b的协议类型是TCP,目的IP地址是10.129.56.39,目的端口号是443。
那么针对数据流a,内容识别模型识别是桌面云的数据流类型、语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文特征识别是桌面云的数据流类型的第一置信度是0.5、识别是语音会议的数据流类型的第一置信度均是0、识别是视频会议的数据流类型的第一置信度是0。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:0*0.4+0.5*0.6=0.3,小于θ1,因此当前数据流不是桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是视频会议的数据流类型。
那么针对数据流b,内容识别模型识别是桌面云的数据流类型、语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文特征识别是桌面云的数据流类型的第一置信度是0.9、识别是语音会议的数据流类型的第一置信度均是0、识别是视频会议的数据流类型的第一置信度是0。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:0*0.4+0.9*0.6=0.54,大于θ1,因此当前数据流是桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是视频会议的数据流类型。
本申请实施例中,该内容识别模型还可以进行更新,下面提供两种不同的更新方案。
方案一,若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值且小于第二预设阈值θ2,则第二设备向第一设备发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值。例如,针对数据流a来说,对应于桌面云的数据流类型的综合置信度0.3不在区间(θ1,θ2)内,因此无需向第一设备发送所述当前数据流的特征信息和桌面云的数据流类型的信息;再如,针对数据流b来说,对应于桌面云的数据流类型的综合置信度0.54不在区间(θ1,θ2)内,因此需要向第一设备发送所述当前数据流的特征信息(如目的IP地址10.129.56.39、目的端口号443、协议类TCP型)和桌面云的数据流类型的信息(如名称、标识等信息)。
相应的,第一设备接收第二设备发送的所述当前数据流的特征信息和所述第一数据流类型的信息,即该第一设备上就多了一条数据流记录,如图5,多了一条针对数据流b的记录。然后,该第一设备根据当前数据流的特征信息和所述第一数据流类型的信息得到的第一信息,例如,对计算置信度有影响的一些参数。可选的,第一信息属于模型文件,因此可以以模型文件的方式传输,常见开源keras库的ai模型文件是h5文件/json文件;开源sklearn库的ai模型文件是pkl/m文件。这些文件都是二进制的,用于保存模型的结构、模型的权重。
接着,第二设备接收第一设备发送的第一信息,然后根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。由于数据流b的记录中,目的IP地址为10.129.56.39、 目的端口号为443、协议类型为TCP共同对应的是桌面云的数据流类型,因此,更新后的内容识别模型再对输入的数据流b进行估计时,估计出的对应于桌面云的数据流类型的第二置信度为1。可选的,由于数据流a的特征信息与数据流b的特征信息有相似之处,例如,目的IP地址在同一网段,端口号相似,协议类型相似,因此,更新后的内容识别模型再对输入的数据流a进行估计时,估计的结果会更接近对数据流b的估计结果,例如,估计出的对应于桌面云的数据流类型的第二置信度可能为0.6。
后续网络中再出现数据流a和数据流b需要估计数据流类型时,执行流程如下。
置信度权重向量(w1,w2)、第一预设阈值θ1、第二预设阈值θ2依旧不变。
那么针对数据流a,内容识别模型识别是桌面云的数据流类型的第二置信度为0.6,识别是语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文特征识别是桌面云的数据流类型的第一置信度是0.5、识别是语音会议的数据流类型的第一置信度均是0、识别是视频会议的数据流类型的第一置信度是0。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:0.6*0.4+0.5*0.6=0.54,大于θ1,因此当前数据流是桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是视频会议的数据流类型。
由于识别是桌面云的数据流类型的第二置信度0.54在区间(θ1,θ2)内,因此需要向第一设备发送所述当前数据流的特征信息和桌面云的数据流类型的信息,以用于后续对内容识别模型进行更新(更新原理前面已有介绍,此处不再赘述)。
那么针对数据流b,内容识别模型识别是桌面云的数据流类型的数据流类型的第二置信度是1、识别是语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文特征识别是桌面云的数据流类型的第一置信度是0.9、识别是语音会议的数据流类型的第一置信度均是0、识别是视频会议的数据流类型的第一置信度是0。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:1*0.4+0.9*0.6=0.94,大于θ1,因此当前数据流是桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是视频会议的数据流类型。
由于识别是桌面云的数据流类型的第二置信度0.94不在区间(θ1,θ2)内,因此不需要向第一设备发送所述当前数据流的特征信息和桌面云的数据流类型的信息。
在网络中存在云化的弹性部署场景,例如,目的IP地址为10.129.56.39,目的端口号为443,协议类型为TCP的云资源从为桌面云提供服务变成为视频会议提供服务。针对这种场景,上述第一设备根据当前数据流的特征信息和所述第一数据流类型的信息得到的第一信息,可以包括:若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;然后对包括所述第二记录的多条记录进行训练,以得到第一信息。
举例来说,数据流c的协议类型是TCP,目的IP地址是10.129.56.39,目的端口号是443。
那么针对数据流c,内容识别模型识别是桌面云的数据流类型的数据流类型的第二置信度是1、识别是语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文特征识别是桌面云的数据流类型的第一置信度是0、识别是语音会议的数据流类型的第一置信度是0、识别是视频会议的数据流类型的第一置信度是0.9。因此,对应于 这三种数据流类型的综合置信度如下。
桌面云:1*0.4+0*0.6=0.4,小于θ1,因此当前数据流不是桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0.9*0.6=0.54,大于θ1,因此当前数据流是视频会议的数据流类型。
由于识别是视频会议的数据流类型的第二置信度0.54在区间(θ1,θ2)内,因此需要向第一设备发送所述当前数据流的特征信息和桌面云的数据流类型的信息。相应的,第一设备接收第二设备发送的所述当前数据流的特征信息和所述第一数据流类型的信息,即该第一设备上就有了一条数据流记录,如图6,多了一条针对数据流c的记录。
对比图6和图5就可以发现,针对数据流c的记录与针对数据流b的记录相比,数据流类型(即应用类别)不同但特征信息(如目的IP地址、目的端口号和协议类型)相同,因此修改已有的记录,使得协议类型TCP、目的IP地址10.129.56.39、目的端口号443共同对应的数据流类型为视频会议的数据流类型,而不再是桌面云对应的数据流类型,修改前的记录即为第一记录,修改后的记录即为第二记录。
可以理解,将第一记录更新为第二记录之后,基于第二记录训练得到的第一信息来更新内容识别模型后,上述数据流c再次输入到更新后的内容识别模型时,识别其为桌面云的数据流类型的第二置信度为0,识别其为视频会议的数据流类型的第二置信度为1。
方案二,若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,第二设备无需将当前数据流的特征信息和所述第一数据流类型的信息发送给第一设备,则而是由自己根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。可选的,若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述第一数据流类型不同,则第二设备将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。为了便于理解,具体原理可以参照上述方案一,由该第二设备替代上述方案一种第一设备执行的操作即可。
步骤S304:第二设备向运维支持系统OSS发送所述当前数据流的数据流类型的信息。
具体地,第二设备可以在每次确定出当前数据流的数据流类型时,向OSS发送当前数据流的数据流类型的信息,例如,第一次生成数据流a的数据流类型时,向OSS发送数据流a的数据流类型的信息,第一次生成数据流b的数据流类型时,向OSS发送数据流b的数据流类型的信息,第二次生成数据流a的数据流类型时,向OSS发送数据流a的数据流类型的信息,第二次生成数据流b的数据流类型时,向OSS发送数据流b的数据流类型的信息;生成数据流c的数据流类型时,向OSS发送数据流c的数据流类型的信息。可以理解的是,假若第二设备为该OSS,则不需要执行上述步骤S304。
步骤S305:OSS根据当前数据流的数据流类型的信息生成针对当前数据流的流量控制策略。例如,如果当前数据流的数据流类型信息表明该当前数据流为视频桌面云的数据流类型,或者视频会议的数据流类型,则将当前数据流定义为高优先级的QoS。
步骤S306:OSS向转发设备或者终端发送所述流量控制策略。
具体地,该转发设备103可以为路由器或者交换机等设备,该终端104为输出上述当前数据流的设备。假若转发设备或者终端根据流量控制策略获知当前数据流属于高优先级的QoS,则在发现有多种数据流待发送时,优先发送被配置为高优先级的当前数据流。
在图3所描述的方法中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数 据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
上述详细阐述了本发明实施例的方法,下面提供了本发明实施例的装置。
请参见图7,图7是本发明实施例提供的一种数据流类型识别设备70的结构示意图,该设备70可以为图3所示方法实施例中的第二设备,该设备70可以包括第一识别单元701、第二识别单元702和确定单元703,其中,各个单元的详细描述如下。
第一识别单元701,用于根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度,其中,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;
第二识别单元702,用于根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度,其中,所述特征信息包括目的地址和协议类型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到的;
确定单元703,用于根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
上述设备中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
在一种可能的实现方式中,所述确定单元703,用于根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,具体为:
用于根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
在上述设备中,预先根据内容识别模型和行为识别模型各自对最终识别结果的影响程度, 分别为两种模型配置置信度的权重(为行为识别模型配置的权重为第一置信度的权重,为内容识别模型配置的权重为第二置信度的权重),因此基于这两个置信度的权重计算出的综合置信度更能体现当前数据流的实际类型。另外,引入第一预设阈值来衡量对应的数据流类型是否可取,能够提高确定数据流类型的效率和准确性。
在又一种可能的实现方式中,所述设备70还包括:
第一发送单元,用于在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值的情况下,向其他设备(如图3所示方法实施例中的第一设备)发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值;
接收单元,用于接收所述其他设备(如图3所示方法实施例中的第一设备)发送的第一信息,所述第一信息是所述其他设备(如图3所示方法实施例中的第一设备)根据所述当前数据流的特征信息和所述第一数据流类型的标识信息得到的;
更新单元,用于根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。
在又一种可能的实现方式中,该设备70还包括:
更新单元,用于在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值的情况下,根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行校正。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,将当前数据流的相关信息发送给设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
在又一种可能的实现方式中,所述根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,具体为:
若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;
对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
也即是说,如果在已经识别过的记录中,存在历史数据流的特征信息与当前数据流的特征信息相同但数据流类型不同的情况,则将记录中的数据流类型更新为当前数据流的第一数据流类型,这主要是为了适应云资源的弹性部署,例如,同样的云资源在前一段时间用于视频会议,在下一段时间用于桌面云;采用上述方法能够在下一段时间内及时将数据流类型更新为桌面云,从而在弹性部署云资源的情况下仍然可以准确识别出当前数据流的数据流类型。
在又一种可能的实现方式中,所述设备70还包括:
第二发送单元,用于在所述确定单元根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型之后,向运维支持系统OSS发送所述当前数据流的数据流类型的信息,所述当前数据流的数据流类型的所述信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
在又一种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP长度、传输协议 长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
需要说明的是,各个单元的实现还可以对应参照图3所示的方法实施例的相应描述。
请参见图8,图8是本发明实施例提供的一种数据流类型识别设备80的结构示意图,该设备80可以为图3所示方法实施例中的第一设备,该设备80可以包括接收单元801、生成单元802和发送单元803,其中,各个单元的详细描述如下。
接收单元801,用于接收其他设备(如图3所示方法实施例中的第二设备)发送的当前数据流的特征信息和数据流类型的信息;
生成单元802,用于根据所述当前数据流的特征信息和数据流类型的信息生成第一信息;
发送单元803,用于向所述其他设备(如图3所示方法实施例中的第二设备)发送所述第一信息,以用于更新内容识别模型,所述内容识别模型用于得到对应于至少一个数据流类型的至少一个第二置信度,其中,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
上述设备80中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
需要说明的是,各个单元的实现还可以对应参照图3所示的方法实施例的相应描述。
请参见图9,图9是本发明实施例提供的一种设备90,该设备90可以为图3所示方法实施例中的第二设备,该设备90包括处理器901、存储器902和收发器903,所述处理器901、存储器902和收发器903通过总线相互连接。
存储器902包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器902用于相关计算机程序及数据。收发器903用于接收和发送数据。
处理器901可以是一个或多个中央处理器(central processing unit,CPU),在处理器901是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
处理器901读取所述存储器902中存储的计算机程序代码,用于执行以下操作:
根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度,其中,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;
根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度,其中,所述特征信息包括目的地址和协议类型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到的;
根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
上述操作中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
在一种可能的实现方式中,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,具体为:
根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
在上述操作中,预先根据内容识别模型和行为识别模型各自对最终识别结果的影响程度,分别为两种模型配置置信度的权重(为行为识别模型配置的权重为第一置信度的权重,为内容识别模型配置的权重为第二置信度的权重),因此基于这两个置信度的权重计算出的综合置信度更能体现当前数据流的实际类型。另外,引入第一预设阈值来衡量对应的数据流类型是否可取,能够提高确定数据流类型的效率和准确性。
在一种可能的实现方式中,所述处理器还用于:
若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则通过所述收发器向其他设备(如图3所示方法实施例中的第一设备)发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值;
通过所述收发器接收所述其他设备(如图3所示方法实施例中的第一设备)发送的第一信息,所述第一信息是所述其他设备(如图3所示方法实施例中的第一设备)根据所述当前数据流的特征信息和所述第一数据流类型的标识信息得到的;
根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。
在又一种可能的实现方式中,所述处理器还用于:
若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行校正。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小 于第二预设阈值时,将当前数据流的相关信息发送给设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
在又一种可能的实现方式中,所述根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,具体为:
若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;
对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
也即是说,如果在已经识别过的记录中,存在历史数据流的特征信息与当前数据流的特征信息相同但数据流类型不同的情况,则将记录中的数据流类型更新为当前数据流的第一数据流类型,这主要是为了适应云资源的弹性部署,例如,同样的云资源在前一段时间用于视频会议,在下一段时间用于桌面云;采用上述方法能够在下一段时间内及时将数据流类型更新为桌面云,从而在弹性部署云资源的情况下仍然可以准确识别出当前数据流的数据流类型。
在又一种可能的实现方式中,所述处理器还用于,在所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型之后,通过所述收发器向运维支持系统OSS发送所述当前数据流的数据流类型的信息,所述当前数据流的数据流类型的所述信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
在又一种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP长度、传输协议长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
需要说明的是,各个操作的实现还可以对应参照图3所示的方法实施例的相应描述。
请参见图10,图10是本发明实施例提供的一种设备100,该设备100可以为图3所示方法实施例中的第一设备,该设备100包括处理器1001、存储器1002和收发器1003,所述处理器1001、存储器1002和收发器1003通过总线相互连接。
存储器1002包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器1002用于相关计算机程序及数据。收发器1003用于接收和发送数据。
处理器1001可以是一个或多个中央处理器(central processing unit,CPU),在处理器1001是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
处理器1001读取所述存储器1002中存储的计算机程序代码,用于执行以下操作:
通过所述收发器接收其他设备(如图3所示方法实施例中的第二设备)发送的当前数据流的特征信息和数据流类型的信息;
根据所述当前数据流的特征信息和数据流类型的信息生成第一信息;
通过所述收发器向所述其他设备(如图3所示方法实施例中的第二设备)发送所述第一信息,以用于更新内容识别模型,所述内容识别模型用于得到对应于至少一个数据流类型的 至少一个第二置信度,其中,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
上述操作中,涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
需要说明的是,各个操作的实现还可以对应参照图3所示的方法实施例的相应描述。
需说明的是,以上描述的任意装置实施例都仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本发明提供的网络设备或主机的实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本发明实施例还提供一种芯片系统,所述芯片系统包括至少一个处理器,存储器和接口电路,所述存储器、所述收发器和所述至少一个处理器通过线路互联,所述至少一个存储器中存储有计算机程序;所述计算机程序被所述处理器执行时,实现图3所示的方法流程。
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在处理器上运行时,实现图3所示的方法流程。
本发明实施例还提供一种计算机程序产品,当所述计算机程序产品在处理器上运行时,实现图3所示的方法流程得以实现。
综上所述,本申请涉及行为识别模型和内容识别模型,行为识别模型为由多个数据流样本的报文特征和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到,因此内容识别模型就是一个在线学习模型;行为识别模型能够识别出一些基本(或说典型)数据流的数据流类型,同样类型的数据流的某些特征(例如报文长度、报文传输速度等)可能在后续传输过程中发生变化,因此结合内容识别模型的在线学习特性从其他方面(如目的地址、协议类型等)对该数据流的类型进行识别,能够对冲行为识别模型在数据流传输过程中对数据流的数据流类型的识别误差,因此本申请采用行为识别模型和内容识别模型相结合的识别方式,提高了数据流的数据流类型的识别准确度,也提升了识别的泛化性,能够适用于应用云化部署、应用传输加密、私有应用等多种场景。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来计算机程序相关的硬件完成,该计算机程序可存储于计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM 或随机存储记忆体RAM、磁碟或者光盘等各种可存储计算机程序代码的介质。
本发明实施例中提到的第一设备、第一置信度、第一数据流类型、第一预设阈值第一信息以及第一记录中的“第一”只是用来做名字标识,并不代表顺序上的第一。该规则同样适用于“第二”、“第三”和“第四”等。然而,本发明实施例中提到的第一个标识中的“第一个”代表顺序上的第一。该规则同样适用于“第N个”。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、替换、改进等,均应包括在本发明的保护范围之内。

Claims (23)

  1. 一种数据流类型识别方法,其特征在于,包括:
    根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度,其中,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;
    根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度,其中,所述特征信息包括目的地址和协议类型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到的;
    根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,包括:
    根据对应于第一数据流类型的所述第一置信度和对应于所述第一数据流类型的所述第二置信度计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;
    若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,包括:
    根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;
    若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
  4. 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:
    若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则向设备发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值;
    接收所述设备发送的第一信息,所述第一信息是所述设备根据所述当前数据流的特征信息和所述第一数据流类型的标识信息得到的;
    根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。
  5. 根据权利要求2或3所述的方法,其特征在于,还包括:
    若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,包括:
    若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;
    对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型之后,还包括:
    向运维支持系统OSS发送所述当前数据流的数据流类型的信息,所述当前数据流的数据流类型的信息用于所述OSS生成针对所述当前数据流的流量控制策略。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述报文长度包括报文中以太帧长度、IP长度、传输协议长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
  9. 一种数据流类型识别方法,其特征在于,包括:
    接收设备发送的当前数据流的特征信息和数据流类型的信息;
    根据所述当前数据流的特征信息和数据流类型的信息生成第一信息;
    向所述设备发送所述第一信息,以用于更新内容识别模型,所述内容识别模型用于得到对应于至少一个数据流类型的至少一个第二置信度,其中,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述当前数据流的特征信息和数据流类型的信息生成第一信息,包括:
    若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述当前数据流的数据流类型不同,则将所述第一记录的数据流类型更新为所述当前数据流的数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;
    对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
  11. 一种数据流类型识别设备,其特征在于,所述设备是第一设备,所述第一设备包括存储器和处理器,其中,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:
    根据当前数据流的报文特征和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度,其中,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型;所述报文特征包括报文长度、报文传输速度、报文间隔时间和报 文方向中的一项或者多项;
    根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度,其中,所述特征信息包括目的地址和协议类型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到的;
    根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型。
  12. 根据权利要求11所述的设备,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,具体为:
    根据对应于第一数据流类型的所述第一置信度和对应于所述第一数据流类型的所述第二置信度计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;
    若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
  13. 根据权利要求11所述的设备,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型,具体为:
    根据对应于第一数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述第一数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述第一数据流类型的综合置信度,所述第一数据流类型为所述至少一个数据流类型中的任意一个;
    若对应于所述第一数据流类型的所述综合置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述第一数据流类型。
  14. 根据权利要求12或13所述的设备,其特征在于,所述第一设备还包括收发器,所述处理器还用于:
    若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则通过所述收发器向第二设备发送所述当前数据流的特征信息和所述第一数据流类型的信息,所述第二预设阈值大于所述第一预设阈值;
    通过所述收发器接收所述第二设备发送的第一信息,所述第一信息是所述第二设备根据所述当前数据流的特征信息和所述第一数据流类型的标识信息得到的;
    根据所述第一信息更新所述内容识别模型,以得到新的内容识别模型。
  15. 根据权利要求12或13所述的设备,其特征在于,所述处理器还用于:
    若对应于所述第一数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
  16. 根据权利要求15所述的设备,其特征在于,所述根据所述当前数据流的特征信息和所述第一数据流类型信息更新所述内容识别模型,以得到新的内容识别模型,具体为:
    若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录 的数据流类型与所述第一数据流类型不同,则将所述第一记录的数据流类型更新为所述第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;
    对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
  17. 根据权利要求11-16任一项所述的设备,其特征在于,所述第一设备还包括收发器,所述处理器还用于,在所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的数据流类型之后,通过所述收发器向运维支持系统OSS发送所述当前数据流的数据流类型的信息,所述当前数据流的数据流类型的所述信息用于所述OSS生成针对所述当前数据流的流量控制策略。
  18. 根据权利要求11-17任一项所述的设备,其特征在于,所述报文长度包括报文中以太帧长度、IP长度、传输协议长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
  19. 一种数据流类型识别设备,所述设备是第一设备,其特征在于,所述第一设备包括存储器、处理器和收发器,其中,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:
    通过所述收发器接收第二设备发送的当前数据流的特征信息和数据流类型的信息;
    根据所述当前数据流的特征信息和数据流类型的信息生成第一信息;
    通过所述收发器向所述第二设备发送所述第一信息,以用于更新内容识别模型,所述内容识别模型用于得到对应于至少一个数据流类型的至少一个第二置信度,其中,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文特征和数据流类型得到的模型,所述报文特征包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
  20. 根据权利要求19所述的设备,其特征在于,所述根据所述当前数据流的特征信息和数据流类型的信息生成第一信息,具体为:
    若多条记录中的第一记录的特征信息与所述当前数据流的特征信息相同但所述第一记录的数据流类型与所述当前数据流的数据流类型不同,则将所述第一记录的数据流类型更新为所述当前数据流的数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;
    对包括所述第二记录的多条记录进行训练,以得到新的内容识别模型。
  21. 一种数据流类型识别设备,其特征在于,包括用于执行权利要求1-10任一项所述的方法的单元。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在处理器上运行时,实现权利要求1-10任一所述的方法。
  23. 一种计算机程序产品,其特征在于,所述计算机程序产品存储在存储器上,当所述计算机程序产品在处理器上运行时,实现权利要求1-10任一项所述的方法。
PCT/CN2020/115693 2019-09-16 2020-09-16 一种数据流类型识别方法及相关设备 WO2021052379A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR1020227010740A KR20220053658A (ko) 2019-09-16 2020-09-16 데이터 스트림 분류 방법 및 관련 장치
BR112022004814A BR112022004814A2 (pt) 2019-09-16 2020-09-16 Método de classificação de fluxo de dados e dispositivo relacionado
EP20865499.6A EP4024791A4 (en) 2019-09-16 2020-09-16 DATA FLOW TYPE IDENTIFICATION METHOD AND ASSOCIATED DEVICES
JP2022516688A JP7413515B2 (ja) 2019-09-16 2020-09-16 データストリーム分類方法および関連デバイス
US17/695,491 US11838215B2 (en) 2019-09-16 2022-03-15 Data stream classification method and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910872990.0A CN112511457B (zh) 2019-09-16 2019-09-16 一种数据流类型识别方法及相关设备
CN201910872990.0 2019-09-16

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/695,491 Continuation US11838215B2 (en) 2019-09-16 2022-03-15 Data stream classification method and related device

Publications (1)

Publication Number Publication Date
WO2021052379A1 true WO2021052379A1 (zh) 2021-03-25

Family

ID=74883367

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/115693 WO2021052379A1 (zh) 2019-09-16 2020-09-16 一种数据流类型识别方法及相关设备

Country Status (7)

Country Link
US (1) US11838215B2 (zh)
EP (1) EP4024791A4 (zh)
JP (1) JP7413515B2 (zh)
KR (1) KR20220053658A (zh)
CN (2) CN112511457B (zh)
BR (1) BR112022004814A2 (zh)
WO (1) WO2021052379A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220200869A1 (en) * 2017-11-27 2022-06-23 Lacework, Inc. Configuring cloud deployments based on learnings obtained by monitoring other cloud deployments
US20220247769A1 (en) * 2017-11-27 2022-08-04 Lacework, Inc. Learning from similar cloud deployments
WO2023005278A1 (zh) * 2021-07-27 2023-02-02 华为技术有限公司 一种流控处理的方法及通信装置
US11818156B1 (en) 2017-11-27 2023-11-14 Lacework, Inc. Data lake-enabled security platform

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11742901B2 (en) * 2020-07-27 2023-08-29 Electronics And Telecommunications Research Institute Deep learning based beamforming method and apparatus
CN113935431B (zh) * 2021-10-28 2022-04-08 北京永信至诚科技股份有限公司 一种多流关联分析识别私有加密数据的方法及系统
CN115037698B (zh) * 2022-05-30 2024-01-02 天翼云科技有限公司 一种数据识别方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011114060A2 (fr) * 2010-03-17 2011-09-22 Thales Procédé d'identification d'un protocole à l'origine d'un flux de données
CN107360032A (zh) * 2017-07-20 2017-11-17 中国南方电网有限责任公司 一种网络流识别方法及电子设备
CN108667747A (zh) * 2018-04-28 2018-10-16 深圳信息职业技术学院 网络流应用类型识别的方法、装置及计算机可读存储介质
CN108900432A (zh) * 2018-07-05 2018-11-27 中山大学 一种基于网络流行为的内容感知方法
CN110048962A (zh) * 2019-04-24 2019-07-23 广东工业大学 一种网络流量分类的方法、系统及设备

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007228489A (ja) 2006-02-27 2007-09-06 Nec Corp アプリケーション識別システム、アプリケーション識別方法及びアプリケーション識別用プログラム
US8634399B2 (en) 2006-04-12 2014-01-21 Qualcomm Incorporated Uplink and bi-directional traffic classification for wireless communication
US8682812B1 (en) * 2010-12-23 2014-03-25 Narus, Inc. Machine learning based botnet detection using real-time extracted traffic features
US9094288B1 (en) * 2011-10-26 2015-07-28 Narus, Inc. Automated discovery, attribution, analysis, and risk assessment of security threats
JP5812282B2 (ja) 2011-12-16 2015-11-11 公立大学法人大阪市立大学 トラヒック監視装置
WO2014117406A1 (zh) 2013-02-04 2014-08-07 华为技术有限公司 特征提取装置、网络流量识别方法、装置和系统
US20140321290A1 (en) * 2013-04-30 2014-10-30 Hewlett-Packard Development Company, L.P. Management of classification frameworks to identify applications
RU2589852C2 (ru) * 2013-06-28 2016-07-10 Закрытое акционерное общество "Лаборатория Касперского" Система и способ автоматической регулировки правил контроля приложений
US9721212B2 (en) * 2014-06-04 2017-08-01 Qualcomm Incorporated Efficient on-device binary analysis for auto-generated behavioral models
CN105007282B (zh) * 2015-08-10 2018-08-10 济南大学 面向网络服务提供商的恶意软件网络行为检测方法及系统
JP2017139580A (ja) 2016-02-02 2017-08-10 沖電気工業株式会社 通信解析装置及び通信解析プログラム
US20170364794A1 (en) * 2016-06-20 2017-12-21 Telefonaktiebolaget Lm Ericsson (Publ) Method for classifying the payload of encrypted traffic flows
CN107360159B (zh) * 2017-07-11 2019-12-03 中国科学院信息工程研究所 一种识别异常加密流量的方法及装置
CN108173708A (zh) * 2017-12-18 2018-06-15 北京天融信网络安全技术有限公司 基于增量学习的异常流量检测方法、装置及存储介质
CN108200032A (zh) * 2017-12-27 2018-06-22 北京奇艺世纪科技有限公司 一种数据检测方法、装置及电子设备
CN108650195B (zh) * 2018-04-17 2021-08-24 南京烽火星空通信发展有限公司 一种app流量自动识别模型构建方法
US10555040B2 (en) * 2018-06-22 2020-02-04 Samsung Electronics Co., Ltd. Machine learning based packet service classification methods for experience-centric cellular scheduling
CN109067612A (zh) * 2018-07-13 2018-12-21 哈尔滨工程大学 一种基于增量聚类算法的在线流量识别方法
CN109818976B (zh) * 2019-03-15 2021-09-21 杭州迪普科技股份有限公司 一种异常流量检测方法及装置
CN110138681B (zh) * 2019-04-19 2021-01-22 上海交通大学 一种基于tcp报文特征的网络流量识别方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011114060A2 (fr) * 2010-03-17 2011-09-22 Thales Procédé d'identification d'un protocole à l'origine d'un flux de données
CN107360032A (zh) * 2017-07-20 2017-11-17 中国南方电网有限责任公司 一种网络流识别方法及电子设备
CN108667747A (zh) * 2018-04-28 2018-10-16 深圳信息职业技术学院 网络流应用类型识别的方法、装置及计算机可读存储介质
CN108900432A (zh) * 2018-07-05 2018-11-27 中山大学 一种基于网络流行为的内容感知方法
CN110048962A (zh) * 2019-04-24 2019-07-23 广东工业大学 一种网络流量分类的方法、系统及设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP4024791A4
ZHOU, DINGDING ET AL.: "Traffic Classification Based on Bayesian Updating Method", JOURNAL OF SYSTEM SIMULATION, vol. 25, no. 11, 30 November 2013 (2013-11-30), pages 2597 - 2603, XP009527021, ISSN: 1004-731X *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220200869A1 (en) * 2017-11-27 2022-06-23 Lacework, Inc. Configuring cloud deployments based on learnings obtained by monitoring other cloud deployments
US20220247769A1 (en) * 2017-11-27 2022-08-04 Lacework, Inc. Learning from similar cloud deployments
US11785104B2 (en) * 2017-11-27 2023-10-10 Lacework, Inc. Learning from similar cloud deployments
US11818156B1 (en) 2017-11-27 2023-11-14 Lacework, Inc. Data lake-enabled security platform
US11894984B2 (en) * 2017-11-27 2024-02-06 Lacework, Inc. Configuring cloud deployments based on learnings obtained by monitoring other cloud deployments
WO2023005278A1 (zh) * 2021-07-27 2023-02-02 华为技术有限公司 一种流控处理的方法及通信装置

Also Published As

Publication number Publication date
EP4024791A4 (en) 2022-09-28
CN112511457B (zh) 2021-12-28
EP4024791A1 (en) 2022-07-06
CN114465962B (zh) 2024-01-05
US11838215B2 (en) 2023-12-05
JP7413515B2 (ja) 2024-01-15
KR20220053658A (ko) 2022-04-29
CN114465962A (zh) 2022-05-10
BR112022004814A2 (pt) 2022-06-21
JP2022548136A (ja) 2022-11-16
CN112511457A (zh) 2021-03-16
US20220210082A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
WO2021052379A1 (zh) 一种数据流类型识别方法及相关设备
WO2021169308A1 (zh) 一种数据流类型识别模型更新方法及相关设备
US9954743B2 (en) Application-aware network management
US11669751B2 (en) Prediction of network events via rule set representations of machine learning models
CN108881028B (zh) 基于深度学习实现应用感知的sdn网络资源调度方法
US11956118B2 (en) Fault root cause identification method, apparatus, and device
WO2021164261A1 (zh) 云网络设备的测试方法、存储介质和计算机设备
CN116545936B (zh) 拥塞控制方法、系统、装置、通信设备及存储介质
WO2022028456A1 (zh) 拥塞控制方法和装置、网络节点设备,及计算机可读存储介质
US10243816B2 (en) Automatically optimizing network traffic
Mazhar Rathore et al. Exploiting encrypted and tunneled multimedia calls in high-speed big data environment
CN107070851B (zh) 基于网络流的连接指纹生成和垫脚石追溯的系统和方法
Obasi Encrypted network traffic classification using ensemble learning techniques
Preamthaisong et al. Enhanced DDoS detection using hybrid genetic algorithm and decision tree for SDN
KR20220029142A (ko) Sdn 컨트롤러 서버 및 이의 sdn 기반 네트워크 트래픽 사용량 분석 방법
Al-Saadi et al. Unsupervised machine learning-based elephant and mice flow identification
Gomez et al. Efficient network telemetry based on traffic awareness
Obasi et al. An experimental study of different machine and deep learning techniques for classification of encrypted network traffic
WO2021147371A1 (zh) 故障检测方法、装置及系统
Nweke et al. Resilience analysis of software-defined networks using queueing networks
An et al. Evaluating SIP-based VoIP communication quality and network security
US20230239247A1 (en) Method and system for dynamic load balancing
Tadesse Statistical Modeling of Internet Traffic Flow Length and Flow Size
Song et al. FlowBot: A Learning-Based Co-bottleneck Flow Detector for Video Servers
Angi et al. Load Profiling via In-Band Flow Classification and P4 With Howdah

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20865499

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022516688

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112022004814

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 20227010740

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020865499

Country of ref document: EP

Effective date: 20220330

ENP Entry into the national phase

Ref document number: 112022004814

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20220315