WO2021169308A1 - 一种数据流类型识别模型更新方法及相关设备 - Google Patents

一种数据流类型识别模型更新方法及相关设备 Download PDF

Info

Publication number
WO2021169308A1
WO2021169308A1 PCT/CN2020/119665 CN2020119665W WO2021169308A1 WO 2021169308 A1 WO2021169308 A1 WO 2021169308A1 CN 2020119665 W CN2020119665 W CN 2020119665W WO 2021169308 A1 WO2021169308 A1 WO 2021169308A1
Authority
WO
WIPO (PCT)
Prior art keywords
data stream
recognition model
data
stream type
behavior recognition
Prior art date
Application number
PCT/CN2020/119665
Other languages
English (en)
French (fr)
Inventor
吴俊�
胡新宇
张亮
徐慧颖
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20922372.6A priority Critical patent/EP4087202A4/en
Publication of WO2021169308A1 publication Critical patent/WO2021169308A1/zh
Priority to US17/896,943 priority patent/US20220407809A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing

Definitions

  • the present invention relates to the field of computer technology and communication, and further relates to the application of artificial intelligence (AI) in the field of computer technology and communication, and in particular to a method for updating a data stream type recognition model and related equipment.
  • AI artificial intelligence
  • the following methods are mainly used to know the type of office application, pre-collect sample data, and then manually or use third-party tools to label the sample data, and then use machine learning or neural network algorithms to offline the model for the labeled sample data After training, the model obtained by offline training is used to predict the application type of the current network traffic.
  • the method of obtaining training samples by artificial labeling is inefficient.
  • the embodiment of the present invention discloses a data stream type recognition model updating method and related equipment, which can obtain training samples for updating the behavior recognition model more efficiently.
  • an embodiment of the present application provides a method for updating a data stream type identification model, and the method includes:
  • the message information includes message length, message transmission speed, message interval time, and message direction.
  • the behavior recognition model is a model obtained by training the message information and data stream types of multiple data stream samples;
  • the correction data corresponding to the current data stream is acquired, wherein the correction data corresponding to the current data stream is The data includes the message information of the current data stream and the second data stream type corresponding to the current data stream, and the correction data is used as a training sample to update the behavior recognition model.
  • the correction data is the difference between the device's first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream. In this case, it is automatically obtained without manual labeling, so the obtained sample data for training the behavior recognition model is more efficient.
  • the correction data is the message information and accurate data stream type generated when the recognition result of the behavior recognition model is inaccurate. Therefore, the subsequent update of the behavior recognition model based on the correction data can obtain a more accurate recognition effect. Behavior recognition model.
  • the determining the second data stream type corresponding to the current data stream according to the target correspondence relationship and the general characteristics of the current data stream includes: If the general feature of the current data stream is the same as the first general feature in the corresponding relationship, the data stream type corresponding to the first general feature is taken as the second data stream type corresponding to the current data stream.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the method further includes:
  • correction data corresponding to the current data stream includes message information of the current data stream and a second data stream type corresponding to the current data stream;
  • Type a new behavior recognition model obtained by training the behavior recognition model.
  • the operation of training a new behavior recognition model is implemented by the designated first device with strong computing capability, and the third device can recognize its own behavior only according to the new model parameters sent by the first device. Model, so that the third device can use the main computing resources for message forwarding, which effectively guarantees the message forwarding performance of the third device.
  • the method further includes: The behavior recognition model is updated according to the correction data to obtain a new behavior recognition model.
  • the operation of training the behavior recognition model is implemented by the third device, which is equivalent to using the behavior recognition model and the training behavior recognition model on the same device.
  • the behavior recognition model is updated according to the correction data to obtain a new
  • the behavior recognition model includes:
  • the second data stream type corresponding to each data stream trains the behavior recognition model to obtain a new behavior recognition model, wherein the M data streams are the cumulative amount from the effective of the behavior recognition model to the current date, or The cumulative amount within a preset time period, or the ratio of the M data streams to the total number of data streams that have been transmitted after the behavior recognition model takes effect exceeds a preset threshold; the M data streams include the current data stream .
  • the message information according to the M data streams and the M Training the behavior recognition model for the second data stream type corresponding to each data stream to obtain a new behavior recognition model includes:
  • the second data stream type corresponding to the M data streams According to the message information of the M data streams, the second data stream type corresponding to the M data streams, the message information of the Y data streams, and the second data stream type corresponding to the Y data streams, respectively Training the behavior recognition model to obtain a new behavior recognition model;
  • the Y data streams and the M data streams come from the same network, or,
  • the Y data streams and the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two Different forms of networks, or the at least two different networks include networks of two different regions.
  • the Y data streams and the M data streams are from at least two different networks, according to the message information of the M data streams, the second data stream types corresponding to the M data streams, Y
  • the message information of each data stream and the second data stream type corresponding to the Y data streams respectively train the behavior recognition model to obtain a new behavior recognition model, including:
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream type trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the determination of the The first data stream type corresponding to the current data stream includes:
  • the content recognition model is a model obtained from the characteristic information and data stream type of one or more historical data streams, and the data stream type of the historical data stream is obtained according to the behavior recognition model.
  • the first data stream type corresponding to the current data stream is obtained according to the content recognition model and the behavior recognition model, and then the first data stream type is corrected to obtain the final data stream type of the current data stream.
  • the behavior recognition model is pre-trained from the message information and data stream type of multiple data stream samples, and the content recognition model is trained based on the feature information of the data stream and the data stream type recognized by the behavior model; therefore, through The content recognition model and behavior recognition model analyze feature information, message information, etc., to more accurately predict the first data stream type corresponding to the current data stream.
  • the data stream type in the data stream sample used when training the content recognition model is recognized by the behavior recognition model there is no need to collect a large amount of data required for training, which solves the problem of insufficient data integrity.
  • the message information, feature information, and behavior recognition model of the current data flow is .
  • the content recognition model determines the first data stream type corresponding to the current data stream, including:
  • the first data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
  • the at least one first confidence level and the at least one first The second confidence level determines the first data stream type of the current data stream, including:
  • the weight value of the first confidence level, the second confidence level corresponding to the target data stream type, and the weight value of the second confidence level Calculating a comprehensive confidence corresponding to the target data stream type, where the target data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the target data stream type is greater than a first preset threshold, it is determined that the target data stream type is the first data stream type corresponding to the current data stream.
  • the method further includes:
  • the comprehensive confidence corresponding to the target data stream type is less than a second preset threshold, send the characteristic information of the current data stream and the second data stream type to the second device, and the second preset Set the threshold to be greater than the first preset threshold;
  • the inventor of the present application updates the content recognition model by using the identification result of the data stream type of the current data stream. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is sent to the second device for training, It is used to obtain a new content recognition model to make the next determination result more accurate.
  • the method further includes:
  • the content recognition model is updated according to the characteristic information of the current data stream and the second data stream type to obtain a new In the content recognition model, the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application updates the content recognition model by using the identification result of the data stream type of the current data stream. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is trained to obtain new content Identify the model to make the next determination result more accurate.
  • the determination is made according to the target correspondence relationship and the general characteristics of the current data stream After the second data stream type corresponding to the current data stream, the method further includes:
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the packet length includes the Ethernet frame length in the packet and the IP packet
  • the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
  • an embodiment of the present application provides a method for updating a data stream type identification model, and the method includes:
  • the second data stream type corresponding to the current data stream is determined by the third device according to the target correspondence relationship and the general characteristics of the current data stream, and the target correspondence relationship is a combination of multiple general characteristics and multiple data stream types.
  • the behavior recognition model is trained according to the correction data corresponding to the M data streams to obtain a new behavior recognition model;
  • the number of data streams is the cumulative amount from the effective of the behavior recognition model to the current period, or the cumulative amount within a preset time period, or the M data streams account for the data streams that have been transmitted after the behavior recognition model is effective
  • the ratio of the total amount exceeds a preset threshold; M data streams include the current data stream;
  • the behavior recognition model is used to determine the data stream type of the data stream to be predicted according to the input message information of the data stream to be predicted; the message information includes the message length, the message transmission speed, and the message One or more of interval time and message direction.
  • the first device when the first device accumulates a certain amount of correction data from the third device, it trains the behavior recognition model based on the certain amount of correction data to obtain a new behavior recognition model, and then trains the new behavior recognition model.
  • the behavior recognition model of sends to the third device information describing the new behavior recognition model, so that the third device can update the behavior recognition model on the third device.
  • the above method does not require the third device to perform model training, and only needs to directly obtain a new behavior recognition model based on the model training result of the first device, which is beneficial for the third device to make full use of computing resources to identify the data stream type.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the correction data corresponding to the current data stream is the If the first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream are different data stream types, the first data stream type corresponding to the current data stream is all
  • the third device is determined according to the message information, characteristic information, behavior recognition model, and content recognition model of the current data stream; the characteristic information includes one or more of the destination address and the protocol type, and the content recognition
  • the model is obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model.
  • the behavior is adjusted according to the correction data corresponding to the M data streams
  • the recognition model is trained to obtain a new behavior recognition model, including:
  • the Y data streams and the M data streams are from the same network, or,
  • the Y data streams and the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two Different forms of networks, or the at least two different networks include networks of two different regions.
  • the training the behavior recognition model according to the correction data corresponding to the M data streams and the correction data corresponding to the Y data streams to obtain a new behavior recognition model includes:
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the method further includes:
  • the second model data is used to describe the new content recognition model
  • the content recognition model is based on the characteristic information and data stream types of one or more historical data streams
  • the obtained model the content recognition model is used to estimate the data stream type of the data stream to be predicted according to the input characteristic information of the data stream to be predicted, wherein the data stream type of the historical data stream is based on the behavior recognition model Obtained
  • the behavior recognition model is a model obtained based on the message information and data flow types of multiple data flow samples
  • the message information includes message length, message transmission speed, message interval time, and message direction
  • the characteristic information includes one or more of the destination address and the protocol type.
  • the first device in the process of identifying the data stream type by the third device through the trained content recognition model, if the accuracy of the model is found to be low, the first device is triggered to retrain the content recognition model in combination with related data, and After training a new content recognition model, update the content recognition model on the third device.
  • This iteratively updated content recognition model can meet the differentiated needs of different users, different networks, and different scenarios, and has better generalization. , Stronger versatility.
  • an embodiment of the present application provides a device for updating a data stream type recognition model.
  • the device is a third device.
  • the third device includes a memory and a processor.
  • the memory is used to store a computer program.
  • the processor calls the computer program to perform the following operations:
  • the message information includes message length, message transmission speed, message interval time, and message direction.
  • the behavior recognition model is a model obtained by training the message information and data stream types of multiple data stream samples;
  • the correction data corresponding to the current data stream is acquired, where the current data stream corresponds to
  • the correction data includes message information of the current data stream and a second data stream type corresponding to the current data stream, and the correction data is used as a training sample to update the behavior recognition model.
  • the correction data is the difference between the device's first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream. In this case, it is automatically obtained without manual labeling, so the obtained sample data for training the behavior recognition model is more efficient.
  • the correction data is the message information and accurate data stream type generated when the recognition result of the behavior recognition model is inaccurate. Therefore, the subsequent update of the behavior recognition model based on the correction data can obtain a more accurate recognition effect. Behavior recognition model.
  • the processing The device in determining the second data stream type corresponding to the current data stream according to the target correspondence relationship and the general characteristics of the current data stream, the processing The device is specifically used for:
  • the data stream type corresponding to the first general feature is taken as the second data stream type corresponding to the current data stream.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the device further includes a communication interface, and after the obtaining the correction data corresponding to the current data stream, the processor is further configured to:
  • the correction data corresponding to the current data stream is sent to the first device through the communication interface, where the correction data corresponding to the current data stream includes the message information of the current data stream and the first data stream corresponding to the current data stream.
  • the second data stream type is the first model data obtained by training the behavior recognition model, where the first model data is used to describe a new behavior recognition model obtained by training the behavior recognition model.
  • the operation of training a new behavior recognition model is implemented by the designated first device with strong computing capability, and the third device can recognize its own behavior only according to the new model parameters sent by the first device. Model, so that the third device can use the main computing resources for message forwarding, which effectively guarantees the message forwarding performance of the third device.
  • the processing The device is specifically configured to update the behavior recognition model according to the correction data to obtain a new behavior recognition model.
  • the operation of training the behavior recognition model is implemented by the third device, which is equivalent to using the behavior recognition model and the training behavior recognition model on the same device.
  • the processor in the update of the behavior recognition model according to the correction data, to obtain Regarding the new behavior recognition model, the processor is specifically used for:
  • the second data stream type corresponding to each flow trains the behavior recognition model to obtain a new behavior recognition model, where the M data streams are the cumulative amount from the effective of the behavior recognition model to the current time, or The cumulative amount within a preset time period, or the ratio of the M data streams to the total number of data streams that have been transmitted after the behavior recognition model takes effect exceeds a preset threshold; the M data streams include the current data Flow, M is the preset reference threshold.
  • the message information according to the M data streams and the M The second data stream type corresponding to each data stream trains the behavior recognition model to obtain a new behavior recognition model, and the processor is specifically configured to:
  • the second data stream type corresponding to the M data streams According to the message information of the M data streams, the second data stream type corresponding to the M data streams, the message information of the Y data streams, and the second data stream type corresponding to the Y data streams, respectively Training the behavior recognition model to obtain a new behavior recognition model;
  • the Y data streams and the M data streams come from the same network, or,
  • the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two different forms of networks, or The at least two different networks include two different regional networks.
  • the behavior recognition model is trained to obtain a new behavior recognition model, and the processor is specifically configured to:
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream type trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the processor is specifically configured to:
  • the content recognition model is a model obtained from the characteristic information and data stream type of one or more historical data streams, and the data stream type of the historical data stream is obtained according to the behavior recognition model.
  • the first data stream type corresponding to the current data stream is obtained according to the content recognition model and the behavior recognition model, and then the first data stream type is corrected to obtain the final data stream type of the current data stream.
  • the behavior recognition model is pre-trained from the message information and data stream type of multiple data stream samples, and the content recognition model is trained based on the feature information of the data stream and the data stream type recognized by the behavior model; therefore, through The content recognition model and behavior recognition model analyze feature information, message information, etc., to more accurately predict the first data stream type corresponding to the current data stream.
  • the data stream type in the data stream sample used when training the content recognition model is recognized by the behavior recognition model there is no need to collect a large amount of data required for training, which solves the problem of insufficient data integrity.
  • the processor is specifically configured to:
  • the first data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
  • the processor is specifically configured to:
  • the weight value of the first confidence level, the second confidence level corresponding to the target data stream type, and the weight value of the second confidence level Calculating a comprehensive confidence corresponding to the target data stream type, where the target data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the target data stream type is greater than a first preset threshold, it is determined that the target data stream type is the first data stream type corresponding to the current data stream.
  • the device further includes a communication interface
  • the processor is further configured to:
  • the second preset threshold is greater than the first preset threshold
  • the inventor of the present application updates the content recognition model by using the identification result of the data stream type of the current data stream. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, information about the current data stream is sent to the second device for training, It is used to obtain a new content recognition model to make the next determination result more accurate.
  • the processor is further configured to:
  • the content recognition model is updated according to the characteristic information of the current data stream and the second data stream type to obtain a new In the content recognition model, the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application updates the content recognition model by using the identification result of the data stream type of the current data stream. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is trained to obtain new content Identify the model to make the next determination result more accurate.
  • the processor in the thirteenth possible implementation manner of the third aspect, in determining the target corresponding relationship and the general characteristics of the current data stream After the second data stream type corresponding to the current data stream is described, the processor is further configured to:
  • the second data stream type corresponding to the current data stream is sent to the operation and maintenance support system OSS through the communication interface, and the information of the second data stream type of the current data stream is used by the OSS to generate information for the current data stream Flow control strategy.
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the packet length includes the length of the Ethernet frame in the packet and the IP packet One or more of the length, the length of the transmission protocol message and the length of the header, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
  • an embodiment of the present application provides a device for updating a data stream type recognition model.
  • the device is a first device.
  • the first device includes a memory, a processor, and a communication interface.
  • the memory is used to store a computer A program, the processor calls the computer program for performing the following operations:
  • the correction data corresponding to the current data stream sent by the third device is received through the communication interface, where the correction data corresponding to the current data stream includes the message information of the current data stream and the second data stream corresponding to the current data stream.
  • Data stream type; the second data stream type corresponding to the current data stream is determined by the third device according to the target correspondence relationship and the general characteristics of the current data stream, and the target correspondence relationship is multiple general characteristics and multiple Correspondence of data stream types;
  • the behavior recognition model is trained according to the correction data corresponding to the M data streams to obtain a new behavior recognition model;
  • the number of data streams is the cumulative amount from the effective of the behavior recognition model to the current period, or the cumulative amount within a preset time period, or the M data streams account for the data streams that have been transmitted after the behavior recognition model is effective
  • the ratio of the total amount exceeds a preset threshold; M data streams include the current data stream;
  • the first model data is used to describe the new behavior recognition model
  • the behavior recognition model is message information based on multiple data stream samples
  • the behavior recognition model is used to determine the data stream type of the data stream to be predicted according to the input message information of the data stream to be predicted; the message information includes the length of the message and the message One or more of transmission speed, message interval time, and message direction.
  • the first device when the first device accumulates a certain amount of correction data from the third device, it trains the behavior recognition model based on the certain amount of correction data to obtain a new behavior recognition model, and then trains the new behavior recognition model.
  • the behavior recognition model of sends to the third device information describing the new behavior recognition model, so that the third device can update the behavior recognition model on the third device.
  • the above method does not require the third device to perform model training, and only needs to directly obtain a new behavior recognition model based on the model training result of the first device, which is beneficial for the third device to make full use of computing resources to identify the data stream type.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the correction data corresponding to the current data stream is that the third device Is sent when the first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream are different data stream types, and the first data stream type corresponding to the current data stream is all
  • the third device is determined according to the message information, characteristic information, behavior recognition model, and content recognition model of the current data stream; the characteristic information includes one or more of the destination address and the protocol type, and the content recognition
  • the model is obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model.
  • the behavior is adjusted according to the correction data corresponding to the M data streams
  • the processor is specifically configured to:
  • the Y data streams and the M data streams are from the same network, or,
  • the Y data streams and the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two Different forms of networks, or the at least two different networks include networks of two different regions.
  • the processor specifically uses At:
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the processor is further configured to:
  • the second model data is used to describe the new content recognition model
  • the content recognition model is based on the characteristics of one or more historical data streams Information and data stream type
  • the content recognition model is used to estimate the data stream type of the data stream to be predicted according to the input characteristic information of the data stream to be predicted, wherein the data stream type of the historical data stream It is obtained according to a behavior recognition model, which is a model obtained according to the message information and data stream types of multiple data stream samples
  • the message information includes message length, message transmission speed, and message interval One or more of time and message direction
  • the characteristic information includes one or more of destination address and protocol type.
  • the first device in the process of identifying the data stream type by the third device through the trained content recognition model, if the accuracy of the model is found to be low, the first device is triggered to retrain the content recognition model in combination with related data, and After training a new content recognition model, update the content recognition model on the third device.
  • This iteratively updated content recognition model can meet the differentiated needs of different users, different networks, and different scenarios, and has better generalization. , Stronger versatility.
  • an embodiment of the present application provides an apparatus for updating a data stream type identification model.
  • the apparatus is a third device or a module or device in the third device, and includes:
  • the first determining unit is configured to determine the first data stream type corresponding to the current data stream according to the message information of the current data stream and the behavior recognition model, and the message information includes the message length, the message transmission speed, and the message One or more of the interval time and the message direction, and the behavior recognition model is a model obtained by training message information and data flow types of multiple data flow samples;
  • the second determining unit is configured to determine the second data stream type corresponding to the current data stream according to the target correspondence relationship and the general characteristics of the current data stream, wherein the target correspondence relationship is multiple general characteristics and multiple data Correspondence of flow types;
  • the acquiring unit is configured to acquire the correction data corresponding to the current data stream when the first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream are different, wherein:
  • the correction data corresponding to the current data stream includes message information of the current data stream and a second data stream type corresponding to the current data stream, and the correction data is used as a training sample to update the behavior recognition model.
  • the correction data is the case where the device has a different first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream It is automatically obtained without manual labeling, so the sample data obtained for training the behavior recognition model is more efficient.
  • the correction data is the message information and accurate data stream type generated when the recognition result of the behavior recognition model is inaccurate. Therefore, the subsequent update of the behavior recognition model based on the correction data can obtain a more accurate recognition effect. Behavior recognition model.
  • the The second determining unit is specifically configured to: if the general feature of the current data stream is the same as the first general feature in the corresponding relationship, use the data stream type corresponding to the first general feature as the current data stream correspondence The second data stream type.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the fifth aspect in the third possible implementation manner of the fifth aspect, it also includes:
  • the first sending unit is configured to send the correction data corresponding to the current data stream to the first device after the acquiring unit acquires the correction data corresponding to the current data stream, wherein the correction data corresponding to the current data stream Including the message information of the current data flow and the second data flow type corresponding to the current data flow;
  • the first receiving unit is configured to receive first model data sent by the first device, where the first model data is used to describe the current data and message information according to the current data flow by the first device
  • the second data stream type corresponding to the stream is a new behavior recognition model obtained by training the behavior recognition model.
  • the operation of training a new behavior recognition model is implemented by the designated first device with strong computing capability, and the third device can recognize its own behavior only according to the new model parameters sent by the first device. Model, so that the third device can use the main computing resources for message forwarding, which effectively guarantees the message forwarding performance of the third device.
  • the fourth possible implementation manner of the fifth aspect further includes:
  • the first update unit is configured to update the behavior recognition model according to the correction data after the acquisition unit acquires the correction data corresponding to the current data stream, so as to obtain a new behavior recognition model.
  • the operation of training the behavior recognition model is implemented by the third device, which is equivalent to using the behavior recognition model and the training behavior recognition model on the same device.
  • the behavior recognition model is updated according to the correction data to obtain a new
  • the first update unit is specifically configured to:
  • the second data stream type corresponding to each data stream trains the behavior recognition model to obtain a new behavior recognition model, wherein the M data streams are the cumulative amount from the effective of the behavior recognition model to the current date, or The cumulative amount within a preset time period, or the ratio of the M data streams to the total number of data streams that have been transmitted after the behavior recognition model takes effect exceeds a preset threshold; the M data streams include the current data stream .
  • the sixth possible implementation manner of the fifth aspect in accordance with the message information of the M data streams and the M data streams:
  • the second data stream type corresponding to each data stream trains the behavior recognition model to obtain a new behavior recognition model, and the first update unit is specifically configured to:
  • the second data stream type corresponding to the M data streams According to the message information of the M data streams, the second data stream type corresponding to the M data streams, the message information of the Y data streams, and the second data stream type corresponding to the Y data streams, respectively Training the behavior recognition model to obtain a new behavior recognition model;
  • the Y data streams and the M data streams come from the same network, or,
  • the Y data streams and the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two Different forms of networks, or the at least two different networks include networks of two different regions.
  • the first update unit is specifically configured to:
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream type trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the current Regarding the first data stream type corresponding to the data stream is specifically configured to:
  • the content recognition model is a model obtained from the characteristic information and data stream type of one or more historical data streams, and the data stream type of the historical data stream is obtained according to the behavior recognition model.
  • the first data stream type corresponding to the current data stream is obtained according to the content recognition model and the behavior recognition model, and then the first data stream type is corrected to obtain the final data stream type of the current data stream.
  • the behavior recognition model is pre-trained from the message information and data stream type of multiple data stream samples, and the content recognition model is trained based on the feature information of the data stream and the data stream type recognized by the behavior model; therefore, through The content recognition model and behavior recognition model analyze feature information, message information, etc., to more accurately predict the first data stream type corresponding to the current data stream.
  • the data stream type in the data stream sample used when training the content recognition model is recognized by the behavior recognition model there is no need to collect a large amount of data required for training, which solves the problem of insufficient data integrity.
  • the first determining unit in the message information, feature information, and behavior identification of the current data flow, is specifically configured to:
  • the first data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
  • the first determining unit is specifically configured to:
  • the weight value of the first confidence level, the second confidence level corresponding to the target data stream type, and the weight value of the second confidence level Calculating a comprehensive confidence corresponding to the target data stream type, where the target data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the target data stream type is greater than a first preset threshold, it is determined that the target data stream type is the first data stream type corresponding to the current data stream.
  • the method further includes:
  • the second sending unit is configured to send the characteristic information of the current data stream and the second device to the second device when the comprehensive confidence corresponding to the target data stream type is less than a second preset threshold.
  • Data stream type, the second preset threshold is greater than the first preset threshold;
  • the second receiving unit is configured to receive second model data sent by the second device, where the second model data is used to describe the second device according to the feature information of the current data stream and the second data A new content recognition model derived from the identification information of the stream type.
  • the inventor of the present application updates the content recognition model by using the identification result of the data stream type of the current data stream. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, information about the current data stream is sent to the second device for training, It is used to obtain a new content recognition model to make the next determination result more accurate.
  • the twelfth possible implementation manner of the fifth aspect further includes:
  • the second update unit is configured to update according to the characteristic information of the current data stream and the second data stream type when the comprehensive confidence corresponding to the target data stream type is less than a second preset threshold.
  • the content recognition model is used to obtain a new content recognition model, and the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application uses the identification result of the data stream type of the current data stream to update the content recognition model. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is trained to obtain new content Identify the model to make the next determination result more accurate.
  • the thirteenth possible implementation manner of the fifth aspect further includes:
  • the third sending unit is configured to send the data stream type corresponding to the current data stream to the operation and maintenance support system OSS after the second determining unit determines the second data stream type corresponding to the current data stream according to the target correspondence relationship and the general characteristics of the current data stream.
  • the second data flow type corresponding to the current data flow, and the information of the second data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the packet length includes the length of the Ethernet frame in the packet and the IP packet One or more of the length, the length of the transmission protocol message and the length of the header, and the transmission protocol includes the transmission control protocol TCP and/or the user datagram protocol UDP.
  • an embodiment of the present application provides an apparatus for updating a data stream type identification model.
  • the apparatus is a first device or a module or device in the first device, and includes:
  • the first receiving unit is configured to receive correction data corresponding to the current data stream sent by the third device, wherein the correction data corresponding to the current data stream includes the message information of the current data stream and the data corresponding to the current data stream.
  • the second data stream type; the second data stream type corresponding to the current data stream is determined by the third device according to the target correspondence relationship and the general characteristics of the current data stream, and the target correspondence relationship is a plurality of general characteristics Correspondence with multiple data stream types;
  • the acquiring unit is configured to train the behavior recognition model according to the correction data corresponding to the M data streams in the case of accumulatively receiving the correction data corresponding to the M data streams from the third device to obtain a new Behavior recognition model;
  • the M data streams are the cumulative amount from the effect of the behavior recognition model to the current date, or the cumulative amount within a preset time period, or the M data streams account for the behavior recognition model After taking effect, the ratio of the total number of transmitted data streams exceeds a preset threshold; the M data streams include the current data stream;
  • the first sending unit is configured to send first model data to the third device, where the first model data is used to describe the new behavior recognition model, and the behavior recognition model is a report based on multiple data stream samples A model obtained from the message information and the data stream type, the behavior recognition model is used to determine the data stream type of the data stream to be predicted according to the input message information of the data stream to be predicted; the message information includes the message length, One or more of message transmission speed, message interval time, and message direction.
  • the first device when the first device accumulates a certain amount of correction data from the third device, it trains the behavior recognition model based on the certain amount of correction data to obtain a new behavior recognition model, and then trains the new behavior recognition model.
  • the behavior recognition model of sends to the third device information describing the new behavior recognition model, so that the third device can update the behavior recognition model on the third device.
  • the above method does not require the third device to perform model training, and only needs to directly obtain a new behavior recognition model based on the model training result of the first device, which is beneficial for the third device to make full use of computing resources to identify the data stream type.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the correction data corresponding to the current data stream is Is sent when the first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream are different data stream types, and the first data stream type corresponding to the current data stream is all
  • the third device is determined according to the message information, characteristic information, behavior recognition model, and content recognition model of the current data stream; the characteristic information includes one or more of the destination address and the protocol type, and the content recognition
  • the model is obtained based on the characteristic information of one or more historical data streams and the data stream type, and the data stream type of the historical data stream is obtained according to the behavior recognition model.
  • the behavior is identified based on the correction data corresponding to the M data streams.
  • the acquiring unit is specifically used for:
  • the Y data streams and the M data streams are from the same network, or,
  • the Y data streams and the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two Different forms of networks, or the at least two different networks include networks of two different regions.
  • the acquiring unit is specifically configured to :
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the fifth possible implementation manner of the sixth aspect further includes:
  • the second receiving unit is configured to receive the characteristic information of the current data stream and the information of the second data stream type sent by the third device;
  • a generating unit configured to train the content recognition model according to the feature information of the current data stream and the second data stream type to obtain a new content recognition model
  • the second sending unit is configured to send second model data to the third device, the second model data is used to describe the new content recognition model, and the content recognition model is based on one or more historical data streams
  • the content recognition model is used to estimate the data stream type of the data stream to be predicted according to the input characteristic information of the data stream to be predicted, wherein the data of the historical data stream
  • the stream type is obtained according to a behavior recognition model, which is a model obtained according to the message information and data stream types of multiple data stream samples, and the message information includes message length, message transmission speed, and message information.
  • One or more of message interval time and message direction, and the characteristic information includes one or more of destination address and protocol type.
  • the first device in the process of identifying the data stream type by the third device through the trained content recognition model, if the accuracy of the model is found to be low, the first device is triggered to retrain the content recognition model in combination with related data, and After training a new content recognition model, update the content recognition model on the third device.
  • This iteratively updated content recognition model can meet the differentiated needs of different users, different networks, and different scenarios, and has better generalization. , Stronger versatility.
  • an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and when it runs on a processor, it realizes the first aspect or any possibility of the first aspect The method described in the implementation.
  • embodiments of the present application provide a computer-readable storage medium in which a computer program is stored, and when it runs on a processor, the second aspect or any possibility of the second aspect is realized The method described in the implementation.
  • the embodiments of the present application provide a computer program product, the computer program product is stored in a memory, and when the computer program product runs on a processor, the first aspect or any possible aspect of the first aspect is realized. Implement the method described in the method.
  • an embodiment of the present application provides a computer program product, the computer program product is stored in a memory, and when the computer program product runs on a processor, the second aspect or any possible aspect of the second aspect is realized Implement the method described in the method.
  • an embodiment of the present application provides a data stream type identification system, including a third device and a first device:
  • the third device is the data stream type recognition model update device described in the third aspect, or the data stream type recognition model update device described in any possible implementation of the third aspect, or the data stream type recognition model update device described in the fifth aspect Data stream type recognition model updating device, or the data stream type recognition model updating device described in any possible implementation manner of the fifth aspect;
  • the first device is the data stream type recognition model update device described in the fourth aspect, or the data stream type recognition model update device described in any possible implementation of the fourth aspect, or the data stream type recognition model update device described in the sixth aspect
  • the data flow type identification model updating device, or the data flow type identification model updating device described in any possible implementation manner of the sixth aspect is the data stream type recognition model update device described in the fourth aspect, or the data stream type recognition model update device described in any possible implementation of the fourth aspect, or the data stream type recognition model update device described in the sixth aspect.
  • the first data stream type and the second data stream type are Different types generate correction data for updating the behavior recognition model, that is, training samples;
  • the correction data is the first data stream type corresponding to the current data stream of the device and the second data stream type corresponding to the current data stream Under different circumstances, it is automatically obtained without manual labeling, so the obtained sample data for training the behavior recognition model is more efficient.
  • the correction data is the message information and accurate data stream type generated when the recognition result of the behavior recognition model is inaccurate. Therefore, the subsequent update of the behavior recognition model based on the correction data can obtain a more accurate recognition effect.
  • Behavior recognition model is the first data stream type corresponding to the current data stream of the device and the second data stream type corresponding to the current data stream.
  • the first data stream type corresponding to the current data stream can be obtained according to the content recognition model and the behavior recognition model, and then the first data stream type can be corrected to obtain the final data stream type of the current data stream.
  • the behavior recognition model is pre-trained from the message information and data stream type of multiple data stream samples, and the content recognition model is trained based on the feature information of the data stream and the data stream type recognized by the behavior model; therefore, through The content recognition model and behavior recognition model analyze feature information, message information, etc., to more accurately predict the first data stream type corresponding to the current data stream.
  • the data stream type in the data stream sample used when training the content recognition model is recognized by the behavior recognition model there is no need to collect a large amount of data required for training, which solves the problem of insufficient data integrity.
  • the first device or the third device retrains the behavior recognition model with the relevant data that caused the deviation, and updates the behavior recognition model on the third device after training the new behavior recognition model.
  • This iterative update behavior The recognition model can meet the differentiated needs of different users, different networks, and different scenarios, with better generalization and greater versatility.
  • the comprehensive confidence is compared with the preset update threshold ⁇ 2 and it is found that it needs to be updated, it will be updated if there are multiple occurrences of the need to update.
  • the second device or the third device retrains the content recognition model with the relevant data whose comprehensive confidence is lower than ⁇ 2, and updates the content recognition model on the third device after training the new content recognition model. This iteration
  • the updated content recognition model can meet the differentiated needs of different users, different networks, and different scenarios, with better generalization and greater versatility.
  • FIG. 1 is a schematic structural diagram of a data stream type identification system provided by an embodiment of the present invention
  • 2A is a schematic diagram of a scene of a content recognition model and a behavior recognition model provided by an embodiment of the present invention
  • 2B is a schematic structural diagram of a classification model provided by an embodiment of the present invention.
  • 2C is a schematic structural diagram of a classification model provided by an embodiment of the present invention.
  • 2D is a schematic structural diagram of a classification model provided by an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a data stream type identification method provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of scenes of data stream a and data stream b provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an example of data stream recording provided by an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an example of data stream recording provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of an apparatus for updating a data stream type identification model provided by an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of another device for updating a data stream type identification model provided by an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a data stream type identification model updating device provided by an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of another device for updating a data stream type identification model provided by an embodiment of the present invention.
  • Figure 1 is a schematic structural diagram of a data stream type identification system provided by an embodiment of the present invention.
  • the system includes a first device 101, a second device 102, a third device 103, and a terminal 104. These devices may Connect via wired or wireless mode, where:
  • the terminal 104 is used to run various applications, such as video conferencing applications, voice conferencing applications, desktop cloud applications, etc.
  • the data stream types (also called application types) of data streams generated by different applications are often different.
  • the data stream generated by the terminal 104 needs to be sent to the destination device through the third device 103 first.
  • the third device 103 may include forwarding devices such as routers and switches, and the number of the third device 103 may be one. There can be multiple, for example, there is one router and three switches; another example, there is only one switch; another example, there are three switches, and so on.
  • how the data stream generated by the terminal 104 should be sent on the terminal 104 and how it should be forwarded on the third device 103 can be performed in accordance with the flow control strategy generated by the operational system support (OSS)
  • OSS operational system support
  • the flow control strategy is generated by OSS101 based on the data flow type of the current data flow.
  • the data flow type of the current data flow used by the OSS 101 to generate the flow control policy may be determined by the third device 103.
  • the third device 103 needs to use a behavior recognition model, a content recognition model, and an address correction model when determining the data stream type of the current data stream.
  • the parameters of the model may include, but are not limited to, a confidence weight vector. (w1, w2), the first preset threshold ⁇ 1, the second preset threshold ⁇ 2 of the data stream type, etc., where the first preset threshold ⁇ 1 can also be called the classification threshold, which is used to measure whether the data stream type is divided into A certain category; the second preset threshold ⁇ 2 is also called the model update threshold, which is used to measure when to update the content recognition model.
  • the input of the content recognition model when recognizing the current data stream's data stream type can include characteristic information (such as destination IP, destination port, protocol type, etc.), and the behavior recognition model when recognizing the current data stream's data stream type
  • the input can include message information (such as message length, message transmission speed, message interval time, message direction, etc.).
  • the third device 103 obtains the confidence of the content recognition model and the confidence of the behavior recognition model to obtain the first data stream type based on the confidence weight vector (w1, w2).
  • the third device 103 also uses the address correction model to obtain the first data stream type. Predict the second data stream type of the current data stream.
  • the second data stream type of the current data stream can be predicted, use the second data stream type as the final data stream type of the current data stream; if the current data stream is not predicted For the second data stream type of the stream, the first data stream type is used as the final data stream type of the current data stream.
  • the above behavior recognition model and content recognition model will be updated at a suitable time.
  • the third device 103 can train the corresponding data sample (or training data) by itself.
  • the first device 101 can also train corresponding data samples to obtain new model parameters, and then send them to the third device 103 for the third device 103 to update the behavior recognition model; in addition, update the behavior recognition model.
  • the condition of the model may be that the first data stream type of the data stream is inconsistent with the second data stream type of the data stream one or more times.
  • the third device 103 can train the corresponding data samples to update the content recognition model, or the second device 102 can train the corresponding data samples to obtain new model parameters, and then send them to the first device.
  • the third device 103 is for the third device 103 to update the content recognition model; in addition, the condition for updating the content recognition model may be that the confidence that the current data stream is the first data stream type is higher than the first preset threshold ⁇ 1 but lower than The second preset threshold ⁇ 2.
  • the above content recognition model is essentially a classification model.
  • the classification model can be a tree model, as shown in Figure 2C, and the classification model can also be a neural network model, as shown in Figure 2D.
  • the classification model may also be a support vector machine (support vector machine, SVM) model, and the classification model may also be other forms of models.
  • the content recognition model is classified by extracting the characteristics of the input vector (such as the destination Internet protocol (IP) address, destination port number, protocol type and other characteristic information)
  • the content recognition model Data streams with the same destination IP address, same protocol type, and same port number can be identified as the same data stream type; data streams with the same network segment, same protocol type, and similar port number can be identified as the same data stream type; the destination The port number is 20 (a well-known port number of the file transfer protocol (file transfer protocol, FTP)).
  • the transmission control protocol transmission control protocol, TCP
  • TCP transmission control protocol
  • the first device 101 in the architecture shown in FIG. 1 may be a server or a server cluster, and may be deployed locally, remotely, or in the cloud.
  • the above-mentioned second device 102 may be one server or a server cluster composed of multiple servers. Multiple second devices 102 are connected to the first device 101 through a network, a second device 102 and one or more third devices 103 connected to the one second device 102 belong to a network, and another second device 102 is connected to One or more third devices 103 connected to the other second device 102 belong to another network.
  • the networks can be divided according to geographic locations.
  • two networks in different geographic locations belong to two different networks; the networks can be divided according to network forms, such as cellular networks, Wi-Fi networks, and wired networks. They belong to three different networks; for example, two different LANs belong to two different networks; of course, there can be other division methods.
  • the first device 101 trains any third device 103 to update the parameters of the model, it can use the training data provided by the third device 103 in other networks.
  • the architecture shown in Figure 1 can be regarded as a three-tier architecture, in which:
  • the first device 101 belongs to the highest layer of the three layers. In comparison, it has the largest storage capacity and the strongest computing power. Therefore, the training behavior recognition model requires huge data (such as data stream types and messages). Information) is stored on the first device 101, for example, in a behavior knowledge base stored therein; and the calculation of the huge data is also completed by the first device 101. In addition, because the data stream type and message information submitted by the third device 103 to the first device 101 are basically desensitized, processing by the first device 101 will not cause security problems.
  • the second device 102 belongs to the middle layer of these three layers. In comparison, the storage capacity and computing capacity are moderate, and it can store a certain amount of characteristic information (such as IP address, TCP protocol, etc.) and data stream types, such as those stored in it. Address knowledge base. Since the second device 102 is in the local area network to which the third device 103 belongs, the information such as the IP address does not need to be outside the local area network, and it is stored on the second device 102 without risk of security and privacy. In addition, since the second device 102 is relatively close to the third device 103, the update demand of the content recognition model can be fed back to the second device 102 in time, which facilitates the frequent update of the content recognition model.
  • characteristic information such as IP address, TCP protocol, etc.
  • data stream types such as those stored in it. Address knowledge base. Since the second device 102 is in the local area network to which the third device 103 belongs, the information such as the IP address does not need to be outside the local area network, and it is stored on the second device
  • the third device 103 belongs to the lowest layer of the three layers, and its main function is usually to forward data messages. Therefore, the operation of training content recognition models and behavior recognition models, as well as the storage of sample data used for training, can be None of this is done on the third device 103.
  • the joint deployment of multiple networks may not be used.
  • the number of the second device 102 connected to the first device 101 may be one, that is, the first device 101
  • the architecture can be further transformed, for example, the first device 101 is removed, and the operations performed by the first device 101 and the corresponding functions improved in the context are integrated into the second device 102, that is, the original first device 101
  • the operations and functions performed by the device 101 are implemented locally.
  • the above-mentioned operation of predicting the data stream type of the current data stream according to the corresponding model and the operation of updating the model can be performed by the third device 103, or other devices, such as the second Device 102, third device 103, OSS, etc., when these devices are used, the subject of performing these operations in the subsequent method embodiments is replaced with the second device 102, third device 103, OSS, etc.
  • the third device 103 when the third device 103 performs the update originally, if it involves receiving the first information sent by the first device 101, then when the update operation is changed to deploy in the first When implemented by a device 101, the first device 101 can directly use the first information, instead of the first information sent by other devices to the first device 101.
  • Figure 3 is a data stream type identification method provided by an embodiment of the present invention.
  • the method can be implemented based on the architecture shown in Figure 1.
  • the method includes but is not limited to the following steps:
  • Step S301 The third device determines the first data stream type corresponding to the current data stream according to the message information of the current data stream and the behavior recognition model.
  • the third device determines that the first data stream type corresponding to the current data stream only uses the message information and behavior recognition model of the current data stream, and the specific implementation is as follows.
  • the behavior recognition model is a model obtained based on the message information and data stream types of multiple data stream samples; optionally, the multiple data stream samples may be offline samples, that is, the behavior recognition model may be obtained by offline training Model.
  • the multiple data stream samples may also be pre-selected typical (or representative) samples.
  • the data stream packets of a video conference application usually have a relatively long packet length, but occasionally there may be packets. When the message length is relatively short, in comparison, the relatively long message length can better reflect that the current data stream is the data stream of the video conference application. Therefore, when selecting the data stream for the video conference application, try to choose the message length comparison The long one is representative as a data stream sample.
  • the data stream type of the multiple data stream samples may be manually determined, that is, manual labeling.
  • the behavior recognition model is a model obtained based on the message information and data stream types of multiple data stream samples, the behavior recognition model can reflect some relationships between the message information in a data stream and the data stream type, Therefore, when the message information of the current data stream is input to the behavior recognition model, it can predict to a certain extent the tendency (or probability) that the current data stream belongs to a certain or certain data stream types, and reflect the tendency (or probability).
  • the parameter of probability can also be called confidence.
  • the message information may include one or more of message length, message transmission speed, message interval time, and message direction.
  • the message length includes the length of the Ethernet frame in the message.
  • the message information can also include other features, such as message length, message transmission speed, message interval time, and maximum, minimum, average, variance, and variance in the message direction. Quantile etc.
  • the message information can be input into the behavior recognition model in the form of a vector, for example, in the form of (message length, message transmission speed, message interval time).
  • the data stream type in the embodiment of the present application may also be referred to as an application type.
  • N there may be N data stream types, and N is greater than or equal to 1.
  • This embodiment of the application can estimate (or predict) the confidence that the current data stream belongs to each of the N data stream types , That is, the N first confidence levels corresponding to the N data stream types of the current data stream are obtained.
  • the N data stream types refer to the data stream type of a video conference and the data stream of a voice conference Type, the data stream type of the desktop cloud
  • the behavior recognition model needs to be used to estimate the first confidence that the current data stream belongs to the data stream type of the video conference, the first confidence of the data stream type of the voice conference, and the first confidence of the data stream type of the desktop cloud.
  • the first confidence level of the data stream type If the N data stream types refer to the data stream types of the video conference, the behavior recognition model needs to be used to estimate the first confidence that the current data stream belongs to the data stream type of the video conference.
  • a data stream type can be selected as the first data stream type corresponding to the current data stream based on the first confidence level corresponding to these data stream types, for example , Corresponding to which data stream type has the highest confidence, then this data stream type is taken as the first data stream type corresponding to the current data stream.
  • this data stream type is taken as the first data stream type corresponding to the current data stream.
  • the third device determines that the first data stream type corresponding to the current data stream not only uses the message information and behavior recognition model of the current data stream, but also the feature information and content recognition model, which specifically includes The following parts:
  • the first part the third device obtains at least one first confidence level of the current data flow corresponding to at least one data flow type according to the message information and the behavior recognition model of the current data flow.
  • the behavior recognition model is a model obtained based on the message information and data stream types of multiple data stream samples; optionally, the multiple data stream samples may be offline samples, that is, the behavior recognition model may be offline training The resulting model.
  • the multiple data stream samples may also be pre-selected typical (or representative) samples.
  • the data stream packets of a video conference application usually have a relatively long packet length, but occasionally there may be packets. When the message length is relatively short, in comparison, the relatively long message length can better reflect that the current data stream is the data stream of the video conference application. Therefore, when selecting the data stream for the video conference application, try to choose the message length comparison The long one is representative as a data stream sample.
  • the data stream type of the multiple data stream samples may be manually determined, that is, manual labeling.
  • the behavior recognition model is a model obtained based on the message information and data stream types of multiple data stream samples, the behavior recognition model can reflect some relationships between the message information in a data stream and the data stream type, Therefore, when the message information of the current data flow is input to the behavior recognition model, it can predict to a certain extent the tendency (or probability) that the current data flow belongs to a certain or certain data flow types, and reflect the tendency (or the probability).
  • the parameter of probability can also be called confidence.
  • the message information may include one or more of message length, message transmission speed, message interval time, and message direction.
  • the message length includes the length of the Ethernet frame in the message.
  • the message information can also include other features, such as message length, message transmission speed, message interval time, and maximum, minimum, average, variance, and variance in the message direction. Quantile etc.
  • the message information can be input into the behavior recognition model in the form of a vector, for example, in the form of (message length, message transmission speed, message interval time).
  • the data stream type in the embodiment of the present application may also be referred to as an application type.
  • N there may be N data stream types, and N is greater than or equal to 1.
  • This embodiment of the application can estimate (or predict) the confidence that the current data stream belongs to each of the N data stream types , That is, the N first confidence levels corresponding to the N data stream types of the current data stream are obtained.
  • the N data stream types refer to the data stream type of a video conference and the data stream of a voice conference Type, the data stream type of the desktop cloud
  • the behavior recognition model needs to be used to estimate the first confidence that the current data stream belongs to the data stream type of the video conference, the first confidence of the data stream type of the voice conference, and the first confidence of the data stream type of the desktop cloud.
  • the first confidence level of the data stream type If the N data stream types refer to the data stream types of the video conference, the behavior recognition model needs to be used to estimate the first confidence that the current data stream belongs to the data stream type of the video conference.
  • the embodiment of this application focuses on one of the data stream types. Therefore, the embodiment of this application only estimates (or predicts) that the current data stream belongs to the data stream of this focus.
  • the type of confidence is to obtain a first confidence of the current data stream corresponding to a data stream type. For example, if the multiple data stream types refer to the data stream type of a video conference or the type of a voice conference. Data stream type, desktop cloud data stream type, but the embodiment of this application only focuses on the data stream type of the video conference, so it is only necessary to use the behavior recognition model to estimate the first confidence that the current data stream belongs to the data stream type of the video conference. .
  • the second part the third device obtains at least one second confidence level of the current data stream corresponding to the at least one data stream type according to the characteristic information of the current data stream and the content recognition model.
  • the content recognition model is a model obtained based on the characteristic information and data stream types of one or more historical data streams.
  • the one or more historical data streams may be online data streams, that is, one or more data streams continuously generated in a period of time before that, and the data stream type of the historical data stream is identified by the above behavior Model recognition, that is, the content recognition model can be a model obtained through online training.
  • the content recognition model is a model obtained based on the characteristic information of one or more historical data streams and the data stream type
  • the content recognition model can reflect some relationships between the characteristic information in a data stream and the data stream type, Therefore, when the characteristic information of the current data stream is input to the content recognition model, it can predict to a certain extent the tendency (or probability) that the current data stream belongs to a certain or certain data stream types, and reflect the tendency (or probability)
  • the parameter of) can also be called confidence.
  • the feature information may include one or more of the destination address, protocol type, and port number.
  • the destination address may be an IP address, a destination MAC address, or other forms. Address;
  • the feature information can include other features in addition to the features exemplified here.
  • the feature information here may be target-specific information, for example, target IP, target port, and so on.
  • the feature information can be input into the content recognition model in the form of vectors, and can be in the form of (ip, port, protocol), for example (10.29.74.5, 8443, 6). It can also be in the form of (mac, port, protocol), for example (05FA1525EEFF, 8443, 6). Of course, it can also be in other forms, and I will not give examples one by one here.
  • the embodiment of this application can estimate (or predict) the confidence that the current data stream belongs to each of the N data stream types, that is, obtain the current The N second confidence levels of the data stream corresponding to the N data stream types.
  • the content recognition model needs to be used to estimate the second confidence that the current data stream belongs to the data stream type of the video conference, the second confidence of the data stream type of the voice conference, and the second confidence of the data stream type of the desktop cloud. Confidence. If the N data stream types refer to the data stream type of the video conference, the content recognition model needs to be used to estimate the second confidence that the current data stream belongs to the data stream type of the video conference.
  • the embodiment of this application focuses on one of the data stream types. Therefore, the embodiment of this application only estimates (or predicts) that the current data stream belongs to the data stream of this focus.
  • the type of confidence that is, a second confidence of the current data stream corresponding to a data stream type is obtained.
  • the multiple data stream types refer to the data stream type of a video conference or the type of a voice conference.
  • Data stream type, desktop cloud data stream type but the embodiment of this application only focuses on the data stream type of the video conference, so only the content recognition model needs to estimate the second confidence that the current data stream belongs to the data stream type of the video conference. .
  • the third part the third device determines the first data stream type of the current data stream according to the at least one first confidence level and the at least one second confidence level.
  • At least one first degree of confidence can characterize the data type tendency of the current data stream to a certain extent
  • at least one second confidence degree can also characterize the data stream type tendency of the current data stream to a certain extent, so the difference between the two Comprehensive consideration can obtain a more accurate and credible data stream type tendency, and thus the data stream type of the current data stream.
  • the data stream type determined in this way is called the first data stream type to facilitate subsequent description.
  • the data stream type is determined as the data stream type of the current data stream, that is, it is determined as the first data stream type corresponding to the current data stream.
  • the comprehensive confidence of the data stream type of the video conference is 0.7; according to the data stream type of the voice conference
  • the first confidence level and the second confidence level of the data stream type of the voice meeting determine the comprehensive confidence level of the data stream type of the voice meeting to be 0.2; according to the first confidence level of the data stream type of the desktop cloud and the data of the desktop cloud
  • the second confidence level of the stream type determines that the comprehensive confidence level of the data stream type of the desktop cloud is 0.1; since the comprehensive confidence level of the data stream type of the video conference is the largest, the current data is estimated (also expressed as "prediction")
  • the data stream type of the stream is the data stream type of the video conference, that is, the data stream type of the video conference
  • the determining the first data stream type corresponding to the current data stream according to at least one first confidence level and at least one second confidence level may specifically be: according to the type corresponding to the target data stream
  • the calculation of the first degree of confidence, the weight value of the first degree of confidence, the second degree of confidence corresponding to the target data stream type, and the calculation of the weight value of the second degree of confidence correspond to the target data Comprehensive confidence level of the stream type
  • the target data stream type is any one of the at least one data stream type, that is, each data stream type in the at least one data stream type satisfies the target data here The characteristics of the stream type.
  • the comprehensive confidence corresponding to the target data stream type is greater than a first preset threshold, it is determined that the data stream type of the current data stream is the target data stream type, and then the target data stream type can be used as The first data stream type corresponding to the current data stream, for example, if the comprehensive confidence level corresponding to the data stream type of the video conference is greater than the first preset threshold, it is determined that the data stream type of the video conference corresponds to the current data stream If the comprehensive confidence level corresponding to the data stream type of the desktop cloud is greater than the first preset threshold, determine that the data stream type of the desktop cloud is the first data stream type corresponding to the current data stream .
  • the confidence weight vector (w1, w2) is (0.4, 0.6)
  • the first confidence weight which can also be regarded as the weight of the behavior recognition model
  • the second confidence weight which can also be regarded as Is the weight of the content recognition model
  • the first preset threshold ⁇ 1 of the data stream type is equal to 0.5.
  • the horizontal axis represents the sequence number of the data stream, and the vertical axis represents the message length.
  • the message length is greater than 0 for an uplink message, and the message length is less than 0 for a downlink message;
  • Both data stream a and data stream b are data stream types of the desktop cloud.
  • data stream b has upstream packets, which are relatively more representative of the characteristics of the desktop cloud scene. Therefore, the message behavior of data stream b is considered to be typical the behavior of.
  • Data flow a has no uplink messages for a long period of time, and it cannot be clearly indicated that it is a desktop cloud scenario.
  • the message behavior of data flow a is considered to be an atypical behavior; the behavior recognition model is usually based on the data flow of typical behaviors.
  • the behavior recognition model can identify the data stream type of data stream b, but cannot identify the data stream type of data stream a.
  • the characteristic information of data stream a and data stream b is as follows.
  • the protocol type of data stream a is TCP, the destination IP address is 10.129.74.5, and the destination port number is 8443.
  • the protocol type of data stream b is TCP, the destination IP address is 10.129.56.39, and the destination port number is 443.
  • the content recognition model recognizes that the second confidence level of the data stream type of the desktop cloud, the data stream type of the voice conference, and the data stream type of the video conference are all 0.
  • the behavior recognition model is based on the message information to recognize that the first confidence of the data stream type of the desktop cloud is 0.5, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
  • One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the content recognition model recognizes that the data stream type of the desktop cloud, the data stream type of the voice conference, and the second confidence of the data stream type of the video conference are all 0.
  • the behavior recognition model is based on the message information to recognize that the first confidence of the data stream type of the desktop cloud is 0.9, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
  • One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • Step S302 The third device determines the second data stream type corresponding to the current data stream according to the target correspondence relationship and the general characteristics of the current data stream, and the target correspondence relationship is a combination of multiple general characteristics and multiple data stream types. Correspondence.
  • the certain characteristics are the general features here.
  • the general feature may be a well-known port number, or a well-known DNS, and so on. Take the well-known port number as an example.
  • Port 20 is the FTP port.
  • the FTP port is usually used for download. Therefore, the corresponding data stream type is the data stream type of data download.
  • port 20 and data download can be established Correspondence between the types of data streams; take the well-known DNS as an example, the DNS address of the domain name www.163.com is 183.131.119.86, and the domain name www.163.com is a well-known web site, and its data stream type is The data stream type of the web page, so the correspondence between the address 183.131.119.86 and the data stream type of the web page can be established. According to the two examples cited here, the target correspondence can be as shown in Table 1:
  • the third device can determine the corresponding "data stream type for data download” according to the target correspondence and the general feature "port 20" in the current data stream. Therefore, the "data stream type for data download” is the determined second data stream type corresponding to the current data stream.
  • an address correction library in the third device may also be named with other names
  • the information in the above-mentioned target correspondence can be stored in the address correction library for the third device to determine Used when the second data stream type corresponds to the current data stream.
  • the content in the above-mentioned target correspondence relationship may be obtained by the machine's own identification, or it may be manually added; the content in the target correspondence relationship may also be updated at an appropriate time as needed.
  • each general feature in the above target correspondence relationship corresponds to a unique data stream type. Therefore, there may be cases where multiple general features correspond to the same data stream type (ie, many-to-one), or there may be multiple The common features correspond to different data stream types (ie, one-to-one).
  • Step S303 The third device sends the final data stream type of the current data stream to the operation and maintenance support system OSS.
  • the second data stream type corresponding to the current data stream is used as the final data stream type of the current data stream.
  • the first data stream corresponding to the current data stream is used as the final data stream type of the current data stream.
  • the first data stream type corresponding to the current data stream can also be directly used as the final data stream type of the current data stream.
  • the third device may send the final data stream type of the current data stream to the OSS every time the final data stream type of the current data stream is determined, for example, when the final data stream type of data stream a is generated for the first time, OSS sends the final data stream type of data stream a.
  • the final data stream type of data stream b is generated for the first time, the final data stream type of data stream b is sent to OSS.
  • Step S304 The OSS generates a flow control strategy for the current data flow according to the final data flow type of the current data flow. For example, if the final data stream type of the current data stream indicates that the current data stream is a data stream type of a video desktop cloud or a data stream type of a video conference, the current data stream is defined as a high-priority QoS.
  • Step S305 The OSS sends the flow control strategy to the third device or terminal.
  • the third device or terminal learns that the current data stream belongs to the high-priority QoS according to the flow control policy, when it finds that there are multiple data streams to be sent, the current data stream configured as high-priority is preferentially sent.
  • the aforementioned behavior recognition model and content recognition model can be updated, and the updated behavior model and the updated content recognition model are used by the third device or other devices to identify the data stream type of the new data stream. .
  • step S306 For the process of updating the behavior recognition model, please refer to step S306:
  • Step S306 The third device updates the behavior recognition model according to the first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream. Two different update schemes are exemplified below.
  • the third device sends the correction data corresponding to the current data stream to the first device, where:
  • the correction data corresponding to the current data stream includes the message information of the current data stream and the second data stream type corresponding to the current data stream.
  • the first device receives the correction data corresponding to the current data stream sent by the third device, and the first device determines whether it has cumulatively received the correction data corresponding to the M data streams from the third device, and if it has received M cumulatively
  • the behavior recognition model is trained according to the correction data corresponding to the M data streams to obtain a new behavior recognition model.
  • the M data streams are the accumulated amount from the effective date of the behavior recognition model to the current time (if the model is updated, the accumulation needs to be restarted). Or the M data streams are accumulated in a preset time period (for example, in the past 24 hours). Or the ratio of the M data streams to the total number of data streams that have been transmitted after the behavior recognition model takes effect exceeds a preset threshold.
  • the preset threshold is pre-configured to 10%.
  • the behavior recognition model takes effect at 00:00:00 on March 1, 2019, the total number of existing training data from that time to the current time is 10,000, and the floating rate (ie, the preset threshold) is 10 %, then from the effective time, if the correction data corresponding to 1000 data streams is added cumulatively, a new behavior recognition model must be trained.
  • the M data streams include the current data stream.
  • the corresponding correction data can be expressed as ⁇ the second data corresponding to the data stream A Stream type, s ⁇ message direction, message length, message time> triplet ⁇ , where s is a positive integer; for example, the second data stream type corresponding to data stream A is the data stream type of cloud desktop, And corresponding to three ⁇ message direction, message length, message time> triples, then it can be expressed as ⁇ cloud desktop data flow type, 3 ⁇ message direction, message length, message time> triples Group ⁇ , please refer to Table 2.
  • Table 2 shows a possible situation of the correction data in more detail.
  • “source” refers to the area from which the behavioral data of a data stream comes from. You can fill in the name of the corresponding network; or fill in the name of the third device from which it is sourced. According to the network topology, the name of the third device is mapped to the corresponding network.
  • the "message direction” can be represented by a value, with 0 for downlink and 1 for uplink; or directly by "uplink” and “downlink”.
  • the "message time” can be a time stamp; it can also be a relative time, that is, the first record of each data stream is 0, and the subsequent records of this stream are the time since the first record.
  • the behavior recognition model used by the third device also exists in the first device, so the first device uses the correction data corresponding to the M data streams as training data to train the behavior recognition model to obtain a new behavior recognition model. It is also possible that the first device uses the correction data corresponding to the M data streams as training data, and combines the batch training data stored in the history for training to obtain a new behavior recognition model.
  • the minimum loss function (loss) can be used in the training process to input the message information ⁇ message direction, length, time> in the correction data of the data stream into the behavior recognition model, so that the data output by the behavior recognition model The stream type is as close as possible to the data stream type in the correction data.
  • a new trained behavior recognition model can be obtained.
  • the first device After the training is performed for the correction functions corresponding to the M data streams, a new trained behavior recognition model can be obtained.
  • the first device After the first device is trained to obtain the new behavior recognition model, it sends first model data to the third device.
  • the first model data is the model data of the new behavior recognition model and is used to describe that the first device is based on the
  • the message information of the current data stream and the second data stream type corresponding to the current data stream are a new behavior recognition model obtained by training the behavior recognition model.
  • the behavior recognition model includes the model structure (such as the functional form of the model) and model parameters.
  • the first model data has at least the following situations. In case 1, the first model data is the parameter value of the model parameter of the new behavior recognition model. Case 2.
  • the first model data is the difference data between the new behavior recognition model and the behavior recognition model before training, usually the parameter values of the model parameters of the new behavior recognition model and the model parameters of the behavior recognition model before training.
  • the difference value of the parameter value may specifically be a matrix composed of parameter values of multiple parameters
  • the first model data (that is, the difference data) may specifically be composed of multiple parameter values.
  • the matrix of the difference may specifically be composed of multiple parameter values.
  • the matrix composed of the parameter values of the model parameters of the behavior recognition model before training is [a1, b1, c1, d1]
  • the matrix composed of the parameter values of the model parameters of the new behavior recognition model is [a2, b2, c2, d2]
  • the matrix composed of the difference of these 4 model parameters is [a2-a1, b2-b1, c2-c1, d2-d1].
  • the first model data is [a1, b1, c1, d1]
  • the first model data (that is, the difference data) is [a2-a1, b2-b1, c2- c1, d2-d1].
  • Case 3 The first model data includes the model structure of the new behavior recognition model and the parameter values of the model parameters, that is, the complete data of the new behavior recognition model.
  • the first model data may specifically be a model file.
  • the ai model files of the common open source keras library are h5 files/json files; the ai model files of the open source sklearn library are pkl/m files. These files are binary and are used to save the model structure and/or the parameter values of the model parameters.
  • the h5 file is used to describe the parameter values of the model parameters
  • the jason file is used to describe the model structure.
  • the first model data of case 1 and case 2 may both be h5 files
  • the first model data of case 3 may include h5 files and jason files.
  • the third device receives the first model data sent by the first device, and updates the behavior recognition model according to the received first model data. If the first model data is the complete data of the new behavior recognition model, the third device can directly load the first model data to generate the new behavior recognition model to replace the current behavior recognition model, and subsequently use the new behavior recognition model.
  • the behavior recognition model for data stream type recognition. If the first model data is the parameter value of the model parameter of the new behavior recognition model, the parameter value is substituted into the current behavior recognition model to replace the old parameter value, thereby obtaining the new behavior recognition model. If the first model data is the aforementioned difference data, a new parameter value is obtained according to the difference data and the parameter value of the model parameter of the current behavior recognition model, thereby obtaining a new behavior recognition model.
  • training the behavior recognition model according to the correction data corresponding to the M data streams to obtain a new behavior recognition model may be specifically: training the behavior recognition model only according to the correction data corresponding to the M data streams to obtain
  • the new behavior recognition model can also be specifically based on the correction data corresponding to the M data streams and the correction data corresponding to the Y data streams in other networks to train the behavior recognition model to obtain a new behavior recognition model, that is to say ,
  • the correction data used by the trained new behavior recognition model comes from at least two different networks, wherein the at least two different networks include two different local area networks, or the at least two different networks include two Different forms of networks, or the at least two different networks include networks of two different regions.
  • the network to which the M data streams belong can be called the first network, and the Y data streams are said to come from the A second network other than the first network; the first device trains the behavior recognition model according to the correction data corresponding to the M data streams to obtain a new behavior recognition model, which may be specifically:
  • the first device corrects the message information of the Y data streams according to the difference between the network configuration of the second network and the first network to obtain the corrected message information of the Y data streams ,
  • the characteristic information or message information is originally the same two data streams, because the network configuration of the network they come from is different, resulting in the final reflected characteristic information or message information being different.
  • the characteristic information or message information of the data flow of the second network is consistent with the measurement standard of the characteristic information or message information of the data flow from the first network (or unified into the same characteristic space), making it more comparable, and finally It is helpful to improve the accuracy of the trained behavior recognition model.
  • the packet length of the data stream in the second network is rewritten according to the MTU of the data stream in the first network.
  • the MTU of the data stream sent by a device in the first network is 1500
  • the second The MTU of a data stream sent by a device in the network is 1452
  • the information of a data stream sent by a device in the second network is shown in Table 3.
  • the packet time of the data flow in the second network is rewritten.
  • the average packet time of the data stream sent by the device in the network is 10% larger.
  • the information of a data stream sent by the device in the second network is shown in Table 5. Refer to the MTU pair of the data stream sent by the device in the first network. The information in Table 5 is shown in Table 6 after data mapping processing.
  • the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the The second data stream corresponding to the Y data streams trains the behavior recognition model to obtain a new behavior recognition model.
  • the protocol type of the network data stream d is UDP
  • the IP address is 1.2.3.4
  • the port number is 10050
  • the port number 10050 is a general feature
  • the data stream transmitted in this field is basically the data type of the voice conference, and the correspondence between the general feature of 10050 and the voice conference data stream type has been added to the above target correspondence, That is, it is stored in the address correction library.
  • the confidence weight vector (w1, w2), the first preset threshold ⁇ 1, and the second preset threshold ⁇ 2 remain unchanged.
  • the second confidence level of the content recognition model identifying the data stream type of the desktop cloud is 1, and the second confidence level of identifying the data stream type of the voice conference and the data stream type of the video conference is 0.
  • the behavior recognition model is based on the message information to recognize that the first confidence of the data stream type of the desktop cloud is 0.9, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
  • One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the second data stream corresponding to the current data stream is determined to be the data stream type of the voice conference according to the above-mentioned target correspondence and the general characteristics in the current data stream d, so the third device needs to send the correction data of the current data stream d to the first device.
  • the first device After the first device receives the correction data of the current data stream d sent by the third device, if it happens to accumulate 10 newly-added correction data from the third device, then the first device combines these 10 newly-added correction data Train the behavior recognition model to get a new behavior recognition model.
  • the updated behavior recognition model recognizes that the current data stream d is the data stream type of the voice conference with a first confidence of 0.9, then when the current data stream d appears again, the recognition process of the current data stream d is as follows:
  • the content recognition model recognizes that the second confidence of the data stream type of the desktop cloud is 1, and the second confidence of the data stream type of the voice conference and the video conference is 0.
  • the behavior recognition model is based on the message information to recognize that the first confidence of the data stream type of the desktop cloud is 0.1, the first confidence of the data stream type of the voice conference is 0.9, and the first confidence of the data stream type of the video conference is 0.9.
  • One confidence is 0.
  • the comprehensive confidence levels corresponding to the three data stream types are as follows.
  • the current data stream d is the data stream type of the voice conference, that is, the first data stream type corresponding to the current data stream d is the data stream type of the voice conference.
  • the second data stream corresponding to the current data stream is determined to be the data stream type of the voice conference according to the above-mentioned target correspondence and the general characteristics in the current data stream d
  • the first data stream type corresponding to the determined current data stream d ie The data stream types of the cloud desktop are the same, so the third device does not need to send the information for retraining the behavior recognition model to the first device.
  • the corresponding second data stream type recognizes the behavior model to obtain a new behavior recognition model.
  • it may be specifically: if the second data stream type of the current data stream and the first data stream type of the current data stream are different data stream types, and there are currently accumulated M data stream corresponding
  • the first data stream type is different from the second data stream type corresponding to the M data stream types, and training is performed according to the message information of the M data streams and the second data stream types corresponding to the M data streams.
  • the behavior recognition model is used to obtain a new behavior recognition model, where M data streams include the current data stream, and M is a preset reference threshold. How to train the behavior recognition model according to the message information of the M data streams and the second data stream types corresponding to the M data streams has been introduced in the previous scheme 1, and will not be repeated here.
  • step S307 For the process of updating the content recognition model, please refer to step S307:
  • Step S307 The third device updates the content recognition model according to the comprehensive confidence of the first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream. Two different types are exemplified below. Update plan.
  • the third device sends the characteristic information and second data of the current data stream to the second device Stream type, the target data stream type is one of the at least one data stream type. If the confidence of the target data stream type is greater than the first preset threshold ⁇ 1, then the target data stream type is determined to be corresponding to the current data stream For the first data stream type, the second preset threshold is greater than the first preset threshold.
  • the comprehensive confidence level of 0.3 corresponding to the data stream type of the desktop cloud is not within the interval ( ⁇ 1, ⁇ 2), so there is no need to send information such as the characteristic information of the current data stream to the second device;
  • the comprehensive confidence level of 0.54 corresponding to the data stream type of the desktop cloud is within the interval ( ⁇ 1, ⁇ 2), so it is necessary to send the characteristic information of the current data stream (such as the destination IP) to the second device.
  • the second device receives the characteristic information of the current data stream and the second data stream type (or the first data stream type) sent by the third device.
  • the second data stream type is sent.
  • One data stream type that is, there is one more data stream record on the second device, as shown in Figure 5, one more record for data stream b.
  • the second device trains the content recognition model according to the feature information of the current data stream and the type of the second data stream to obtain a new content recognition model,
  • the second device After the second device is trained to obtain the new content recognition model, it sends second model data to the third device.
  • the second model data is the model data of the new content recognition model and is used to describe that the second device is based on the
  • the feature information of the current data stream and the second data stream type corresponding to the current data stream are a new content recognition model obtained by training the content recognition model.
  • the content recognition model includes the model structure (such as the functional form of the model) and model parameters.
  • the second model data has at least the following situations. In case 1, the second model data is the parameter value of the model parameter of the new content recognition model.
  • the second model data is the difference data between the new content recognition model and the pre-training content recognition model, usually the parameter values of the model parameters of the new content recognition model and the model parameters of the pre-training content recognition model The difference value of the parameter value.
  • the second model data may specifically be a matrix composed of parameter values of multiple parameters
  • the second model data (that is, the difference data) may specifically be composed of multiple parameter values. The matrix of the difference.
  • the matrix composed of the parameter values of the model parameters of the content recognition model before training is [e1, f1, g1, h1]
  • the matrix composed of the parameter values of the model parameters of the new content recognition model is [e2, f2, g2, h2]
  • the matrix composed of the difference of these 4 model parameters is [e2-e1, f2-f1, g2-g1, h2-h1].
  • the second model data is [e1, f1, g1, h1]
  • the second model data (that is, the difference data) is [e2-e1, f2-f1, g2- g1, h2-h1].
  • Case 3 The second model data includes the model structure of the new content recognition model and the parameter values of the model parameters, that is, the complete data of the new content recognition model.
  • the second model data may specifically be a model file.
  • the ai model files of the common open source keras library are h5 files/json files; the ai model files of the open source sklearn library are pkl/m files. These files are binary and are used to save the model structure and/or the parameter values of the model parameters.
  • the h5 file is used to describe the parameter values of the model parameters
  • the jason file is used to describe the model structure.
  • the second model data of case 1 and case 2 may both be h5 files
  • the second model data of case 3 may include h5 files and jason files.
  • the third device receives the second model data sent by the second device, and updates the content recognition model according to the received second model data. If the second model data is the complete data of the new content recognition model, the third device can directly load the second model data to generate the new content recognition model to replace the current content recognition model, and then use the new content recognition model.
  • the new content recognition model recognizes the type of data stream. If the second model data is the parameter value of the model parameter of the new content recognition model, the parameter value is substituted into the current content recognition model to replace the old parameter value, thereby obtaining the new content recognition model.
  • the second model data is the difference between the parameter value of the model parameter in the new content recognition model and the parameter value of the model parameter in the content recognition model before the update, then the difference and the content recognition model before the update The parameter value of the model parameter obtains the new parameter value, thereby obtaining the new content recognition model.
  • the updated content recognition model reconfirms the input data
  • the estimated second confidence level of the data stream type corresponding to the desktop cloud is 1.
  • the characteristic information of data stream a is similar to the characteristic information of data stream b, for example, the destination IP address is in the same network segment, the port number is similar, and the protocol type is similar. Therefore, the updated content recognition model When the input data stream a is estimated, the estimated result will be closer to the estimation result of the data stream b.
  • the estimated second confidence level of the data stream type corresponding to the desktop cloud may be 0.6.
  • the confidence weight vector (w1, w2), the first preset threshold ⁇ 1, and the second preset threshold ⁇ 2 remain unchanged.
  • the second confidence level of the content recognition model identifying the data stream type of the desktop cloud is 0.6, and the second confidence level of identifying the data stream type of the voice conference and the data stream type of the video conference is 0.
  • the behavior recognition model is based on the message information to recognize that the first confidence of the data stream type of the desktop cloud is 0.5, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
  • One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the comprehensive confidence of identifying the data stream type of the desktop cloud is 0.54 within the interval ( ⁇ 1, ⁇ 2), it is necessary to send the characteristic information of the current data stream and the second data stream type (or the first data stream) to the second device.
  • Type when the second data stream type is not identified, the information of the first data stream type is sent to be used for subsequent update of the content recognition model (the update principle has been introduced before, and will not be repeated here).
  • the second confidence that the content recognition model recognizes the data stream type of the desktop cloud is 1, and the second confidence that it recognizes the data stream type of the voice conference and the data stream type of the video conference is 0.
  • the behavior recognition model is based on the message information to recognize that the first confidence of the data stream type of the desktop cloud is 0.9, the first confidence of the data stream type of the voice conference is 0, and the first confidence of the data stream type of the video conference is 0.
  • One confidence is 0. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the comprehensive confidence level of 0.94 for identifying the data stream type of the desktop cloud is not within the interval ( ⁇ 1, ⁇ 2), there is no need to send the characteristic information of the current data stream and the second data stream type (or the first data stream) to the second device.
  • the stream type when the second data stream type is not identified, the information of the first data stream type is sent.
  • the corresponding second data stream type is not determined through the above correspondence, but the corresponding first data stream type is finally determined through the behavior recognition model and the content recognition model, and it exists
  • the feature information of the first record of the multiple records is the same as the feature information of this data stream, but the data stream type of the first record is different from the first data stream type corresponding to this data stream, then the first The recorded data stream type is updated to the first data stream type corresponding to this data stream to obtain the second record; each of the multiple records includes characteristic information and data stream type; The recorded multiple records train the content recognition model to obtain a new behavior recognition model.
  • one of the multiple records includes the feature information of the current data stream and the corresponding second data stream type.
  • the reason for updating some of the above multiple records is that there is a cloud-based flexible deployment scenario in the network.
  • the destination IP address is 10.129.56.39
  • the destination port number is 443
  • the protocol type is TCP. Resources have changed from providing services for desktop clouds to providing services for video conferencing.
  • the content recognition model can be trained based on more accurate information, which is beneficial to improve the recognition accuracy of the content recognition model.
  • the protocol type of data stream c is TCP
  • the destination IP address is 10.129.56.40
  • the destination port number is 444.
  • the second confidence level of the content recognition model identifying the data stream type of the desktop cloud is 1, and the second confidence level of identifying the data stream type of the voice conference and the data stream type of the video conference is 0.
  • the behavior recognition model is based on the message information to recognize that the first confidence of the data stream type of the desktop cloud is 0, the first confidence of the data stream type of the voice conference is 0, the recognition is the first data stream type of the video conference The confidence level is 0.9. Therefore, the comprehensive confidence levels corresponding to these three data stream types are as follows.
  • the second device receives the characteristic information of the current data stream and the type of the first data stream sent by the third device, that is, there is a data stream record on the second device, as shown in FIG. 6, an additional one Record for data stream c.
  • the record for data stream c is compared with the existing record for data stream f, the data stream type (ie application category) is different but the characteristic information (such as destination IP address, destination port number, and protocol type) are the same, so the modification has been made.
  • Some records for data stream f make the data stream type corresponding to the protocol type TCP, destination IP address 10.129.56.40, and destination port number 444 the data stream type of the video conference, instead of the data stream type corresponding to the desktop cloud.
  • the record before modification is the first record, and the record after modification is the second record.
  • the content recognition model is trained according to multiple records including the second record to obtain a new content recognition model, and the data stream c is re-input to the updated (ie In the new) content recognition model, the second confidence level of identifying it as the data stream type of the desktop cloud is 0, and the second confidence level of identifying it as the data stream type of the video conference is 1, optional, here are multiple
  • the record includes the feature information corresponding to the current data stream and the second data stream type.
  • Solution 2 If the comprehensive confidence corresponding to the first target data stream type is less than a second preset threshold, the third device does not need to send the characteristic information of the current data stream and the second data stream type to the second device , It is up to oneself to update the content recognition model according to the characteristic information of the current data stream and the second data stream type to obtain a new content recognition model, and the target data stream type is the at least one data stream type If the confidence of the target data stream type is greater than the first preset threshold ⁇ 1, then the target data stream type is determined to be the first data stream type corresponding to the current data stream, and the second preset threshold is greater than the first data stream type corresponding to the current data stream. The first preset threshold.
  • the principle of the second solution can be referred to the first solution, and the operation performed by the second device of the first solution can be replaced by the third device.
  • the first data stream type and the second data stream type are Different data stream types generate correction data for updating the behavior recognition model, that is, training samples;
  • the correction data is the first data stream type corresponding to the current data stream of the device and the second data corresponding to the current data stream It is automatically obtained under different flow types without manual labeling, so the sample data obtained for training the behavior recognition model is more efficient.
  • the correction data is the message information and accurate data stream type generated when the recognition result of the behavior recognition model is inaccurate. Therefore, the subsequent update of the behavior recognition model based on the correction data can obtain a more accurate recognition effect. Behavior recognition model.
  • the first data stream type corresponding to the current data stream can be obtained according to the content recognition model and the behavior recognition model, and then the first data stream type can be corrected to obtain the final data stream type of the current data stream.
  • the behavior recognition model is pre-trained from the message information and data stream type of multiple data stream samples, and the content recognition model is trained based on the feature information of the data stream and the data stream type recognized by the behavior model; therefore, through The content recognition model and behavior recognition model analyze feature information, message information, etc., to more accurately predict the first data stream type corresponding to the current data stream.
  • the data stream type in the data stream sample used when training the content recognition model is recognized by the behavior recognition model there is no need to collect a large amount of data required for training, which solves the problem of insufficient data integrity.
  • the first device or the third device retrains the behavior recognition model with the relevant data that caused the deviation, and updates the behavior recognition model on the third device after training the new behavior recognition model.
  • This iterative update behavior The recognition model can meet the differentiated needs of different users, different networks, and different scenarios, with better generalization and greater versatility.
  • the comprehensive confidence is compared with the preset update threshold ⁇ 2 and it is found that it needs to be updated, it will be updated if there are multiple occurrences of the need to update.
  • the second device or the third device retrains the content recognition model with the relevant data whose comprehensive confidence is lower than ⁇ 2, and updates the content recognition model on the third device after training the new content recognition model. This iteration
  • the updated content recognition model can meet the differentiated needs of different users, different networks, and different scenarios, with better generalization and greater versatility.
  • FIG. 7 is an apparatus 70 for updating a data stream type identification model according to an embodiment of the present invention.
  • the apparatus 70 may be the third device in the method embodiment shown in FIG. 3 or a device in the third device. Or module.
  • the device 70 may include a first determining unit 701, a second determining unit 702, and an acquiring unit 703, wherein the detailed description of each unit is as follows.
  • the first determining unit 701 is configured to determine the first data stream type corresponding to the current data stream according to the message information of the current data stream and the behavior recognition model.
  • the message information includes the message length, the message transmission speed, and the message information.
  • One or more of message interval time and message direction, and the behavior recognition model is a model obtained by training message information and data flow types of multiple data flow samples;
  • the second determining unit 702 is configured to determine the second data stream type corresponding to the current data stream according to the target correspondence relationship and the general characteristics of the current data stream, where the target correspondence relationship includes multiple general characteristics and multiple general characteristics. Correspondence of data stream types;
  • the obtaining unit 703 is configured to obtain correction data corresponding to the current data stream when the first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream are different, where:
  • the correction data corresponding to the current data stream includes message information of the current data stream and a second data stream type corresponding to the current data stream, and the correction data is used as a training sample to update the behavior recognition model.
  • the correction data is the difference between the device's first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream. In this case, it is automatically obtained without manual labeling, so the obtained sample data for training the behavior recognition model is more efficient.
  • the correction data is the message information and accurate data stream type generated when the recognition result of the behavior recognition model is inaccurate. Therefore, the subsequent update of the behavior recognition model based on the correction data can obtain a more accurate recognition effect. Behavior recognition model.
  • the second determining unit is specifically configured to: If the general feature of the current data stream is the same as the first general feature in the corresponding relationship, the data stream type corresponding to the first general feature is taken as the second data stream type corresponding to the current data stream.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the first sending unit is configured to send the correction data corresponding to the current data stream to the first device after the acquiring unit acquires the correction data corresponding to the current data stream, wherein the correction data corresponding to the current data stream Including the message information of the current data flow and the second data flow type corresponding to the current data flow;
  • the first receiving unit is configured to receive first model data sent by the first device, where the first model data is used to describe the current data and message information according to the current data flow by the first device
  • the second data stream type corresponding to the stream is a new behavior recognition model obtained by training the behavior recognition model.
  • the operation of training a new behavior recognition model is implemented by the designated first device with strong computing capability, and the third device can recognize its own behavior only according to the new model parameters sent by the first device. Model, so that the third device can use the main computing resources for message forwarding, which effectively guarantees the message forwarding performance of the third device.
  • the first update unit is configured to update the behavior recognition model according to the correction data after the acquisition unit acquires the correction data corresponding to the current data stream, so as to obtain a new behavior recognition model.
  • the operation of training the behavior recognition model is implemented by the third device, which is equivalent to using the behavior recognition model and the training behavior recognition model on the same device.
  • the first updating unit is specifically configured to:
  • the second data stream type corresponding to each data stream trains the behavior recognition model to obtain a new behavior recognition model, wherein the M data streams are the cumulative amount from the effective of the behavior recognition model to the current date, or The cumulative amount within a preset time period, or the ratio of the M data streams to the total number of data streams that have been transmitted after the behavior recognition model takes effect exceeds a preset threshold; the M data streams include the current data stream .
  • the behavior recognition model is trained according to the message information of the M data streams and the second data stream type corresponding to the M data streams to obtain new behavior recognition
  • the first update unit is specifically configured to:
  • the second data stream type corresponding to the M data streams According to the message information of the M data streams, the second data stream type corresponding to the M data streams, the message information of the Y data streams, and the second data stream type corresponding to the Y data streams, respectively Training the behavior recognition model to obtain a new behavior recognition model;
  • the Y data streams and the M data streams come from the same network, or,
  • the Y data streams and the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two Different forms of networks, or the at least two different networks include networks of two different regions.
  • the M data streams are from at least two different networks, according to the message information of the M data streams, the M data streams
  • the second data stream type corresponding to each data stream, the message information of the Y data streams, and the second data stream type corresponding to the Y data streams are trained on the behavior recognition model to obtain a new behavior recognition model.
  • the first update unit is specifically configured to:
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream type trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the first determining unit is specifically configured to:
  • the content recognition model is a model obtained from the characteristic information and data stream type of one or more historical data streams, and the data stream type of the historical data stream is obtained according to the behavior recognition model.
  • the first data stream type corresponding to the current data stream is obtained according to the content recognition model and the behavior recognition model, and then the first data stream type is corrected to obtain the final data stream type of the current data stream.
  • the behavior recognition model is pre-trained from the message information and data stream type of multiple data stream samples, and the content recognition model is trained based on the feature information of the data stream and the data stream type recognized by the behavior model; therefore, through The content recognition model and behavior recognition model analyze feature information, message information, etc., to more accurately predict the first data stream type corresponding to the current data stream.
  • the data stream type in the data stream sample used when training the content recognition model is recognized by the behavior recognition model there is no need to collect a large amount of data required for training, which solves the problem of insufficient data integrity.
  • the first data stream type A determination unit is specifically used for:
  • the first data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
  • the first determining unit specifically Used for:
  • the weight value of the first confidence level, the second confidence level corresponding to the target data stream type, and the weight value of the second confidence level Calculating a comprehensive confidence corresponding to the target data stream type, where the target data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the target data stream type is greater than a first preset threshold, it is determined that the target data stream type is the first data stream type corresponding to the current data stream.
  • the second sending unit is configured to send the characteristic information of the current data stream and the second device to the second device when the comprehensive confidence corresponding to the target data stream type is less than a second preset threshold.
  • Data stream type, the second preset threshold is greater than the first preset threshold;
  • the second receiving unit is configured to receive second model data sent by the second device, where the second model data is used to describe the second device according to the feature information of the current data stream and the second data Stream type is a new content recognition model obtained by training the content recognition model.
  • the inventor of the present application updates the content recognition model by using the identification result of the data stream type of the current data stream. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, information about the current data stream is sent to the second device for training, It is used to obtain a new content recognition model to make the next determination result more accurate.
  • the second update unit is configured to update according to the characteristic information of the current data stream and the second data stream type when the comprehensive confidence corresponding to the target data stream type is less than a second preset threshold.
  • the content recognition model is used to obtain a new content recognition model, and the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application updates the content recognition model by using the identification result of the data stream type of the current data stream. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is trained to obtain new content Identify the model to make the next determination result more accurate.
  • the third sending unit is configured to send the data stream type corresponding to the current data stream to the operation and maintenance support system OSS after the second determining unit determines the second data stream type corresponding to the current data stream according to the target correspondence relationship and the general characteristics of the current data stream.
  • the second data flow type corresponding to the current data flow, and the information of the second data flow type of the current data flow is used by the OSS to generate a flow control policy for the current data flow.
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the message length includes one or more of the Ethernet frame length, the IP message length, the transmission protocol message length, and the header length in the message
  • the transmission protocol includes transmission Control protocol TCP and/or User Datagram Protocol UDP. It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
  • FIG. 8 is an apparatus 80 for updating a data stream type identification model provided by an embodiment of the present invention.
  • the apparatus 80 may be the first device in the method embodiment shown in FIG. 3 or a device in the first device. Or module.
  • the device 80 may include a first receiving unit 801, an acquiring unit 802, and a first sending unit 803, wherein the detailed description of each unit is as follows.
  • the first receiving unit 801 is configured to receive correction data corresponding to the current data stream sent by the third device, where the correction data corresponding to the current data stream includes the message information of the current data stream and the current data stream correspondence
  • the second data stream type of the current data stream; the second data stream type corresponding to the current data stream is determined by the third device according to the target correspondence relationship and the general characteristics of the current data stream, and the target correspondence relationship is multiple general Correspondence between features and multiple data stream types;
  • the acquiring unit 802 is configured to train the behavior recognition model according to the correction data corresponding to the M data streams in the case of accumulatively receiving the correction data corresponding to the M data streams from the third device to obtain new The behavior recognition model;
  • the M data streams are the cumulative amount from the effect of the behavior recognition model to the current date, or the cumulative amount within a preset time period, or the M data streams account for the behavior recognition
  • the ratio of the total number of data streams that have been transmitted after the model takes effect exceeds a preset threshold; the M data streams include the current data stream;
  • the first sending unit 803 is configured to send first model data to the third device, where the first model data is used to describe the new behavior recognition model, and the behavior recognition model is based on multiple data stream samples A model obtained from the message information and the data stream type, the behavior recognition model is used to determine the data stream type of the data stream to be predicted according to the input message information of the data stream to be predicted; the message information includes the message length , One or more of message transmission speed, message interval time, and message direction.
  • the first device when the first device accumulates a certain amount of correction data from the third device, it trains the behavior recognition model based on the certain amount of correction data to obtain a new behavior recognition model, and then trains the new behavior recognition model.
  • the behavior recognition model of sends to the third device information describing the new behavior recognition model, so that the third device can update the behavior recognition model on the third device.
  • the above method does not require the third device to perform model training, and only needs to directly obtain a new behavior recognition model based on the model training result of the first device, which is beneficial for the third device to make full use of computing resources to identify the data stream type.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the correction data corresponding to the current data stream is the first data stream type corresponding to the current data stream and the second data stream corresponding to the current data stream by the third device
  • the first data stream type corresponding to the current data stream is the third device according to the message information, characteristic information, behavior recognition model, and content of the current data stream.
  • the identification model is determined; the feature information includes one or more of the destination address and the protocol type, the content identification model is obtained based on the feature information and data stream type of one or more historical data streams, the historical The data stream type of the data stream is obtained according to the behavior recognition model.
  • the acquiring unit is specifically configured to:
  • the Y data streams and the M data streams are from the same network, or,
  • the Y data streams and the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two Different forms of networks, or the at least two different networks include networks of two different regions.
  • the acquiring unit is specifically configured to:
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the second receiving unit is configured to receive the characteristic information of the current data stream and the information of the second data stream type sent by the third device;
  • a generating unit configured to train the content recognition model according to the feature information of the current data stream and the second data stream type to obtain a new content recognition model
  • the second sending unit is configured to send second model data to the third device, the second model data is used to describe the new content recognition model, and the content recognition model is based on one or more historical data streams
  • the content recognition model is used to estimate the data stream type of the data stream to be predicted according to the input characteristic information of the data stream to be predicted, wherein the data of the historical data stream
  • the stream type is obtained according to a behavior recognition model, which is a model obtained according to the message information and data stream types of multiple data stream samples, and the message information includes message length, message transmission speed, and message information.
  • One or more of message interval time and message direction, and the characteristic information includes one or more of destination address and protocol type.
  • the first device in the process of identifying the data stream type by the third device through the trained content recognition model, if the accuracy of the model is found to be low, the first device is triggered to retrain the content recognition model in combination with related data, and After training a new content recognition model, update the content recognition model on the third device.
  • This iteratively updated content recognition model can meet the differentiated needs of different users, different networks, and different scenarios, and has better generalization. , Stronger versatility.
  • each unit may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
  • FIG. 9 is a device 90 provided by an embodiment of the present invention.
  • the device 90 may be the third device in the method embodiment shown in FIG. 3.
  • the device 90 includes a processor 901, a memory 902, and a communication interface. 903.
  • the processor 901, the memory 902, and the communication interface 903 are connected to each other through a bus.
  • the memory 902 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or A portable read-only memory (compact disc read-only memory, CD-ROM), the memory 902 is used for related computer programs and data.
  • the communication interface 903 is used to receive and send data.
  • the processor 901 may be one or more central processing units (CPU).
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 901 reads the computer program code stored in the memory 902, and is used to perform the following operations:
  • the message information includes message length, message transmission speed, message interval time, and message direction.
  • the behavior recognition model is a model obtained by training the message information and data stream types of multiple data stream samples;
  • the correction data corresponding to the current data stream is acquired, where the current data stream corresponds to
  • the correction data includes message information of the current data stream and a second data stream type corresponding to the current data stream, and the correction data is used as a training sample to update the behavior recognition model.
  • the correction data is the difference between the device's first data stream type corresponding to the current data stream and the second data stream type corresponding to the current data stream. In this case, it is automatically obtained without manual labeling, so the obtained sample data for training the behavior recognition model is more efficient.
  • the correction data is the message information and accurate data stream type generated when the recognition result of the behavior recognition model is inaccurate. Therefore, the subsequent update of the behavior recognition model based on the correction data can obtain a more accurate recognition effect. Behavior recognition model.
  • the processor is specifically configured to:
  • the data stream type corresponding to the first general feature is taken as the second data stream type corresponding to the current data stream.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the processor is further configured to:
  • the correction data corresponding to the current data stream is sent to the first device through the communication interface, where the correction data corresponding to the current data stream includes the message information of the current data stream and the first data stream corresponding to the current data stream.
  • the second data stream type is the first model data obtained by training the behavior recognition model, where the first model data is used to describe a new behavior recognition model obtained by training the behavior recognition model.
  • the operation of training a new behavior recognition model is implemented by the designated first device with strong computing capability, and the third device can recognize its own behavior only according to the new model parameters sent by the first device. Model, so that the third device can use the main computing resources for message forwarding, which effectively guarantees the message forwarding performance of the third device.
  • the processor is specifically configured to: update the behavior recognition model according to the correction data to obtain new behavior recognition Model.
  • the operation of training the behavior recognition model is implemented by the third device, which is equivalent to using the behavior recognition model and the training behavior recognition model on the same device.
  • the processor is specifically configured to:
  • the second data stream type corresponding to each flow trains the behavior recognition model to obtain a new behavior recognition model, where the M data streams are the cumulative amount from the effective of the behavior recognition model to the current time, or The cumulative amount within a preset time period, or the ratio of the M data streams to the total number of data streams that have been transmitted after the behavior recognition model takes effect exceeds a preset threshold; the M data streams include the current data Flow, M is the preset reference threshold.
  • the behavior recognition model is trained according to the message information of the M data streams and the second data stream type corresponding to the M data streams to obtain a new behavior
  • the processor is specifically configured to:
  • the second data stream type corresponding to the M data streams According to the message information of the M data streams, the second data stream type corresponding to the M data streams, the message information of the Y data streams, and the second data stream type corresponding to the Y data streams, respectively Training the behavior recognition model to obtain a new behavior recognition model;
  • the Y data streams and the M data streams come from the same network, or,
  • the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two different forms of networks, or The at least two different networks include two different regional networks.
  • the M data streams are from at least two different networks, according to the message information of the M data streams, the M data streams
  • the behavior recognition model is trained on the second data stream type, the message information of the Y data streams and the second data stream type respectively corresponding to the Y data streams to obtain a new behavior recognition model.
  • the processor is specifically used for:
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream type trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the processor is specifically configured to:
  • the content recognition model is a model obtained from the characteristic information and data stream type of one or more historical data streams, and the data stream type of the historical data stream is obtained according to the behavior recognition model.
  • the first data stream type corresponding to the current data stream is obtained according to the content recognition model and the behavior recognition model, and then the first data stream type is corrected to obtain the final data stream type of the current data stream.
  • the behavior recognition model is pre-trained from the message information and data stream type of multiple data stream samples, and the content recognition model is trained based on the feature information of the data stream and the data stream type recognized by the behavior model; therefore, through The content recognition model and behavior recognition model analyze feature information, message information, etc., to more accurately predict the first data stream type corresponding to the current data stream.
  • the data stream type in the data stream sample used when training the content recognition model is recognized by the behavior recognition model there is no need to collect a large amount of data required for training, which solves the problem of insufficient data integrity.
  • the processor specifically Used for:
  • the first data stream type of the current data stream is determined according to the at least one first confidence level and the at least one second confidence level.
  • the processor is specifically configured to :
  • the weight value of the first confidence level, the second confidence level corresponding to the target data stream type, and the weight value of the second confidence level Calculating a comprehensive confidence corresponding to the target data stream type, where the target data stream type is any one of the at least one data stream type;
  • the comprehensive confidence corresponding to the target data stream type is greater than a first preset threshold, it is determined that the target data stream type is the first data stream type corresponding to the current data stream.
  • the processor is further configured to:
  • the second preset threshold is greater than the first preset threshold
  • the inventor of the present application updates the content recognition model by using the identification result of the data stream type of the current data stream. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, information about the current data stream is sent to the second device for training, It is used to obtain a new content recognition model to make the next determination result more accurate.
  • the processor is further configured to:
  • the content recognition model is updated according to the characteristic information of the current data stream and the second data stream type to obtain a new In the content recognition model, the second preset threshold is greater than the first preset threshold.
  • the inventor of the present application updates the content recognition model by using the identification result of the data stream type of the current data stream. Specifically, a second preset threshold is introduced, and when the comprehensive confidence corresponding to the first data stream type is less than the second preset threshold, the relevant information of the current data stream is trained to obtain new content Identify the model to make the next determination result more accurate.
  • the processor is further configured to:
  • the second data stream type corresponding to the current data stream is sent to the operation and maintenance support system OSS through the communication interface, and the information of the second data stream type of the current data stream is used by the OSS to generate information for the current data stream Flow control strategy.
  • the relevant information of the current data flow type is notified to the OSS system, so that the OSS system can generate the current data based on the data flow type of the current data flow.
  • the flow control strategy of the stream for example, when the first data stream type of the current data stream is a video stream of a video conference, the corresponding flow control strategy is defined as a priority transmission strategy, that is, when there are multiple data streams to be transmitted, priority is given Transmit the current data stream.
  • the message length includes one or more of the Ethernet frame length, the IP message length, the transmission protocol message length, and the header length in the message
  • the transmission protocol includes transmission Control protocol TCP and/or User Datagram Protocol UDP.
  • each operation may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
  • FIG. 10 is a device 100 provided by an embodiment of the present invention.
  • the device 100 may be the first device in the method embodiment shown in FIG. 3.
  • the device 100 includes a processor 1001, a memory 1002, and a communication interface. 1003.
  • the processor 1001, the memory 1002, and the communication interface 1003 are connected to each other through a bus.
  • the memory 1002 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or A portable read-only memory (compact disc read-only memory, CD-ROM), the memory 1002 is used for related computer programs and data.
  • the communication interface 1003 is used to receive and send data.
  • the processor 1001 may be one or more central processing units (CPUs).
  • CPUs central processing units
  • the processor 1001 is a CPU
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 1001 reads the computer program code stored in the memory 1002, and is used to perform the following operations:
  • the correction data corresponding to the current data stream sent by the third device is received through the communication interface, where the correction data corresponding to the current data stream includes the message information of the current data stream and the second data stream corresponding to the current data stream.
  • Data stream type; the second data stream type corresponding to the current data stream is determined by the third device according to the target correspondence relationship and the general characteristics of the current data stream, and the target correspondence relationship is multiple general characteristics and multiple Correspondence of data stream types;
  • the behavior recognition model is trained according to the correction data corresponding to the M data streams to obtain a new behavior recognition model;
  • the number of data streams is the cumulative amount from the effective of the behavior recognition model to the current period, or the cumulative amount within a preset time period, or the M data streams account for the data streams that have been transmitted after the behavior recognition model is effective
  • the ratio of the total amount exceeds a preset threshold; M data streams include the current data stream;
  • the first model data is used to describe the new behavior recognition model
  • the behavior recognition model is message information based on multiple data stream samples
  • the behavior recognition model is used to determine the data stream type of the data stream to be predicted according to the input message information of the data stream to be predicted; the message information includes the length of the message and the message One or more of transmission speed, message interval time, and message direction.
  • the first device when the first device accumulates a certain amount of correction data from the third device, it trains the behavior recognition model based on the certain amount of correction data to obtain a new behavior recognition model, and then trains the new behavior recognition model.
  • the behavior recognition model of sends to the third device information describing the new behavior recognition model, so that the third device can update the behavior recognition model on the third device.
  • the above method does not require the third device to perform model training, and only needs to directly obtain a new behavior recognition model based on the model training result of the first device, which is beneficial for the third device to make full use of computing resources to identify the data stream type.
  • the general feature is a well-known port number or a well-known domain name system DNS.
  • the correction data corresponding to the current data stream is the first data stream type corresponding to the current data stream and the second data stream corresponding to the current data stream by the third device
  • the first data stream type corresponding to the current data stream is the third device according to the message information, characteristic information, behavior recognition model, and content of the current data stream.
  • the identification model is determined; the feature information includes one or more of the destination address and the protocol type, the content identification model is obtained based on the feature information and data stream type of one or more historical data streams, the historical The data stream type of the data stream is obtained according to the behavior recognition model.
  • the processor in terms of training the behavior recognition model according to the correction data corresponding to the M data streams to obtain a new behavior recognition model, the processor is specifically configured to:
  • the Y data streams and the M data streams are from the same network, or,
  • the Y data streams and the M data streams come from at least two different networks, where the at least two different networks include two different local area networks, or the at least two different networks include two Different forms of networks, or the at least two different networks include networks of two different regions.
  • the processor is specifically configured to:
  • the modified message information of the Y data streams the modified message information of the Y data streams, the second data stream type corresponding to the M data streams, and the second data stream corresponding to the Y data streams.
  • the data stream trains the behavior recognition model to obtain a new behavior recognition model.
  • the message information of data streams from different networks is normalized to make the message information of data streams from different networks more comparable.
  • the The behavior recognition model has better generalization and higher prediction accuracy.
  • the processor is further configured to:
  • the second model data is used to describe the new content recognition model
  • the content recognition model is based on the characteristics of one or more historical data streams Information and data stream type
  • the content recognition model is used to estimate the data stream type of the data stream to be predicted according to the input characteristic information of the data stream to be predicted, wherein the data stream type of the historical data stream It is obtained according to a behavior recognition model, which is a model obtained according to the message information and data stream types of multiple data stream samples
  • the message information includes message length, message transmission speed, and message interval One or more of time and message direction
  • the characteristic information includes one or more of destination address and protocol type.
  • the first device in the process of identifying the data stream type by the third device through the trained content recognition model, if the accuracy of the model is found to be low, the first device is triggered to retrain the content recognition model in combination with related data, and After training a new content recognition model, update the content recognition model on the third device.
  • This iteratively updated content recognition model can meet the differentiated needs of different users, different networks, and different scenarios, and has better generalization. , Stronger versatility.
  • each operation may also correspond to the corresponding description of the method embodiment shown in FIG. 3.
  • any of the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that there is a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art can understand and implement it without creative work.
  • An embodiment of the present invention also provides a chip system, the chip system includes at least one processor, a memory, and an interface circuit.
  • the memory, the transceiver, and the at least one processor are interconnected by wires, and the at least one memory
  • a computer program is stored therein; when the computer program is executed by the processor, the method flow shown in FIG. 3 is realized.
  • the embodiment of the present invention also provides a computer-readable storage medium in which a computer program is stored, and when it runs on a processor, the method flow shown in FIG. 3 is implemented.
  • the embodiment of the present invention also provides a computer program product.
  • the computer program product runs on a processor, the method flow shown in FIG. 3 is realized.
  • the first data stream type is identified according to the behavior recognition model
  • the second data stream type is identified according to the preset correspondence relation about the general characteristics
  • the correction data is the first data stream type corresponding to the current data stream and the current data stream corresponding to the device.
  • the second data stream is automatically obtained when the type is different, without manual labeling, so the obtained sample data for training the behavior recognition model is more efficient.
  • the correction data is the message information and accurate data stream type generated when the recognition result of the behavior recognition model is inaccurate. Therefore, the subsequent update of the behavior recognition model based on the correction data can obtain a more accurate recognition effect. Behavior recognition model.
  • the first data stream type corresponding to the current data stream can be obtained according to the content recognition model and the behavior recognition model, and then the first data stream type can be corrected to obtain the final data stream type of the current data stream.
  • the behavior recognition model is obtained by pre-training the message information and data stream type of multiple data stream samples
  • the content recognition model is obtained by training based on the characteristic information of the data stream and the data stream type recognized by the behavior model; therefore, through The content recognition model and behavior recognition model analyze feature information, message information, etc., to more accurately predict the first data stream type corresponding to the current data stream.
  • the data stream type in the data stream sample used when training the content recognition model is recognized by the behavior recognition model there is no need to collect a large amount of data required for training, which solves the problem of insufficient data integrity.
  • the first device or the third device retrains the behavior recognition model with the relevant data that caused the deviation, and updates the behavior recognition model on the third device after training the new behavior recognition model.
  • This iterative update behavior The recognition model can meet the differentiated needs of different users, different networks, and different scenarios, with better generalization and greater versatility.
  • the comprehensive confidence is compared with the preset update threshold ⁇ 2 and it is found that it needs to be updated, it will be updated if there are multiple occurrences of the need to update.
  • the second device or the third device retrains the content recognition model with the relevant data whose comprehensive confidence is lower than ⁇ 2, and updates the content recognition model on the third device after training the new content recognition model. This iteration
  • the updated content recognition model can meet the differentiated needs of different users, different networks, and different scenarios, with better generalization and greater versatility.
  • the computer program can be stored in a computer readable storage medium.
  • the computer program During execution, it may include the procedures of the foregoing method embodiments.
  • the aforementioned storage media include: ROM or random storage RAM, magnetic disks or optical disks and other media that can store computer program codes.
  • the first device, the first confidence level, the first data stream type, the first preset threshold, the second information, and the "first" in the first record mentioned in the embodiment of the present invention are only used for name identification, not Represents the first in the order. This rule also applies to “second”, “third” and “fourth”. However, the "first” in the first identifier mentioned in the embodiment of the present invention represents the first in order. This rule also applies to the "Nth”.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本申请实施例提供一种数据流类型识别模型更新方法及相关设备,该方法是包括:根据当前数据流的报文信息、行为识别模型确定当前数据流对应的第一数据流类型;根据目标对应关系和当前数据流的通用特征确定当前数据流对应的第二数据流类型,其中,目标对应关系为多个通用特征与多个数据流类型的对应关系;若当前数据流对应的第一数据流类型和当前数据流对应的第二数据流类型不同,则获取当前数据流对应的校正数据,其中,当前数据流对应的校正数据包括当前数据流的报文信息和当前数据流对应的第二数据流类型,校正数据用于作为训练样本更新行为识别模型。这种获得用于更新行为识别模型的训练样本的方式效率更高。

Description

一种数据流类型识别模型更新方法及相关设备
本申请要求于2020年2月28日提交中国专利局、申请号为202010130637.8、申请名称为“一种数据流类型识别模型更新方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机技术领域和通信领域,进一步涉及人工智能(Artificial Intelligence,AI)在计算机技术领域和通信领域的应用,尤其涉及一种数据流类型识别模型更新方法及相关设备。
背景技术
随着计算机技术的迅猛发展,越来越多的企业使用私有的办公类应用进行办公,如桌面云、语音会议、视频会议等均属于私有的办公类应用。为了合理安排各业务的流量提高业务的可靠性,通常需要合理配置QoS优先级、实时选路等。而QoS优先级、实时选路等的前提是需要获知当前的办公应用属于哪种类型的应用(即哪种数据流类型)。
目前主要采用如下方式来获知办公应用的类型,预先采集样本数据,然后通过人工或使用第三方工具对样本数据进行标注,之后对标注过的样本数据使用机器学习或者神经网络的算法进行模型的离线训练,再通过离线训练得到的模型预测现网流量的应用类型。然而,人为标注来获取训练样本的方式效率较低。
发明内容
本发明实施例公开了一种数据流类型识别模型更新方法及相关设备,能够更高效地获取用于更新行为识别模型的训练样本。
第一方面,本申请实施例提供一种数据流类型识别模型更新方法,该方法包括:
根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述行为识别模型为对多个数据流样本的报文信息和数据流类型进行训练得到的模型;
根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,其中,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
若所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同,则获取所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型,所述校正数据用于作为训练样本更新所述行为识别模型。
在上述方法中,根据行为识别模型识别出第一数据流类型,以及根据预设的关于通用 特征的对应关系识别出第二数据流类型之后,如果第一数据流类型和第二数据流类型不同则生成用于更新行为识别模型的校正数据,也就是训练样本;一方面,校正数据是设备在当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下自动获取的,无需人为打标签,因此得到的用于训练行为识别模型的样本数据的效率更高。另一方面,该校正数据是在行为识别模型识别结果不准确的情况下生成的报文信息和准确的数据流类型,因此后续基于该校正数据进行行为识别模型的更新,可以获得识别效果更准确的行为识别模型。
结合第一方面,在第一方面的第一种可能的实现方式中,所述根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,包括:若所述当前数据流的通用特征与所述对应关系中的第一通用特征相同,则将所述第一通用特征对应的数据流类型作为所述当前数据流对应的第二数据流类型。
结合第一方面,在第一方面的第二种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
结合第一方面,在第一方面的第三种可能的实现方式中,
所述获取所述当前数据流对应的校正数据之后,还包括:
向第一设备发送所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;
接收所述第一设备发送的第一模型数据,所述第一模型数据用于描述由所述第一设备根据所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型对所述行为识别模型进行训练得到的新的行为识别模型。
在该方法中,训练新的行为识别模型的操作由指定的具有较强计算能力的第一设备来实现,第三设备只需根据第一设备发送的新的模型参数就可以对自己的行为识别模型,这样该第三设备就可以将主要的计算资源用在报文转发上面,有效保证了第三设备的报文转发性能。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第四种可能的实现方式中,所述获取所述当前数据流对应的校正数据之后,还包括:根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型。
该方法里面,训练行为识别模型的操作由第三设备来实现,相当于使用行为识别模型和训练行为识别模型都在同一个设备上。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第五种可能的实现方式中,所述根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型,包括:
若当前已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,则根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,其中,所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流。
该方法里面,更新行为识别模型是存在触发条件的,具体就是看当前是否已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,通过合理配置M可以避免过于频繁更新行为识别模型而带来不必要的计算开销,也可以避免因更新频率不够而带来的行为识别模型预存不准确的问题。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第六种可能的实现方式中:所述根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,包括:
根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型;
其中,所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第七种可能的实现方式中,
若所述Y条数据流与所述M条数据流来自至少两个不同的网络,根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,包括:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的泛化性更好,预测准确度更高。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第八种可能的实现方式中,所述根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,包括:
根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型,所述特征信息包括目的地址和协议类型中的一项或者多项;所述内容识别模型为对一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到。
在上述方法中,根据内容识别模型和行为识别模型得到当前数据流对应的第一数据流 类型,然后对第一数据流类型进行校正得到当前数据流最终的数据流类型。其中,行为识别模型为由多个数据流样本的报文信息和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到;因此通过内容识别模型和行为识别模型对特征信息、报文信息等进行分析可以更准确地预测出当前数据流对应的第一数据流类型。并且由于训练内容识别模型时用到的数据流样本中的数据流类型是通过行为识别模型识别得到的,无需大量地去采集训练所需要的数据,解决了数据完整性不足的问题。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第九种可能的实现方式中,所述根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型,包括:
根据当前数据流的报文信息和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度;
根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度;
根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第十种可能的实现方式中,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型,包括:
根据对应于目标数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述目标数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述目标数据流类型的综合置信度,所述目标数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述目标数据流类型的所述综合置信度大于第一预设阈值,则确定所述目标数据流类型为所述当前数据流对应的第一数据流类型。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第十一种可能的实现方式中,所述方法还包括:
若对应于所述目标数据流类型的所述综合置信度小于第二预设阈值,则向第二设备发送所述当前数据流的特征信息和所述第二数据流类型,所述第二预设阈值大于所述第一预设阈值;
接收所述第二设备发送的第二模型数据,所述第二模型数据用于描述由所述第二设备根据所述当前数据流的特征信息和所述第二数据流类型训练所述内容识别模型得到的新的内容识别模型。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,将当前数据流的相关信息发送给第二设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第十二种可能的实现方式中,所述方法还包括:
若对应于所述目标数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第二数据流类型更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,对当前数据流的相关信息进行训练,以获得新的内容识别模型,以使得下一次的认定结果更加准确。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第十三种可能的实现方式中,所述根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型之后,还包括:
向运维支持系统OSS发送所述当前数据流对应的第二数据流类型,所述当前数据流的第二数据流类型的信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
结合第一方面,或者第一方面的上述任一种可能的实现方式,在第一方面的第十四种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP报文长度、传输协议报文长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
第二方面,本申请实施例提供一种数据流类型识别模型更新方法,该方法包括:
接收第三设备发送的当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;所述当前数据流对应的第二数据流类型为所述第三设备根据目标对应关系和所述当前数据流的通用特征确定的,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
若累计接收到了来自所述第三设备的M条数据流对应的校正数据,则根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型;所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流;
向所述第三设备发送第一模型数据,所述第一模型数据用于描述所述新的行为识别模型,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述行为识别模型用于根据输入的待预测数据流的报文信息确定所述待预测数据流的数据流类型;所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
在上述方法中,第一设备在累计一定数量的来自第三设备的校正数据的情况下,根据 该一定数量的校正数据对该行为识别模型进行训练得到新的行为识别模型,并在训练出新的行为识别模型向第三设备发送用于描述该新的行为识别模型的信息,以用于该第三设备对第三设备上的行为识别模型进行更新。
以上方法无需第三设备进行模型训练,只需基于第一设备的模型训练结果直接得到新的行为识别模型即可,有利于第三设备充分利用计算资源来进行数据流类型的识别。
结合第二方面,在第二方面的第一种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
结合第二方面,或者第二方面的上述任一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述当前数据流对应的校正数据是所述第三设备在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型为不同的数据流类型的情况下发送的,所述当前数据流对应的第一数据流类型是所述第三设备根据当前数据流的报文信息、特征信息、所述行为识别模型、内容识别模型确定的;所述特征信息包括目的地址和协议类型中的一项或者多项,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的,所述历史数据流的数据流类型是根据所述行为识别模型得到的。
结合第二方面,或者第二方面的上述任一种可能的实现方式,在第二方面的第三种可能的实现方式种,所述根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型,包括:
根据所述M条数据流对应的校正数据和Y条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型,其中:
所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
结合第二方面,或者第二方面的上述任一种可能的实现方式,在第二方面的第四种可能的实现方式中,若所所述Y条数据流与所述M条数据流来自至少两个不同的网络;所述根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型,包括:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流对所述行为识别模型进行训练得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的泛化性更好,预测准确度更高。
结合第二方面,或者第二方面的上述任一种可能的实现方式,在第二方面的第五种可 能的实现方式中,所述方法还包括:
接收所述第三设备发送的当前数据流的特征信息和第二数据流类型的信息;
根据所述当前数据流的特征信息和第二数据流类型对所述内容识别模型进行训练得到新的内容识别模型;
向所述第三设备发送第二模型数据,所述第二模型数据用于描述所述新的内容识别模型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述内容识别模型用于根据输入的待预测的数据流的特征信息估计所述待预测数据流的数据流类型,其中,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述特征信息包括目的地址和协议类型中的一项或多项。
在上述方法中,在第三设备通过已训练出的内容识别模型识别数据流类型的过程中,如果发现模型准确度不高,则触发第一设备结合相关数据重新训练该内容识别模型,并在训练出新的内容识别模型后对第三设备上的内容识别模型进行更新,这种迭代更新的内容识别模型的方式能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
第三方面,本申请实施例提供一种数据流类型识别模型更新设备,所述设备是第三设备,所述第三设备包括存储器和处理器,其中,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:
根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;所述行为识别模型为对多个数据流样本的报文信息和数据流类型进行训练得到的模型;
根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,其中,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
若根据所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同,则获取所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型,所述校正数据用于作为训练样本更新所述行为识别模型。
在上述方法中,根据行为识别模型识别出第一数据流类型,以及根据预设的关于通用特征的对应关系识别出第二数据流类型之后,如果第一数据流类型和第二数据流类型不同则生成用于更新行为识别模型的校正数据,也就是训练样本;一方面,校正数据是设备在当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下自动获取的,无需人为打标签,因此得到的用于训练行为识别模型的样本数据的效率更高。另一方面,该校正数据是在行为识别模型识别结果不准确的情况下生成的报文信息和准确的数据流类型,因此后续基于该校正数据进行行为识别模型的更新,可以获得识别效果更准确的行为识别模型。
结合第三方面,在第三方面的第一种可能的实现方式中,在根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,所述处理器具体用于:
若所述当前数据流的通用特征与所述对应关系中的第一通用特征相同,则将所述第一通用特征对应的数据流类型作为所述当前数据流对应的第二数据流类型。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第二种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第三种可能的实现方式中,
所述设备还包括通信接口,所述获取所述当前数据流对应的校正数据之后,所述处理器还用于:
通过所述通信接口向第一设备发送所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;
通过所述通信接口接收所述第一设备发送的第一模型数据,所述第一模型数据用于描述由所述第一设备根据所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型对所述行为识别模型进行训练得到的第一模型数据,其中,所述第一模型数据用于描述对所述行为识别模型进行训练得到的新的行为识别模型。
在该方法中,训练新的行为识别模型的操作由指定的具有较强计算能力的第一设备来实现,第三设备只需根据第一设备发送的新的模型参数就可以对自己的行为识别模型,这样该第三设备就可以将主要的计算资源用在报文转发上面,有效保证了第三设备的报文转发性能。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第四种可能的实现方式中,所述获取所述当前数据流对应的校正数据之后,所述处理器具体用于:根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型。
该方法里面,训练行为识别模型的操作由第三设备来实现,相当于使用行为识别模型和训练行为识别模型都在同一个设备上。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第五种可能的实现方式中,在所述根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型方面,所述处理器具体用于:
当前已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,则根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,其中,所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;所述M条数据流包括所述当前数据流,M为预设的参考阈值。
该方法里面,更新行为识别模型是存在触发条件的,具体就是看当前是否已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,通过合理配置M可以避免过于频繁更新行为识别模型而带来不必要的计算开销,也可以避免 因更新频率不够而带来的行为识别模型预存不准确的问题。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第六种可能的实现方式中:所述根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型方面,所述处理器具体用于:
根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型;
其中,所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第七种可能的实现方式中,
若所述Y条数据流与所述M条数据流来自至少两个不同的网络,根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型方面,所述处理器具体用于:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的泛化性更好,预测准确度更高。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第八种可能的实现方式中,在根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,所述处理器具体用于:
根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型,所述特征信息包括目的地址和协议类型中的一项或者多项;所述内容识别模型为对一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到。
在上述方法中,具体根据内容识别模型和行为识别模型得到当前数据流对应的第一数据流类型,然后对第一数据流类型进行校正得到当前数据流最终的数据流类型。其中,行为识别模型为由多个数据流样本的报文信息和数据流类型预先训练得到,而内容识别模型 是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到;因此通过内容识别模型和行为识别模型对特征信息、报文信息等进行分析可以更准确地预测出当前数据流对应的第一数据流类型。并且由于训练内容识别模型时用到的数据流样本中的数据流类型是通过行为识别模型识别得到的,无需大量地去采集训练所需要的数据,解决了数据完整性不足的问题。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第九种可能的实现方式中,在根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型方面,所述处理器具体用于:
根据当前数据流的报文信息和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度;
根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度;
根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第十种可能的实现方式中,在根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型方面,所述处理器具体用于:
根据对应于目标数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述目标数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述目标数据流类型的综合置信度,所述目标数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述目标数据流类型的所述综合置信度大于第一预设阈值,则确定所述目标数据流类型为所述当前数据流对应的第一数据流类型。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第十一种可能的实现方式中,所述设备还包括通信接口,所述处理器还用于:
若对应于所述目标数据流类型的所述综合置信度小于第二预设阈值,则通过所述通信接口向第二设备发送所述当前数据流的特征信息和所述第二数据流类型,所述第二预设阈值大于所述第一预设阈值;
通过所述通信接口接收所述第二设备发送的第二模型数据,所述第二模型数据用于描述由所述第二设备根据所述当前数据流的特征信息和所述第二数据流类型训练所述内容识别模型得到的新的内容识别模型。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,将当前数据流的相关信息发送给第二设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第十二种可能的实现方式中,所述处理器还用于:
若对应于所述目标数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第二数据流类型更新所述内容识别模型,以得到新的内容识别模 型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,对当前数据流的相关信息进行训练,以获得新的内容识别模型,以使得下一次的认定结果更加准确。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第十三种可能的实现方式中,在根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型之后,所述处理器还用于:
通过所述通信接口向运维支持系统OSS发送所述当前数据流对应的第二数据流类型,所述当前数据流的第二数据流类型的信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
结合第三方面,或者第三方面的上述任一种可能的实现方式,在第三方面的第十四种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP报文长度、传输协议报文长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
第四方面,本申请实施例提供一种数据流类型识别模型更新设备,所述设备是第一设备,所述第一设备包括存储器、处理器和通信接口,其中,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:
通过所述通信接口接收第三设备发送的当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;所述当前数据流对应的第二数据流类型为所述第三设备根据目标对应关系和所述当前数据流的通用特征确定的,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
若累计接收到了来自所述第三设备的M条数据流对应的校正数据,则根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型;所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流;
通过所述通信接口向所述第三设备发送第一模型数据,所述第一模型数据用于描述所述新的行为识别模型,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述行为识别模型用于根据输入的待预测数据流的报文信息确定所述待预测数据流的数据流类型;所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文 方向中的一项或者多项。
在上述方法中,第一设备在累计一定数量的来自第三设备的校正数据的情况下,根据该一定数量的校正数据对该行为识别模型进行训练得到新的行为识别模型,并在训练出新的行为识别模型向第三设备发送用于描述该新的行为识别模型的信息,以用于该第三设备对第三设备上的行为识别模型进行更新。
以上方法无需第三设备进行模型训练,只需基于第一设备的模型训练结果直接得到新的行为识别模型即可,有利于第三设备充分利用计算资源来进行数据流类型的识别。
结合第四方面,在第四方面的第一种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
结合第四方面,或者第四方面的上述任一种可能的实现方式,在第四方面的第二种可能的实现方式中,所述当前数据流对应的校正数据是所述第三设备在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型为不同的数据流类型的情况下发送的,所述当前数据流对应的第一数据流类型是所述第三设备根据当前数据流的报文信息、特征信息、所述行为识别模型、内容识别模型确定的;所述特征信息包括目的地址和协议类型中的一项或者多项,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的,所述历史数据流的数据流类型是根据所述行为识别模型得到的。
结合第四方面,或者第四方面的上述任一种可能的实现方式,在第四方面的第三种可能的实现方式中:所述根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述处理器具体用于:
根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型,其中:
所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
结合第四方面,或者第四方面的上述任一种可能的实现方式,在第四方面的第四种可能的实现方式中,若所所述Y条数据流与所述M条数据流来自至少两个不同的网络;所述根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述处理器具体用于:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流对所述行为识别模型进行训练得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的 泛化性更好,预测准确度更高。
结合第四方面,或者第四方面的上述任一种可能的实现方式,在第四方面的第五种可能的实现方式中,所述处理器还用于:
通过所述通信接口接收所述第三设备发送的当前数据流的特征信息和第二数据流类型;
根据所述当前数据流的特征信息和第二数据流类型对所述内容识别模型进行训练得到新的内容识别模型;
通过所述通信接口向所述第三设备发送第二模型数据,所述第二模型数据用于描述所述新的内容识别模型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述内容识别模型用于根据输入的待预测的数据流的特征信息估计所述待预测数据流的数据流类型,其中,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述特征信息包括目的地址和协议类型中的一项或多项。
在上述方法中,在第三设备通过已训练出的内容识别模型识别数据流类型的过程中,如果发现模型准确度不高,则触发第一设备结合相关数据重新训练该内容识别模型,并在训练出新的内容识别模型后对第三设备上的内容识别模型进行更新,这种迭代更新的内容识别模型的方式能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
第五方面,本申请实施例提供一种数据流类型识别模型更新装置,该装置为第三设备或者第三设备里面的模块或者器件,其包括:
第一确定单元,用于根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述行为识别模型为对多个数据流样本的报文信息和数据流类型进行训练得到的模型;
第二确定单元,用于根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,其中,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
获取单元,用于在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下,获取所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型,所述校正数据用于作为训练样本更新所述行为识别模型。
上述方法中,根据行为识别模型识别出第一数据流类型,以及根据预设的关于通用特征的对应关系识别出第二数据流类型之后,如果第一数据流类型和第二数据流类型不同则生成用于更新行为识别模型的校正数据,也就是训练样本;一方面,校正数据是设备在当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下自动获取的,无需人为打标签,因此得到的用于训练行为识别模型的样本数据的效率更高。另一方面,该校正数据是在行为识别模型识别结果不准确的情况下生成的报文信息和准确 的数据流类型,因此后续基于该校正数据进行行为识别模型的更新,可以获得识别效果更准确的行为识别模型。
结合第五方面,在第五方面的第一种可能的实现方式中,在根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型方面,所述第二确定单元具体用于:若所述当前数据流的通用特征与所述对应关系中的第一通用特征相同,则将所述第一通用特征对应的数据流类型作为所述当前数据流对应的第二数据流类型。
结合第五方面,在第五方面的第二种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
结合第五方面,在第五方面的第三种可能的实现方式中,还包括:
第一发送单元,用于在所述获取单元获取所述当前数据流对应的校正数据之后,向第一设备发送所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;
第一接收单元,用于接收所述第一设备发送的第一模型数据,所述第一模型数据用于描述由所述第一设备根据所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型对所述行为识别模型进行训练得到的新的行为识别模型。
在该方法中,训练新的行为识别模型的操作由指定的具有较强计算能力的第一设备来实现,第三设备只需根据第一设备发送的新的模型参数就可以对自己的行为识别模型,这样该第三设备就可以将主要的计算资源用在报文转发上面,有效保证了第三设备的报文转发性能。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第四种可能的实现方式中,还包括:
第一更新单元,用于在所述获取单元获取所述当前数据流对应的校正数据之后,根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型。
该方法里面,训练行为识别模型的操作由第三设备来实现,相当于使用行为识别模型和训练行为识别模型都在同一个设备上。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第五种可能的实现方式中,在根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型方面,所述第一更新单元具体用于:
若当前已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,则根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,其中,所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流。
该方法里面,更新行为识别模型是存在触发条件的,具体就是看当前是否已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,通过合理配置M可以避免过于频繁更新行为识别模型而带来不必要的计算开销,也可以避免因更新频率不够而带来的行为识别模型预存不准确的问题。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第六种可能的实现方式中:在根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型方面,所述第一更新单元具体用于:
根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型;
其中,所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第七种可能的实现方式中,在所述若所述Y条数据流与所述M条数据流来自至少两个不同的网络,根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型方面,所述第一更新单元具体用于:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的泛化性更好,预测准确度更高。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第八种可能的实现方式中,在根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型方面,所述第一确定单元具体用于:
根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型,所述特征信息包括目的地址和协议类型中的一项或者多项;所述内容识别模型为对一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到。
在上述方法中,根据内容识别模型和行为识别模型得到当前数据流对应的第一数据流类型,然后对第一数据流类型进行校正得到当前数据流最终的数据流类型。其中,行为识别模型为由多个数据流样本的报文信息和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到;因此通过内容识别模型和行为识别模型对特征信息、报文信息等进行分析可以更准确地预测出当前数据流对应的 第一数据流类型。并且由于训练内容识别模型时用到的数据流样本中的数据流类型是通过行为识别模型识别得到的,无需大量地去采集训练所需要的数据,解决了数据完整性不足的问题。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第九种可能的实现方式中,在所述根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型方面,所述第一确定单元具体用于:
根据当前数据流的报文信息和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度;
根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度;
根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第十种可能的实现方式中,在根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型方面,所述第一确定单元具体用于:
根据对应于目标数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述目标数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述目标数据流类型的综合置信度,所述目标数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述目标数据流类型的所述综合置信度大于第一预设阈值,则确定所述目标数据流类型为所述当前数据流对应的第一数据流类型。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第十一种可能的实现方式中,还包括:
第二发送单元,用于在对应于所述目标数据流类型的所述综合置信度小于第二预设阈值的情况下,向第二设备发送所述当前数据流的特征信息和所述第二数据流类型,所述第二预设阈值大于所述第一预设阈值;
第二接收单元,用于接收所述第二设备发送的第二模型数据,所述第二模型数据用于描述由所述第二设备根据所述当前数据流的特征信息和所述第二数据流类型的标识信息得到的新的内容识别模型。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,将当前数据流的相关信息发送给第二设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第十二种可能的实现方式中,还包括:
第二更新单元,用于在对应于所述目标数据流类型的所述综合置信度小于第二预设阈值的情况下,根据所述当前数据流的特征信息和所述第二数据流类型更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型 进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,对当前数据流的相关信息进行训练,以获得新的内容识别模型,以使得下一次的认定结果更加准确。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第十三种可能的实现方式中,还包括:
第三发送单元,用于在所述第二确定单元根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型之后,向运维支持系统OSS发送所述当前数据流对应的第二数据流类型,所述当前数据流的第二数据流类型的信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
结合第五方面,或者第五方面的上述任一种可能的实现方式,在第五方面的第十四种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP报文长度、传输协议报文长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
第六方面,本申请实施例提供一种数据流类型识别模型更新装置,该装置为第一设备或者第一设备里面的模块或者器件,其包括:
第一接收单元,用于接收第三设备发送的当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;所述当前数据流对应的第二数据流类型为所述第三设备根据目标对应关系和所述当前数据流的通用特征确定的,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
获取单元,用于在累计接收到了来自所述第三设备的M条数据流对应的校正数据的情况下,根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型;所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流;
第一发送单元,用于向所述第三设备发送第一模型数据,所述第一模型数据用于描述所述新的行为识别模型,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述行为识别模型用于根据输入的待预测数据流的报文信息确定所述待预测数据流的数据流类型;所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
在上述方法中,第一设备在累计一定数量的来自第三设备的校正数据的情况下,根据该一定数量的校正数据对该行为识别模型进行训练得到新的行为识别模型,并在训练出新 的行为识别模型向第三设备发送用于描述该新的行为识别模型的信息,以用于该第三设备对第三设备上的行为识别模型进行更新。以上方法无需第三设备进行模型训练,只需基于第一设备的模型训练结果直接得到新的行为识别模型即可,有利于第三设备充分利用计算资源来进行数据流类型的识别。
结合第六方面,在第六方面的第一种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
结合第六方面,或者第六方面的上述任一种可能的实现方式,在第六方面的第二种可能的实现方式中,所述当前数据流对应的校正数据是所述第三设备在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型为不同的数据流类型的情况下发送的,所述当前数据流对应的第一数据流类型是所述第三设备根据当前数据流的报文信息、特征信息、所述行为识别模型、内容识别模型确定的;所述特征信息包括目的地址和协议类型中的一项或者多项,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的,所述历史数据流的数据流类型是根据所述行为识别模型得到的。
结合第六方面,或者第六方面的上述任一种可能的实现方式,在第六方面的第三种可能的实现方式种,在根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述获取单元具体用于:
根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型,其中:
所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
结合第六方面,或者第六方面的上述任一种可能的实现方式,在第六方面的第四种可能的实现方式中,若所所述Y条数据流与所述M条数据流来自至少两个不同的网络;在根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述获取单元具体用于:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流对所述行为识别模型进行训练得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的泛化性更好,预测准确度更高。
结合第六方面,或者第六方面的上述任一种可能的实现方式,在第六方面的第五种可能的实现方式中,还包括:
第二接收单元,用于接收所述第三设备发送的当前数据流的特征信息和第二数据流类型的信息;
生成单元,用于根据所述当前数据流的特征信息和第二数据流类型对所述内容识别模型进行训练得到新的内容识别模型;
第二发送单元,用于向所述第三设备发送第二模型数据,所述第二模型数据用于描述所述新的内容识别模型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述内容识别模型用于根据输入的待预测的数据流的特征信息估计所述待预测数据流的数据流类型,其中,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述特征信息包括目的地址和协议类型中的一项或多项。
在上述方法中,在第三设备通过已训练出的内容识别模型识别数据流类型的过程中,如果发现模型准确度不高,则触发第一设备结合相关数据重新训练该内容识别模型,并在训练出新的内容识别模型后对第三设备上的内容识别模型进行更新,这种迭代更新的内容识别模型的方式能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
第七方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在处理器上运行时,实现第一方面或者第一方面的任意可能的实现方式所描述的方法。
第八方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在处理器上运行时,实现第二方面或者第二方面的任意可能的实现方式所描述的方法。
第九方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品存储在存储器上,当所述计算机程序产品在处理器上运行时,实现第一方面或者第一方面的任意可能的实现方式所描述的方法。
第十方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品存储在存储器上,当所述计算机程序产品在处理器上运行时,实现第二方面或者第二方面的任意可能的实现方式所描述的方法。
第十一方面,本申请实施例提供了一种数据流类型识别系统,包括第三设备和第一设备:
所述第三设备为上述第三方面所描述的数据流类型识别模型更新设备,或者第三方面的任一可能的实现方式所描述的数据流类型识别模型更新设备,或者第五方面所描述的数据流类型识别模型更新装置,或者第五方面的任一可能的实现方式所描述的数据流类型识别模型更新装置;
所述第一设备为上述第四方面所描述的数据流类型识别模型更新设备,或者第四方面的任一可能的实现方式所描述的数据流类型识别模型更新设备,或者第六方面所描述的数据流类型识别模型更新装置,或者第六方面的任一可能的实现方式所描述的数据流类型识 别模型更新装置。
在本申请实施例中,根据行为识别模型识别出第一数据流类型,以及根据预设的关于通用特征的对应关系识别出第二数据流类型之后,如果第一数据流类型和第二数据流类型不同则生成用于更新行为识别模型的校正数据,也就是训练样本;一方面,校正数据是设备在当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下自动获取的,无需人为打标签,因此得到的用于训练行为识别模型的样本数据的效率更高。另一方面,该校正数据是在行为识别模型识别结果不准确的情况下生成的报文信息和准确的数据流类型,因此后续基于该校正数据进行行为识别模型的更新,可以获得识别效果更准确的行为识别模型。
除此之外,可以具体根据内容识别模型和行为识别模型得到当前数据流对应的第一数据流类型,然后对第一数据流类型进行校正得到当前数据流最终的数据流类型。其中,行为识别模型为由多个数据流样本的报文信息和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到;因此通过内容识别模型和行为识别模型对特征信息、报文信息等进行分析可以更准确地预测出当前数据流对应的第一数据流类型。并且由于训练内容识别模型时用到的数据流样本中的数据流类型是通过行为识别模型识别得到的,无需大量地去采集训练所需要的数据,解决了数据完整性不足的问题。
另外,在第三设备通过已训练出的行为识别模型识别数据流类型的过程中,如果通过地址校正模型(包含基于通用特征的对应关系)发现识别结果出现偏差,则在累计多次出现偏差的情况下第一设备或者第三设备结合导致偏差的相关数据重新训练该行为识别模型,并在训练出新的行为识别模型后对第三设备上的行为识别模型进行更新,这种迭代更新的行为识别模型能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
另外,在第三设备通过已训练出的内容识别模型识别数据流类型的过程中,如果讲综合置信度与预设的更新阈值θ2比较发现需要更新,则在累计多次出现需要更新的情况下第二设备或者第三设备结合导致综合置信度低于θ2的相关数据重新训练该内容识别模型,并在训练出新的内容识别模型后对第三设备上的内容识别模型进行更新,这种迭代更新的内容识别模型能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
附图说明
以下对本发明实施例用到的附图进行介绍。
图1是本发明实施例提供的一种数据流类型的识别系统的架构示意图;
图2A是本发明实施例提供的内容识别模型和行为识别模型的场景示意图;
图2B是本发明实施例提供的一种分类模型的结构示意图;
图2C是本发明实施例提供的一种分类模型的结构示意图;
图2D是本发明实施例提供的一种分类模型的结构示意图;
图3是本发明实施例提供的一种数据流类型识别方法的流程示意图;
图4是本发明实施例提供的数据流a和数据流b的场景示意图;
图5是本发明实施例提供的数据流记录的样例示意图;
图6是本发明实施例提供的数据流记录的样例示意图;
图7是本发明实施例提供的一种数据流类型识别模型更新装置的结构示意图;
图8是本发明实施例提供的又一种数据流类型识别模型更新装置的结构示意图;
图9是本发明实施例提供的一种数据流类型识别模型更新设备的结构示意图;
图10是本发明实施例提供的又一种数据流类型识别模型更新设备的结构示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。
请参见图1,图1是本发明实施例提供的一种数据流类型的识别系统的架构示意图,该系统包括第一设备101、第二设备102、第三设备103和终端104,这些设备可以通过有线或者无线的方式进行连接,其中:
终端104上用于运行各种应用,例如,视频会议应用、语音会议应用、桌面云应用等等,不同的应用所产生的数据流的数据流类型(也称应用类型)往往不同。本申请实施例中,终端104所产生的数据流需要先通过第三设备103发往目的设备,其中,第三设备103可以包括路由器、交换机等转发设备,第三设备103的数量可以是一个也可以是多个,例如,存在一个路由器和三个交换机;再如,仅存在一个交换机;再如,存在三个交换机,等等。
其中,终端104所产生的数据流在该终端104上该如何发送,以及在第三设备103上该如何转发,都可以按照运维支持系统(operational system support,OSS)生成的流量控制策略来进行,例如,当流量控制策略规定视频会议应用所产生的数据流具有最高优先级时,假若在终端104,或者第三设备103上有包括视频会议应用所产生的数据流在内的多种数据流需要发送时,会优先传输视频会议应用所产生的数据流。需要说明的是,该流量控制策略由OSS101基于当前数据流的数据流类型来生成的。本申请实施例中,OSS101生成流量控制策略所用到的当前数据流的数据流类型可以通过第三设备103来确定的。
如图2A所示,第三设备103在确定当前数据流的数据流类型的时候需要用到行为识别模型、内容识别模型和地址校正模型,其中,模型的参数可以包括但不限于置信度权重向量(w1,w2)、数据流类型的第一预设阈值θ1、第二预设阈值θ2等,其中,第一预设阈值θ1也可以称为分类阈值,用于衡量是否将数据流类型划分到某一类;第二预设阈值θ2也称为模型更新阈值,用于衡量何时对内容识别模型进行更新。这些模型参数将在后续的方法流程中做更具体的阐述。其中,内容识别模型在对当前数据流进行数据流类型识别时的输入可以包括特征信息(如目的IP、目的端口、协议类型等),行为识别模型在对当前数据流进行数据流类型识别时的输入可以包括报文信息(如报文长度、报文传输速度、报文间隔时间、报文方向等)。第三设备103对内容识别模型得到的置信度和行为识别模型得到的置信度基于置信度权重向量(w1,w2)得到第一数据流类型,另外,第三设备103还会通过地址校正模型来预测当前数据流的第二数据流类型,如果能够预测得到当前数据流的第二数据流类型,则将该第二数据流类型作为该当前数据流最终的数据流类型;如果没有预 测到当前数据流的第二数据流类型,则将该第一数据流类型作为该当前数据流最终的数据流类型。
本申请实施例中,会在合适的时机对上述行为识别模型和内容识别模型进行更新,其中,针对行为识别模型,可以由第三设备103自己对相应的数据样本(或者训练数据)进行训练来更新该行为识别模型,也可以由第一设备101对相应的数据样本进行训练得到新的模型参数,然后发送给第三设备103以供第三设备103更新该行为识别模型;另外,更新行为识别模型的条件可以是一次或者多次出现数据流的第一数据流类型与数据流的第二数据流类型不一致。针对该内容识别模型,可以由第三设备103自己对相应的数据样本进行训练来更新内容识别模型,也可以由第二设备102对相应的数据样本进行训练得到新的模型参数,然后发送给第三设备103以供第三设备103更新该内容识别模型;另外,更新内容识别模型的条件可以是确定出当前数据流为第一数据流类型的置信度高于第一预设阈值θ1但是低于第二预设阈值θ2。
需要说明的是,上述内容识别模型实质上是一个分类模型,如图2B所示,分类模型可以是树模型,如图2C所示,分类模型还可以是神经网络模型,如图2D所示,分类模型还可以是支持向量机(support vector machine,SVM)模型,当分类模型还可以是其他形式的模型。可选的,由于上述内容识别模型是通过提取输入的向量(如目的互联网协议(internet protocol,IP)地址、目的端口号、协议类型等特征信息)的特征来分类得到的,因此该内容识别模型可以将同一目的IP地址,同一协议类型,相同端口号的数据流识别为相同的数据流类型;将同一网段,同一协议类型,相近端口号的数据流识别为相同的数据流类型;将目的端口号是20(文件传输协议(file transfer protocol,FTP)的知名端口号)的传输控制协议(transmission control protocol,TCP)流量识别为下载的数据流类型。
需要说明的是,图1所示的架构中的第一设备101可以是一个服务器或者服务器集群,可以部署在本地,或者远端,或者云端。上述第二设备102可以为一个服务器或者多个服务器组成的服务器集群。多个第二设备102通过网络与该第一设备101连接,一个第二设备102以及与该一个第二设备102连接的一个或者多个第三设备103属于一个网络,另一个第二设备102与该另一个第二设备102连接的一个或者多个第三设备103属于另一个网络。本申请实施例中,网络可以按照地理位置进行划分,例如,两个不同地理位置的网络属于两个不同的网络;网络可以按照网络形态进行划分,例如,蜂窝网络、Wi-Fi网络、有线网络分别属于三个不同的网络;再如,两个不同的局域网属于两个不同的网络;当然,还可以有其他的划分方式。通过这种部署方式,第一设备101在为任何一个第三设备103训练用于更新模型的参数时,可以用到其他网络中的第三设备103提供工的训练数据。
可选的,图1所示的架构可以看做是一个三层架构,其中:
第一设备101属于这三层中的最高一层,相比而言其具有最大的存储量、最强的计算能力,因此训练行为识别模型需要用到的庞大数据(如数据流类型和报文信息)存放在该第一设备101上面,例如存储在其中的行为知识库中;并且对该庞大数据的计算也由该第一设备101来完成。除此之外,因为第三设备103提交给第一设备101的数据流类型、报文信息等基本都是脱敏的,由第一设备101来处理也不会导致安全问题。
第二设备102属于这三层中的中间层,相比而言存储能力和计算能力适中,能够存储 一定量的特征信息(如IP地址、TCP协议等)和数据流类型,如存放在其中的地址知识库中。由于第二设备102在第三设备103所属的局域网内,所以IP地址等信息不用出局域网,其存放在第二设备102上没有安全和隐私的风险。另外,由于第二设备102与第三设备103比较近,内容识别模型的更新需求可以及时反馈到第二设备102,为内容识别模型的频繁更新提供了便利。
第三设备103属于这三层中的最低一层,其主要功能通常是进行数据报文的转发,因此内容识别模型、行为识别模型训练的操作,以及训练所要用到的样本数据的存储,可以都不在该第三设备103上进行。
在一种可能的实现方式中,也可以不采用多个网络联合部署的方式,这种情况下,第一设备101连接的第二设备102的数量可以为一个,也即是说,第一设备101在为任何一个第三设备103训练用于更新模型的参数时,只用到该第三设备103所属的网络中的第三设备103提供的训练数据,不使用其他网络中的设备提供的训练数据。针对这种情况,还可以对架构做进一步变换,例如,去掉第一设备101,并将前后文中提高的第一设备101执行的操作及相应的功能集成到第二设备102中,即原本第一设备101执行的操作及功能都在本地实现。
在又一种可能的实现方式中,以上提及的根据相应模型预测当前数据流的数据流类型的操作,以及更新模型的操作可以由第三设备103做,也可以是其他设备,如第二设备102、第三设备103、OSS等来做,当由这些设备来做的时候,后续方法实施例中的执行这些操作的主语更换为第二设备102、第三设备103、OSS,除此之外,可能还需要对部分技术表述做符合逻辑的简单推理,例如,原来由第三设备103执行更新时如果涉及接收第一设备101发送的第一信息,那么当更新的操作更换为部署在第一设备101来实现时,第一设备101直接用第一信息即可,不用其他设备向该第一设备101发送的第一信息。
请参见图3,图3是本发明实施例提供的一种数据流类型识别方法,该方法可以基于图1所示的架构来实现,该方法包括但不限于如下步骤:
步骤S301:第三设备根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,下面例举几种可选的方案。
可选方案一:第三设备确定所述当前数据流对应的第一数据流类型仅用到当前数据流的报文信息和行为识别模型,具体实现如下。
所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型;可选的,该多个数据流样本可以为离线样本,即该行为识别模型可以为离线训练得到的模型。该多个数据流样本还可以为预先挑选的比较典型(或说有代表性)的样本,例如,视频会议应用的数据流的报文大多时候的报文长度比较长,但是也会偶尔出现报文长度比较短的时候,相比而言,报文长度比较长更能反映当前数据流为视频会议应用的数据流,因此,在选择关于视频会议应用的数据流时,尽量选择报文长度比较长的这种有代表性的作为数据流样本。可选的,该多个数据流样本的数据流类型可以是人为确定的,即人工打标签。由于所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,因此该行为识别模型能够反映一个数据流中的报文信息与数据流类型之间的一些关系,因此, 当向该行为识别模型输入当前数据流的报文信息时,其能够一定程度上预测该当前数据流属于某个或者某些数据流类型的倾向(或者说概率),体现倾向(或者说概率)的参数也可以称为置信度。
本申请实施例中,报文信息可以包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,可选的,报文长度包括报文中以太帧长度、IP报文长度、传输协议报文长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。当然,报文信息除了包括这里举例的特征外,还可以包括其他特征,例如,报文长度、报文传输速度、报文间隔时间和报文方向中的最大值、最小值、均值、方差、分位数等。该报文信息可以是以向量的形式输入到行为识别模型中,例如可以是(报文长度,报文传输速度,报文间隔时间)这种形式。另外,本申请实施例中的数据流类型还可以称之为应用类型。
第一种可能情况中,数据流类型可能有N个,N大于或者等于1,本申请实施例可以估计(或者说预测)出当前数据流属于该N个数据流类型中每个类型的置信度,即得到所述当前数据流的对应于N个数据流类型的N个第一置信度,举例来说,假若该N个数据流类型指的是视频会议的数据流类型、语音会议的数据流类型、桌面云的数据流类型,那么,需要通过行为识别模型估计当前数据流属于视频会议的数据流类型的第一置信度、属于语音会议的数据流类型的第一置信度、属于桌面云的数据流类型的第一置信度。假若N个数据流类型指的是视频会议的数据流类型,则需要通过行为识别模型估计当前数据流属于视频会议的数据流类型的第一置信度。
当确定出当前数据流对应于一些数据流类型的第一置信度后,可以基于对应于这些数据流类型的第一置信度选择一个数据流类型作为当前数据流对应的第一数据流类型,例如,对应于哪个数据流类型的置信度最高,则将该数据流类型作为当前数据流对应的第一数据流类型。当然,还可以综合其他因素来做出选择。
可选方案二,第三设备确定所述当前数据流对应的第一数据流类型不仅要用到当前数据流的报文信息和行为识别模型,还要用到特征信息和内容识别模型,具体包括如下几个部分:
第一部分:第三设备根据当前数据流的报文信息和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度。
其中,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型;可选的,该多个数据流样本可以为离线样本,即该行为识别模型可以为离线训练得到的模型。该多个数据流样本还可以为预先挑选的比较典型(或说有代表性)的样本,例如,视频会议应用的数据流的报文大多时候的报文长度比较长,但是也会偶尔出现报文长度比较短的时候,相比而言,报文长度比较长更能反映当前数据流为视频会议应用的数据流,因此,在选择关于视频会议应用的数据流时,尽量选择报文长度比较长的这种有代表性的作为数据流样本。可选的,该多个数据流样本的数据流类型可以是人为确定的,即人工打标签。由于所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,因此该行为识别模型能够反映一个数据流中的报文信息与数据流类型之间的一些关系,因此,当向该行为识别模型输入当前数据流的报文信息时,其能够一定程度上预测该当前数 据流属于某个或者某些数据流类型的倾向(或者说概率),体现倾向(或者说概率)的参数也可以称为置信度。
本申请实施例中,报文信息可以包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,可选的,报文长度包括报文中以太帧长度、IP报文长度、传输协议报文长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。当然,报文信息除了包括这里举例的特征外,还可以包括其他特征,例如,报文长度、报文传输速度、报文间隔时间和报文方向中的最大值、最小值、均值、方差、分位数等。该报文信息可以是以向量的形式输入到行为识别模型中,例如可以是(报文长度,报文传输速度,报文间隔时间)这种形式。另外,本申请实施例中的数据流类型还可以称之为应用类型。
第一种可能情况中,数据流类型可能有N个,N大于或者等于1,本申请实施例可以估计(或者说预测)出当前数据流属于该N个数据流类型中每个类型的置信度,即得到所述当前数据流的对应于N个数据流类型的N个第一置信度,举例来说,假若该N个数据流类型指的是视频会议的数据流类型、语音会议的数据流类型、桌面云的数据流类型,那么,需要通过行为识别模型估计当前数据流属于视频会议的数据流类型的第一置信度、属于语音会议的数据流类型的第一置信度、属于桌面云的数据流类型的第一置信度。假若N个数据流类型指的是视频会议的数据流类型,则需要通过行为识别模型估计当前数据流属于视频会议的数据流类型的第一置信度。
第二种可能情况中,数据流类型可能有多个,但本申请实施例重点关注其中一个数据流类型,因此本申请实施例只估计(或者说预测)当前数据流属于该重点关注的数据流类型的置信度,即得到所述当前数据流的对应于一个数据流类型的一个第一置信度,举例来说,假若该多个数据流类型指的是视频会议的数据流类型、语音会议的数据流类型、桌面云的数据流类型,但本申请实施例只关注视频会议的数据流类型,因此只需通过行为识别模型估计当前数据流属于视频会议的数据流类型的第一置信度即可。
第二部分:第三设备根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度。
具体地,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型。可选的,该一条或者多条历史数据流可以为在线数据流,即在此之前的一段时间内持续产生的一条或者多条数据流,所述历史数据流的数据流类型是由上述行为识别模型识别得到的,即该内容识别模型可以为在线训练得到的模型。由于所述内容识别模型为根据一条或者多条历史数据流的特征信息和数据流类型得到的模型,因此该内容识别模型能够反映一个数据流中的特征信息与数据流类型之间的一些关系,因此,当向该内容识别模型输入当前数据流的特征信息时,其能够一定程度上预测该当前数据流属于某个或者某些数据流类型的倾向(或者说概率),体现倾向(或者说概率)的参数也可以称为置信度。
本申请实施例中,所述特征信息可以包括目的地址、协议类型、端口号中的一项或者多项,其中,目的地址可以为IP地址,也可以为目的MAC地址,还可以为其他形式的地址;当然,特征信息除了包括这里举例的特征外,还可以包括其他特征。进一步地,这里的特征信息可以是针对目标的信息,例如,目标IP,目标端口等。该特征信息可以是以向 量的形式输入到内容识别模型中,可以是(ip,port,protocol)这种形式,例如可以为(10.29.74.5,8443,6)。也可以是(mac,port,protocol)这种形式,例如可以为(05FA1525EEFF,8443,6)。当然还可以是其他形式,此处不再一一举例。
第一种可能情况中,数据流类型可能有N个,本申请实施例可以估计(或者说预测)出当前数据流属于该N个数据流类型中每个类型的置信度,即得到所述当前数据流的对应于N个数据流类型的N个第二置信度,举例来说,假若该N个数据流类型指的是视频会议的数据流类型、语音会议的数据流类型、桌面云的数据流类型,那么,需要通过内容识别模型估计当前数据流属于视频会议的数据流类型的第二置信度、属于语音会议的数据流类型的第二置信度、属于桌面云的数据流类型的第二置信度。假若N个数据流类型指的是视频会议的数据流类型,则需要通过内容识别模型估计当前数据流属于视频会议的数据流类型的第二置信度。
第二种可能情况中,数据流类型可能有多个,但本申请实施例重点关注其中一个数据流类型,因此本申请实施例只估计(或者说预测)当前数据流属于该重点关注的数据流类型的置信度,即得到所述当前数据流的对应于一个数据流类型的一个第二置信度,举例来说,假若该多个数据流类型指的是视频会议的数据流类型、语音会议的数据流类型、桌面云的数据流类型,但本申请实施例只关注视频会议的数据流类型,因此只需通过内容识别模型估计当前数据流属于视频会议的数据流类型的第二置信度即可。
第三部分:第三设备根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型。
具体地,因为至少一个第一置信度能够在一定程度上表征当前数据流的数据类型倾向,至少一个第二置信度也可以在一定程度上表征当前数据流的数据流类型倾向,因此对两者进行综合考虑可以获得更准确可信的数据流类型倾向,从而得出当前数据流的数据流类型,采用这种方式确定处的数据流类型称为第一数据流类型以方便后续描述。
一种可选的方案中,得到的哪种数据流类型的综合置信度最大,则将该数据流类型确定为当前数据流的数据流类型,即确定为当前数据流对应的第一数据流类型,例如,根据视频会议的数据流类型的第一置信度和视频会议的数据流类型的第二置信度确定出的视频会议的数据流类型的综合置信度为0.7;根据语音会议的数据流类型的第一置信度和语音会议的数据流类型的第二置信度确定出的语音会议的数据流类型的综合置信度为0.2;根据桌面云的数据流类型的第一置信度和桌面云的数据流类型的第二置信度确定出的桌面云的数据流类型的综合置信度为0.1;由于视频会议的数据流类型的综合置信度最大,因此预估(也表述为“预测”)的当前数据流的数据流类型为视频会议的数据流类型,即视频会议的数据流类型为当前数据流对应的第一数据流类型。
又一种可选的方案中,所述根据至少一个第一置信度和至少一个第二置信度确定所述当前数据流对应的第一数据流类型,可以具体为:根据对应于目标数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述目标数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述目标数据流类型的综合置信度,所述目标数据流类型为所述至少一个数据流类型中的任意一个,也即是说,该至少一个数据流类型中的每个数据流类型均满足这里的目标数据流类型的特征。若对应于所述目标数据流类型的所述综合 置信度大于第一预设阈值,则确定所述当前数据流的数据流类型为所述目标数据流类型,此时目标数据流类型就可以作为当前数据流对应的第一数据流类型,例如,假若对应于视频会议的数据流类型的所述综合置信度大于第一预设阈值,则确定视频会议的数据流类型为所述当前数据流对应的第一数据流类型;假若对应于桌面云的数据流类型的所述综合置信度大于第一预设阈值,则确定桌面云的数据流类型为所述当前数据流对应的第一数据流类型。
举例来说,假若置信度权重向量(w1,w2)为(0.4,0.6),即第一置信度权重(也可以认为是行为识别模型的权重)为0.6,第二置信度权重(也可以认为是内容识别模型的权重)为0.4、数据流类型的第一预设阈值θ1等于0.5。在第三设备进行数据流类型识别之初,由于内容识别模型未得到充分训练,因此行为识别模型能够对输入的数据流进行数据流类型的识别,而内容识别模型无法对输入的流数据进行数据流类型的识别,因此起初内容识别模型识别的对应于任何数据流类型的置信度均为0。
假若存在两条桌面云的数据流,如图4所示,横轴代表数据流序号,纵轴代表报文长度,报文长度大于0为上行报文,报文长度小于0为下行报文;数据流a和数据流b均为桌面云的数据流类型,其中,数据流b有上行的报文,相对来说更能代表桌面云场景的特性,因此认为数据流b的报文行为是典型的行为。数据流a在很长一段时间内没有上行报文,无法明显表明其为桌面云场景,因此认为数据流a的报文行为是不典型的行为;行为识别模型通常是通过对典型行为的数据流进行训练得到,因此行为识别模型能够识别出数据流b的数据流类型,但无法识别出数据流a的数据流类型。数据流a和数据流b的特征信息如下。
数据流a的协议类型是TCP,目的IP地址是10.129.74.5,目的端口号是8443。
数据流b的协议类型是TCP,目的IP地址是10.129.56.39,目的端口号是443。
那么针对数据流a,内容识别模型识别是桌面云的数据流类型、语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文信息识别是桌面云的数据流类型的第一置信度是0.5、识别是语音会议的数据流类型的第一置信度均是0、识别是视频会议的数据流类型的第一置信度是0。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:0*0.4+0.5*0.6=0.3,小于θ1,因此当前数据流不是桌面云的数据流类型,即此时计算得到的当前数据流a对应的第一数据流类型为桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是视频会议的数据流类型。
那么针对数据流b,内容识别模型识别是桌面云的数据流类型、语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文信息识别是桌面云的数据流类型的第一置信度是0.9、识别是语音会议的数据流类型的第一置信度均是0、识别是视频会议的数据流类型的第一置信度是0。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:0*0.4+0.9*0.6=0.54,大于θ1,因此当前数据流是桌面云的数据流类型,即此时计算得到的当前数据流b对应的第一数据流类型为桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是视频会议的数据流类型。
步骤S302:第三设备根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,所述目标对应关系为多个通用特征与多个数据流类型的对应关系。
具体地,如果本领域技术人员在发现数据流中存在这某些特征之后基本上可以准确确定出该数据流的数据流类型,那么该某些特征即为此处的通用特征。举例来说,该通用特征可以为知名端口号、或者知名DNS等等。以知名端口号为例,20端口是FTP端口,FTP端口通常是用来作下载(Download)的,因此对应的数据流类型为数据下载的数据流类型,这个例子中可以建立20端口与数据下载的数据流类型之间的对应关系;以知名DNS为例,域名www.163.com的DNS地址为183.131.119.86,而域名www.163.com又是公知的网页类网站,其数据流类型为网页的数据流类型,因此可以建立地址183.131.119.86与网页的数据流类型之间的对应关系。按照这里举的两个例子,目标对应关系可以如表1所示:
表1
Figure PCTCN2020119665-appb-000001
参见表1,如果当前数据流中包含“端口20”,那么该第三设备根据目标对应关系和当前数据流中的通用特征“端口20”就可以确定出对应的“数据下载的数据流类型”,因此“数据下载的数据流类型”即为确定出的该当前数据流对应的第二数据流类型。
可选的,该第三设备中可以存在一个地址校正库(当然也可以命名为其他名称),上述目标对应关系中的信息就可以存放在该地址校正库中,以供该第三设备在确定当前数据流对应的第二数据流类型时使用。另外,上述目标对应关系中的内容可以是通过机器自己识别得到,也可以是人工添加;该目标对应关系中的内容还可以根据需要在适当的时候进行更新。
需要说明的是,上述目标对应关系中每个通用特征都会对应唯一一个数据流类型,因此,可能出现多个通用特征对应的数据流类型相同(即多对一)的情况,也可能出现多个通用特征分别对应不同的数据流类型(即一对一)的情况。
需要说明的是,并非每个数据流中都包含上述通用特征,因此并非针对所有的数据流都可以获得其第二数据流类型。
步骤S303:第三设备向运维支持系统OSS发送当前数据流的最终数据流类型。
具体地,当识别出了当前数据流对应的第二数据流类型时,当前数据流对应的第二数据流类型作为当前数据流的最终数据流类型,当没有识别出当前数据流对应的第二数据流类型时,当前数据流对应的第一数据流作为当前数据流的最终数据流类型。可选的,也可以直接将当前数据流对应的第一数据流类型作为当前数据流的最终数据流类型。
另外,第三设备可以在每次确定出当前数据流的最终数据流类型时,向OSS发送当前 数据流的最终数据流类型,例如,第一次生成数据流a的最终数据流类型时,向OSS发送数据流a的最终数据流类型,第一次生成数据流b的最终数据流类型时,向OSS发送数据流b的最终数据流类型,第二次生成数据流a的最终数据流类型时,向OSS发送数据流a的最终数据流类型,第二次生成数据流b的最终数据流类型时,向OSS发送数据流b的最终数据流类型;生成数据流c的最终数据流类型时,向OSS发送数据流c的最终数据流类型。可以理解的是,假若第三设备为该OSS,则无需执行向OSS发送当前数据流的最终数据流类型的操作。
步骤S304:OSS根据当前数据流的最终数据流类型生成针对当前数据流的流量控制策略。例如,如果当前数据流的最终数据流类型表明该当前数据流为视频桌面云的数据流类型,或者视频会议的数据流类型,则将当前数据流定义为高优先级的QoS。
步骤S305:OSS向第三设备或者终端发送所述流量控制策略。
具体地,假若第三设备或者终端根据流量控制策略获知当前数据流属于高优先级的QoS,则在发现有多种数据流待发送时,优先发送被配置为高优先级的当前数据流。
本申请实施例中,可以对上述行为识别模型和内容识别模型进行更新,更新后的行为模型和更新后的内容识别模型用于第三设备或者其他设备对新的数据流进行数据流类型的识别。
更新行为识别模型的过程请参见步骤S306:
步骤S306:第三设备根据所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型更新所述行为识别模型,下面例举两种不同的更新方案。
方案一:
首先,若当前数据流对应的第一数据流类型和当前数据流对应的第二数据流类型为不同的数据流类型,则第三设备向第一设备发送当前数据流对应的校正数据,其中,当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型。
相应的,第一设备接收第三设备发送的当前数据流对应的校正数据,该第一设备判断是否已累计接收到了来自第三设备的M条数据流对应的校正数据,若累计接收到了M条数据流对应的校正数据,则根据所述M条数据流对应的校正数据对行为识别模型进行训练得到新的行为识别模型。
本申请实施例中,所述M条数据流为从所述行为识别模型生效到截止当前的累计量(如果模型发生了更新,则需要重新开始累计)。或者所述M条数据流为在预设时间段内(如近24小时内)的累计量。或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值,例如,预先配置该预设阈值为10%,如果行为识别模型生效后传输的数据流表示为S,那么M=S*10%,例如如果S=10000,那么M=100,如果S=89500,那么M=8950。举例来说,假若行为识别模型生效的时间为2019年3月1日00:00:00,从该时间截止到当前总共已有的训练数据为10000条,上浮比例(即预设阈值)为10%,那么从该生效时间算起,如果累计新增了1000条数据流对应的校正数据,则要训练得到新的行为识别模型。本申请实施例中,M条数据流包括所述当前数据流。
其中,这M条数据流对应的校正数据的表达形式有很多,以该M条数据流中的数据流A为例进行说明,其对应的校正数据可以表示为{数据流A对应的第二数据流类型,s条<报文方向,报文长度,报文时间>三元组},其中,s为正整数;例如,数据流A对应的第二数据流类型为云桌面的数据流类型,且对应3个<报文方向,报文长度,报文时间>三元组,那么可以表示为{云桌面的数据流类型,3条<报文方向,报文长度,报文时间>三元组},请参见表2,表2更详细地示意了该校正数据的一种可能情况。
表2
Figure PCTCN2020119665-appb-000002
可选的,在表2中,“来源”指一条数据流的行为数据来自哪个区域,可以填写相应网络的名称;或者填写其来源的第三设备的名称,接收到校正数据的第一设备再根据网络拓扑,将第三设备的名称映射到相应网络。“报文方向”可以用值表示,0表示下行,1表示上行;或者直接用“上行”“下行”来表示。“报文时间”可以是时间戳;也可以是相对时间,即每条数据流的第一个记录为0,这条流的后续记录是距离第一个记录的时间。
本申请实施例中,第三设备使用的行为识别模型在该第一设备中也存在,因此该第一设备将这M条数据流对应的校正数据作为训练数据对该行为识别模型进行训练得到新的行为识别模型;也可能,该第一设备将这M条数数据流对应的校正数据作为训练数据、结合历史中存储的批量训练数据来进行训练,得到新的行为识别模型。可选的,训练过程中可以采用最小损失函数(loss),将数据流的校正数据中的报文信息<报文方向、长度、时间>输入到行为识别模型中,使得行为识别模型输出的数据流类型尽可能地趋近于该校正数据中的数据流类型。针对M条数据流对应的校正函数均执行了该训练之后,就可以得到训练好的新的行为识别模型了。第一设备训练得到新的行为识别模型之后,向该第三设备发送第一模型数据,所述第一模型数据是新的行为识别模型的模型数据,用于描述由所述第一设备根据所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型对所述行为识别模型进行训练得到的新的行为识别模型。行为识别模型包括模型结构(如模型的函数形式)和模型参数。该第一模型数据至少存在如下几种情况。情况1,第一模型数据为该新的行为识别模型的模型参数的参数值。情况2,第一模型数据为该新的行为识别模型相对于训练前的行为识别模型的差异数据,通常为该新的行为识别模型的模型参数的参数值与训练前的行为识别模型的模型参数的参数值的差异值。可选的,对于情况1,该第一模型数据具体可以是由多个参数的参数值组成的矩阵,对于情况2,该第一模型数据(即该差异数据)具体可以是由多个参数值的差值组成的矩阵。如,训练前的行为识别模型的模型参数的参数值组成的矩阵为[a1、b1、c1、d1],新的行为识别模型的模型参数的参数值组成的矩阵为[a2、b2、c2、d2],则这4个模型参数的差值组成的矩阵为[a2-a1、b2-b1、c2-c1、d2-d1]。相应地,对于情况1,该第一模型数据为[a1、b1、c1、d1],对于情况2,该第一模型数据(即该差异数据)为[a2-a1、b2-b1、c2-c1、d2-d1]。情况3,第一模型数据,包括 该新的行为识别模型的模型结构和模型参数的参数值,即该新的行为识别模型的完整数据。
该第一模型数据具体可以是模型文件。常见开源keras库的ai模型文件是h5文件/json文件;开源sklearn库的ai模型文件是pkl/m文件。这些文件都是二进制的,用于保存模型结构和/或模型参数的参数值。其中,h5文件用于描述模型参数的参数值,jason文件用于描述模型结构。在具体实现时,情况1和情况2的第一模型数据均可以为h5文件,情况3的第一模型数据可以包括h5文件和jason文件。
接着,该第三设备接收所述第一设备发送的第一模型数据,并根据接收的第一模型数据更新行为识别模型。如果该第一模型数据为新的行为识别模型的完整数据,那么该第三设备可以直接加载该第一模型数据生成该新的行为识别模型,以替换当前的行为识别模型,后续就使用该新的行为识别模型进行数据流类型识别。如果该第一模型数据为新的行为识别模型的模型参数的参数值,则将该参数值代入到当前的行为识别模型中以代替旧的参数值,从而得到新的行为识别模型。如果该第一模型数据为上述差异数据,则根据差异数据和当前的行为识别模型的模型参数的参数值得到新的参数值,从而得到新的行为识别模型。
可选的,根据所述M条数据流对应的校正数据对行为识别模型进行训练得到新的行为识别模型可以具体为:仅根据所述M条数据流对应的校正数据对行为识别模型进行训练得到新的行为识别模型,也可以具体为根据该M条数据流对应的校正数据和其他网络中的Y条数据流对应的校正数据对行为识别模型进行训练得到新的行为识别模型,也即是说,训练得新的行为识别模型用到的校正数据来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可选的,若训练得到新的行为识别模型用到的矫正数据来自至少两个不同的网络,可以称所述M条数据流所属的网络为第一网络,称该Y条数据流来自所述第一网络以外的第二网络;第一设备根据所述M条数据流对应的校正数据对行为识别模型进行训练得到新的行为识别模型,可以具体为:
首先,第一设备根据所述第二网络与所述第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息,这主要是因为特征信息或者报文信息本来相同的两个数据流,因为他们所来自的网络的网络配置不同而导致最终反映出的特征信息或者报文信息不同,通过修正主要是为了让来自第二网络的数据流的特征信息或报文信息与来自第一网络的数据流的特征信息或者报文信息的衡量标准一致(或者说统一到同一特征空间),使其更具有可比性,最终是有利于提高训练出的行为识别模型的准确性。
例如,采用数据映射的方式,根据第一网络中的数据流的MTU,改写第二网络中的数据流的报文长度,比如第一网络中的设备发送的数据流的MTU是1500,第二网络中的设备发送的数据流的MTU是1452,第二网络中的设备发送的一条数据流的信息如表3所示,参照第一网络中的设备发送的数据流的MTU对表3中的信息做数据映射处理后如表4所示。
表3
Figure PCTCN2020119665-appb-000003
表4
Figure PCTCN2020119665-appb-000004
例如,采用数据映射的方式,根据第一网络中的数据流的报文时间分布情况,改写第二网络中的数据流的报文时间,比如第二网络中的设备发送的数据流比第一网络中的设备发送的数据流的报文时间均值大10%,第二网络中的设备发送的一条数据流的信息如表5所示,参照第一网络中的设备发送的数据流的MTU对表5中的信息做数据映射处理后如表6所示。
表5
Figure PCTCN2020119665-appb-000005
表6
Figure PCTCN2020119665-appb-000006
然后,报文信息修正完成之后,根据所述M条数据流的报文信息、所述Y条数据流的 修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流对行为识别模型进行训练得到新的行为识别模型。
举例来说,假若M=10(累计新增的校正数据数量达到10时需要启动对行为识别模型的更新),网络数据流d的协议类型为UDP、IP地址是1.2.3.4、端口号是10050,端口号10050属于通用特征,其在本领域传输的数据流基本都是语音会议的数据类型,且10050这个通用特征与语音会议数据流类型之间的对应关系已添加到了上述目标对应关系中,即保存在了地址校正库里面。置信度权重向量(w1,w2)、第一预设阈值θ1、第二预设阈值θ2依旧不变。
那么针对数据流d,内容识别模型识别是桌面云的数据流类型的第二置信度为1,识别是语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文信息识别是桌面云的数据流类型的第一置信度是0.9、识别是语音会议的数据流类型的第一置信度均是0、识别是视频会议的数据流类型的第一置信度是0。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:1*0.4+0.9*0.6=0.94,大于θ1,因此当前数据流d是桌面云的数据流类型,即当前数据流d对应的第一数据流类型为桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流d不是语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流d不是视频会议的数据流类型。
由于识别是桌面云的数据流类型的综合置信度0.94不在区间(θ1,θ2)内,因此不需要对内容识别模型进行更新(更新原理后面有介绍,此处不展开说明)。
由于根据上述目标对应关系和当前数据流d中的通用特征确定当前数据流对应的第二数据流为语音会议的数据流类型,不同于当前数据流d对应的第一数据流类型(即云桌面的数据流类型),因此第三设备需要向第一设备发送当前数据流d的校正数据。
第一设备接收到第三设备发送的当前数据流d的校正数据后,假若刚好累计了来自第三设备的10条新增的校正数据,那么该第一设备结合这10条新增的校正数据对行为识别模型进行训练,得到新的行为识别模型。
假若更新后的行为识别模型识别当前数据流d为语音会议的数据流类型的第一置信度为0.9,那么当再次出现当前数据流d时,对当前数据流d的识别过程如下:
内容识别模型识别是桌面云的数据流类型的第二置信度为1,识别是语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文信息识别是桌面云的数据流类型的第一置信度是0.1、识别是语音会议的数据流类型的第一置信度均是0.9、识别是视频会议的数据流类型的第一置信度是0。对应于三种数据流类型的综合置信度如下。
桌面云:1*0.4+0.1*0.6=0.46,小于θ1,因此当前数据流d不是桌面云的数据流类型。
语音会议:0*0.4+0.9*0.6=0.54,大于θ1,因此当前数据流d是语音会议的数据流类型,即当前数据流d对应的第一数据流类型为语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流d不是视频会议的数据流类型。
由于识别是桌面云的数据流类型的综合置信度0.54在区间(θ1,θ2)内,因此需要对内容识别模型进行更新(更新原理后面有介绍,此处不展开说明)。
由于根据上述目标对应关系和当前数据流d中的通用特征确定当前数据流对应的第二数据流为语音会议的数据流类型,与确定出的当前数据流d对应的第一数据流类型(即云桌面的数据流类型)相同,因此第三设备无需向第一设备发送用于重新训练行为识别模型的信息。
方案二:
若所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型为不同的数据流类型,则根据所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型对所述行为识别模型,以得到新的行为识别模型。可选的,可以具体是:若所述当前数据流的第二数据流类型与所述当前数据流的第一数据流类型为不同的数据流类型,且当前已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,则根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,其中,是M条数据流包括所述当前数据流,M为预设的参考阈值。如何根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型前面方案一中已经有介绍,此处不再赘述。
更新内容识别模型的过程请参见步骤S307:
步骤S307:第三设备根据所述当前数据流对应的第一数据流类型的综合置信度和所述当前数据流对应的第二数据流类型更新所述内容识别模型,下面例举两种不同的更新方案。
方案一:
若对应于目标数据流类型的所述综合置信度大于第一预设阈值θ1且小于第二预设阈值θ2,则第三设备向第二设备发送所述当前数据流的特征信息和第二数据流类型,目标数据流类型为所述至少一个数据流类型中的某一个,若存在目标数据流类型的置信度大于第一预设阈值θ1,则认定该目标数据流类型为当前数据流对应的第一数据流类型,所述第二预设阈值大于所述第一预设阈值。例如,针对数据流a来说,对应于桌面云的数据流类型的综合置信度0.3不在区间(θ1,θ2)内,因此无需向第二设备发送所述当前数据流的特征信息等信息;再如,针对数据流b来说,对应于桌面云的数据流类型的综合置信度0.54在区间(θ1,θ2)内,因此需要向第二设备发送所述当前数据流的特征信息(如目的IP地址10.129.56.39、目的端口号443、协议类TCP型)和桌面云的数据流类型(当没有识别出第二数据流类型时就是第一数据流类型,这里刚好就是桌面云的数据流类型;当有识别出第二数据流类型时就是第二数据流类型)的信息(如名称、标识等信息)。
相应的,第二设备接收第三设备发送的所述当前数据流的特征信息和所述第二数据流类型(或者第一数据流类型,当没有识别出第二数据流类型时发送的就是第一数据流类型),即该第二设备上就多了一条数据流记录,如图5,多了一条针对数据流b的记录。然后,该第二设备根据当前数据流的特征信息和所述第二数据流类型对内容识别模型进行训练得到新的内容识别模型,
第二设备训练得到新的内容识别模型之后,向该第三设备发送第二模型数据,所述第二模型数据是新的内容识别模型的模型数据,用于描述由所述第二设备根据所述当前数据 流的特征信息和所述当前数据流对应的第二数据流类型对所述内容识别模型进行训练得到的新的内容识别模型。内容识别模型包括模型结构(如模型的函数形式)和模型参数。该第二模型数据至少存在如下几种情况。情况1,第二模型数据为该新的内容识别模型的模型参数的参数值。情况2,第二模型数据为该新的内容识别模型相对于训练前的内容识别模型的差异数据,通常为该新的内容识别模型的模型参数的参数值与训练前的内容识别模型的模型参数的参数值的差异值。可选的,对于情况1,该第二模型数据具体可以是由多个参数的参数值组成的矩阵,对于情况2,该第二模型数据(即该差异数据)具体可以是由多个参数值的差值组成的矩阵。如,训练前的内容识别模型的模型参数的参数值组成的矩阵为[e1、f1、g1、h1],新的内容识别模型的模型参数的参数值组成的矩阵为[e2、f2、g2、h2],则这4个模型参数的差值组成的矩阵为[e2-e1、f2-f1、g2-g1、h2-h1]。相应地,对于情况1,该第二模型数据为[e1、f1、g1、h1],对于情况2,该第二模型数据(即该差异数据)为[e2-e1、f2-f1、g2-g1、h2-h1]。情况3,第二模型数据,包括该新的内容识别模型的模型结构和模型参数的参数值,即该新的内容识别模型的完整数据。
该第二模型数据具体可以是模型文件。常见开源keras库的ai模型文件是h5文件/json文件;开源sklearn库的ai模型文件是pkl/m文件。这些文件都是二进制的,用于保存模型结构和/或模型参数的参数值。其中,h5文件用于描述模型参数的参数值,jason文件用于描述模型结构。在具体实现时,情况1和情况2的第二模型数据均可以为h5文件,情况3的第二模型数据可以包括h5文件和jason文件。
接着,第三设备接收第二设备发送的第二模型数据,并根据接收的第二模型数据更新内容识别模型。如果该第二模型数据为新的内容识别模型的完整数据,则该第三设备就可以直接加载该第二模型数据生成该新的内容识别模型,以替换当前的内容识别模型,后续就使用该新的内容识别模型进行数据流类型的识别。如果该第二模型数据为新的内容识别模型的模型参数的参数值,则将该参数值代入到当前的内容识别模型中以代替旧的参数值,从而得到新的内容识别模型。如果该第二模型数据为所述新的内容识别模型中模型参数的参数值与更新前的内容识别模型中模型参数的参数值的差值,则根据该差值和更新前的内容识别模型的模型参数的参数值得到新的参数值,从而得到新的内容识别模型。
由于数据流b的记录中,目的IP地址为10.129.56.39、目的端口号为443、协议类型为TCP共同对应的是桌面云的数据流类型,因此,更新后的内容识别模型再对输入的数据流b进行估计时,估计出的对应于桌面云的数据流类型的第二置信度为1。可选的,由于数据流a的特征信息与数据流b的特征信息有相似之处,例如,目的IP地址在同一网段,端口号相似,协议类型相似,因此,更新后的内容识别模型再对输入的数据流a进行估计时,估计的结果会更接近对数据流b的估计结果,例如,估计出的对应于桌面云的数据流类型的第二置信度可能为0.6。
后续网络中再出现数据流a和数据流b需要估计数据流类型时,执行流程如下。
置信度权重向量(w1,w2)、第一预设阈值θ1、第二预设阈值θ2依旧不变。
那么针对数据流a,内容识别模型识别是桌面云的数据流类型的第二置信度为0.6,识别是语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文信息识别是桌面云的数据流类型的第一置信度是0.5、识别是语音会议的数据流类 型的第一置信度均是0、识别是视频会议的数据流类型的第一置信度是0。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:0.6*0.4+0.5*0.6=0.54,大于θ1,因此当前数据流是桌面云的数据流类型,即更新内容识别模型之后得到的当前数据流a对应的第一数据流类型为桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是视频会议的数据流类型。
由于识别是桌面云的数据流类型的综合置信度0.54在区间(θ1,θ2)内,因此需要向第二设备发送所述当前数据流的特征信息和第二数据流类型(或者第一数据流类型,当没有识别出第二数据流类型时发送的就是第一数据流类型)的信息,以用于后续对内容识别模型进行更新(更新原理前面已有介绍,此处不再赘述)。
那么针对数据流b,内容识别模型识别是桌面云的数据流类型的第二置信度是1、识别是语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基于报文信息识别是桌面云的数据流类型的第一置信度是0.9、识别是语音会议的数据流类型的第一置信度均是0、识别是视频会议的数据流类型的第一置信度是0。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:1*0.4+0.9*0.6=0.94,大于θ1,因此当前数据流是桌面云的数据流类型,即更新内容识别模型之后得到的当前数据流b对应的第一数据流类型为桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是视频会议的数据流类型。
由于识别是桌面云的数据流类型的综合置信度0.94不在区间(θ1,θ2)内,因此不需要向第二设备发送所述当前数据流的特征信息和第二数据流类型(或者第一数据流类型,当没有识别出第二数据流类型时发送的就是第一数据流类型)的信息。
可选的,如果来了一条数据流,通过上述对应关系没有确定其对应的第二数据流类型,但是通过行为识别模型和内容识别模型最终确定出了其对应的第一数据流类型,且存在多条记录中的第一记录的特征信息与这一条数据流的特征信息相同但所述第一记录的数据流类型与这一条数据流对应的第一数据流类型不同,则将所述第一记录的数据流类型更新为这一条数据流对应的第一数据流类型,以获得第二记录;所述多条记录中的每一条记录包括特征信息和数据流类型;然后通过包括所述第二记录的多条记录训练内容识别模型,以得新的行为识别模型,可选的,这多条记录中有一条记录包括上述当前数据流的特征信息和对应的第二数据流类型。之所以要对上述多条记录中的部分记录进行更新,是因为在网络中存在云化的弹性部署场景,例如,目的IP地址为10.129.56.39,目的端口号为443,协议类型为TCP的云资源从为桌面云提供服务变成为视频会议提供服务。采用上述方式更新记录之后,就可以基于更准确的信息进行内容识别模型的训练,有利于提高内容识别模型的识别准确度。
举例来说,数据流c的协议类型是TCP,目的IP地址是10.129.56.40,目的端口号是444。
那么针对数据流c,内容识别模型识别是桌面云的数据流类型的第二置信度是1、识别是语音会议的数据流类型、视频会议的数据流类型的第二置信度均是0。行为识别模型基 于报文信息识别是桌面云的数据流类型的第一置信度是0、识别是语音会议的数据流类型的第一置信度是0、识别是视频会议的数据流类型的第一置信度是0.9。因此,对应于这三种数据流类型的综合置信度如下。
桌面云:1*0.4+0*0.6=0.4,小于θ1,因此当前数据流不是桌面云的数据流类型。
语音会议:0*0.4+0*0.6=0,小于θ1,因此当前数据流不是语音会议的数据流类型。
视频会议:0*0.4+0.9*0.6=0.54,大于θ1,因此当前数据流是视频会议的数据流类型,即更新内容识别模型之后得到的当前数据流c对应的第一数据流类型为视频会议的数据流类型。
由于识别是视频会议的数据流类型的综合置信度0.54在区间(θ1,θ2)内,因此需要向第二设备发送所述当前数据流的特征信息和第一数据流类型。相应的,第二设备接收第三设备发送的所述当前数据流的特征信息和所述第一数据流类型,即该第二设备上就有了一条数据流记录,如图6,多了一条针对数据流c的记录。
假若针对数据流c的记录与已有的针对数据流f的记录相比,数据流类型(即应用类别)不同但特征信息(如目的IP地址、目的端口号和协议类型)相同,因此修改已有的针对数据流f的记录,使得协议类型TCP、目的IP地址10.129.56.40、目的端口号444共同对应的数据流类型为视频会议的数据流类型,而不再是桌面云对应的数据流类型,修改前的记录即为第一记录,修改后的记录即为第二记录。
可以理解,将第一记录更新为第二记录之后,根据包括该第二记录的多个记录对内容识别模型进行训练得到新的内容识别模型后,上述数据流c再次输入到更新后的(即新的)内容识别模型时,识别其为桌面云的数据流类型的第二置信度为0,识别其为视频会议的数据流类型的第二置信度为1,可选的,这里的多个记录包括上述当前数据流对应的特征信息和第二数据流类型。
方案二,若对应于所述第目标数据流类型的所述综合置信度小于第二预设阈值,第三设备无需将当前数据流的特征信息和所述第二数据流类型发送给第二设备,则而是由自己根据所述当前数据流的特征信息和所述第二数据流类型更新所述内容识别模型,以得到新的内容识别模型,目标数据流类型为所述至少一个数据流类型中的某一个,若存在目标数据流类型的置信度大于第一预设阈值θ1,则认定该目标数据流类型为当前数据流对应的第一数据流类型,所述第二预设阈值大于所述第一预设阈值。方案二的原理可以参照上述方案一,由该第三设备替代上述方案一种第二设备执行的操作即可。
在图3所描述的方法中,根据行为识别模型识别出第一数据流类型,以及根据预设的关于通用特征的对应关系识别出第二数据流类型之后,如果第一数据流类型和第二数据流类型不同则生成用于更新行为识别模型的校正数据,也就是训练样本;一方面,校正数据是设备在当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下自动获取的,无需人为打标签,因此得到的用于训练行为识别模型的样本数据的效率更高。另一方面,该校正数据是在行为识别模型识别结果不准确的情况下生成的报文信息和准确的数据流类型,因此后续基于该校正数据进行行为识别模型的更新,可以获得识别效果更准确的行为识别模型。
除此之外,可以具体根据内容识别模型和行为识别模型得到当前数据流对应的第一数 据流类型,然后对第一数据流类型进行校正得到当前数据流最终的数据流类型。其中,行为识别模型为由多个数据流样本的报文信息和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到;因此通过内容识别模型和行为识别模型对特征信息、报文信息等进行分析可以更准确地预测出当前数据流对应的第一数据流类型。并且由于训练内容识别模型时用到的数据流样本中的数据流类型是通过行为识别模型识别得到的,无需大量地去采集训练所需要的数据,解决了数据完整性不足的问题。
另外,在第三设备通过已训练出的行为识别模型识别数据流类型的过程中,如果通过地址校正模型(包含基于通用特征的对应关系)发现识别结果出现偏差,则在累计多次出现偏差的情况下第一设备或者第三设备结合导致偏差的相关数据重新训练该行为识别模型,并在训练出新的行为识别模型后对第三设备上的行为识别模型进行更新,这种迭代更新的行为识别模型能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
另外,在第三设备通过已训练出的内容识别模型识别数据流类型的过程中,如果讲综合置信度与预设的更新阈值θ2比较发现需要更新,则在累计多次出现需要更新的情况下第二设备或者第三设备结合导致综合置信度低于θ2的相关数据重新训练该内容识别模型,并在训练出新的内容识别模型后对第三设备上的内容识别模型进行更新,这种迭代更新的内容识别模型能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
上述详细阐述了本发明实施例的方法,下面提供了本发明实施例的装置。
请参见图7,图7是本发明实施例提供的一种数据流类型识别模型更新装置70,该装置70可以为图3所示方法实施例中的第三设备或者该第三设备中的器件或模块。该装置70可以包括第一确定单元701、第二确定单元702和获取单元703,其中,各个单元的详细描述如下。
第一确定单元701,用于根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述行为识别模型为对多个数据流样本的报文信息和数据流类型进行训练得到的模型;
第二确定单元702,用于根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,其中,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
获取单元703,用于在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下,获取所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型,所述校正数据用于作为训练样本更新所述行为识别模型。
在上述方法中,根据行为识别模型识别出第一数据流类型,以及根据预设的关于通用特征的对应关系识别出第二数据流类型之后,如果第一数据流类型和第二数据流类型不同 则生成用于更新行为识别模型的校正数据,也就是训练样本;一方面,校正数据是设备在当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下自动获取的,无需人为打标签,因此得到的用于训练行为识别模型的样本数据的效率更高。另一方面,该校正数据是在行为识别模型识别结果不准确的情况下生成的报文信息和准确的数据流类型,因此后续基于该校正数据进行行为识别模型的更新,可以获得识别效果更准确的行为识别模型。
在一种可能的实现方式中,在根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型方面,所述第二确定单元具体用于:若所述当前数据流的通用特征与所述对应关系中的第一通用特征相同,则将所述第一通用特征对应的数据流类型作为所述当前数据流对应的第二数据流类型。
在又一种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
在又一种可能的实现方式中,还包括:
第一发送单元,用于在所述获取单元获取所述当前数据流对应的校正数据之后,向第一设备发送所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;
第一接收单元,用于接收所述第一设备发送的第一模型数据,所述第一模型数据用于描述由所述第一设备根据所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型对所述行为识别模型进行训练得到的新的行为识别模型。
在该方法中,训练新的行为识别模型的操作由指定的具有较强计算能力的第一设备来实现,第三设备只需根据第一设备发送的新的模型参数就可以对自己的行为识别模型,这样该第三设备就可以将主要的计算资源用在报文转发上面,有效保证了第三设备的报文转发性能。
在又一种可能的实现方式中,还包括:
第一更新单元,用于在所述获取单元获取所述当前数据流对应的校正数据之后,根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型。
该方法里面,训练行为识别模型的操作由第三设备来实现,相当于使用行为识别模型和训练行为识别模型都在同一个设备上。
在又一种可能的实现方式中,在根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型方面,所述第一更新单元具体用于:
若当前已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,则根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,其中,所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流。
该方法里面,更新行为识别模型是存在触发条件的,具体就是看当前是否已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,通过合理配置M可以避免过于频繁更新行为识别模型而带来不必要的计算开销,也可以避免 因更新频率不够而带来的行为识别模型预存不准确的问题。
在又一种可能的实现方式中:在根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型方面,所述第一更新单元具体用于:
根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型;
其中,所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
在又一种可能的实现方式中,在所述若所述Y条数据流与所述M条数据流来自至少两个不同的网络,根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型方面,所述第一更新单元具体用于:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的泛化性更好,预测准确度更高。
在又一种可能的实现方式中,在根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型方面,所述第一确定单元具体用于:
根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型,所述特征信息包括目的地址和协议类型中的一项或者多项;所述内容识别模型为对一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到。
在上述方法中,根据内容识别模型和行为识别模型得到当前数据流对应的第一数据流类型,然后对第一数据流类型进行校正得到当前数据流最终的数据流类型。其中,行为识别模型为由多个数据流样本的报文信息和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到;因此通过内容识别模型和行为识别模型对特征信息、报文信息等进行分析可以更准确地预测出当前数据流对应的第一数据流类型。并且由于训练内容识别模型时用到的数据流样本中的数据流类型是通过行为识别模型识别得到的,无需大量地去采集训练所需要的数据,解决了数据完整性不足 的问题。
在又一种可能的实现方式中,在所述根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型方面,所述第一确定单元具体用于:
根据当前数据流的报文信息和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度;
根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度;
根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型。
在又一种可能的实现方式中,在根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型方面,所述第一确定单元具体用于:
根据对应于目标数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述目标数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述目标数据流类型的综合置信度,所述目标数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述目标数据流类型的所述综合置信度大于第一预设阈值,则确定所述目标数据流类型为所述当前数据流对应的第一数据流类型。
在又一种可能的实现方式中,还包括:
第二发送单元,用于在对应于所述目标数据流类型的所述综合置信度小于第二预设阈值的情况下,向第二设备发送所述当前数据流的特征信息和所述第二数据流类型,所述第二预设阈值大于所述第一预设阈值;
第二接收单元,用于接收所述第二设备发送的第二模型数据,所述第二模型数据用于描述由所述第二设备根据所述当前数据流的特征信息和所述第二数据流类型训练所述内容识别模型得到的新的内容识别模型。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,将当前数据流的相关信息发送给第二设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
在又一种可能的实现方式中,还包括:
第二更新单元,用于在对应于所述目标数据流类型的所述综合置信度小于第二预设阈值的情况下,根据所述当前数据流的特征信息和所述第二数据流类型更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,对当前数据流的相关信息进行训练,以获得新的内容识别模型,以使得下一次的认定结果更加准确。
在又一种可能的实现方式中,还包括:
第三发送单元,用于在所述第二确定单元根据目标对应关系和所述当前数据流的通用 特征确定所述当前数据流对应的第二数据流类型之后,向运维支持系统OSS发送所述当前数据流对应的第二数据流类型,所述当前数据流的第二数据流类型的信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
在又一种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP报文长度、传输协议报文长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。需要说明的是,各个单元的实现还可以对应参照图3所示的方法实施例的相应描述。
请参见图8,图8是本发明实施例提供的一种数据流类型识别模型更新装置80,该装置80可以为图3所示方法实施例中的第一设备或者该第一设备中的器件或模块。该装置80可以包括第一接收单元801、获取单元802和第一发送单元803,其中,各个单元的详细描述如下。
第一接收单元801,用于接收第三设备发送的当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;所述当前数据流对应的第二数据流类型为所述第三设备根据目标对应关系和所述当前数据流的通用特征确定的,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
获取单元802,用于在累计接收到了来自所述第三设备的M条数据流对应的校正数据的情况下,根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型;所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流;
第一发送单元803,用于向所述第三设备发送第一模型数据,所述第一模型数据用于描述所述新的行为识别模型,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述行为识别模型用于根据输入的待预测数据流的报文信息确定所述待预测数据流的数据流类型;所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
在上述方法中,第一设备在累计一定数量的来自第三设备的校正数据的情况下,根据该一定数量的校正数据对该行为识别模型进行训练得到新的行为识别模型,并在训练出新的行为识别模型向第三设备发送用于描述该新的行为识别模型的信息,以用于该第三设备对第三设备上的行为识别模型进行更新。以上方法无需第三设备进行模型训练,只需基于第一设备的模型训练结果直接得到新的行为识别模型即可,有利于第三设备充分利用计算资源来进行数据流类型的识别。
在一种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
在又一种可能的实现方式中,所述当前数据流对应的校正数据是所述第三设备在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型为不同的数据流类型的情况下发送的,所述当前数据流对应的第一数据流类型是所述第三设备根据当前数据流的报文信息、特征信息、所述行为识别模型、内容识别模型确定的;所述特征信息包括目的地址和协议类型中的一项或者多项,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的,所述历史数据流的数据流类型是根据所述行为识别模型得到的。
在又一种可能的实现方式种,在根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述获取单元具体用于:
根据所述M条数据流对应的校正数据和Y条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型,其中:
所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
在又一种可能的实现方式中,若所所述Y条数据流与所述M条数据流来自至少两个不同的网络;在根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述获取单元具体用于:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流对所述行为识别模型进行训练得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的泛化性更好,预测准确度更高。
在又一种可能的实现方式中,还包括:
第二接收单元,用于接收所述第三设备发送的当前数据流的特征信息和第二数据流类型的信息;
生成单元,用于根据所述当前数据流的特征信息和第二数据流类型对所述内容识别模型进行训练得到新的内容识别模型;
第二发送单元,用于向所述第三设备发送第二模型数据,所述第二模型数据用于描述所述新的内容识别模型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述内容识别模型用于根据输入的待预测的数据流的特征信息估计所述待预测数据流的数据流类型,其中,所述历史数据流的数据流类型是根据行为识别模 型得到的,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述特征信息包括目的地址和协议类型中的一项或多项。
在上述方法中,在第三设备通过已训练出的内容识别模型识别数据流类型的过程中,如果发现模型准确度不高,则触发第一设备结合相关数据重新训练该内容识别模型,并在训练出新的内容识别模型后对第三设备上的内容识别模型进行更新,这种迭代更新的内容识别模型的方式能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
需要说明的是,各个单元的实现还可以对应参照图3所示的方法实施例的相应描述。
请参见图9,图9是本发明实施例提供的一种设备90,该设备90可以为图3所示方法实施例中的第三设备,该设备90包括处理器901、存储器902和通信接口903,所述处理器901、存储器902和通信接口903通过总线相互连接。
存储器902包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器902用于相关计算机程序及数据。通信接口903用于接收和发送数据。
处理器901可以是一个或多个中央处理器(central processing unit,CPU),在处理器901是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
处理器901读取所述存储器902中存储的计算机程序代码,用于执行以下操作:
根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;所述行为识别模型为对多个数据流样本的报文信息和数据流类型进行训练得到的模型;
根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,其中,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
若根据所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同,则获取所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型,所述校正数据用于作为训练样本更新所述行为识别模型。
在上述方法中,根据行为识别模型识别出第一数据流类型,以及根据预设的关于通用特征的对应关系识别出第二数据流类型之后,如果第一数据流类型和第二数据流类型不同则生成用于更新行为识别模型的校正数据,也就是训练样本;一方面,校正数据是设备在当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下自动获取的,无需人为打标签,因此得到的用于训练行为识别模型的样本数据的效率更高。另一方面,该校正数据是在行为识别模型识别结果不准确的情况下生成的报文信息和准确的数据流类型,因此后续基于该校正数据进行行为识别模型的更新,可以获得识别效果更准确的行为识别模型。
在一种可能的实现方式中,在根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,所述处理器具体用于:
若所述当前数据流的通用特征与所述对应关系中的第一通用特征相同,则将所述第一通用特征对应的数据流类型作为所述当前数据流对应的第二数据流类型。
在又一种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
在又一种可能的实现方式中,所述获取所述当前数据流对应的校正数据之后,所述处理器还用于:
通过所述通信接口向第一设备发送所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;
通过所述通信接口接收所述第一设备发送的第一模型数据,所述第一模型数据用于描述由所述第一设备根据所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型对所述行为识别模型进行训练得到的第一模型数据,其中,所述第一模型数据用于描述对所述行为识别模型进行训练得到的新的行为识别模型。
在该方法中,训练新的行为识别模型的操作由指定的具有较强计算能力的第一设备来实现,第三设备只需根据第一设备发送的新的模型参数就可以对自己的行为识别模型,这样该第三设备就可以将主要的计算资源用在报文转发上面,有效保证了第三设备的报文转发性能。
在又一种可能的实现方式中,所述获取所述当前数据流对应的校正数据之后,所述处理器具体用于:根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型。
该方法里面,训练行为识别模型的操作由第三设备来实现,相当于使用行为识别模型和训练行为识别模型都在同一个设备上。
在又一种可能的实现方式中,在所述根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型方面,所述处理器具体用于:
当前已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,则根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,其中,所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;所述M条数据流包括所述当前数据流,M为预设的参考阈值。
该方法里面,更新行为识别模型是存在触发条件的,具体就是看当前是否已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,通过合理配置M可以避免过于频繁更新行为识别模型而带来不必要的计算开销,也可以避免因更新频率不够而带来的行为识别模型预存不准确的问题。
在又一种可能的实现方式中:所述根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型方面,所述处理器具体用于:
根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y 条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型;
其中,所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
在又一种可能的实现方式中,若所述Y条数据流与所述M条数据流来自至少两个不同的网络,根据所述M条数据流的报文信息、所述M条数据流分别对应的第二数据流类型、Y条数据流的报文信息和所述Y条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型方面,所述处理器具体用于:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的泛化性更好,预测准确度更高。
在又一种可能的实现方式中,在根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,所述处理器具体用于:
根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型,所述特征信息包括目的地址和协议类型中的一项或者多项;所述内容识别模型为对一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到。
在上述方法中,具体根据内容识别模型和行为识别模型得到当前数据流对应的第一数据流类型,然后对第一数据流类型进行校正得到当前数据流最终的数据流类型。其中,行为识别模型为由多个数据流样本的报文信息和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到;因此通过内容识别模型和行为识别模型对特征信息、报文信息等进行分析可以更准确地预测出当前数据流对应的第一数据流类型。并且由于训练内容识别模型时用到的数据流样本中的数据流类型是通过行为识别模型识别得到的,无需大量地去采集训练所需要的数据,解决了数据完整性不足的问题。
在又一种可能的实现方式中,在根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型方面,所述处理器具体用于:
根据当前数据流的报文信息和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度;
根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度;
根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型。
在又一种可能的实现方式中,在根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型方面,所述处理器具体用于:
根据对应于目标数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述目标数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述目标数据流类型的综合置信度,所述目标数据流类型为所述至少一个数据流类型中的任意一个;
若对应于所述目标数据流类型的所述综合置信度大于第一预设阈值,则确定所述目标数据流类型为所述当前数据流对应的第一数据流类型。
在又一种可能的实现方式中,所述处理器还用于:
若对应于所述目标数据流类型的所述综合置信度小于第二预设阈值,则通过所述通信接口向第二设备发送所述当前数据流的特征信息和所述第二数据流类型,所述第二预设阈值大于所述第一预设阈值;
通过所述通信接口接收所述第二设备发送的第二模型数据,所述第二模型数据用于描述由所述第二设备根据所述当前数据流的特征信息和所述第二数据流类型训练所述内容识别模型得到的新的内容识别模型。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,将当前数据流的相关信息发送给第二设备进行训练,以用于获得新的内容识别模型,以使得下一次的认定结果更加准确。
在又一种可能的实现方式中,所述处理器还用于:
若对应于所述目标数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第二数据流类型更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
上述方法中,本申请发明人利用当前数据流的数据流类型的认定结果对内容识别模型进行更新。具体的,引入了第二预设阈值,在对应于所述第一数据流类型的所述综合置信度小于第二预设阈值时,对当前数据流的相关信息进行训练,以获得新的内容识别模型,以使得下一次的认定结果更加准确。
在又一种可能的实现方式中,在根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型之后,所述处理器还用于:
通过所述通信接口向运维支持系统OSS发送所述当前数据流对应的第二数据流类型,所述当前数据流的第二数据流类型的信息用于所述OSS生成针对所述当前数据流的流量控制策略。
也即是说,在确定出当前数据流的数据流类型之后,将当前数据流类型的相关信息通知给OSS系统,这样OSS系统就可以基于当前数据流的数据流类型来生成针对所述当前数据流的流量控制策略,例如当前数据流的第一数据流类型为视频会议的视频流时,将其对 应的流量控制策略定义为优先传输的策略,即当有多个数据流待传输时,优先传输该当前数据流。
在又一种可能的实现方式中,所述报文长度包括报文中以太帧长度、IP报文长度、传输协议报文长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
需要说明的是,各个操作的实现还可以对应参照图3所示的方法实施例的相应描述。
请参见图10,图10是本发明实施例提供的一种设备100,该设备100可以为图3所示方法实施例中的第一设备,该设备100包括处理器1001、存储器1002和通信接口1003,所述处理器1001、存储器1002和通信接口1003通过总线相互连接。
存储器1002包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器1002用于相关计算机程序及数据。通信接口1003用于接收和发送数据。
处理器1001可以是一个或多个中央处理器(central processing unit,CPU),在处理器1001是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
处理器1001读取所述存储器1002中存储的计算机程序代码,用于执行以下操作:
通过所述通信接口接收第三设备发送的当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;所述当前数据流对应的第二数据流类型为所述第三设备根据目标对应关系和所述当前数据流的通用特征确定的,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
若累计接收到了来自所述第三设备的M条数据流对应的校正数据,则根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型;所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流;
通过所述通信接口向所述第三设备发送第一模型数据,所述第一模型数据用于描述所述新的行为识别模型,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述行为识别模型用于根据输入的待预测数据流的报文信息确定所述待预测数据流的数据流类型;所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
在上述方法中,第一设备在累计一定数量的来自第三设备的校正数据的情况下,根据该一定数量的校正数据对该行为识别模型进行训练得到新的行为识别模型,并在训练出新的行为识别模型向第三设备发送用于描述该新的行为识别模型的信息,以用于该第三设备对第三设备上的行为识别模型进行更新。以上方法无需第三设备进行模型训练,只需基于第一设备的模型训练结果直接得到新的行为识别模型即可,有利于第三设备充分利用计算资源来进行数据流类型的识别。
在一种可能的实现方式中,所述通用特征为知名端口号、或者知名域名系统DNS。
在又一种可能的实现方式中,所述当前数据流对应的校正数据是所述第三设备在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型为不同的数据流类型的情况下发送的,所述当前数据流对应的第一数据流类型是所述第三设备根据当前数据流的报文信息、特征信息、所述行为识别模型、内容识别模型确定的;所述特征信息包括目的地址和协议类型中的一项或者多项,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的,所述历史数据流的数据流类型是根据所述行为识别模型得到的。
在又一种可能的实现方式中:所述根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述处理器具体用于:
根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型,其中:
所述Y条数据流与所述M条数据流来自同一个网络,或者,
所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
可以理解,通过对来自不同网络的数据流的相关信息进行训练,能够提高行为识别模型的泛化性,预测效果更好。
在又一种可能的实现方式中,若所所述Y条数据流与所述M条数据流来自至少两个不同的网络;所述根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述处理器具体用于:
根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流对所述行为识别模型进行训练得到新的行为识别模型。
该方法中,将来自不同网络的数据流的报文信息进行归一化处理,使得来自不同网络的数据流的报文信息之间更具可比性,基于归一化后的报文信息训练出的行为识别模型的泛化性更好,预测准确度更高。
在又一种可能的实现方式中,所述处理器还用于:
通过所述通信接口接收所述第三设备发送的当前数据流的特征信息和第二数据流类型的信息;
根据所述当前数据流的特征信息和第二数据流类型对所述内容识别模型进行训练得到新的内容识别模型;
通过所述通信接口向所述第三设备发送第二模型数据,所述第二模型数据用于描述所述新的内容识别模型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述内容识别模型用于根据输入的待预测的数据流的特征信息估计所述待预测数据流的数据流类型,其中,所述历史数据流的数据流类型是根据行为识别模型 得到的,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述特征信息包括目的地址和协议类型中的一项或多项。
在上述方法中,在第三设备通过已训练出的内容识别模型识别数据流类型的过程中,如果发现模型准确度不高,则触发第一设备结合相关数据重新训练该内容识别模型,并在训练出新的内容识别模型后对第三设备上的内容识别模型进行更新,这种迭代更新的内容识别模型的方式能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
需要说明的是,各个操作的实现还可以对应参照图3所示的方法实施例的相应描述。
需说明的是,以上描述的任意装置实施例都仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本发明提供的网络设备或主机的实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本发明实施例还提供一种芯片系统,所述芯片系统包括至少一个处理器,存储器和接口电路,所述存储器、所述收发器和所述至少一个处理器通过线路互联,所述至少一个存储器中存储有计算机程序;所述计算机程序被所述处理器执行时,实现图3所示的方法流程。
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在处理器上运行时,实现图3所示的方法流程。
本发明实施例还提供一种计算机程序产品,当所述计算机程序产品在处理器上运行时,实现图3所示的方法流程得以实现。
综上所述,在本申请实施例中,根据行为识别模型识别出第一数据流类型,以及根据预设的关于通用特征的对应关系识别出第二数据流类型之后,如果第一数据流类型和第二数据流类型不同则生成用于更新行为识别模型的校正数据,也就是训练样本;一方面,校正数据是设备在当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下自动获取的,无需人为打标签,因此得到的用于训练行为识别模型的样本数据的效率更高。另一方面,该校正数据是在行为识别模型识别结果不准确的情况下生成的报文信息和准确的数据流类型,因此后续基于该校正数据进行行为识别模型的更新,可以获得识别效果更准确的行为识别模型。
除此之外,可以具体根据内容识别模型和行为识别模型得到当前数据流对应的第一数据流类型,然后对第一数据流类型进行校正得到当前数据流最终的数据流类型。其中,行为识别模型为由多个数据流样本的报文信息和数据流类型预先训练得到,而内容识别模型是基于数据流的特征信息和由行为模型识别出的数据流类型训练得到;因此通过内容识别模型和行为识别模型对特征信息、报文信息等进行分析可以更准确地预测出当前数据流对 应的第一数据流类型。并且由于训练内容识别模型时用到的数据流样本中的数据流类型是通过行为识别模型识别得到的,无需大量地去采集训练所需要的数据,解决了数据完整性不足的问题。
另外,在第三设备通过已训练出的行为识别模型识别数据流类型的过程中,如果通过地址校正模型(包含基于通用特征的对应关系)发现识别结果出现偏差,则在累计多次出现偏差的情况下第一设备或者第三设备结合导致偏差的相关数据重新训练该行为识别模型,并在训练出新的行为识别模型后对第三设备上的行为识别模型进行更新,这种迭代更新的行为识别模型能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
另外,在第三设备通过已训练出的内容识别模型识别数据流类型的过程中,如果讲综合置信度与预设的更新阈值θ2比较发现需要更新,则在累计多次出现需要更新的情况下第二设备或者第三设备结合导致综合置信度低于θ2的相关数据重新训练该内容识别模型,并在训练出新的内容识别模型后对第三设备上的内容识别模型进行更新,这种迭代更新的内容识别模型能够满足不同用户、不同网络、不同场景的差异化需求,泛化性更好、通用性更强。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来计算机程序相关的硬件完成,该计算机程序可存储于计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储计算机程序代码的介质。
本发明实施例中提到的第一设备、第一置信度、第一数据流类型、第一预设阈值第二信息以及第一记录中的“第一”只是用来做名字标识,并不代表顺序上的第一。该规则同样适用于“第二”、“第三”和“第四”等。然而,本发明实施例中提到的第一个标识中的“第一个”代表顺序上的第一。该规则同样适用于“第N个”。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、替换、改进等,均应包括在本发明的保护范围之内。

Claims (41)

  1. 一种数据流类型识别模型更新方法,其特征在于,包括:
    根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述行为识别模型为对多个数据流样本的报文信息和数据流类型进行训练得到的模型;
    根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,其中,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
    若所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同,则获取所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型,所述校正数据用于作为训练样本更新所述行为识别模型。
  2. 根据权利要求1所述的方法,其特征在于,所述根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,包括:
    若所述当前数据流的通用特征与所述对应关系中的第一通用特征相同,则将所述第一通用特征对应的数据流类型作为所述当前数据流对应的第二数据流类型。
  3. 根据权利要求1或2所述的方法,其特征在于,所述通用特征为知名端口号、或者知名域名系统DNS。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述获取所述当前数据流对应的校正数据之后,还包括:
    向第一设备发送所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;
    接收所述第一设备发送的第一模型数据,所述第一模型数据用于描述由所述第一设备根据所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型对所述行为识别模型进行训练得到的新的行为识别模型。
  5. 根据权利要求1-3任一项所述的方法,其特征在于,所述获取所述当前数据流对应的校正数据之后,还包括:
    根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型,包括:
    若当前已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,则根据所述M条数据流的报文信息和所述M条数据流分别对应的 第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,其中,所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,包括:
    根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型,所述特征信息包括目的地址和协议类型中的一项或者多项;所述内容识别模型为对一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到。
  8. 根据权利要求7所述的方法,其特征在于,所述根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型,包括:
    根据当前数据流的报文信息和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度;
    根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度;
    根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型,包括:
    根据对应于目标数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述目标数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述目标数据流类型的综合置信度,所述目标数据流类型为所述至少一个数据流类型中的任意一个;
    若对应于所述目标数据流类型的所述综合置信度大于第一预设阈值,则确定所述目标数据流类型为所述当前数据流对应的第一数据流类型。
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    若对应于所述目标数据流类型的所述综合置信度小于第二预设阈值,则向第二设备发送所述当前数据流的特征信息和所述第二数据流类型,所述第二预设阈值大于所述第一预设阈值;
    接收所述第二设备发送的第二模型数据,所述第二模型数据用于描述由所述第二设备根据所述当前数据流的特征信息和所述第二数据流类型训练所述内容识别模型得到的新的内容识别模型。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    若对应于所述目标数据流类型的所述综合置信度小于第二预设阈值,则根据所述当前数据流的特征信息和所述第二数据流类型更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
  12. 根据权利要求1-11任一项所述的方法,其特征在于,所述根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型之后,还包括:
    向运维支持系统OSS发送所述当前数据流对应的第二数据流类型,所述当前数据流的第二数据流类型的信息用于所述OSS生成针对所述当前数据流的流量控制策略。
  13. 根据权利要求1-12任一项所述的方法,其特征在于,所述报文长度包括报文中以太帧长度、IP报文长度、传输协议报文长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
  14. 一种数据流类型识别模型更新方法,其特征在于,包括:
    接收第三设备发送的当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;所述当前数据流对应的第二数据流类型为所述第三设备根据目标对应关系和所述当前数据流的通用特征确定的,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
    若累计接收到了M条数据流对应的校正数据,则根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型;所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流;
    向所述第三设备发送第一模型数据,所述第一模型数据用于描述所述新的行为识别模型,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述行为识别模型用于根据输入的待预测数据流的报文信息确定所述待预测数据流的数据流类型;所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
  15. 根据权利要求14所述的方法,其特征在于,所述通用特征为知名端口号、或者知名域名系统DNS。
  16. 根据权利要求14或15所述的方法,其特征在于,所述当前数据流对应的校正数据是所述第三设备在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型为不同的数据流类型的情况下发送的,所述当前数据流对应的第一数据流类型是所述第三设备根据当前数据流的报文信息、特征信息、所述行为识别模型、内容识别模型确定的;所述特征信息包括目的地址和协议类型中的一项或者多项,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的,所述历史数据流的数据流 类型是根据所述行为识别模型得到的。
  17. 根据权利要求14-16任一项所述的方法,其特征在于,所述根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型,包括:
    根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型,其中:
    所述Y条数据流与所述M条数据流来自同一个网络,或者,
    所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
  18. 根据权利要求17所述的方法,其特征在于,若所所述Y条数据流与所述M条数据流来自至少两个不同的网络;所述根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型,包括:
    根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
    根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流对所述行为识别模型进行训练得到新的行为识别模型。
  19. 根据权利要求14-18任一项所述的方法,其特征在于,还包括:
    接收所述第三设备发送的当前数据流的特征信息和第二数据流类型的信息;
    根据所述当前数据流的特征信息和第二数据流类型对所述内容识别模型进行训练得到新的内容识别模型;
    向所述第三设备发送第二模型数据,所述第二模型数据用于描述所述新的内容识别模型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述内容识别模型用于根据输入的待预测的数据流的特征信息估计所述待预测数据流的数据流类型,其中,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述特征信息包括目的地址和协议类型中的一项或多项。
  20. 一种数据流类型识别模型更新装置,其特征在于,包括:
    第一确定单元,用于根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述行为识别模型为对多个数据流样本的报文信息和数据流类型进行训练得到的模型;
    第二确定单元,用于根据目标对应关系和所述当前数据流的通用特征确定所述当前数 据流对应的第二数据流类型,其中,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
    获取单元,用于在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同的情况下,获取所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型,所述校正数据用于作为训练样本更新所述行为识别模型。
  21. 根据权利要求20所述的装置,其特征在于,在根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型方面,所述第二确定单元具体用于:
    若所述当前数据流的通用特征与所述对应关系中的第一通用特征相同,则将所述第一通用特征对应的数据流类型作为所述当前数据流对应的第二数据流类型。
  22. 根据权利要求20或21所述的装置,其特征在于,所述通用特征为知名端口号、或者知名域名系统DNS。
  23. 根据权利要求20-22任一项所述的装置,其特征在于,所述装置还包括:
    第一发送单元,用于在所述获取单元获取所述当前数据流对应的校正数据之后,向第一设备发送所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;
    第一接收单元,用于接收所述第一设备发送的第一模型数据,所述第一模型数据用于描述由所述第一设备根据所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型对所述行为识别模型进行训练得到的新的行为识别模型。
  24. 根据权利要求20-22任一项所述的装置,其特征在于,所述装置还包括:
    第一更新单元,用于在所述获取单元获取所述当前数据流对应的校正数据之后,根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型。
  25. 根据权利要求24所述的装置,其特征在于,在根据所述校正数据更新所述行为识别模型,以得到新的行为识别模型方面,所述第一更新单元具体用于:
    若当前已累计存在M条数据流对应的第一数据流类型与所述M条数据流类型对应的第二数据流类型不同,则根据所述M条数据流的报文信息和所述M条数据流分别对应的第二数据流类型训练所述行为识别模型,以得到新的行为识别模型,其中,所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流。
  26. 根据权利要求20-25任一项所述的装置,其特征在于,在根据当前数据流的报文 信息、行为识别模型确定所述当前数据流对应的第一数据流类型方面,所述第一确定单元具体用于:
    根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型,所述特征信息包括目的地址和协议类型中的一项或者多项;所述内容识别模型为对一条或多条历史数据流的特征信息和数据流类型得到的模型,所述历史数据流的数据流类型是根据所述行为识别模型得到。
  27. 根据权利要求26所述的装置,其特征在于,在根据当前数据流的报文信息、特征信息、行为识别模型、内容识别模型确定所述当前数据流对应的第一数据流类型方面,所述第一确定单元具体用于:
    根据当前数据流的报文信息和行为识别模型得到所述当前数据流的对应于至少一个数据流类型的至少一个第一置信度;
    根据所述当前数据流的特征信息和内容识别模型得到所述当前数据流的对应于所述至少一个数据流类型的至少一个第二置信度;
    根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型。
  28. 根据权利要求27所述的装置,其特征在于,在根据所述至少一个第一置信度和所述至少一个第二置信度确定所述当前数据流的第一数据流类型方面,所述第一确定单元具体用于:
    根据对应于目标数据流类型的所述第一置信度、所述第一置信度的权重值、对应于所述目标数据流类型的所述第二置信度和所述第二置信度的权重值计算对应于所述目标数据流类型的综合置信度,所述目标数据流类型为所述至少一个数据流类型中的任意一个;
    若对应于所述目标数据流类型的所述综合置信度大于第一预设阈值,则确定所述目标数据流类型为所述当前数据流对应的第一数据流类型。
  29. 根据权利要求28所述的装置,其特征在于,所述装置还包括:
    第二发送单元,用于在对应于所述目标数据流类型的所述综合置信度小于第二预设阈值的情况下,向第二设备发送所述当前数据流的特征信息和所述第二数据流类型,所述第二预设阈值大于所述第一预设阈值;
    第二接收单元,用于接收所述第二设备发送的第二模型数据,所述第二模型数据用于描述由所述第二设备根据所述当前数据流的特征信息和所述第二数据流类型训练所述内容识别模型得到的新的内容识别模型。
  30. 根据权利要求29所述的装置,其特征在于,所述装置还包括:
    第二更新单元,用在若对应于所述目标数据流类型的所述综合置信度小于第二预设阈值的情况下,根据所述当前数据流的特征信息和所述第二数据流类型更新所述内容识别模型,以得到新的内容识别模型,所述第二预设阈值大于所述第一预设阈值。
  31. 根据权利要求20-30任一项所述的装置,其特征在于,所述装置还包括:
    第三发送单元,用于在所述第二确定单元根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型之后,向运维支持系统OSS发送所述当前数据流对应的第二数据流类型,所述当前数据流的第二数据流类型的信息用于所述OSS生成针对所述当前数据流的流量控制策略。
  32. 根据权利要求20-31任一项所述的装置,其特征在于,所述报文长度包括报文中以太帧长度、IP报文长度、传输协议报文长度和报头长度中的一项或者多项,所述传输协议包括传输控制协议TCP和/或用户数据报协议UDP。
  33. 一种数据流类型识别模型更新装置,其特征在于,包括:
    第一接收单元,用于接收第三设备发送的当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;所述当前数据流对应的第二数据流类型为所述第三设备根据目标对应关系和所述当前数据流的通用特征确定的,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
    获取单元,用于在累计接收到了M条数据流对应的校正数据的情况下,根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型;所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流;
    第一发送单元,用于向所述第三设备发送第一模型数据,所述第一模型数据用于描述所述新的行为识别模型,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述行为识别模型用于根据输入的待预测数据流的报文信息确定所述待预测数据流的数据流类型;所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
  34. 根据权利要求33所述的装置,其特征在于,所述通用特征为知名端口号、或者知名域名系统DNS。
  35. 根据权利要求33或34所述的装置,其特征在于,所述当前数据流对应的校正数据是所述第三设备在所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型为不同的数据流类型的情况下发送的,所述当前数据流对应的第一数据流类型是所述第三设备根据当前数据流的报文信息、特征信息、所述行为识别模型、内容识别模型确定的;所述特征信息包括目的地址和协议类型中的一项或者多项,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的,所述历史数据流的数据流类型是根据所述行为识别模型得到的。
  36. 根据权利要求33-35任一项所述的装置,其特征在于,在根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述获取单元具体用于:
    根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型,其中:
    所述Y条数据流与所述M条数据流来自同一个网络,或者,
    所述Y条数据流与所述M条数据流来自至少两个不同的网络,其中,所述至少两个不同的网络包括两个不同的局域网,或者所述至少两个不同的网络包括两个不同形态的网络,或者所述至少两个不同的网络包括两个不同区域的网络。
  37. 根据权利要求36所述的装置,其特征在于,若所所述Y条数据流与所述M条数据流来自至少两个不同的网络;在根据所述M条数据流对应的校正数据和Y条数据流对应的矫正数据对所述行为识别模型进行训练得到新的行为识别模型方面,所述获取单元具体用于:
    根据所述Y条数据流所属的第二网络与所述M条数据流所属的第一网络的网络配置的差异对所述Y条数据流的报文信息进行修正,得到所述Y条数据流的修正后的报文信息;
    根据所述M条数据流的报文信息、所述Y条数据流的修正后的报文信息、所述M条数据流对应的第二数据流类型、所述Y条数据流对应的第二数据流对所述行为识别模型进行训练得到新的行为识别模型。
  38. 根据权利要求33-37任一项所述的装置,其特征在于,还包括:
    第二接收单元,用于接收所述第三设备发送的当前数据流的特征信息和第二数据流类型的信息;
    生成单元,用于根据所述当前数据流的特征信息和第二数据流类型的信息对所述内容识别模型进行训练得到新的内容识别模型;
    第二发送单元,用于向所述第三设备发送第二模型数据,所述第二模型数据用于描述所述新的内容识别模型,所述内容识别模型为根据一条或多条历史数据流的特征信息和数据流类型得到的模型,所述内容识别模型用于根据输入的待预测的数据流的特征信息估计所述待预测数据流的数据流类型,其中,所述历史数据流的数据流类型是根据行为识别模型得到的,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项,所述特征信息包括目的地址和协议类型中的一项或多项。
  39. 一种数据流类型识别模型更新设备,其特征在于,所述设备是第三设备,所述第三设备包括存储器和处理器,其中,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:
    根据当前数据流的报文信息、行为识别模型确定所述当前数据流对应的第一数据流类 型,所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项;所述行为识别模型为对多个数据流样本的报文信息和数据流类型进行训练得到的模型;
    根据目标对应关系和所述当前数据流的通用特征确定所述当前数据流对应的第二数据流类型,其中,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
    若根据所述当前数据流对应的第一数据流类型和所述当前数据流对应的第二数据流类型不同,则获取所述当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型,所述校正数据用于作为训练样本更新所述行为识别模型。
  40. 一种数据流类型识别模型更新设备,其特征在于,所述设备是第一设备,所述第一设备包括存储器、处理器和通信接口,其中,所述存储器用于存储计算机程序,所述处理器调用所述计算机程序,用于执行如下操作:
    通过所述通信接口接收第三设备发送的当前数据流对应的校正数据,其中,所述当前数据流对应的校正数据包括所述当前数据流的报文信息和所述当前数据流对应的第二数据流类型;所述当前数据流对应的第二数据流类型为所述第三设备根据目标对应关系和所述当前数据流的通用特征确定的,所述目标对应关系为多个通用特征与多个数据流类型的对应关系;
    若累计接收到了来自所述第三设备的M条数据流对应的校正数据,则根据所述M条数据流对应的校正数据对所述行为识别模型进行训练得到新的行为识别模型;所述M条数据流为从所述行为识别模型生效到截止当前的累计量,或者为在预设时间段内的累计量,或者所述M条数据流占所述行为识别模型生效后已传输的数据流的总量的比值超过预设阈值;M条数据流包括所述当前数据流;
    通过所述通信接口向所述第三设备发送第一模型数据,所述第一模型数据用于描述所述新的行为识别模型,所述行为识别模型为根据多个数据流样本的报文信息和数据流类型得到的模型,所述行为识别模型用于根据输入的待预测数据流的报文信息确定所述待预测数据流的数据流类型;所述报文信息包括报文长度、报文传输速度、报文间隔时间和报文方向中的一项或者多项。
  41. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在处理器上运行时,实现权利要求1-19任一所述的方法。
PCT/CN2020/119665 2020-02-28 2020-09-30 一种数据流类型识别模型更新方法及相关设备 WO2021169308A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20922372.6A EP4087202A4 (en) 2020-02-28 2020-09-30 DATA FLOW TYPE IDENTIFICATION MODEL UPDATE METHOD AND RELATED DEVICE
US17/896,943 US20220407809A1 (en) 2020-02-28 2022-08-26 Data Stream Classification Model Updating Method and Related Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010130637.8A CN111404833B (zh) 2020-02-28 2020-02-28 一种数据流类型识别模型更新方法及相关设备
CN202010130637.8 2020-02-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/896,943 Continuation US20220407809A1 (en) 2020-02-28 2022-08-26 Data Stream Classification Model Updating Method and Related Device

Publications (1)

Publication Number Publication Date
WO2021169308A1 true WO2021169308A1 (zh) 2021-09-02

Family

ID=71413859

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119665 WO2021169308A1 (zh) 2020-02-28 2020-09-30 一种数据流类型识别模型更新方法及相关设备

Country Status (4)

Country Link
US (1) US20220407809A1 (zh)
EP (1) EP4087202A4 (zh)
CN (1) CN111404833B (zh)
WO (1) WO2021169308A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666169A (zh) * 2022-05-24 2022-06-24 杭州安恒信息技术股份有限公司 一种扫描探测类型的识别方法、装置、设备及介质
CN115988574A (zh) * 2023-03-15 2023-04-18 阿里巴巴(中国)有限公司 基于流表的数据处理方法、系统、设备和存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111404833B (zh) * 2020-02-28 2022-04-12 华为技术有限公司 一种数据流类型识别模型更新方法及相关设备
CN112367215B (zh) * 2020-09-21 2022-04-26 杭州安恒信息安全技术有限公司 基于机器学习的网络流量协议识别方法和装置
CN112398875B (zh) * 2021-01-18 2021-04-09 北京电信易通信息技术股份有限公司 视频会议场景下基于机器学习的流数据安全漏洞探测方法
CN114039995A (zh) * 2021-09-17 2022-02-11 北京新网医讯技术有限公司 基于dicom ip端口区分ai模块功能管理方法及系统
US20230164029A1 (en) * 2021-11-22 2023-05-25 Cisco Technology, Inc. Recommending configuration changes in software-defined networks using machine learning
CN114338262B (zh) * 2021-11-26 2024-02-20 国网信息通信产业集团有限公司 能源舱通信方法、系统及电子设备
CN116094924B (zh) * 2022-07-08 2023-11-21 荣耀终端有限公司 用于模型更新的方法及相关装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110040706A1 (en) * 2009-08-11 2011-02-17 At&T Intellectual Property I, Lp Scalable traffic classifier and classifier training system
CN104468273A (zh) * 2014-12-12 2015-03-25 北京百度网讯科技有限公司 识别流量数据的应用类型的方法及系统
CN108667747A (zh) * 2018-04-28 2018-10-16 深圳信息职业技术学院 网络流应用类型识别的方法、装置及计算机可读存储介质
CN110233769A (zh) * 2018-03-06 2019-09-13 华为技术有限公司 一种流量检测方法和流量检测设备
CN110781950A (zh) * 2019-10-23 2020-02-11 新华三信息安全技术有限公司 一种报文处理方法及装置
CN111404833A (zh) * 2020-02-28 2020-07-10 华为技术有限公司 一种数据流类型识别模型更新方法及相关设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101695035B (zh) * 2009-10-21 2012-07-04 成都市华为赛门铁克科技有限公司 流量识别方法及装置
CN102315974B (zh) * 2011-10-17 2014-08-27 北京邮电大学 基于层次化特征分析的tcp、udp流量在线识别方法和装置
US9621431B1 (en) * 2014-12-23 2017-04-11 EMC IP Holding Company LLC Classification techniques to identify network entity types and determine network topologies
CN107545889B (zh) * 2016-06-23 2020-10-23 华为终端有限公司 适用于模式识别的模型的优化方法、装置及终端设备
CN107360032B (zh) * 2017-07-20 2020-12-01 中国南方电网有限责任公司 一种网络流识别方法及电子设备
CN109961080B (zh) * 2017-12-26 2022-09-23 腾讯科技(深圳)有限公司 终端识别方法及装置
CN109525587A (zh) * 2018-11-30 2019-03-26 新华三信息安全技术有限公司 一种数据包的识别方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110040706A1 (en) * 2009-08-11 2011-02-17 At&T Intellectual Property I, Lp Scalable traffic classifier and classifier training system
CN104468273A (zh) * 2014-12-12 2015-03-25 北京百度网讯科技有限公司 识别流量数据的应用类型的方法及系统
CN110233769A (zh) * 2018-03-06 2019-09-13 华为技术有限公司 一种流量检测方法和流量检测设备
CN108667747A (zh) * 2018-04-28 2018-10-16 深圳信息职业技术学院 网络流应用类型识别的方法、装置及计算机可读存储介质
CN110781950A (zh) * 2019-10-23 2020-02-11 新华三信息安全技术有限公司 一种报文处理方法及装置
CN111404833A (zh) * 2020-02-28 2020-07-10 华为技术有限公司 一种数据流类型识别模型更新方法及相关设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4087202A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666169A (zh) * 2022-05-24 2022-06-24 杭州安恒信息技术股份有限公司 一种扫描探测类型的识别方法、装置、设备及介质
CN114666169B (zh) * 2022-05-24 2022-08-12 杭州安恒信息技术股份有限公司 一种扫描探测类型的识别方法、装置、设备及介质
CN115988574A (zh) * 2023-03-15 2023-04-18 阿里巴巴(中国)有限公司 基于流表的数据处理方法、系统、设备和存储介质
CN115988574B (zh) * 2023-03-15 2023-08-04 阿里巴巴(中国)有限公司 基于流表的数据处理方法、系统、设备和存储介质

Also Published As

Publication number Publication date
US20220407809A1 (en) 2022-12-22
EP4087202A4 (en) 2023-07-05
EP4087202A1 (en) 2022-11-09
CN111404833B (zh) 2022-04-12
CN111404833A (zh) 2020-07-10

Similar Documents

Publication Publication Date Title
WO2021169308A1 (zh) 一种数据流类型识别模型更新方法及相关设备
WO2021052379A1 (zh) 一种数据流类型识别方法及相关设备
US11736364B2 (en) Cascade-based classification of network devices using multi-scale bags of network words
US20210281492A1 (en) Determining context and actions for machine learning-detected network issues
US11669751B2 (en) Prediction of network events via rule set representations of machine learning models
EP3349395B1 (en) Predicting a user experience metric for an online conference using network analytics
US11528231B2 (en) Active labeling of unknown devices in a network
US11200488B2 (en) Network endpoint profiling using a topical model and semantic analysis
US11153347B2 (en) Preserving privacy in exporting device classification rules from on-premise systems
US11451456B2 (en) Learning stable representations of devices for clustering-based device classification systems
US11018943B1 (en) Learning packet capture policies to enrich context for device classification systems
US11100364B2 (en) Active learning for interactive labeling of new device types based on limited feedback
US10999146B1 (en) Learning when to reuse existing rules in active labeling for device classification
US11956118B2 (en) Fault root cause identification method, apparatus, and device
US12034629B2 (en) Overlay network modification
US10454776B2 (en) Dynamic computer network classification using machine learning
US20230385708A1 (en) Reconciling computing infrastructure and data in federated learning
US20230254254A1 (en) PACKET FLOW IDENTIFICATION AND QoE-AWARE PROCESSING USING A LOCAL DEVICE AGENT
WO2024119513A1 (en) Device and method for agent for dynamically adapting explicit congestion notification configuration in network system
WO2024193304A1 (zh) 网络切片管理系统、方法、电子设备及存储介质
CN116938723A (zh) 一种切片带宽的规划方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20922372

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020922372

Country of ref document: EP

Effective date: 20220804

NENP Non-entry into the national phase

Ref country code: DE