CN112949739A

CN112949739A - Information transmission scheduling method and system based on intelligent traffic classification

Info

Publication number: CN112949739A
Application number: CN202110284637.8A
Authority: CN
Inventors: 王洪鹏; 刘湘德; 刘刚; 于翔; 张瑞; 黄旭岑; 林睿; 李馥丹; 罗俊; 薛滔; 余康
Original assignee: CETC 29 Research Institute
Current assignee: CETC 29 Research Institute
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2021-06-11

Abstract

The invention discloses an information transmission scheduling method and system based on intelligent traffic classification. The method includes the following steps: S1, preparing training data; S2, constructing a neural network; S3, training a neural network; S4, classifying traffic categories; S5, Carry out traffic transmission scheduling. The invention solves the problems existing in the prior art, such as partial load imbalance, inability to identify high-priority data types of the Internet of Things, insufficient network resource allocation guarantee, poor stability, and difficulty in meeting different service quality requirements of users.

Description

Information transmission scheduling method and system based on intelligent traffic classification

Technical Field

The invention relates to the technical field of artificial intelligence Internet of things, in particular to an information transmission scheduling method and system based on intelligent flow classification.

Background

According to different classification modes, network traffic classification methods can be classified into a classification method based on port matching, a classification method based on deep packet inspection, and a classification method based on machine learning.

The classification method based on port matching is the earliest network traffic classification method and is also the simplest and most convenient classification method, and mainly comprises the steps of extracting port numbers from acquired traffic data and mapping the port numbers to corresponding applications according to port number classification, so that a classification effect is achieved. The classification method is very efficient, low in time complexity and space complexity and easy to expand, but has obvious defects, because ports are excessively depended on for classification, as the application types are increased, more and more applications use random ports or nonstandard ports, and the difficulty of classification based on the ports is increased. Therefore, port-based network traffic classification methods have been gradually eliminated today and can only be used for assisted classification.

The Deep Packet Inspection (DPI) based classification method mainly detects the payload content of an application layer to identify a special character string matched with a known application or protocol, and is therefore classified into character string-based matching and regular expression matching. Compared with the traditional port matching method, the DPI technology does not need to match according to a specific port number any more, thereby avoiding the influence of a random port and a non-standard port and improving the accuracy of classification. However, the DPI classification method needs to detect the payload content of each traffic data packet, so there are also problems related to user privacy, and at the same time, it does not play a good role in encrypting traffic data, so that the classification method based on deep packet inspection cannot be implemented well in many network traffic classification problems.

Machine learning-based classification methods in recent years are popular classification methods, which mainly utilize traffic data characteristics and machine learning algorithms to establish the relationship between traffic characteristics and traffic data sample classes, and further identify different traffic classes. The current commonly used machine learning algorithms comprise naive Bayes, decision trees, K-nearest neighbors, support vector machines and the like, and the main processes of the methods are to establish a classification model and then classify new samples by using the classification model, wherein the decision tree algorithm has higher accuracy for network traffic classification.

Deep learning is the most vigorous branch of development of the current machine learning subject and is the technology with the widest application prospect in the whole artificial intelligence field. The convolutional neural network is used as a representative network structure in deep learning, and can well realize translation from feature data to label categories. Therefore, the deep learning is used for classifying the network traffic, which is a feasible and very efficient method.

The current sensor nodes and users are uniformly distributed, link bandwidth states are different, so that a local load unbalance phenomenon easily occurs in the Internet of things, the high-priority data type of the Internet of things cannot be identified, network resource distribution guarantee is not in place, and users with different Quality of Service (Quality of Service) requirements cannot be met.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an information transmission scheduling method and system based on intelligent traffic classification, and solves the problems that local load imbalance is easy to occur, the high-priority data type of the Internet of things cannot be identified, the network resource distribution guarantee is not in place, the stability is poor, different service quality requirements of users are difficult to meet and the like in the prior art.

The technical scheme adopted by the invention for solving the problems is as follows:

an information transmission scheduling method based on intelligent flow classification comprises the following steps:

s1, preparing training data: making a network flow data set into a flow characteristic training set as the input of a neural network;

s2, constructing a neural network: establishing short connections between all layers in the front and the back layer by using a classification network to construct a neural network;

s3, training a neural network: inputting the traffic characteristic training set in the step S1 as a neural network input, and inputting the traffic characteristic training set into the neural network for training to obtain a traffic classification model;

s4, dividing the traffic type: classifying the flow data to be processed according to the flow classification model;

and S5, carrying out traffic transmission scheduling: and classifying the classified flow data into specific priority levels according to a flow class-priority mapping table which is divided in advance, and then carrying out flow scheduling according to the priority levels.

The invention utilizes a deep learning algorithm to intelligently classify the collected common traffic categories, and realizes the transmission of different paths of services with different Quality of service (Quality of service) requirements in the network through a rule-based information scheduling algorithm on the basis. By utilizing deep learning characteristic learning and label mapping and combining a common load balancing scheduling algorithm, the information transmission scheduling method based on intelligent traffic classification is provided. Specifically, first, a common traffic data set for training is prepared; then, a neural network training data set is constructed, and network flow characteristics are deeply learned and classified; and finally, according to the classification result of the neural network on the flow data, using a rule-based scheduling algorithm to transmit and schedule the flow.

The invention intelligently classifies the flow information in the network and carries out priority scheduling based on classification. The classification of the present invention is generated in the following manner: preparing an initial flow data set, and performing preprocessing operations such as redundancy removal, normalization and the like on the initial flow data set to generate a candidate flow data set; and after the candidate flow data set is subjected to network characteristic dimension reduction to generate a flow sample data set, the deep learning model is trained by using the flow sample data set to generate a classification model. And applying the generated classification model to an actual physical network environment, collecting network traffic, extracting data traffic characteristic data, and generating a classification result of the traffic data after the data traffic characteristic data is processed by the classification model.

The invention ensures that the overall accuracy of network traffic classification is high. In addition, the traffic classification network of the method is trained by various different traffic data, and the practicability and the stability are more excellent compared with those of the traditional method.

The Internet of things traffic data is identified, classified and transmitted and scheduled according to the data type priority, and network resource allocation is guaranteed, so that the defect that local load unbalance is easy to occur in the prior art is overcome, the stability is good, and different service quality requirements of users can be met conveniently.

As a preferred technical solution, the step S1 includes the following steps:

s11, preprocessing: extracting a source IP address, a destination IP address, a transport layer protocol, a source port and a destination port corresponding to each piece of data in the initial flow data set to obtain a candidate flow data set;

s12, feature dimension reduction: and performing feature dimensionality reduction on the candidate flow data set, removing redundant features to obtain a flow sample data set, and processing the flow sample data set to be used as a flow feature training set to be output.

The flow data set is preprocessed, the redundant flow data with low value are removed through characteristic dimension reduction, the effectiveness of the data flow used as the flow characteristic training set is guaranteed, and the processing efficiency, the network flow classification and other processing accuracy are greatly improved.

As a preferred technical solution, the step S12 includes the following steps:

s121, byte truncation and compression: and carrying out byte interception on the flow sample data set, compressing the intercepted bytes to a uniform format, and then outputting the compressed bytes serving as a flow characteristic training set.

The method is convenient for improving the format identity and the sample effectiveness of the neural network data, so that the training and identifying effect of the neural network is more efficient and accurate, and the intelligent level is improved.

As a preferred technical solution, in step S1, a Moore network traffic data set is used as the network traffic data set.

The Moore network flow data set is a classic data set and is widely applied to the field of network flow identification and classification, so that the Moore network flow data set has the advantages of large data range, wide application range and strong universality.

As a preferred technical solution, in step S1, the Moore data set is divided into a training set and a test set, and label files corresponding to the training set and the test set are generated.

This facilitates the ability to improve neural network training, making the training effect better.

As a preferred technical solution, in step S2, DenseNet is used as the classification network.

The DenseNet establishes the dense connection between all the layers in the front and the layers behind, and realizes the feature reuse on the channel dimension, so that the DenseNet strengthens the feature reuse, promotes the feature propagation, relieves the influence of gradient disappearance, and reduces the overall parameter operation amount of the network.

As a preferred technical solution, in step S3, a cross entropy loss function is used as a loss function of the neural network.

The cross entropy loss function enables the loss in the network training process to be reduced, the network is prevented from being slow in the later training period, the training speed is improved, and the stability is good.

As a preferred technical solution, the step S4 includes the following steps:

s41, carrying out flow detection at the set observation point to obtain corresponding network flow; extracting and reducing dimensions by using data preprocessing to obtain flow characteristics suitable for neural network input; the flow characteristics are input to the flow classification model trained in step S3, and the flow classifications are classified.

This step specifically completes the functions of traffic acquisition, feature extraction, and traffic classification.

As a preferred technical solution, the step S5 includes the following steps:

s51, strategy matching and flow forwarding control are carried out on the classified flows;

s52, performing two-stage scheduling on the flow data, wherein the first stage uses weighted scheduling to schedule the flow data packets with different priorities according to the priority sequence; and the second stage uses round-robin scheduling to carry out round-robin scheduling on the flow data of the same level in a first-come first-obtained mode.

The steps can well realize the control and the scheduling of different flows, and ensure the load balance and the self-adaptive transmission of the data flow in the network as much as possible.

A system of an information transmission scheduling method based on intelligent traffic classification comprises the following modules:

and the flow classification module is used for performing flow acquisition, feature extraction and flow classification on the flow data to be processed and performing information interaction with the network flow control module.

A network flow control module: the system is used for carrying out strategy matching and flow forwarding control on the classified flow and carrying out information interaction with the flow classification module and the data collection and distribution module;

the data collection and distribution module: the first stage uses weighted scheduling to schedule the flow data packets with different priorities according to the priority sequence; the second stage uses round-robin scheduling to carry out round-robin scheduling on the flow data of the same level in turn according to a first-come-first-obtained mode; and performs information interaction with the network flow control module.

The invention ensures that the overall accuracy of network traffic classification is high. In addition, the traffic classification network of the invention is trained by various different traffic data, and the practicability and stability are more excellent compared with the traditional method.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention utilizes a deep learning algorithm to intelligently classify the collected common traffic categories, and realizes that services with different service quality requirements in the network are transmitted in different paths through a rule-based information scheduling algorithm on the basis; the Internet of things traffic data is identified, classified and transmitted and scheduled according to the data type priority, the classification accuracy is high, and the network resource allocation is guaranteed, so that the defect that local load unbalance is easy to occur in the prior art is overcome, the stability is good, and different service quality requirements of users can be met conveniently;

(2) the method has the advantages that the flow data set is preprocessed, redundant flow data with low value are removed through characteristic dimension reduction, the effectiveness of data flow used as a flow characteristic training set is guaranteed, and the processing efficiency, the network flow classification and other processing accuracy are greatly improved;

(3) the method comprises the steps of byte interception and compression, which is convenient for improving the format identity and the sample effectiveness of the neural network data, so that the training and identifying effect of the neural network is more efficient and accurate, and the intelligent level is improved;

(4) the Moore network traffic data set is used as the network traffic data set, so that the data range is large, the application range is wide, and the universality is high;

(5) the Moore data set is divided into a training set and a testing set, and label files corresponding to the training set and the testing set are generated, so that the training capability of the neural network is improved, and the training effect is better;

(6) the DenseNet is used as a classification network, so that the feature reuse is enhanced, the feature propagation is promoted, the influence of gradient disappearance is relieved, and the parameter operation amount of the whole network is reduced;

(7) the cross entropy loss function is used as the loss function of the neural network, so that the loss in the network training process is reduced, the network is prevented from being slow in the later training period, the training speed is improved, and the stability is good;

(8) the method and the device particularly complete the functions of flow acquisition, feature extraction and flow classification, can well realize control and scheduling of different flows, and ensure load balance and adaptive transmission of data flows in the network as much as possible.

Drawings

FIG. 1 is a block diagram of the system of the present invention;

FIG. 2 is a schematic diagram of the overall structure of the neural network according to the present invention;

FIG. 3 is a schematic diagram of the classification results of the flow data generated after processing by the classification model according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited to these examples.

Example 1

As shown in fig. 1 to 3, an information transmission scheduling method based on intelligent traffic classification includes the following steps:

Experiments prove that the overall accuracy of network flow classification is up to more than 97%. In addition, the traffic classification network of the method is trained by various different traffic data, and the practicability and the stability are more excellent compared with those of the traditional method.

As a preferred technical solution, the step S1 includes the following steps:

As a preferred technical solution, the step S12 includes the following steps:

As a preferred technical solution, the step S4 includes the following steps:

As a preferred technical solution, the step S5 includes the following steps:

Example 2

As shown in fig. 1 to fig. 3, on the basis of the embodiment, the embodiment provides a system of an information transmission scheduling method based on intelligent traffic classification, which includes the following modules:

Example 3

As shown in fig. 1 to 3, as further optimization of the

embodiments

1 and 2, the present embodiment includes all technical features of the

embodiments

1 and 2, and in addition, the present embodiment is further refined and supplemented.

the data set used by the invention is a classical Moore network traffic data set, is traffic data of Cambridge university collected by Moore and the like, and is used for the field of network traffic identification and classification. The data for each flow in the data set has 12 feature vectors and 1 manually labeled class label. According to the internet communication standard file, a network flow is defined as a set of all packets having the same five-tuple (source IP address, destination IP address, transport layer protocol, source port and destination port) passing through an observation point in the network within a specific time interval. In order to use the traffic data for a deep neural network, the traffic data set needs to be preprocessed first. By using an Xplico tool for preprocessing, quintuple corresponding to each piece of data in the Moore data set is extracted, and as a plurality of features have no value on flow classification and belong to redundant features, feature dimension reduction processing is also needed to be used for removing the redundant features. And finally, intercepting the first 784 bytes, and then compressing the bytes into an MNIST data format for unifying the format, so that the bytes can be conveniently used as the input of a neural network. When the training data set is manufactured, the Moore data set is divided into a training set and a testing set, and label files corresponding to the Moore data set are generated.

the present invention uses DenseNet as a classification network. DenseNet improves on classical Resnet and is a convolutional neural network with dense connections, and the basic idea is to establish skip connection between the front layer and the back layer, i.e. short connections. But the difference lies in that the Densenet establishes the dense connection between all the layers in the front and the layers behind, and realizes the feature reuse on the channel dimension, and through the way, the Densenet strengthens the feature reuse, promotes the feature propagation, relieves the influence of gradient disappearance, and reduces the parameter operation amount of the whole network.

The DenseNet mainly comprises 3 dense blocks, firstly, 1 convolution layer is subjected to feature extraction, then the extracted features are sent into the dense blocks, a transition layer is arranged behind the first two dense blocks, and the DenseNet comprises 1 convolution layer of 1 × 1 and average pooling, wherein the convolution layer of 1 × 1 is used for reducing feature dimensionality and improving calculation efficiency, and the average pooling can achieve the effects of reducing dimensionality and extracting features. And a classification module is arranged behind the last dense block, global average pooling, linear activation and a full connection layer are used, and classification scores obtained by a softmax function are output, so that a classification result is obtained. The overall structure is shown in fig. 2.

the traffic characteristic training set divided in step S1 is input to the neural network DenseNet as a network input, and is trained. The used loss function is standard cross entropy loss, the network is optimized by using an Adam optimizer, the purpose is to reduce the loss in the network training process, network weight parameters are updated and modified through back propagation, iterative training is carried out until the classification accuracy reaches the satisfaction degree, the training is stopped, a network model is stored and used as a final classification model, and the cross entropy loss function used by the network is as follows:

wherein M is the number of traffic classes; c is a counting variable representing the flow of the types, and the value range of c is 1 to M; y is_cIs an indicator variable (0 or 1), is 1 if the class is the same as the class of the sample, otherwise is 0; p is a radical of_cIs the probability that the prediction class belongs to class c, p_cThe value range is 0-1.

According to the Internet communication standard file, the flow data messages are prioritized according to the first 3 bits of the service type field, and are CS7, CS6, EF, AF4, AF3, AF2, AF1 and BE from top to bottom. The CS level is used by a protocol in the network, the EF type traffic is urgent Forwarding (required Forwarding), the AF is Assured Forwarding (required Forwarding), and the BE type traffic is Best Effort Forwarding (Best Effort), so the EF is the highest priority and the BE is the lowest priority.

After the traffic class is obtained, the classified traffic data is classified into a specific priority level according to a traffic class-priority mapping table which is divided in advance, and then traffic scheduling is performed according to the priority level, so that the ordered stability of network traffic is ensured. Based on the idea of defining a network by software, corresponding modules are respectively deployed at each level of the whole framework to realize the interaction of classification, transmission and scheduling. Specifically, a traffic classification module is deployed on the equipment of an application plane to detect, extract and classify traffic; deploying a network flow control module at the control layer, performing strategy matching according to the flow type, and performing flow control and forwarding based on the strategy; and finally, deploying a data collection and distribution module in a data layer to complete control and forwarding of network data traffic.

More specifically, the following steps are employed:

a: and the flow classification module of the application layer mainly performs flow acquisition, feature extraction and flow classification. And carrying out flow detection at the set observation point to obtain corresponding network flow. And then, extracting and reducing dimensions by using data preprocessing to obtain the flow characteristics suitable for neural network input. And finally, inputting the flow characteristics into the classification model trained in the step 3 to obtain the flow category.

b: and the network flow control module of the control layer is responsible for carrying out strategy matching and flow forwarding control on the classified flows. Specifically, the control module acquires the state of the network device in advance according to an Openflow protocol, and generates a data flow table and a forwarding policy. After the application plane finishes flow classification, the control module acquires flow types and flow data and finishes matching with a flow table and a forwarding strategy, the flow table comprises flow header information (such as a source address and a destination address) and priority (forwarding or discarding), and after matching, next-step destination equipment and operation instructions are obtained, and then corresponding forwarding operation is carried out.

c: and the data collection and distribution module of the data layer performs final flow scheduling. The data collection and distribution module is divided into a data collection module and a data distribution module. The data collection module acquires data and a flow table instruction matched after classification and identification from the control layer, and correspondingly processes a flow data packet, including whether to discard the flow data packet, divides a scheduling queue to which the flow belongs according to priority, calls what port to forward the flow data packet, and the like; the data distribution module is responsible for specific forwarding operation, and different scheduling algorithms are utilized to meet load balance in different application scenarios. Specifically, two-stage scheduling is performed on the traffic data, the first stage uses weighted scheduling, and the traffic data packets with different priorities are strictly scheduled according to the priority sequence. If 3 flows A, B, C are respectively in EF, AF and BE priority, scheduling according to the priority order of A, B, C; and the second stage uses round-robin scheduling, and carries out round-robin scheduling in turn on the flow data of the same level according to a first-come first-obtained mode. Through the two-stage scheduling strategy, the control and scheduling of different flows can be well realized, and the load balance and the self-adaptive transmission of the data flow in the network are ensured as much as possible.

The traffic classification network proposed by the present invention was evaluated on a Moore dataset. Experiments prove that the method enables the overall accuracy of network flow classification to be as high as more than 97%. In addition, the traffic classification network of the method is trained by various different traffic data, and the practicability and the stability are more excellent compared with those of the traditional method.

Example 4

As shown in fig. 1 to 3, based on the technical solutions of example 1, example 2, and example 3, this example provides a specific implementation to verify the technical effects of the technical solutions of the present invention.

In order to verify the feasibility of the method, various types of data with multidimensional characteristics on the network are captured by Wireshark software, including various data such as chat data, emails, files, P2P data, streaming media data, videos and the like of instant messaging software. And training and classifying the Desnet neural network through actual data captured in the network, and identifying deep macroscopic features of the data stream. The data type identification recall rate, the prediction accuracy of the unidirectional data and the overall accuracy are counted, as shown in table 1, the method keeps higher level of the identification accuracy of each type of data traffic in the network, the overall accuracy is more than 97%, and a good foundation is laid for subsequent transmission scheduling.

TABLE 1 network traffic classification identification accuracy

Meanwhile, the method is tested in a certain encryption network environment, and the result shows that the method is also applicable to the encrypted traffic data set.

According to an accurate identification result, matching forwarding is carried out through a preset distribution strategy, the utilization rate of the network bandwidth is improved by about 20%, the network bearing capacity is better improved, and then the transmission requirements of different users are met.

As described above, the present invention can be preferably realized.

The foregoing is only a preferred embodiment of the present invention, and the present invention is not limited thereto in any way, and any simple modification, equivalent replacement and improvement made to the above embodiment within the spirit and principle of the present invention still fall within the protection scope of the present invention.

Claims

1. An information transmission scheduling method based on intelligent flow classification is characterized by comprising the following steps:

2. The information transmission scheduling method based on intelligent traffic classification as claimed in claim 1, wherein the step S1 includes the following steps:

3. The information transmission scheduling method based on intelligent traffic classification as claimed in claim 2, wherein the step S12 includes the following steps:

4. The information transmission scheduling method based on intelligent traffic classification as claimed in claim 1, wherein in step S1, a Moore network traffic data set is used as the network traffic data set.

5. The information transmission scheduling method based on intelligent traffic classification as claimed in claim 4, wherein in step S1, the Moore data set is divided into a training set and a test set, and label files corresponding to the training set and the test set are generated.

6. The information transmission scheduling method based on intelligent traffic classification as claimed in claim 1, wherein in step S2, DenseNet is used as the classification network.

7. The information transmission scheduling method based on intelligent traffic classification as claimed in claim 6, wherein in step S3, a cross entropy loss function is used as the loss function of the neural network.

8. The information transmission scheduling method based on intelligent traffic classification as claimed in claim 1, wherein the step S4 includes the following steps:

9. The information transmission scheduling method based on intelligent traffic classification as claimed in claim 8, wherein the step S5 includes the following steps:

10. The system of the information transmission scheduling method based on intelligent traffic classification as claimed in any one of claims 1 to 9, characterized by comprising the following modules: