CN105959175B

CN105959175B - Net flow assorted method based on the GPU kNN algorithm accelerated

Info

Publication number: CN105959175B
Application number: CN201610258008.7A
Authority: CN
Inventors: 陈仁杰; 张玉; 张建忠; 徐敬东
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2016-04-21
Filing date: 2016-04-21
Publication date: 2019-10-22
Anticipated expiration: 2036-04-21
Also published as: CN105959175A

Abstract

The invention proposes the net flow assorted methods of the kNN algorithm accelerated based on GPU.This method is significantly promoting classification performance on to kNN algorithm similarity calculation and sequence using GPU acceleration, while having chosen one group and efficiently flow feature and establishing traffic classifier.The invention also provides a kind of network flow acquisition methods based on process, and in this, as the basic data collection of experimentation to guarantee experimental data validity.It is of the invention the experimental results showed that GPU peak computational speed has 187 times of promotions relative to CPU, and accurate rate of classifying can reach 80% or more substantially, have been reached by 95% or more accuracy rate, has sufficiently proved effectiveness of the invention by certain existing application such as FTP, WEB etc..

Description

Net flow assorted method based on the GPU kNN algorithm accelerated

Technical field

The present invention relates to the fields such as high-performance calculation, network flow classification and network security, and in particular to is used for network flow Amount monitoring, classification and network attack detection.

Background technique

With the fast development of Internet application, more and more internet emerging services are applied and are given birth to.At the same time, net Network operators wish that existing IP network can be not required to change and provide for the carrying of multiple business set to reduce network foundation The cost and operation cost of standby construction.So IP network is in addition to that can carry common Internet business, video, voice etc. Multimedia service, it is also necessary to carry the emerging services such as 3G, NGN (next generation network).Different business is to network The function and performance of offer are not quite similar, such as video conference and traditional internet service are in data throughout and network delay There is very big difference in equal factors dictates.This requires network managers to network flow exhaustive division and to provide corresponding QOS (quality of service, service quality) guarantees.

Important technology of improving service quality is exactly traffic classification, it is for QOS, IDS, charge on traffic, firewall The fields such as technology, traffic classification all play important role, have become one of enhancing network controllability key technology.This Outside, traffic classification can also help the distribution and rule of network management personnel's awareness network flow, and network designer is helped to improve net The planning of network.In recent years, with the continuous expansion of internet scale, traffic classification technology is also in the industry cycle with academia by pass Note, and gradually formed an independent research field.

Due to the continuous promotion of Internet link-speeds, existing traffic classification method gradually becomes difficult to deal with. Since issuing first item CUDA platform from NVIDIA company in 2007, CUDA technology is continued to develop, and driving is constantly updated, and is supported more More characteristics constantly simplify the difficulty of graphic programming, are a big technological innovations on the basis of OPENGL.In machine learning classical way In, kNN is a kind of high-efficient, good classification effect a method.Although kNN algorithm does not need to spend the modeling time, Computation complexity is dependent on k nearest-neighbors are found in a large amount of d dimensional vectors, so when kNN algorithm needs to spend a large amount of classification Between.In real-time network traffic classification, the requirement of classification speed is very high.The classification speed of CPU has been unable to satisfy reality at present When the requirement classified.

Summary of the invention

It can not be coped with present invention aims at the performance for solving current net flow assorted method and constantly accelerate link-speeds The problem of, a kind of kNN algorithm based on CUDA is provided in the application method in traffic classification field.It is different from other research approaches it It is in the method for the present invention is suitble to the kNN algorithm run under GPU environment, and on similarity calculation and sequence step all Optimized using CUDA, the results show GPU peak computational speed is 187 times of CPU calculating speed, significant increase flow point Speed, handling capacity and the performance of class.

The present invention achieves higher traffic classification recall rate and accurate rate by establishing efficient classifier.Such as FTP, WEB etc. have reached 95% or more accuracy rate.The present invention devises the flow acquisition methods based on process, and with this As experimental data set, the confidence level of experiment is improved.

Technical solution of the present invention:

A kind of net flow assorted method of the kNN algorithm accelerated based on GPU, this method calculate energy using powerful GPU Power and bandwidth significantly improve the speed, handling capacity and performance of traffic classification after accelerating to kNN algorithm.Meanwhile using being based on The flow acquisition methods of process guarantee that data set is pure effectively, establish efficient net flow assorted device and reach degree of precision rate. This method specifically includes the following steps:

Step 1, acquisition include the mixed traffic data set of various applications；The data set of these flows is disclosed on internet Data on flows collection, or write using oneself program capture data on flows collection；Since the flow of acquisition may be too big Or comprising noise flow, so we will be filtered processing to flow to obtain data set within 1G and pure；So Afterwards again it is the form of the network flow with identical five-tuple by Segmentation of Data Set, randomly selects 90% network flow as training set, Remaining 10% network flow detects test effect as test set；

Step 2 generates optimal characteristics set using feature selecting algorithm；Step 1 by Segmentation of Data Set be network flow it Afterwards, the given characteristic value result of each network flow is calculated；In order to avoid characteristic value magnitude is different and causes similarity calculation inclined The situation of difference, it is necessary to by characteristic value standardization within the same section；

Described characteristic value is standardized refers within the same section, and characteristic value planningization is existed according to formula (1) In (- 1,1):

Wherein, M_iRepresent i-th of vector in feature vector, avg (M_i) represent the average value of i-th of vector characteristics, max (M_i) represent the maximum value of i-th of vector, min (M_i) represent the minimum value of i-th of vector.

Step 3 establishes kNN algorithm traffic classifier using training dataset；Each test network is calculated using CUDA Flow to the similarity of all training datasets；It is arranged using the similarity that CUDA flows to all training datasets to each test Sequence selectes the highest preceding k neighbours of similarity；The highest type of proportion is selected using voting mechanism in this k neighbour, i.e., For final result；

The similarity for flowing to all training datasets to each test network using CUDA is calculated and is sorted Method be:

If the record quantity of training stream is m, m≤10⁴, the quantity for testing stream is n, n=m/9, the following equal table of parameter m and n Show such meaning；Enable A={ a₁,a₂,...,a_mIt is m training stream record；Enable B={ b₁,b₂,...,b_nIt is n test stream note Record；Meanwhile each stream record can use feature vector u in set A_i={ a_i1,a_i2,...,a_id}^TIt indicates, -1 < a_i1, a_i2,...,a_id< 1；Each stream record can use feature vector u in set B_j={ b_j1,b_j2,...,b_jd}^TIt indicates, -1 < b_j1,b_j2,...,b_jd< 1, vector u_iAnd u_jElement after standardization value range in (- 1,1)；Accelerate similarity to realize Calculating process, the present invention in propose a kind of CUDA thread algorithm of load balance, as shown in formula (2), (3):

From=(m × n)/(k_b×k_t)×T_id (2)

To=(m × n)/(k_b×k_t)×(T_id+1) (3)

Wherein (m × n) represents general assignment number, k_bIndicate thread block number in total, k in kernel_tIt indicates in per thread block Total Thread Count, so (k_b×k_t) indicate Thread Count total in kernel, T_idIt is the id of thread, for identifying a thread； It is clear that From calculated result represents the calculative initial position of per thread, and To calculated result then represents each line The calculative final position of journey；

Although having calculated that most similarity between test set and training set, there are also (m × n) mod (k_b× k_t) similarity of quantity do not calculated, so per thread needs to calculate a similarity, calculation formula such as formula (4) institute again Show:

(m × n)-(m × n) % (k_b×k_t)+T_id (4)

Wherein, (m × n)-(m × n) % (k_b×k_t) indicate to add T by calculating similarity quantity_idTo guarantee residue Similarity all calculated；

Accelerate sort algorithm to realize in CUDA environment, have chosen the sort algorithm of sorting network, for there is l member Array { the d of element₀,d₁,...,d_l-1, it needs to select k least member in array, k≤l enables d_l/2For the separation of array, So shown in the larger subsequence such as formula (5) that comparator generates:

s₁={ max { d₀,d_0+l/2},max{d₁,d_1+l/2},...max{d_l/2-1,d_l/2-1+l/2}} (5)

Shown in the smaller subsequence such as formula (6) that comparator generates:

s₂={ min { d₀,d_0+l/2},min{d₁,d_1+l/2},...min{d_l/2-1,d_l/2-1+l/2}} (6)

For the smallest number certainly in second subsequence, this sequence is sharedA number, and exist every time s₁Or s₂All be using comparator it is independent, CUDA thread can be called while being carried out, thus useSecondary comparator can It is parallel, then in s₂This process is repeated, until s₂An element is only remained, this element is exactly current sequence least member, After once-through operation, sequence just becomes { c₀,c₁,...,c_l-1, c_l-1For first the smallest element in sequence, then by sequence {c₀,c₁,...,c_l-2Regard initial array as, the process of above-mentioned screening least member is repeated, so k are most after k operation Small element is just selected；

Finally, in task distribution, using per thread block and thread loops processing feature value matrix, and by calculating task Equalization distributes to per thread block and thread to reach load balancing, and the task distribution of per thread block and per thread uses public Formula (2), (3), (4) are calculated.

It is described to calculate each network test using CUDA and flow to the similarities of all training datasets using Euclidean distance i.e. Formula (7) calculates:

Wherein, x_iRepresent the ith feature value of first feature vector, y_iRepresent the ith feature of second feature vector Value, M represent the dimension of feature vector.

Step 4, step 3 traffic classifier establish after the completion of, can under GPU environment using existing disaggregated model into Row traffic classification, and analyze classification results and performance.

The advantages of the present invention:

The invention proposes the net flow assorted methods of the kNN algorithm accelerated based on GPU.This method is to kNN algorithm Classification performance is significantly promoted using GPU acceleration in similarity calculation and sequence, while being had chosen one group and efficiently being flowed spy Sign establishes traffic classifier.

The invention also provides a kind of network flow acquisition methods based on process, and in this, as the basis of experimentation Data set is to guarantee experimental data validity.

It is of the invention the experimental results showed that GPU peak computational speed has 187 times of promotions, and essence of classifying relative to CPU True rate can reach 80% or more substantially, reached 95% or more accuracy rate to certain existing application such as FTP, WEB etc., sufficiently demonstrate,proved Bright effectiveness of the invention.

Detailed description of the invention

Fig. 1 is Ground Truth architecture diagram.

Fig. 2 is the matrix expression of training set and test set.

Fig. 3 is the comparator in sorting network.

Fig. 4 is the example (l=7) of sorting network.

Fig. 5 is algorithm performance histogram.

Fig. 6 is algorithm performance curve graph.

Specific embodiment

Step 1 passes through mixed traffic data set of the traffic capture method acquisition comprising various applications based on process.

The invention proposes a kind of kNN net flow assorted methods accelerated based on GPU, are caught by the flow based on process Method is obtained to obtain and test data used, Fig. 1 shows the configuration diagram of Ground Truth.Wherein each host needs One Daemon client is installed, this client Windows API of calling per second obtains host log information, mainly Including information such as Socket timestamp, port numbers, the process names generated.Then according to the setting of heart-beat protocol, by these logs Information is stored into database.

For TCP, on the basis of the timestamp of packet capture, front and back offset finds five yuan in 100 seconds in log Group (IP address pair, port numbers pair, protocol type) all with the consistent log recording of data packet.If finding this record, just will This packet marking process thus.Because TCP has three-way handshake process when establishing connection, the first packet of capture must be SYN packet.We only need to capture this SYN, then mark its process number, then herein connect validity period in, behind and it There is the data packet of common five-tuple or opposite five-tuple to belong to this process number, by the PCAP of this data packet write-in to the end In file, if can not find this record, this data packet is marked as invalid and is rejected.

For UDP, Windows obtains the API of UDP information, can not obtain destination slogan and purpose IP address, together When, there is no three-way handshake process by UDP to establish connection.It therefore, can only be on the basis of the timestamp of packet capture, in log Middle front and back deviate 10 seconds searching source IP address and source port number all with the consistent log recording of data packet.If finding this note Record just by this packet marking process thus, and this data packet is written in PCAP file to the end, if can not find this Item record, this data packet are marked as invalid and are rejected.

The data set obtained using the GT framework based on process is as shown in table 1.Smtp, Pop3 represent answering for email type With Ftp represents the application of file transmission class, and QQ represents the application of instant messaging class, and what BitTorrent and Thunder were represented is The application of P2P class, what YouKu was represented is the application of video class, and what Web was represented is the application such as Taobao, Sina of browser class Etc..Number in table represents the quantity of stream, such as has used 900 Smtp training streams, 100 test streams in an experiment.I Used it is trained stream and test stream be 9:1 ratio, that is to say, that the 90% of data set be used as training set, 10% as test Collection.Also, every other application of type has only selected one group of data to test, and does not take the method for cross validation.Reason exists It is also random selection multi-group data in cross validation, there is no essential distinctions with one group of data experiment is randomly selected for this.

Data set used in the experiment of table 1

Step 2 generates optimal characteristics collection using feature selecting algorithm, calculates characteristic value and planningization characteristic value in same area In, the feature set of selection is as shown in table 2.

Firstly, agreement refers to that transport layer protocol is TCP or UDP.The present invention only research TCP or UDP link at present Data packet.Second, the size of packet load and data packet in network flow, the data packet number and data of different type link Packet size is different.Such as the data packet of Ftp connection will be bigger than other kinds of data packet, because it needs to transmit text Part needs higher link utilization.Third, the minimum of data packet length, maximum, average value.Obviously, different types of to answer It is made a big difference on data packet length with it.Choosing minimum, maximum, average value more can reflect comprehensively it in data packet length On difference.4th, minimum, maximum, average value and the variance of Inter-arrival Time.The influence of the factors such as network delay is not considered, Wrapping arrival time is an important feature.It is envisioned that the packet arrival time of instant messaging or the application of real-time video class It is shorter, because they will have better real-time.

Due to network flow be it is two-way, the two contrary stream feature vectors are integrated into a feature vector. The two-way flow that represents is sent mutually between client and server end.

Feature vector used in the experiment of table 2

The present invention standardizes to characteristic value using formula (1).The denominator of this formula is characterized value maximum value and minimum The difference of value, expression be this characteristic value value maximum difference range, its molecule is characterized value average value and this feature The difference of value.Obviously, molecule is less than the value of denominator forever, and also just each characteristic value is standardized in proportion in (- 1,1).It is logical Experiment is crossed it can be proved that the experimental result after standardization is than improving 10% or so before standardizing, the importance of standardization is not It says and explains.

Step 3 establishes kNN algorithm traffic classifier using training dataset, using CUDA to each test network stream Similarity to all training datasets is calculated and is sorted.

If the record quantity of training stream is m, m≤10⁴, the quantity for testing stream is n, n=m/9, the following equal table of parameter m and n Show such meaning；Enable A={ a₁,a₂,...,a_mIt is m training stream record；Enable B={ b₁,b₂,...,b_nIt is n test stream note Record；Meanwhile each stream record can use feature vector u in set A_i={ a_i1,a_i2,...,a_id}^TIt indicates, -1 < a_i1, a_i2,...,a_id< 1；Each stream record can use feature vector u in set B_j={ b_j1,b_j2,...,b_jd}^TIt indicates, -1 < b_j1,b_j2,...,b_jd< 1, vector u_iAnd u_jElement for value range in (- 1,1), matrix indicates such as attached drawing 2 after standardization It is shown.

Record u is flowed for each test_j={ b_j1,b_j2,...,b_jd}^T, we must calculate it and all training streams record Distance, so calculating time complexity at this time is O (nmd).This is a huge calculating time complexity.Meanwhile I Must to use height be m, the result that width is the Matrix C of n to store calculating.So the space complexity of this algorithm For O (nm).And it is recognised that i-th of vector distance result inside test set inside j-th of vector sum training set is deposited It stores up in (jn+i) in Matrix C.

Theoretically, mn similarity calculation task we can mn thread calculate.Each thread calculates One distance improves calculating speed to greatest extent.However, total Thread Count p that GPU can rise is by hardware limitation, and p is normal Often it is much smaller than task amount mn.Therefore, in order to reach the load balance of per thread, it is secondary that per thread needs compute repeatedly (nm/p) Distance.The task distribution that formula (2) (3) carry out similarity calculation can be used in we.

From=(m × n)/(k_b×k_t)×T_id (2)

To=(m × n)/(k_b×k_t)×(T_id+1) (3)

Although we have calculated that most distance between test set and training set, there are also (m × n) mod (k_b ×k_t) distance of quantity is not calculated, so per thread needs to calculate primary distance again, shown in calculation formula such as formula (4):

(m × n)-(m × n) % (k_b×k_t)+T_id (4)

Wherein, (m × n)-(m × n) % (k_b×k_t) indicate to add T by calculating similarity quantity_idTo guarantee residue Similarity all calculated.

Attached drawing 3 is the comparator of a sorting network.The input of this comparator is 2 data x, y, is handled by comparator Biggish data are on top afterwards, and lesser data are in bottom end.We can see the time required for the operation a comparator Office's time.

For there is the array { d of l element₀,d₁,...,d_l-1, it would be desirable to the k least member of selection in array, k≤ L enables d_l/2For the separation of array, so shown in the larger subsequence such as formula (5) that comparator generates:

s₁={ max { d₀,d_0+l/2},max{d₁,d_1+l/2},...max{d_l/2-1,d_l/2-1+l/2}} (5)

Shown in the smaller subsequence such as formula (6) that comparator generates:

s₂={ min { d₀,d_0+l/2},min{d₁,d_1+l/2},...min{d_l/2-1,d_l/2-1+l/2}} (6)

For the smallest number certainly in second subsequence, this sequence is sharedA number.And exist every time s₁Or s₂All be using comparator it is independent, CUDA thread can be called while being carried out.SoThe use of secondary comparator can With parallel.Then in s₂This process is repeated, until s₂An element is only remained, this element is exactly current sequence least member. After once-through operation, sequence just becomes { c₀,c₁,...,c_l-1, c_l-1For first the smallest element in sequence.Then by sequence Arrange { c₀,c₁,...,c_l-2Regard initial array as, the process of above-mentioned screening least member is repeated, so after k operation, k Least member is just selected.

For example, it would be desirable to 2 least members are selected in sequence { 4,2,3,1,0,6,5 }.Select first most The process of small element is as shown in Fig. 4.After first time sequence, s₂Become { 1,0,3,5 }, the s after second time sequence₂Become {1,0}.After 3rd time, in s₂In only surplus 0 this element.So 0 is first least member.At this moment, array becomes At { 4,2,6,3,5,1,0 }.We can continue process above until selecting second least member 1 in this series relay. In this process, we it should be noted that this time sequence use one time as a result, so all threads require Terminate preamble in this time sequence, guarantees not having the sequence that some thread continues next time before this time sequence terminates Process.We can use the synchronizing function of GPU thread block, is ranked up using a column of the thread block to Matrix C

It is limited though the hardware resources such as the arithmetic core of GPU, available memory space are powerful but also, so in GPU Total Thread Count p and the number b of thread block have a upper limit.In the ideal case, it would be desirable to use n thread block and each line A thread of journey block (m/2p) executes the algorithm above.But often we can not generate so many thread block and Thread Count, I Can use for reference distance calculate in method for allocating tasks, (n/b) a column in each thread block processing array C；It is each simultaneously Thread process (m/2p) secondary comparison.That is, making per thread block and thread loops processing array, and calculating task is impartial Per thread block and thread are distributed to reach load balancing.The task of per thread block and per thread using formula (2), (3), (4) are calculated.

The speed of service of present invention comparison CPU and GPU.Model Intel (R) Xeon (R) of CPU, E5-2620, frequency For 2.0GHZ, there are 8 physical cores, inside save as 32G；GPU uses NVIDIATesla, possesses 2880 CUDA cores, deposits Memory bandwidth reaches 288GB/sec, and peak value double-precision floating point performance reaches 1.87Tflops.Meanwhile CPU and GPU are operated in together On one server, operating system is Red Hat Enterprise Linux Server Release 6.3.

When analyzing the effect of flow point class, whether we measure experimental result using 2 indexs effective, respectively accurately Rate and recall rate.

Accurate rate (precision) is defined as:

Recall rate (recall rate) is also referred to as feedback rates, is defined as:

Speed-up ratio is defined as:

Under above-mentioned environment, experiment has chosen 6943 training stream records, and 777 test stream records, K, which chooses, is not more than 10 Odd number.Experimental result is as shown in table 3:

Table 3CPU and GPU experimental result

For the more intuitive relatively speed of GPU and CPU, experimental result is depicted as curve graph and histogram by us. As shown in attached drawing 5,6.It can be seen from the figure that the classification time required for GPU only accounts for the tiny segment of CPU classification time.GPU The speed of service of kNN algorithm is significantly improved using a large amount of core, the bandwidth of high speed and the powerful computing capability of calculating.

The different application program of experimental selection and quantity and different k values, have done several groups of experiments respectively.Recall rate With accurate rate experimental result as shown in table 4,5,6.

Table 4 flows classification results (3classes, k=3)

Table 5 flows classification results (5classes, k=5)

Table 6 flows classification results (8classes, k=5)

The flow point class accurate rate and recall rate that these three are applied from classification angle analysis WEB, FTP, BitTorrent reach 90% or more.If run a large amount of protocol suites inside QQ, Youku, characteristic value is easy to obscure with other application, results in point Class accuracy rate is not high.But overall classifying quality all reaches 80% or more, basically reaches experiment and is expected.

Claims

1. a kind of kNN algorithm net flow assorted method accelerated based on GPU, this method utilize the computation capability pair of GPU KNN algorithm is accelerated, using based on process flow acquisition methods guarantee data set it is pure effectively, this method specifically include with Lower step:

Step 1, acquisition include the mixed traffic data set of a variety of applications；The data set of these flows is disclosed stream on internet Data set is measured, or the data on flows collection for the program capture write using oneself；Due to acquisition flow may it is too big or Comprising noise flow, need to be filtered flow processing to obtain data set within 1G and pure；Then by data set It is divided into the form of the network flow with identical five-tuple, randomly selects 90% network flow as training set, remaining 10% network Stream detects test effect as test set；

Step 2 generates optimal characteristics set using feature selecting algorithm；It is meter after network flow by Segmentation of Data Set in step 1 Calculate the given characteristic value result of each network flow；In order to avoid characteristic value magnitude is different and leads to asking for similarity calculation deviation Topic needs to standardize characteristic value within the same section；

Step 3 establishes kNN algorithm traffic classifier using training dataset；Each test network is calculated using CUDA to flow to The similarity of all training datasets, specific method are calculated using Euclidean distance i.e. formula (7):

Wherein, x_iRepresent the ith feature value of first feature vector, y_iRepresent the ith feature value of second feature vector, M Represent the dimension of feature vector；The feature vector is made of the characteristic value of one group of network flow characteristic, network flow characteristic It include: transport layer protocol, the size of packet load and data packet, the minimum of data packet length, maximum, average value, packet arrival Minimum, maximum, average value and the variance at interval；

It is ranked up using the similarity that CUDA flows to all training datasets to each test, selectes the highest preceding k of similarity A neighbours；The highest type of proportion, as final result are selected using voting mechanism in this k neighbour；

The side that the similarity for flowing to all training datasets to each test network using CUDA is calculated and sorted Method is:

If the record quantity of training stream is m, m≤10⁴, the quantity for testing stream is n, n=m/9, and following parameter m and n indicates such Meaning；Enable A={ a₁,a₂,...,a_mIt is m training stream record；Enable B={ b₁,b₂,...,b_nIt is n test stream record；Together When, each stream record can use feature vector u in set A_i={ a_i1,a_i2,...,a_id}^TIt indicates, -1 < a_i1,a_i2,...,a_id < 1, wherein d=4；Each stream record can use feature vector u in set B_j={ b_j1,b_j2,...,b_jd}^TIt indicates, -1 < b_j1,b_j2,...,b_jd< 1, wherein d=4, vector u_iAnd u_jElement value range is in (- 1,1) after standardization, to realize Accelerate similarity calculation process, proposes a kind of CUDA thread algorithm of load balance, as shown in formula (2), (3):

From=(m × n)/(k_b×k_t)×T_id (2)

To=(m × n)/(k_b×k_t)×(T_id+1) (3)

Wherein (m × n) represents general assignment number, k_bIndicate thread block number in total, k in kernel_tIndicate total in per thread block Thread Count, so (k_b×k_t) indicate Thread Count total in kernel, T_idIt is the id of thread, for identifying a thread；It is aobvious and It is clear to, From calculated result represents the calculative initial position of per thread, and To calculated result then represents per thread and needs The final position to be calculated；

Although having calculated that most similarity between test set and training set, there are also (m × n) mod (k_b×k_t) number The similarity of amount is not calculated, so per thread needs to calculate a similarity again, shown in calculation formula such as formula (4):

(m × n)-(m × n) % (k_b×k_t)+T_id (4)

Wherein, (m × n)-(m × n) % (k_b×k_t) indicate to add T by calculating similarity quantity_idTo guarantee remaining phase It is all calculated like degree；

Accelerate sort algorithm to realize in CUDA environment, the sort algorithm of sorting network is had chosen, for there is l element Array { d₀,d₁,...,d_l-1, it needs to select k least member in array, k≤l enables d_l/2For the separation of array, so Shown in the larger subsequence such as formula (5) that comparator generates:

s₁={ max { d₀,d_0+l/2},max{d₁,d_1+l/2},...max{d_l/2-1,d_l/2-1+l/2}} (5)

Shown in the smaller subsequence such as formula (6) that comparator generates:

s₂={ min { d₀,d_0+l/2},min{d₁,d_1+l/2},...min{d_l/2-1,d_l/2-1+l/2}} (6)

For the smallest number certainly in second subsequence, this sequence is sharedA number, and every time in s₁Or s₂All be using comparator it is independent, CUDA thread can be called while being carried out, thus useSecondary comparator can be by simultaneously Row, then in s₂This process is repeated, until s₂An element is only remained, this element is exactly current sequence least member, primary After operation, sequence just becomes { c₀,c₁,...,c_l-1, c_l-1For first the smallest element in sequence, then by sequence { c₀, c₁,...,c_l-2Regard initial array as, the process of above-mentioned screening least member is repeated, so k is minimum after k operation Element is just selected；

Finally, in task distribution, using per thread block and thread loops processing feature value matrix, and calculating task is impartial Per thread block and thread are distributed to reach load balancing, the task distribution of per thread block and per thread uses formula (2), (3), (4) are calculated；

Step 4, step 3 traffic classifier establish after the completion of, can be flowed under GPU environment using existing disaggregated model Amount classification, and analyze classification results and performance.

2. according to the method described in claim 1, it is characterized in that use feature selecting algorithm described in step 2 generates optimal spy The method that collection is closed is to choose transport layer protocol, the size of packet load and data packet, the minimum of data packet length, maximum, Average value, minimum, maximum, average value and the variance of Inter-arrival Time are mostly feature set and construction feature vector；

Described characteristic value is standardized refer within the same section, according to formula (1) by characteristic value planningization (- 1, 1) in；

Wherein, M_iRepresent i-th of vector in feature vector, avg (M_i) represent the average value of i-th of vector characteristics, max (M_i) generation The maximum value of i-th of vector of table, min (M_i) represent the minimum value of i-th of vector；Each feature vector is by one group of feature Characteristic value constitute, the average value of i-th of vector is the average value of each characteristic value in this feature vector, the maximum of i-th of vector Value is that each characteristic value is maximum in this feature vector, and the minimum value of i-th of vector is each characteristic value minimum value in this feature vector.