CN105516027A - Application identification model establishing method, and flow data identification method and device - Google Patents

Application identification model establishing method, and flow data identification method and device Download PDF

Info

Publication number
CN105516027A
CN105516027A CN201610018242.2A CN201610018242A CN105516027A CN 105516027 A CN105516027 A CN 105516027A CN 201610018242 A CN201610018242 A CN 201610018242A CN 105516027 A CN105516027 A CN 105516027A
Authority
CN
China
Prior art keywords
data
flows
host
relevance
application identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610018242.2A
Other languages
Chinese (zh)
Other versions
CN105516027B (en
Inventor
王占一
刘博�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qax Technology Group Inc
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Beijing Qianxin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Beijing Qianxin Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201610018242.2A priority Critical patent/CN105516027B/en
Publication of CN105516027A publication Critical patent/CN105516027A/en
Application granted granted Critical
Publication of CN105516027B publication Critical patent/CN105516027B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/80Actions related to the user profile or the type of traffic
    • H04L47/803Application aware

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an application identification model establishing method, a flow data identification method and a flow data identification device. The application identification model establishing method is applied to the condition of data transmission between a host and a network side node; at least one host process with data processing capacity is set on the host; the application identification model establishing method comprises the following steps: acquiring multi-piece host data which are transmitted by the host; acquiring multi-piece flow data which are received by the network side node; comparing each piece of host datum with each piece of flow datum to find out at least one pair of the host datum and the flow datum with relevance; processing parameters of the pairs of the host data and the flow data with the relevance to acquire the correspondence between a host process name corresponding to each pair of the host datum and the flow datum with the relevance and a data packet load; establishing an application identification model by utilizing the correspondence between each host process name and the data packet load. By utilizing the application identification model establishing method, the accuracy, the convenience and the efficiency of application identification can be improved.

Description

The recognition methods of application identification method for establishing model, data on flows and device
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of application identification method for establishing model based on degree of depth study and device, and a kind of recognition methods of data on flows and device.
Background technology
In intranet environment, different application is often in different priority.Such as application and the application of P2P class of administrative class, in most enterprise, all the former can be classified as important application, and the latter be classified as restriction application.In different enterprise, even identical application, also may priority difference.Such as, being video class application equally, is different in the status of video class company and electricity Shang class company.Meanwhile, the service condition understanding each application is conducive to reasonably optimizing and the application in configuration enterprise and network, with the quick transmission of guarantee information and efficiently carrying out of work.Therefore in local area network (LAN), identify that application is very important.
One is port match, and the source port used by data flow or target port and system existing application-port data storehouse compare, thus determine the application that data flow is corresponding.The data flow that certain applications produce adopts particular port to transmit, and is therefore feasible to this part application.The advantage of this mode is the storage and the analysis that do not need mass data, and also do not need complicated algorithm, system burden is very little.Can corresponding multiple application but actual conditions are some ports, or applying the port adopted does not fix, multiple possibility causes the accuracy that identifies not high.
Another kind method is pattern matching, is also current most widely used method.Pattern matching is divided into two classes, and a class is in host side, according to the file characteristic of known features storehouse coupling application, as product version, name of product, company, FileVersion, source filename etc.This method needs to install identification software on every platform main frame, can affect Consumer's Experience and host performance.A class is also had to be identify application at network side by the characterization rules known to data stream matches.This method needs artificial analysis and defined feature, and has newly-increased application every day, and manual analysis workload is too large, does not catch up with the speed that application is newly-increased far away.
Summary of the invention
In view of the above problems, propose the present invention to provide a kind of overcoming the problems referred to above or a kind of application identification method for establishing model based on degree of depth study solved the problem at least in part and device, and a kind of recognition methods of data on flows and device.
According to an aspect of the present invention, embodiments provide a kind of application identification method for establishing model based on degree of depth study, be applied to the environment that main frame and network side node carry out transfer of data, described main frame be provided with the host processes that at least one possesses data-handling capacity, comprise:
Obtain many host datas of described Host Transfer, wherein, carry in each host data in described main frame the host processes title that this host data processes;
Obtain many datas on flows that described network side node receives, wherein, in each data on flows, carry data pack load when described network side node receives this data on flows;
Each host data and each data on flows are compared, to find out at least one pair of host data and data on flows of wherein possessing relevance;
The parameter of each host data and data on flows to possessing relevance is processed, to obtain each host processes title to possessing corresponding to the host data of relevance and data on flows and the corresponding relation between data pack load;
The corresponding relation of each pair of host processes title and data pack load is utilized to set up described application identification model.
Alternatively, each host data and each data on flows are compared, to find out at least one pair of host data and data on flows of wherein possessing relevance, comprising:
Each parameter that each host data and each data on flows carry is compared;
Identical according to many group parameters, or identical parameters ratio exceedes the comparison rules of proportion threshold value, to find out the host data and data on flows pair that possess relevance.
Alternatively, the parameter that host data carries at least comprises: the process title of transmission time of host data, source IP address, source port number, target ip address, destination port number, processing host data;
The parameter that data on flows is carried at least comprises: data pack load during time of reception, source IP address, source port number, target ip address, destination port number, the data on flows of data on flows.
Alternatively, identical according to many group parameters, or identical parameters ratio exceedes the comparison rules of proportion threshold value, with find out possess relevance host data and data on flows to afterwards, also comprise:
According to screening rule to the host data possessing relevance determined and data on flows to screening, filter out the host data and data on flows pair that wherein possess spurious correlation further;
Host data and the data on flows pair of spurious correlation is possessed described in deletion.
Alternatively, according to screening rule to the host data possessing relevance determined and data on flows to screening, filter out the host data and data on flows pair that wherein possess spurious correlation further, comprise following one of at least:
If one host data and more than two datas on flows possess relevance, then determine that this relevance is spurious correlation;
If one host data and a data on flows possess relevance, but both time difference overtime difference limen values, then determine that this relevance is spurious correlation.
Alternatively, utilize the corresponding relation of each pair of host processes title and data pack load to set up described application identification model, comprising:
Respectively machine language conversion is carried out to host processes title and data pack load, be converted into machine recognizable machine data;
Set up corresponding relation between host processes title in post-conversion and data pack load further, and utilize this corresponding relation to set up described application identification model.
Alternatively, machine language conversion is carried out to host processes title, is converted into machine recognizable machine data, comprises:
By host processes title with from 0 and the ordered list increased progressively one by one map, each host processes title is converted into corresponding natural number.
Alternatively, machine language conversion is carried out to data pack load, is converted into machine recognizable machine data, comprises:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
Alternatively, described application identification model uses as follows, comprising:
Obtain the input data of described application identification model, through convolutional layer and pond layer process, generate the depth characteristic of input data;
Described depth characteristic is delivered to the full articulamentum identical with neural net, and described depth characteristic is resolved;
By described full articulamentum, the analysis result of described depth characteristic is transferred to output layer, outwards export.
Alternatively, described convolutional layer and the superposition of described pond layer multi-layer use, and superposition is more, and described depth characteristic is darker.
Alternatively, described convolutional layer and described pond layer use in pairs.
Alternatively, the window dimension of described convolutional layer and described pond layer is 1*n.
According to another aspect of the present invention, embodiments provide a kind of recognition methods of data on flows, comprising:
Receive data on flows, wherein, in described data on flows, carry data pack load when network side node receives this data on flows;
Described data on flows is converted to the identifiable design data of application identification model;
Described identifiable design data are inputted described application identification model, obtains the probability that institute's identification data belongs to different host processes;
The host processes that data on flows is corresponding according to the described probability identification obtained.
Alternatively, described data on flows is converted to the identifiable design data of described application identification model, comprises:
Machine language conversion is carried out to the data pack load of described data on flows, is converted into the discernible data of described application identification model.
Alternatively, machine language conversion is carried out to the data pack load of described data on flows, is converted into the discernible data of described application identification model, comprises:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
Alternatively, the host processes that data on flows is corresponding according to the described probability identification obtained, comprising:
Choose the result of determination of maximum probability value as described data on flows, determine the host processes title that described data on flows is corresponding.
According to a further aspect of the invention, embodiments provide a kind of application identification model apparatus for establishing based on degree of depth study, be applied to the environment that main frame and network side node carry out transfer of data, described main frame be provided with the host processes that at least one possesses data-handling capacity, comprise:
First acquisition module, is suitable for many host datas obtaining described Host Transfer, wherein, carries in described main frame the host processes title that this host data processes in each host data;
Second acquisition module, is suitable for obtaining many datas on flows that described network side node receives, and wherein, carries data pack load when described network side node receives this data on flows in each data on flows;
Comparing module, is suitable for comparing to each host data and each data on flows, to find out at least one pair of host data and data on flows of wherein possessing relevance;
3rd acquisition module, is suitable for processing the parameter of each host data and data on flows to possessing relevance, to obtain each host processes title to possessing corresponding to the host data of relevance and data on flows and the corresponding relation between data pack load;
Set up module, be suitable for utilizing the corresponding relation of each pair of host processes title and data pack load to set up described application identification model.
Alternatively, described comparing module is also suitable for:
Each parameter that each host data and each data on flows carry is compared;
Identical according to many group parameters, or identical parameters ratio exceedes the comparison rules of proportion threshold value, to find out the host data and data on flows pair that possess relevance.
Alternatively, the parameter that host data carries at least comprises: the process title of transmission time of host data, source IP address, source port number, target ip address, destination port number, processing host data;
The parameter that data on flows is carried at least comprises: data pack load during time of reception, source IP address, source port number, target ip address, destination port number, the data on flows of data on flows.
Alternatively, described comparing module is also suitable for:
Identical according to many group parameters, or, identical parameters ratio exceedes the comparison rules of proportion threshold value, with find out possess relevance host data and data on flows to afterwards, according to screening rule to the host data possessing relevance determined and data on flows to screening, filter out the host data and data on flows pair that wherein possess spurious correlation further;
Host data and the data on flows pair of spurious correlation is possessed described in deletion.
Alternatively, described comparing module is also suitable for:
If one host data and more than two datas on flows possess relevance, then determine that this relevance is spurious correlation; Or
If one host data and a data on flows possess relevance, but both time difference overtime difference limen values, then determine that this relevance is spurious correlation.
Alternatively, described module of setting up also is suitable for:
Respectively machine language conversion is carried out to host processes title and data pack load, be converted into machine recognizable machine data;
Set up corresponding relation between host processes title in post-conversion and data pack load further, and utilize this corresponding relation to set up described application identification model.
Alternatively, described module of setting up also is suitable for:
By host processes title with from 0 and the ordered list increased progressively one by one map, each host processes title is converted into corresponding natural number.
Alternatively, described module of setting up also is suitable for:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
According to a further aspect of the invention, embodiments provide a kind of recognition device of data on flows, comprising:
Receiver module, is suitable for receiving data on flows, wherein, carries data pack load when network side node receives this data on flows in described data on flows;
Modular converter, is suitable for the identifiable design data described data on flows being converted to application identification model;
Input module, is suitable for described identifiable design data to input described application identification model, obtains the probability that institute's identification data belongs to different host processes;
Identification module, is suitable for the host processes that described in the probability identification that obtains according to described input module, data on flows is corresponding.
Alternatively, described modular converter is also suitable for:
Machine language conversion is carried out to the data pack load of described data on flows, is converted into the discernible data of described application identification model.
Alternatively, described modular converter is also suitable for:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
Alternatively, described identification module is also suitable for:
Choose the result of determination of maximum probability value as described data on flows, determine the process title that described data on flows is corresponding.
In embodiments of the present invention, respectively host data and data on flows are compared, find out relating dot wherein, and then possess host data and the data on flows of relevance according to relevance screening.At host computer side, host data is sent by concrete process, and at network side, the data pack load of its correspondence can be determined when data on flows obtains, the embodiment of the present invention is by the relevance of host data and data on flows, the host processes title of actual correspondence and the corresponding relation of data pack load are determined in further analysis, and generate application identification model according to this corresponding relation.This application identification follow-up, when use traffic data, can find according to the data pack load correspondence of data on flows the process title sending this host data, and then determine the application sending this host data.And the identification of application can determine the priority of this application, and then determine the processing priority of this data on flows, be conducive to the application in reasonably optimizing and configuration enterprise and network, the quick transmission of guarantee information and efficiently carrying out of work.Namely, adopt the embodiment of the present invention to set up and identify the application identification model that the initiation of data on flows is applied, which kind of application identification model data on flows being inputted embodiment of the present invention foundation sent by application of main frame as long as can obtain rapidly this data on flows, without the need to artificial participation, also identification software be need not increase in host side, the accuracy of application identification, convenience and efficiency considerably increased.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of specification, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
According to hereafter by reference to the accompanying drawings to the detailed description of the specific embodiment of the invention, those skilled in the art will understand above-mentioned and other objects, advantage and feature of the present invention more.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 shows according to an embodiment of the invention based on the process chart of the application identification method for establishing model of degree of depth study;
Fig. 2 shows the use schematic flow sheet of application identification model according to an embodiment of the invention;
Fig. 3 shows a kind of according to an embodiment of the invention recognition methods of data on flows;
Fig. 4 shows according to an embodiment of the invention based on the structural representation of the application identification model apparatus for establishing of degree of depth study;
Fig. 5 shows the structural representation of the recognition device of data on flows according to an embodiment of the invention; And
Fig. 6 shows the process of establishing of application identification model according to an embodiment of the invention and a system schematic of follow-up data on flows identifying.
Embodiment
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
For solving the problems of the technologies described above, the embodiment of the present invention proposes to set up a kind of new application identification model, this application identification model generates based on degree of depth study, degree of depth study is a branch in machine learning field, its essence sets up the deep neural network of a set of automatic analysis study, the mechanism that it imitates human brain carrys out learning data, achieves significant achievement in recent years at image, voice and text field.The application identification model that the embodiment of the present invention is set up also possesses the ability of self study, can determine degree of depth learning characteristic, automatic learning model parameter, do not rely on manual analysis and file characteristic library for data analysis.Further, the foundation of application identification model also makes in identifying, without the need to installing identification software to main frame, also not needing to carry out data storage and computing in host side, can realize unaware state to user, improves user and experiences experience.
Based on this inventive concept, the embodiment of the present invention proposes a kind of application identification method for establishing model based on degree of depth study, the method is applied to the environment that main frame and network side node carry out transfer of data, wherein, the data obtained from host computer side are hereinafter host data, and the data obtained from network side node are hereinafter data on flows.And main frame is provided with the host processes that at least one possesses data-handling capacity.Fig. 1 shows according to an embodiment of the invention based on the process chart of the application identification method for establishing model of degree of depth study.See Fig. 1, the method at least comprises:
Many host datas of step S102, acquisition Host Transfer, wherein, carry in each host data in main frame the host processes title (process) that this host data processes;
Step S104, obtain many datas on flows that network side node receives, wherein, in each data on flows, carry the data pack load (payload) when network side node receives this data on flows;
Step S106, each host data and each data on flows to be compared, to find out at least one pair of host data and data on flows of wherein possessing relevance;
Step S108, the parameter of each host data and data on flows to possessing relevance to be processed, to obtain each host processes title to possessing corresponding to the host data of relevance and data on flows and the corresponding relation between data pack load;
Step S110, the corresponding relation of each pair of host processes title and data pack load is utilized to set up application identification model.
In embodiments of the present invention, respectively host data and data on flows are compared, find out relating dot wherein, and then possess host data and the data on flows of relevance according to relevance screening.At host computer side, host data is sent by concrete process, and at network side, the data pack load of its correspondence can be determined when data on flows obtains, the embodiment of the present invention is by the relevance of host data and data on flows, the host processes title of actual correspondence and the corresponding relation of data pack load are determined in further analysis, and generate application identification model according to this corresponding relation.This application identification follow-up, when use traffic data, can find according to the data pack load correspondence of data on flows the process title sending this host data, and then determine the application sending this host data.And the identification of application can determine the priority of this application, and then determine the processing priority of this data on flows, be conducive to the application in reasonably optimizing and configuration enterprise and network, the quick transmission of guarantee information and efficiently carrying out of work.Namely, adopt the embodiment of the present invention to set up and identify the application identification model that the initiation of data on flows is applied, which kind of application identification model data on flows being inputted embodiment of the present invention foundation sent by application of main frame as long as can obtain rapidly this data on flows, without the need to artificial participation, also identification software be need not increase in host side, the accuracy of application identification, convenience and efficiency considerably increased.
In a preferred embodiment, step S106 need compare to each host data and each data on flows, to find out at least one pair of host data and data on flows of wherein possessing relevance.Organize parameter because host data and data on flows all carry, for data, parameter normally comprises most crucial content or the mark class content of data, therefore can directly each parameter of carrying each host data and each data on flows be compared more.If wherein there is host data to meet many group parameters identical (such as more than 3 groups) with data on flows, or, identical parameters ratio exceedes the comparison rules of proportion threshold value (such as more than 60%), then determine that both should possess relevance, comparison is to find out the All hosts data and data on flows pair that possess relevance successively.
In a specific embodiment, the parameter that host data carries at least comprises: the process title of transmission time of host data, source IP address, source port number, target ip address, destination port number, processing host data.The parameter that data on flows is carried at least comprises: data pack load during time of reception, source IP address, source port number, target ip address, destination port number, the data on flows of data on flows.Now, if the source IP address of the source IP address of host data, source port number, destination port number and data on flows, source port number, destination port number are identical, then can think that this host data and this data on flows should possess relevance.
Further, most of associated data can adopt above-mentioned alignments to determine, but still some special circumstances, and the embodiment of the present invention is referred to as the host data and the data on flows pair that possess spurious correlation.If there is these feature situations, then to need according to screening rule the host data possessing relevance determined and data on flows screening, filter out the host data and data on flows pair that wherein possess spurious correlation further, and delete the host data and the data on flows pair that possess spurious correlation.
If meet following any one, meet the special circumstances of embodiment of the present invention indication, determine that this relevance is spurious correlation:
If first one host data and more than two datas on flows possess relevance, then determine that this relevance is spurious correlation;
After a host data transmission, what network side node received also must be data, if there are the datas on flows of more than two associations, prove that data transmission procedure may obscure other data, the relevance now determined is not well-determined.If these data are applied to set up application identification model, then likely there is the situation of the identified two methods of same data on flows, affect the accuracy of the result of application identification model.
If second one host data and a data on flows possess relevance, but both time difference overtime difference limen values, then determine that this relevance is spurious correlation.
Because of the data high-speed path of cybertimes, data transmission procedure is usually comparatively quick, time is very short, if both time phase differences are too large, then likely host data is lost, what network side node received is not the data on flows that this host data is corresponding, for ensureing to set up the accuracy of the data of application identification model, this situation not with employing.
Consider that application identification model itself is exist as machine mould, and host processes title and data pack load are not all machine languages, therefore, when implementing, machine language conversion can be carried out respectively in advance to host processes title and data pack load, be converted into machine recognizable machine data, set up corresponding relation between host processes title in post-conversion and data pack load further, and utilize this corresponding relation to set up application identification model.
Particularly, can by host processes title with from 0 and the ordered list increased progressively one by one map, each host processes title is converted into corresponding natural number.In addition, the data pack load of hexadecimal string can be converted into corresponding decimal number; To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
Embodiments provide the specific embodiment that a host processes title and data pack load carry out machine language conversion.Table one shows the corresponding relation list of host processes title and the data pack load obtained according to the embodiment of the present invention.
Table one
Payload Process
474554202f… S.exe
474554201e… S.exe
70a100347b… A.exe
803d010301… B.exe
160302009c… C.exe
803d010302… B.exe
504f535420… D.exe
First, data pack load is changed.For Payload: decimal number hexadecimal string being converted into corresponding 0-255, then to every number divided by 255.For each sample, what obtain is the floating number of L [0,1], and L is the length of load, and concrete conversion results refers to table two.
Table two
Secondly, host processes title is changed.Particularly, name map is increased progressively one by one and orderly table, specifically in table three to one from 0.
Table three
Process
0 S.exe
1 A.exe
2 B.exe
3 C.exe
4 D.exe
…… ……
For convenience of retrieval, adopt Numerical Index to substitute process each in table three further, and then generate table four:
Table four
After training data EOC, the embodiment of the present invention then based on identify data genaration application identification model and use.Fig. 2 shows the use schematic flow sheet of application identification model according to an embodiment of the invention.The application identification model that the embodiment of the present invention provides generates based on convolutional neural networks (CNN), is a kind of degree of depth learning model, is usually used in field of image recognition: as Handwritten Digital Recognition, recognition of face, picture classification etc.The application identification model that the embodiment of the present invention provides and the maximum difference of traditional C NN model are that the window dimension in Convolution sums pond is not the n*n being applicable to two dimensional image, but 1*n (n is the size of convolution or pond elementary cell).
Concrete, the use flow process of application identification model comprises:
First, obtain the input data of application identification model, through convolutional layer and pond layer process, generate the depth characteristic of input data; The data of input are first through some convolutional layers and pond layer.Usual convolutional layer and pond layer use in pairs, or only use convolutional layer after certain depth, do not use pond layer.Superpose more, the network of formation is darker.Convolutional layer and pond layer at least use 2 ability to generate depth characteristic.
Secondly, depth characteristic is delivered to the full articulamentum identical with neural net, and depth characteristic is resolved; Depth characteristic is sent into the full articulamentum identical with traditional neural network, and the full number of plies that connects should not too much, general 1 to 3 layer.Finally be delivered to output layer.
Finally, by full articulamentum, the analysis result of depth characteristic is transferred to output layer, outwards export.
Certainly, application identification model needs to upgrade, and the model modification cycle determines as the case may be.If use GPU high performance computation, in conjunction with real resource and business demand, by every day or weekly training carried out model modification; If use CPU cluster computing, in conjunction with real resource and business demand, by training model modification weekly or monthly.
After application identification model is successfully established, namely the embodiment of the present invention can utilize it to carry out the identification of data on flows.Fig. 3 shows a kind of according to an embodiment of the invention recognition methods of data on flows.See Fig. 3, the method at least comprises:
Step S302, receive data on flows, wherein, in data on flows, carry data pack load when network side node receives this data on flows;
Step S304, data on flows is converted to the identifiable design data of application identification model;
Step S306, by identifiable design data input application identification model, obtain the probability that institute's identification data belongs to different host processes;
Step S308, the host processes corresponding according to the probability identification data on flows obtained.
Particularly, the result of determination of maximum probability value as data on flows can be chosen, determine the host processes title that data on flows is corresponding.
Mention when application identification model is set up above, for convenience of digital independent or application, data on flows need be converted into machine language, in like manner, carry out in the identifying of data on flows utilizing application identification model, also need to perform step S304, data on flows is converted to the identifiable design data of application identification model.Particularly, machine language conversion is carried out to the data pack load of data on flows, be converted into the discernible data of application identification model.First, the data pack load of hexadecimal string is converted into corresponding decimal number.Secondly, to the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
The embodiment of the present invention provides the specific embodiment of a data on flows identification.Data to be identified obtain data shown in table five after changing, and then use application identification model to identify it, export the probability that every bar data to be identified belong to types of applications, specifically refer to table five at output layer:
Table five
Finally get the result of determination of Apply Names corresponding to maximum probability value as these data, the recognition result that C.exe is these data as got in example.
For the process of establishing of application identification model that the embodiment of the present invention provided and follow-up data on flows identifying are set forth clearer clearer, embodiments provide a complete embodiment and be described, specifically see below.
1, the acquisition of data
In the training stage, data are divided into two parts: host data and data on flows
Host data obtains from host side, comprise Time (time), SIP (source IP address), SPort (source port number), DIP (target ip address), DPort (destination port number), Process (the process title that application is corresponding in running, as " svchost.exe "), generate hexa-atomic group.Specifically in table six:
Table six
Data on flows obtains from network side, comprise Time (time), SIP (source IP address), SPort (source port number), DIP (target ip address), DPort (destination port number), Payload (load, the spliced packet of TCP network flow uplink and downlink, as " 705ba387fe ... "), generate hexa-atomic group.Specifically in table seven:
Table seven
At cognitive phase, input data only use data on flows, and data on flows form is identical with the training stage.
2, the structure (main frame associates with data on flows) of training data
Inside the fields of two tables above, what two kinds of data all possessed has: Time, SIP, SPort, DIP, DPort.First carry out exact matching by SIP, SPort, DIP, DPort.And Time is due to Data Source difference, system log (SYSLOG) or the time difference uploaded, cause there is certain delay, so need to do further association with Time.
For above two signal charts, need the situation of special processing when representing several associated data:
(1) by the exact matching of four-tuple, 1 data of main frame tentatively associate with 2 data of flow, but the time is close, specifically cannot determine that Process is that the application program of A.exe is corresponding with which bar Payload by the time.Therefore this situation does not associate, and does not add training data.
Four-tuple one_to_one corresponding in (2) two kinds of data, and the time interval is little, therefore thinks and can correctly associate, by " 474554202f ... " " S.exe " adds training data.
(3) although four-tuple one_to_one corresponding, time phase difference 31 minutes, interval is excessive, therefore thinks uncorrelated data, does not associate.Here, the threshold value in the time interval is determined according to the real data in each local area network (LAN), and usual value is no more than 10 minutes.
Training data in final association is in table one:
Table one
3, data transformation
To Payload: decimal number hexadecimal string being converted into corresponding 0-255, then to every number divided by 255.For each sample, what obtain is the floating number of L [0,1], and L is the length of load.
Example is in table two:
Table two
For data to be identified, it is just much of that only to carry out this step; And for training data, also need to be handled as follows:
Apply Names for Process: name map is increased progressively one by one and orderly table to one from 0.Example is in table three:
Table three
Process
0 S.exe
1 A.exe
2 B.exe
3 C.exe
4 D.exe
…… ……
For convenience of retrieval, adopt Numerical Index to substitute process each in table three further, and then generate table four:
Table four
Finally, training data is transformed to associated data corresponding to a series of Xi and Yi.
Example is in table eight:
Table eight
5, identifying
Input: the data to be identified after conversion, transform method is identical with the transform method of training data.
Use the CNN model parameter trained, through forward operation such as convolution, Chi Hua, activation, final output layer exports the probability that every bar data to be identified belong to types of applications, such as table five:
Table five
Finally get the result of determination of Apply Names corresponding to maximum probability value as these data, the recognition result that C.exe is these data as got in example.
6, recognition result reprocessing
After identifying application program, can contrast with known application library, determine whether in restriction list, to take corresponding measure.Simultaneously can each main frame uses in local area network the statistics that should be used as in data, to understand the distributed number situation be applied in local area network (LAN).For certain application, take corresponding processing policy according to the difference of probable value.
The situation of contrast and the processing method of correspondence are as table nine:
Table nine
The threshold value being identified as certain applied probability height can artificially rule of thumb set, and think that probability is high as being greater than 0.3, recognition result is reliable; Be less than 0.3 and think that probability is low, recognition result is unreliable.
Based on same inventive concept, the embodiment of the present invention additionally provides a kind of application identification model apparatus for establishing based on degree of depth study, be applied to the environment that main frame and network side node carry out transfer of data, main frame be provided with the host processes that at least one possesses data-handling capacity.Fig. 4 shows according to an embodiment of the invention based on the structural representation of the application identification model apparatus for establishing of degree of depth study.See Fig. 4, this device at least comprises:
First acquisition module 410, is suitable for many host datas obtaining Host Transfer, wherein, carries in main frame the host processes title that this host data processes in each host data;
Second acquisition module 420, is suitable for obtaining many datas on flows that network side node receives, wherein, carries data pack load when network side node receives this data on flows in each data on flows;
Comparing module 430, is coupled with the first acquisition module 410, second acquisition module 420 respectively, is suitable for comparing to each host data and each data on flows, to find out at least one pair of host data and data on flows of wherein possessing relevance;
3rd acquisition module 440, be coupled with comparing module 430, be suitable for processing the parameter of each host data and data on flows to possessing relevance, to obtain each host processes title to possessing corresponding to the host data of relevance and data on flows and the corresponding relation between data pack load;
Set up module 450, be coupled with the 3rd acquisition module 440, be suitable for utilizing the corresponding relation of each pair of host processes title and data pack load to set up application identification model.
In a preferred embodiment, comparing module 430 is also suitable for:
Each parameter that each host data and each data on flows carry is compared;
Identical according to many group parameters, or identical parameters ratio exceedes the comparison rules of proportion threshold value, to find out the host data and data on flows pair that possess relevance.
In a preferred embodiment,
The parameter that host data carries at least comprises: the process title of transmission time of host data, source IP address, source port number, target ip address, destination port number, processing host data;
The parameter that data on flows is carried at least comprises: data pack load during time of reception, source IP address, source port number, target ip address, destination port number, the data on flows of data on flows.
In a preferred embodiment, comparing module 430 is also suitable for:
Identical according to many group parameters, or, identical parameters ratio exceedes the comparison rules of proportion threshold value, with find out possess relevance host data and data on flows to afterwards, according to screening rule to the host data possessing relevance determined and data on flows to screening, filter out the host data and data on flows pair that wherein possess spurious correlation further;
Delete the host data and the data on flows pair that possess spurious correlation.
In a preferred embodiment, comparing module 430 is also suitable for:
If one host data and more than two datas on flows possess relevance, then determine that this relevance is spurious correlation; Or
If one host data and a data on flows possess relevance, but both time difference overtime difference limen values, then determine that this relevance is spurious correlation.
In a preferred embodiment, set up module 450 to be also suitable for:
Respectively machine language conversion is carried out to host processes title and data pack load, be converted into machine recognizable machine data;
Set up corresponding relation between host processes title in post-conversion and data pack load further, and utilize this corresponding relation to set up application identification model.
In a preferred embodiment, set up module 450 to be also suitable for:
By host processes title with from 0 and the ordered list increased progressively one by one map, each host processes title is converted into corresponding natural number.
In a preferred embodiment, set up module 450 to be also suitable for:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
Based on same inventive concept, the embodiment of the present invention additionally provides a kind of recognition device of data on flows.Fig. 5 shows the structural representation of the recognition device of data on flows according to an embodiment of the invention.See Fig. 5, this device at least comprises:
Receiver module 510, is suitable for receiving data on flows, wherein, carries data pack load when network side node receives this data on flows in data on flows;
Modular converter 520, is coupled with receiver module 510, is suitable for identifiable design data data on flows being converted to application identification model;
Input module 530, is coupled with modular converter 520, is suitable for, by identifiable design data input application identification model, obtaining the probability that institute's identification data belongs to different host processes;
Identification module 540, is coupled with input module 530, the host processes that the probability identification data on flows being suitable for obtaining according to input module 530 is corresponding.
In a preferred embodiment, modular converter 520 is also suitable for:
Machine language conversion is carried out to the data pack load of data on flows, is converted into the discernible data of application identification model.
In a preferred embodiment, modular converter 520 is also suitable for:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
In a preferred embodiment, identification module 540 is also suitable for:
Choose the result of determination of maximum probability value as data on flows, determine the process title that data on flows is corresponding.
To sum up, embodiments provide the process of establishing of application identification model and a system diagram of follow-up data on flows identifying, specifically see Fig. 6.Wherein, host data and data on flows are transfused to training data relating module and train, so enter training data conversion module, the degree of depth learning model training thus set up level of application identification module.When inputting new data on flows, by identification data conversion module, it being identified, being inputted application identification module subsequently to obtain recognition result.And then recognition result and application library are contrasted, and respective handling is carried out to recognition result.
Adopt and embodiments provide the process of establishing of application identification model and follow-up data on flows identifying can reach following beneficial effect:
In embodiments of the present invention, respectively host data and data on flows are compared, find out relating dot wherein, and then possess host data and the data on flows of relevance according to relevance screening.At host computer side, host data is sent by concrete process, and at network side, the data pack load of its correspondence can be determined when data on flows obtains, the embodiment of the present invention is by the relevance of host data and data on flows, the host processes title of actual correspondence and the corresponding relation of data pack load are determined in further analysis, and generate application identification model according to this corresponding relation.This application identification follow-up, when use traffic data, can find according to the data pack load correspondence of data on flows the process title sending this host data, and then determine the application sending this host data.And the identification of application can determine the priority of this application, and then determine the processing priority of this data on flows, be conducive to the application in reasonably optimizing and configuration enterprise and network, the quick transmission of guarantee information and efficiently carrying out of work.Namely, adopt the embodiment of the present invention to set up and identify the application identification model that the initiation of data on flows is applied, which kind of application identification model data on flows being inputted embodiment of the present invention foundation sent by application of main frame as long as can obtain rapidly this data on flows, without the need to artificial participation, also identification software be need not increase in host side, the accuracy of application identification, convenience and efficiency considerably increased.
In specification provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and adaptively can change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this specification (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this specification (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in detail in the claims, the one of any of embodiment required for protection can use with arbitrary compound mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.Those skilled in the art is to be understood that, the application identification model apparatus for establishing based on degree of depth study that microprocessor or digital signal processor (DSP) realize according to the embodiment of the present invention can be used in practice, and the some or all functions of some or all parts in a kind of recognition device of data on flows.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computer of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.
So far, those skilled in the art will recognize that, although multiple exemplary embodiment of the present invention is illustrate and described herein detailed, but, without departing from the spirit and scope of the present invention, still can directly determine or derive other modification many or amendment of meeting the principle of the invention according to content disclosed by the invention.Therefore, scope of the present invention should be understood and regard as and cover all these other modification or amendments.
Based on one aspect of the present invention, also disclose: A1, a kind of application identification method for establishing model based on degree of depth study, be applied to the environment that main frame and network side node carry out transfer of data, described main frame be provided with the host processes that at least one possesses data-handling capacity, comprise:
Obtain many host datas of described Host Transfer, wherein, carry in each host data in described main frame the host processes title that this host data processes;
Obtain many datas on flows that described network side node receives, wherein, in each data on flows, carry data pack load when described network side node receives this data on flows;
Each host data and each data on flows are compared, to find out at least one pair of host data and data on flows of wherein possessing relevance;
The parameter of each host data and data on flows to possessing relevance is processed, to obtain each host processes title to possessing corresponding to the host data of relevance and data on flows and the corresponding relation between data pack load;
The corresponding relation of each pair of host processes title and data pack load is utilized to set up described application identification model.
A2, method according to A1, wherein, each host data and each data on flows are compared, to find out at least one pair of host data and data on flows of wherein possessing relevance, comprising:
Each parameter that each host data and each data on flows carry is compared;
Identical according to many group parameters, or identical parameters ratio exceedes the comparison rules of proportion threshold value, to find out the host data and data on flows pair that possess relevance.
A3, method according to A2, wherein,
The parameter that host data carries at least comprises: the process title of transmission time of host data, source IP address, source port number, target ip address, destination port number, processing host data;
The parameter that data on flows is carried at least comprises: data pack load during time of reception, source IP address, source port number, target ip address, destination port number, the data on flows of data on flows.
A4, method according to A2, wherein, identical according to many group parameters, or identical parameters ratio exceedes the comparison rules of proportion threshold value, with find out possess relevance host data and data on flows to afterwards, also comprise:
According to screening rule to the host data possessing relevance determined and data on flows to screening, filter out the host data and data on flows pair that wherein possess spurious correlation further;
Host data and the data on flows pair of spurious correlation is possessed described in deletion.
A5, method according to A4, wherein, according to screening rule to the host data possessing relevance determined and data on flows to screening, filter out the host data and data on flows pair that wherein possess spurious correlation further, comprise following one of at least:
If one host data and more than two datas on flows possess relevance, then determine that this relevance is spurious correlation;
If one host data and a data on flows possess relevance, but both time difference overtime difference limen values, then determine that this relevance is spurious correlation.
A6, method according to any one of A1 to A5, wherein, utilize the corresponding relation of each pair of host processes title and data pack load to set up described application identification model, comprising:
Respectively machine language conversion is carried out to host processes title and data pack load, be converted into machine recognizable machine data;
Set up corresponding relation between host processes title in post-conversion and data pack load further, and utilize this corresponding relation to set up described application identification model.
A7, method according to A6, wherein, machine language conversion is carried out to host processes title, be converted into machine recognizable machine data, comprise:
By host processes title with from 0 and the ordered list increased progressively one by one map, each host processes title is converted into corresponding natural number.
A8, method according to A6 or A7, wherein, machine language conversion is carried out to data pack load, be converted into machine recognizable machine data, comprise:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
A9, method according to any one of A1 to A8, wherein, described application identification model uses as follows, comprising:
Obtain the input data of described application identification model, through convolutional layer and pond layer process, generate the depth characteristic of input data;
Described depth characteristic is delivered to the full articulamentum identical with neural net, and described depth characteristic is resolved;
By described full articulamentum, the analysis result of described depth characteristic is transferred to output layer, outwards export.
A10, method according to A9, wherein, described convolutional layer and the superposition of described pond layer multi-layer use, and superposition is more, and described depth characteristic is darker.
A11, method according to A9 or A10, wherein, described convolutional layer and described pond layer use in pairs.
A12, method according to any one of A9 to A11, wherein, the window dimension of described convolutional layer and described pond layer is 1*n.
Based on another aspect of the present invention, also disclose: the recognition methods of B13, a kind of data on flows, comprising:
Receive data on flows, wherein, in described data on flows, carry data pack load when network side node receives this data on flows;
Described data on flows is converted to the identifiable design data of application identification model;
Described identifiable design data are inputted described application identification model, obtains the probability that institute's identification data belongs to different host processes;
The host processes that data on flows is corresponding according to the described probability identification obtained.
B14, method according to B13, wherein, be converted to the identifiable design data of described application identification model, comprise by described data on flows:
Machine language conversion is carried out to the data pack load of described data on flows, is converted into the discernible data of described application identification model.
B15, method according to B14, wherein, machine language conversion is carried out to the data pack load of described data on flows, be converted into the discernible data of described application identification model, comprise:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
B16, method according to any one of B13 to B15, wherein, the host processes that data on flows is corresponding according to the described probability identification obtained, comprising:
Choose the result of determination of maximum probability value as described data on flows, determine the host processes title that described data on flows is corresponding.
Based on another aspect of the present invention, also disclose: C17, a kind of application identification model apparatus for establishing based on degree of depth study, be applied to the environment that main frame and network side node carry out transfer of data, described main frame be provided with the host processes that at least one possesses data-handling capacity, comprise:
First acquisition module, is suitable for many host datas obtaining described Host Transfer, wherein, carries in described main frame the host processes title that this host data processes in each host data;
Second acquisition module, is suitable for obtaining many datas on flows that described network side node receives, and wherein, carries data pack load when described network side node receives this data on flows in each data on flows;
Comparing module, is suitable for comparing to each host data and each data on flows, to find out at least one pair of host data and data on flows of wherein possessing relevance;
3rd acquisition module, is suitable for processing the parameter of each host data and data on flows to possessing relevance, to obtain each host processes title to possessing corresponding to the host data of relevance and data on flows and the corresponding relation between data pack load;
Set up module, be suitable for utilizing the corresponding relation of each pair of host processes title and data pack load to set up described application identification model.
C18, device according to C17, wherein, described comparing module is also suitable for:
Each parameter that each host data and each data on flows carry is compared;
Identical according to many group parameters, or identical parameters ratio exceedes the comparison rules of proportion threshold value, to find out the host data and data on flows pair that possess relevance.
C19, device according to C18, wherein,
The parameter that host data carries at least comprises: the process title of transmission time of host data, source IP address, source port number, target ip address, destination port number, processing host data;
The parameter that data on flows is carried at least comprises: data pack load during time of reception, source IP address, source port number, target ip address, destination port number, the data on flows of data on flows.
C20, device according to C18, wherein, described comparing module is also suitable for:
Identical according to many group parameters, or, identical parameters ratio exceedes the comparison rules of proportion threshold value, with find out possess relevance host data and data on flows to afterwards, according to screening rule to the host data possessing relevance determined and data on flows to screening, filter out the host data and data on flows pair that wherein possess spurious correlation further;
Host data and the data on flows pair of spurious correlation is possessed described in deletion.
C21, device according to C20, wherein, described comparing module is also suitable for:
If one host data and more than two datas on flows possess relevance, then determine that this relevance is spurious correlation; Or
If one host data and a data on flows possess relevance, but both time difference overtime difference limen values, then determine that this relevance is spurious correlation.
C22, device according to any one of C17 to C21, wherein, described module of setting up also is suitable for:
Respectively machine language conversion is carried out to host processes title and data pack load, be converted into machine recognizable machine data;
Set up corresponding relation between host processes title in post-conversion and data pack load further, and utilize this corresponding relation to set up described application identification model.
C23, device according to C22, wherein, described module of setting up also is suitable for:
By host processes title with from 0 and the ordered list increased progressively one by one map, each host processes title is converted into corresponding natural number.
C24, device according to C22 or C23, wherein, described module of setting up also is suitable for:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
Based on another aspect of the present invention, also disclose: the recognition device of D25, a kind of data on flows, comprising:
Receiver module, is suitable for receiving data on flows, wherein, carries data pack load when network side node receives this data on flows in described data on flows;
Modular converter, is suitable for the identifiable design data described data on flows being converted to application identification model;
Input module, is suitable for described identifiable design data to input described application identification model, obtains the probability that institute's identification data belongs to different host processes;
Identification module, is suitable for the host processes that described in the probability identification that obtains according to described input module, data on flows is corresponding.
D26, device according to D25, wherein, described modular converter is also suitable for:
Machine language conversion is carried out to the data pack load of described data on flows, is converted into the discernible data of described application identification model.
D27, device according to D26, wherein, described modular converter is also suitable for:
The data pack load of hexadecimal string is converted into corresponding decimal number;
To the decimal number after conversion divided by 255, obtain the floating number of L [0,1], wherein, L is the length of data pack load.
D28, device according to any one of D25 to D27, wherein, described identification module is also suitable for:
Choose the result of determination of maximum probability value as described data on flows, determine the process title that described data on flows is corresponding.

Claims (10)

1., based on an application identification method for establishing model for degree of depth study, be applied to the environment that main frame and network side node carry out transfer of data, described main frame be provided with the host processes that at least one possesses data-handling capacity, comprise:
Obtain many host datas of described Host Transfer, wherein, carry in each host data in described main frame the host processes title that this host data processes;
Obtain many datas on flows that described network side node receives, wherein, in each data on flows, carry data pack load when described network side node receives this data on flows;
Each host data and each data on flows are compared, to find out at least one pair of host data and data on flows of wherein possessing relevance;
The parameter of each host data and data on flows to possessing relevance is processed, to obtain each host processes title to possessing corresponding to the host data of relevance and data on flows and the corresponding relation between data pack load;
The corresponding relation of each pair of host processes title and data pack load is utilized to set up described application identification model.
2. method according to claim 1, wherein, compares to each host data and each data on flows, to find out at least one pair of host data and data on flows of wherein possessing relevance, comprising:
Each parameter that each host data and each data on flows carry is compared;
Identical according to many group parameters, or identical parameters ratio exceedes the comparison rules of proportion threshold value, to find out the host data and data on flows pair that possess relevance.
3. method according to claim 2, wherein,
The parameter that host data carries at least comprises: the process title of transmission time of host data, source IP address, source port number, target ip address, destination port number, processing host data;
The parameter that data on flows is carried at least comprises: data pack load during time of reception, source IP address, source port number, target ip address, destination port number, the data on flows of data on flows.
4. method according to claim 2, wherein, identical according to many group parameters, or identical parameters ratio exceedes the comparison rules of proportion threshold value, with find out possess relevance host data and data on flows to afterwards, also comprise:
According to screening rule to the host data possessing relevance determined and data on flows to screening, filter out the host data and data on flows pair that wherein possess spurious correlation further;
Host data and the data on flows pair of spurious correlation is possessed described in deletion.
5. method according to claim 4, wherein, according to screening rule to the host data possessing relevance determined and data on flows to screening, filter out the host data and data on flows pair that wherein possess spurious correlation further, comprise following one of at least:
If one host data and more than two datas on flows possess relevance, then determine that this relevance is spurious correlation;
If one host data and a data on flows possess relevance, but both time difference overtime difference limen values, then determine that this relevance is spurious correlation.
6. the method according to any one of claim 1 to 5, wherein, utilizes the corresponding relation of each pair of host processes title and data pack load to set up described application identification model, comprising:
Respectively machine language conversion is carried out to host processes title and data pack load, be converted into machine recognizable machine data;
Set up corresponding relation between host processes title in post-conversion and data pack load further, and utilize this corresponding relation to set up described application identification model.
7. method according to claim 6, wherein, machine language conversion is carried out to host processes title, be converted into machine recognizable machine data, comprise:
By host processes title with from 0 and the ordered list increased progressively one by one map, each host processes title is converted into corresponding natural number.
8. a recognition methods for data on flows, comprising:
Receive data on flows, wherein, in described data on flows, carry data pack load when network side node receives this data on flows;
Described data on flows is converted to the identifiable design data of application identification model;
Described identifiable design data are inputted described application identification model, obtains the probability that institute's identification data belongs to different host processes;
The host processes that data on flows is corresponding according to the described probability identification obtained.
9., based on an application identification model apparatus for establishing for degree of depth study, be applied to the environment that main frame and network side node carry out transfer of data, described main frame be provided with the host processes that at least one possesses data-handling capacity, comprise:
First acquisition module, is suitable for many host datas obtaining described Host Transfer, wherein, carries in described main frame the host processes title that this host data processes in each host data;
Second acquisition module, is suitable for obtaining many datas on flows that described network side node receives, and wherein, carries data pack load when described network side node receives this data on flows in each data on flows;
Comparing module, is suitable for comparing to each host data and each data on flows, to find out at least one pair of host data and data on flows of wherein possessing relevance;
3rd acquisition module, is suitable for processing the parameter of each host data and data on flows to possessing relevance, to obtain each host processes title to possessing corresponding to the host data of relevance and data on flows and the corresponding relation between data pack load;
Set up module, be suitable for utilizing the corresponding relation of each pair of host processes title and data pack load to set up described application identification model.
10. a recognition device for data on flows, comprising:
Receiver module, is suitable for receiving data on flows, wherein, carries data pack load when network side node receives this data on flows in described data on flows;
Modular converter, is suitable for the identifiable design data described data on flows being converted to application identification model;
Input module, is suitable for described identifiable design data to input described application identification model, obtains the probability that institute's identification data belongs to different host processes;
Identification module, is suitable for the host processes that described in the probability identification that obtains according to described input module, data on flows is corresponding.
CN201610018242.2A 2016-01-12 2016-01-12 Using identification model method for building up, the recognition methods of data on flows and device Active CN105516027B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610018242.2A CN105516027B (en) 2016-01-12 2016-01-12 Using identification model method for building up, the recognition methods of data on flows and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610018242.2A CN105516027B (en) 2016-01-12 2016-01-12 Using identification model method for building up, the recognition methods of data on flows and device

Publications (2)

Publication Number Publication Date
CN105516027A true CN105516027A (en) 2016-04-20
CN105516027B CN105516027B (en) 2019-03-12

Family

ID=55723677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610018242.2A Active CN105516027B (en) 2016-01-12 2016-01-12 Using identification model method for building up, the recognition methods of data on flows and device

Country Status (1)

Country Link
CN (1) CN105516027B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105812188A (en) * 2016-04-25 2016-07-27 北京网康科技有限公司 Traffic recognition method and device
CN106130839A (en) * 2016-07-12 2016-11-16 电子科技大学 A kind of business recognition method being applied to broadband access network
CN106790019A (en) * 2016-12-14 2017-05-31 北京天融信网络安全技术有限公司 The encryption method for recognizing flux and device of feature based self study
CN108924090A (en) * 2018-06-04 2018-11-30 上海交通大学 A kind of shadowsocks flow rate testing methods based on convolutional neural networks
CN109361617A (en) * 2018-09-26 2019-02-19 中国科学院计算机网络信息中心 A kind of convolutional neural networks traffic classification method and system based on network payload package
CN109802868A (en) * 2019-01-10 2019-05-24 中山大学 A kind of mobile application real-time identification method based on cloud computing
WO2020024761A1 (en) * 2018-07-30 2020-02-06 华为技术有限公司 Method and apparatus for generating application identification model
CN113326946A (en) * 2020-02-29 2021-08-31 华为技术有限公司 Method, device and storage medium for updating application recognition model
CN114499941A (en) * 2021-12-22 2022-05-13 天翼云科技有限公司 Training and detecting method of flow detection model and electronic equipment
CN116204386A (en) * 2023-04-26 2023-06-02 北京明易达科技股份有限公司 Method, system, medium and equipment for automatically identifying and monitoring application service relationship

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741908A (en) * 2009-12-25 2010-06-16 青岛朗讯科技通讯设备有限公司 Identification method for application layer protocol characteristic
CN101764748A (en) * 2009-12-16 2010-06-30 福建星网锐捷网络有限公司 Method for identifying application program, device and system thereof
CN105100091A (en) * 2015-07-13 2015-11-25 北京奇虎科技有限公司 Protocol identification method and protocol identification system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101764748A (en) * 2009-12-16 2010-06-30 福建星网锐捷网络有限公司 Method for identifying application program, device and system thereof
CN101741908A (en) * 2009-12-25 2010-06-16 青岛朗讯科技通讯设备有限公司 Identification method for application layer protocol characteristic
CN105100091A (en) * 2015-07-13 2015-11-25 北京奇虎科技有限公司 Protocol identification method and protocol identification system

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105812188A (en) * 2016-04-25 2016-07-27 北京网康科技有限公司 Traffic recognition method and device
CN106130839A (en) * 2016-07-12 2016-11-16 电子科技大学 A kind of business recognition method being applied to broadband access network
CN106130839B (en) * 2016-07-12 2019-03-01 电子科技大学 A kind of business recognition method applied to broadband access network
CN106790019A (en) * 2016-12-14 2017-05-31 北京天融信网络安全技术有限公司 The encryption method for recognizing flux and device of feature based self study
CN106790019B (en) * 2016-12-14 2019-10-11 北京天融信网络安全技术有限公司 Encryption method for recognizing flux and device based on feature self study
CN108924090B (en) * 2018-06-04 2020-12-11 上海交通大学 Method for detecting traffics of shadowsocks based on convolutional neural network
CN108924090A (en) * 2018-06-04 2018-11-30 上海交通大学 A kind of shadowsocks flow rate testing methods based on convolutional neural networks
CN110784330B (en) * 2018-07-30 2022-04-05 华为技术有限公司 Method and device for generating application recognition model
WO2020024761A1 (en) * 2018-07-30 2020-02-06 华为技术有限公司 Method and apparatus for generating application identification model
CN110784330A (en) * 2018-07-30 2020-02-11 华为技术有限公司 Method and device for generating application recognition model
CN109361617A (en) * 2018-09-26 2019-02-19 中国科学院计算机网络信息中心 A kind of convolutional neural networks traffic classification method and system based on network payload package
CN109802868A (en) * 2019-01-10 2019-05-24 中山大学 A kind of mobile application real-time identification method based on cloud computing
CN109802868B (en) * 2019-01-10 2022-05-06 中山大学 Mobile application real-time identification method based on cloud computing
CN113326946A (en) * 2020-02-29 2021-08-31 华为技术有限公司 Method, device and storage medium for updating application recognition model
WO2021169294A1 (en) * 2020-02-29 2021-09-02 华为技术有限公司 Application recognition model updating method and apparatus, and storage medium
CN114499941A (en) * 2021-12-22 2022-05-13 天翼云科技有限公司 Training and detecting method of flow detection model and electronic equipment
CN114499941B (en) * 2021-12-22 2023-08-04 天翼云科技有限公司 Training and detecting method of flow detection model and electronic equipment
CN116204386A (en) * 2023-04-26 2023-06-02 北京明易达科技股份有限公司 Method, system, medium and equipment for automatically identifying and monitoring application service relationship
CN116204386B (en) * 2023-04-26 2023-07-28 北京明易达科技股份有限公司 Method, system, medium and equipment for automatically identifying and monitoring application service relationship

Also Published As

Publication number Publication date
CN105516027B (en) 2019-03-12

Similar Documents

Publication Publication Date Title
CN105516027A (en) Application identification model establishing method, and flow data identification method and device
CN114143020B (en) Rule-based network security event association analysis method and system
CN103514201B (en) Method and device for querying data in non-relational database
US8713068B2 (en) Media identification system with fingerprint database balanced according to search loads
CN104391881B (en) A kind of daily record analytic method and system based on segmentation methods
CN103870381B (en) A kind of test data generating method and device
CN105100091A (en) Protocol identification method and protocol identification system
US8037057B2 (en) Multi-column statistics usage within index selection tools
CN107579956A (en) The detection method and device of a kind of user behavior
CN106254321A (en) A kind of whole network abnormal data stream sorting technique
CN110175730A (en) A kind of government policy intelligence and the matched system and method for enterprise based on big data
CN103971134A (en) Image classifying, retrieving and correcting method and corresponding device
CN113157651B (en) Method, system, equipment and medium for renaming resource files of android project in batches
CN109474691B (en) Method and device for identifying equipment of Internet of things
CN104092618A (en) Peer-to-peer network traffic feature selection method based on cuckoo search algorithm
CN111177360A (en) Self-adaptive filtering method and device based on user logs on cloud
CN102710491B (en) The method and apparatus that the lossless real-time line rate of the PATRICIA trees aided in using PCAP type filters and hardware is filtered
CN105630797A (en) Data processing method and system
US11914641B2 (en) Text to color palette generator
CN105812280A (en) Classification method and electronic equipment
CN109063040A (en) Client-side program collecting method and system
CN101261645A (en) Method and apparatus for obtaining multiple layer information
CN111143651A (en) New media integration operation data acquisition analysis system for management
CN114511330B (en) Ether house Pompe fraudster detection method and system based on improved CNN-RF
Einziger et al. A formal analysis of conservative update based approximate counting

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Co-patentee after: QAX Technology Group Inc.

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Co-patentee before: BEIJING QIANXIN TECHNOLOGY Co.,Ltd.

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder