WO2021164340A1 - Data processing method and device therefor - Google Patents

Data processing method and device therefor Download PDF

Info

Publication number
WO2021164340A1
WO2021164340A1 PCT/CN2020/129007 CN2020129007W WO2021164340A1 WO 2021164340 A1 WO2021164340 A1 WO 2021164340A1 CN 2020129007 W CN2020129007 W CN 2020129007W WO 2021164340 A1 WO2021164340 A1 WO 2021164340A1
Authority
WO
WIPO (PCT)
Prior art keywords
characteristic
data
application
network device
application category
Prior art date
Application number
PCT/CN2020/129007
Other languages
French (fr)
Chinese (zh)
Inventor
武维
郭建伟
李璠
李建平
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021164340A1 publication Critical patent/WO2021164340A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2475Traffic characterised by specific attributes, e.g. priority or QoS for supporting traffic characterised by the type of applications

Definitions

  • the embodiments of the present application relate to the field of network communication technology, and specifically relate to a data processing method and equipment.
  • DPI deep packet inspection
  • the DPI-based technology performs in-depth data analysis on the data stream, adds application layer data analysis, and finds the domain name information of the server in the parsed application layer data to identify the application category corresponding to the traffic in the network.
  • DPI-based technology uses plaintext parsing to parse the data packets in the pipeline, and plaintext parsing will affect the security of user data.
  • the embodiment of the present application provides a data processing method, which is used to identify the message in the pipeline code stream by the first network device according to the obtained application correlation information without performing plaintext parsing on the message when the application is identified in the network.
  • the corresponding application category improves the security of user data.
  • the first aspect of this application provides a data processing method.
  • the first network device When it is necessary to identify the application category corresponding to the data stream in the network pipeline, the first network device will obtain the data to be detected in the pipeline data, that is, the data to be detected includes byte data in the pipeline data.
  • the first network device After the first network device obtains the to-be-detected data, the first network device processes the to-be-detected data to obtain one or more first characteristic regions, where the first characteristic region includes at least one byte in the to-be-detected data The data.
  • the first network device obtains application relevance information stored in the system, where the application relevance information is used to indicate the relevance between the one or more first feature regions and the application category in the application relevance information.
  • the first network device After the first network device obtains the application relevance information, the first network device determines the application category corresponding to the one or more first characteristic areas according to the one or more first characteristic areas and the application relevance information, Then determine the application category corresponding to the data to be detected.
  • the first network device when the first network device performs application identification, it processes the to-be-detected data obtained from the pipeline code stream to obtain the first characteristic area, and according to the obtained application correlation information and the first The characteristic area determines the application category of the data to be detected, and the application category corresponding to the data to be detected can be determined without clear text analysis, which improves the security of user data.
  • the first network device determines the application category corresponding to the first characteristic area according to the application correlation information and the w first characteristic areas, and the first characteristic area and the corresponding application For the regional correlation between the categories, the first network device counts the sum of the regional correlations of the first feature region corresponding to each application category based on the application category.
  • the first network device determines that the to-be-detected data corresponds to the first application category based on the maximum value of the sum of the regional correlations of the first feature area corresponding to the first application category.
  • the first network device determines the area relevance corresponding to the first characteristic area according to the application relevance information, and determines the application category corresponding to the data to be detected according to the area relevance, which improves the feasibility of the solution.
  • the application correlation information further includes the correlation information of p third characteristic regions, where the correlation information of the third characteristic region includes the third characteristic region and the third characteristic region.
  • the feasibility of the solution is improved.
  • the first network device based on the sum of the regional correlation of the first feature region corresponding to the first application category is the maximum value, and the maximum value is greater than a preset threshold, it is determined that the data to be detected corresponds to the first application category.
  • the device on the first network determines that the value of the sum of the correlation degrees of the first feature area needs to be higher than the preset threshold before the first network device determines that the first application category is the application category corresponding to the data to be detected , Because when the sum of the correlation degrees of the first feature region is still lower than the preset threshold, it indicates that there is no information that is strongly related to the application category in the application correlation information in the data to be detected, so the data to be detected The corresponding application category may not be in the application relevance information, so it is necessary to set the sum of the relevance of the first feature area to be higher than the preset threshold in order to determine the application category corresponding to the data to be detected, thereby improving the solution determination The accuracy of the data to be tested.
  • the acquired data to be detected includes at least the first K bytes of a message
  • the first network device responds to the first K bytes including at least one message.
  • Sliding window processing is performed on the to-be-detected data to obtain w first feature regions.
  • the first network device processes the to-be-detected data by means of a sliding window to obtain the first characteristic region, which improves the feasibility of the solution.
  • the first characteristic area includes continuous s bytes of data, and the s is a positive integer greater than 1.
  • the feasibility of the solution is improved by limiting the specific data format of the first characteristic area.
  • the first network device before the first network device obtains the to-be-detected data from the pipeline data, the first network device generates the application correlation information.
  • the first network device When the first network device prepares to generate application relevance information, the first network device obtains byte data corresponding to the first application category, that is, the byte data corresponding to the first application category is the first data.
  • the first network device inputs the first data into the trained first model.
  • the first model will output the predicted application category.
  • the first model is trained by the first network device, or it can be sent after training by other devices.
  • the predicted application category information is the first application category.
  • the first network device After the first network device obtains the first application category, the first network device obtains n second characteristic regions based on the first application category and the first model, and the second characteristic regions include q adjacent bytes in the first data , N and q are positive integers.
  • the first network device After the first network device obtains the n second feature regions, the first network device determines the regional relevance of the second feature region and the first application category, and generates application relevance information, where the application relevance information includes the second feature The regional correlation between the region and the second feature region.
  • the first network device obtains the relevant byte data of the first application category and inputs the data into the trained first model to obtain the predicted application category information, and generates the predicted application category information according to the predicted application category information.
  • the application of relevance information improves the feasibility of the solution.
  • the application correlation information further includes second characteristic region correlation information
  • the second characteristic region correlation information includes a second characteristic region, and the first characteristic region corresponding to the second characteristic region Application category, and the regional correlation between the second feature area and the first application category.
  • the n second feature areas contain at least one first feature area in the w first feature areas, that is, the data to be detected corresponds to The application category of is the first application category.
  • the first network device obtains h first characteristic values based on the first application category and the first model. For example, the first network device may calculate the first characteristic value according to the first model. An application category, h first feature values are obtained, and the h first feature values are used to indicate the correlation between the first application category and the first feature point in the first data, and the first feature point includes at least the first feature point in the first data One byte of data, the h is a positive integer.
  • the first network device After the first network device obtains the h first characteristic values, the first network device obtains n second characteristic regions according to the h first characteristic values.
  • the first network device obtains h first feature values by processing the first application category, and obtains n second feature regions according to the h first feature values, which improves the feasibility of the solution.
  • the first network device obtains z target feature points in the first data according to h first feature values, and a feature corresponding to a target feature point in the first data
  • the value is one of the first z eigenvalues in the h first eigenvalues sorted from largest to smallest, where z is a positive integer, and z is an integer less than or equal to h.
  • the first network device After the first network device obtains z target feature points, the first network device obtains n second feature regions according to the z target feature points, that is, each second feature region includes at least one target feature point.
  • the first network device obtains z target feature points in the first data according to the h first feature values, and obtains n second feature regions according to the z target feature points, because one target feature
  • the feature value corresponding to the point is one of the first few feature values arranged in descending order of h first feature values, because the feature value indicates the degree of association between the feature point and the application category, the feature value The higher the higher, the higher the degree of association. Therefore, the higher the degree of association between the n second feature regions obtained from the target feature point and the application category, and the application correlation information generated based on the n feature regions will be used in the subsequent The higher the accuracy rate when determining the application category.
  • the midpoint of the second feature region is the target feature point.
  • the feasibility of the solution is improved by explaining the composition method of the second characteristic region.
  • the first network device is based on the number of times each characteristic area in the n second characteristic areas appears in the corresponding application category, and each of the n second characteristic areas.
  • the number of feature regions corresponding to the application category corresponding to the feature region in the n second feature regions, and the region correlation degree of each feature region in the n second feature regions is obtained, and each feature region in the n second feature regions
  • the regional correlation degree of represents the correlation degree between each feature region of the n second characteristic regions and the first application category, that is, the higher the regional correlation degree, the higher the correlation degree with the first application category.
  • the first network device generates application relevance information according to the area relevance of each of the n second characteristic areas.
  • the first network device generates application relevance information according to the area relevance corresponding to each characteristic area in the n second characteristic areas, which improves the feasibility of the solution.
  • n second features are obtained by using q consecutive feature points with each target feature point as the midpoint among the z target feature points Area, the m is a positive integer less than n.
  • the second network device obtains the second feature area by taking the target feature point as the midpoint, which improves the feasibility of the solution.
  • the first network device deletes these two features The feature region that appears less frequently in the first feature region in the region.
  • the first network device when the similarity of two different feature regions in the n second feature regions is high, deletes the feature that appears less frequently in the n second feature regions Region, avoiding the repeated calculation of the region correlation degree for some highly similar feature regions when calculating the region correlation degree, which improves the accuracy of calculating the region correlation degree.
  • this characteristic area is the fifth characteristic area.
  • the first network device deletes the characteristic areas corresponding to at least two application categories in the n second characteristic areas, because when there are characteristic areas corresponding to more than two application categories, it means that the characteristic areas represent different categories. Therefore, it cannot represent the strongly related features of a specific application category. Therefore, after the first network device deletes the feature area corresponding to more than two application categories, the accuracy of the solution to determine the data to be detected can be improved. .
  • the application correlation information may be displayed in the form of a heat map, and the larger the feature value corresponding to the feature point in the heat map, the more vivid the color of the feature point.
  • the application relevance information is displayed in the form of a heat map, so that the result of the application relevance information can be seen more intuitively.
  • the first K bytes of information of the first data may be intercepted, and the value of K includes 784 or 1024.
  • the first K bytes of information of the data to be detected may be intercepted, and the value of K includes 784 or 1024.
  • the second aspect of the application provides a data processing method.
  • the second network device obtains the byte data corresponding to the first application category, that is, the byte data corresponding to the first application category is the first data.
  • the second network device inputs the first data into the trained first model.
  • the first model will output the predicted application category.
  • the first model is trained by the second network device, or it can be sent after training by other devices.
  • the predicted application category information is the first application category.
  • the second network device After the second network device obtains the first application category, the second network device obtains n second feature areas based on the first application category and the first model, and each of the n second feature areas includes the first application category.
  • Q adjacent bytes in a data, n and q are positive integers.
  • the second network device After the second network device obtains the n second feature regions, the second network device determines the regional relevance of the second feature region and the first application category, and generates application relevance information, where the application relevance information includes the second feature The regional correlation between the region and the second feature region.
  • the second network device obtains the relevant byte data of the first application category and inputs the data into the trained first model to obtain predicted application category information, and generates the predicted application category information according to the predicted application category information.
  • the application of relevance information improves the feasibility of the solution.
  • the application correlation information further includes second characteristic region correlation information
  • the second characteristic region correlation information includes a second characteristic region, and the first characteristic region corresponding to the second characteristic region Application category, and the regional correlation between the second feature area and the first application category.
  • the n second feature areas contain at least one first feature area in the w first feature areas, that is, the data to be detected corresponds to The application category of is the first application category.
  • the second network device obtains h first characteristic values based on the first application category and the first model.
  • the second network device may calculate the first characteristic value according to the first model.
  • An application category, h first feature values are obtained, and the h first feature values are used to indicate the correlation between the first application category and the first feature point in the first data, and the first feature point includes at least the first feature point in the first data
  • the h is a positive integer.
  • the second network device After the second network device obtains the h first characteristic values, the second network device obtains n second characteristic regions according to the h first characteristic values.
  • the second network device obtains h first feature values by processing the first application category, and obtains n second feature regions according to the h first feature values, which improves the feasibility of the solution .
  • the second network device obtains z target feature points in the first data according to h first feature values, and a feature corresponding to one target feature point in the first data
  • the value is one of the first z eigenvalues in the h first eigenvalues sorted from largest to smallest, where z is a positive integer, and z is an integer less than or equal to h.
  • the second network device After the second network device obtains z target feature points, the second network device obtains n second feature regions according to the z target feature points, that is, each second feature region includes at least one target feature point.
  • the second network device obtains z target feature points in the first data according to the h first feature values, and obtains n second feature regions according to the z target feature points, because one target feature
  • the feature value corresponding to the point is one of the first few feature values arranged in descending order of h first feature values, because the feature value indicates the degree of association between the feature point and the application category, the feature value The higher the higher, the higher the degree of association. Therefore, the higher the degree of association between the n second feature regions obtained from the target feature point and the application category, and the application correlation information generated based on the n feature regions will be used in the subsequent The higher the accuracy rate when determining the application category.
  • the midpoint of the second feature region is the target feature point.
  • the feasibility of the solution is improved by limiting the composition of the second characteristic area.
  • the second network device is based on the number of times each characteristic area in the n second characteristic areas appears in the corresponding application category, and each of the n second characteristic areas The number of feature regions corresponding to the application category corresponding to the feature region in the n second feature regions, and the region correlation degree of each feature region in the n second feature regions is obtained, and each feature region in the n second feature regions
  • the regional correlation degree of represents the correlation degree between each feature region of the n second characteristic regions and the first application category, that is, the higher the regional correlation degree, the higher the correlation degree with the first application category.
  • the second network device generates application relevance information according to the area relevance of each of the n second characteristic areas.
  • the second network device generates application relevance information according to the area relevance corresponding to each characteristic area in the n second characteristic areas, which improves the feasibility of the solution.
  • n second features are obtained by using q consecutive feature points with each target feature point in the z target feature points as the midpoint.
  • Area, the m is a positive integer less than n.
  • the second network device obtains the second feature area by taking the target feature point as the midpoint, which improves the feasibility of the solution.
  • the first network device deletes these two features The feature region that appears less frequently in the first feature region in the region.
  • the second network device when the similarity of two different feature regions in the n second feature regions is very high, deletes the feature that appears less frequently in the n second feature regions Region, avoiding the repeated calculation of the region correlation degree for some highly similar feature regions when calculating the region correlation degree, which improves the accuracy of calculating the region correlation degree.
  • this characteristic area is the fifth characteristic area.
  • the second network device deletes the characteristic areas corresponding to at least two application categories in the n second characteristic areas, because when there are characteristic areas corresponding to more than two application categories, it means that the characteristic areas represent different categories. Therefore, it cannot represent the strong correlation feature of a specific application category. Therefore, after the second network device deletes the feature area corresponding to more than two application categories, the accuracy of the solution to determine the data to be detected can be improved. .
  • the application correlation information may be displayed in the form of a heat map, and the larger the feature value corresponding to the feature point in the heat map, the brighter the color of the feature point.
  • the application relevance information is displayed in the form of a heat map, so that the result of the application relevance information can be seen more intuitively.
  • the first K bytes of information of the first data may be intercepted, and the value of K includes 784 or 1024.
  • the second network device after the second network device obtains the application relevance information, the second network device sends the application relevance information to the first network device.
  • the application relevance information is sent to the first network device, which improves the feasibility of the solution.
  • the third aspect of this application provides a network device.
  • the obtaining unit is used to obtain the data to be detected
  • a processing unit configured to obtain w first characteristic regions according to the data to be detected, the first characteristic regions include at least one byte of data in the data to be detected, and w is a positive integer;
  • the determining unit is configured to determine the application category corresponding to the data to be detected according to the w first feature regions and the application correlation information, and the application correlation information indicates the correlation between the first feature region and the application category.
  • the determining unit is specifically configured to determine the application category corresponding to the first feature area and the regional correlation between the first feature area and the corresponding application category according to the w first feature areas and application correlation information;
  • a statistical unit configured to count the sum of the regional correlations of the first feature region corresponding to each application category based on the application category;
  • the determining unit is further configured to determine that the to-be-detected data corresponds to the first application category based on that the sum of the regional correlations of the first feature area corresponding to the first application category is the maximum value.
  • the application correlation information includes the correlation information of p third characteristic regions, where the correlation information of the third characteristic region includes the third characteristic region, the application category corresponding to the third characteristic region, and the relationship between the third characteristic region and the third characteristic region.
  • the regional correlation between the corresponding application categories; the p third characteristic regions include at least one characteristic region among the w first characteristic regions.
  • the data to be detected includes the first K bytes of at least one message
  • the processing unit is specifically configured to perform sliding window processing on the first K bytes of at least one message to obtain w first characteristic regions.
  • the first characteristic area includes s consecutive bytes, and s is an integer greater than 1.
  • the acquiring unit is further configured to acquire first data, where the first data includes byte data corresponding to the first application category;
  • Network equipment also includes:
  • the input unit is used to input the first data into the first model, where the output of the first model is the first application category;
  • the processing unit is further configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
  • the determining unit is further configured to determine the regional correlation between the second characteristic area and the first application category;
  • Network equipment also includes:
  • the generating unit is configured to generate application relevance information based on the relevance of the second characteristic area and the area of the first application category.
  • the application relevance information includes second feature area relevance information
  • the second feature area relevance information includes a second feature area, the first application category corresponding to the second feature area, and the second feature area and the first application category Regional relevance;
  • the n second characteristic regions include at least one first characteristic region among the w first characteristic regions, and the application category corresponding to the data to be detected is the first application category.
  • the fourth aspect of the present application provides a network device.
  • An acquiring unit configured to acquire first data, where the first data includes byte data corresponding to the first application category;
  • the input unit is used to input the first data into the first model, where the output of the first model is the first application category;
  • the processing unit is configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
  • the determining unit is used to determine the regional correlation between the second characteristic area and the first application category
  • the generating unit is configured to generate application relevance information based on the relevance of the second characteristic area and the area of the first application category.
  • the application relevance information includes second feature area relevance information
  • the second feature area relevance information includes a second feature area, the first application category corresponding to the second feature area, and the second feature area and the first application category Regional relevance.
  • the processing unit is specifically configured to obtain h first feature values based on the first application category and the first model, where the first feature value indicates the correlation between the first application category and the first feature point in the first data, and the first The characteristic point includes one byte of data in the first data, and h is a positive integer;
  • the processing unit is specifically configured to obtain n second characteristic regions according to the h first characteristic values.
  • the obtaining unit is further configured to obtain z target feature points according to the h first feature values, and the feature value of the target feature point is the first z of the h first feature values arranged in descending order of value.
  • One of the eigenvalues, z is a positive integer, and z is an integer less than or equal to h;
  • the processing unit is specifically configured to obtain n second feature regions according to z target feature points, and each second feature region includes at least one target feature point.
  • the midpoint of the second feature region is the target feature point.
  • the n second feature regions include a sixth feature region and a fourth feature region, if the ratio of feature points in the sixth feature region and feature points in the fourth feature region is greater than the first preset threshold, and The number of times that the sixth characteristic area appears in the characteristic area of the corresponding application category in the first application category is greater than the number of times the fourth characteristic area appears in the characteristic area of the corresponding application category in the first application category, then the network device further includes:
  • the processing unit is used to delete the information of the fourth characteristic area.
  • the n second characteristic regions include a fifth characteristic region, and if the fifth characteristic region corresponds to at least two application categories in the application relevance information, the processing unit is further configured to delete the information of the fifth characteristic region.
  • the fifth aspect of the present application provides a network device.
  • At least one processor and a memory stores program code, and the processor calls the program code to execute the method described in the implementation manner of the first aspect of the present application.
  • the sixth aspect of the present application provides a network device.
  • At least one processor and a memory stores program code, and the processor calls the program code to execute the method described in the implementation manner of the second aspect of the present application.
  • the seventh aspect of the present application provides an application identification system, including a first network device and a second network device.
  • the first network device is used to execute the method described in the implementation manner of the first aspect of the present application.
  • the second network device is used to execute the method described in the implementation manner of the second aspect of the present application.
  • the second network device is used to send application relevance information to the first network device.
  • the eighth aspect of the present application provides a computer storage medium.
  • the computer storage medium stores instructions.
  • the computer executes the same as the first aspect of the present application, and/or the second Aspects implement the method described in the mode.
  • the ninth aspect of the present application provides a computer program product.
  • the computer program product When the computer program product is executed on a computer, the computer executes the method described in the first aspect of the present application and/or the implementation manner of the second aspect.
  • the first network device obtains the data to be detected, and processes the data to be detected to obtain the first characteristic area, and determines the application category of the data to be detected according to the obtained application correlation information and the first characteristic area, without the need to parse the data in plaintext , Improve the security of user data.
  • Figure 1 is a schematic diagram of a network architecture in an embodiment of the application
  • Figure 2 is a schematic flowchart of a data processing method in an embodiment of the application
  • FIG. 3 is a schematic flowchart of another data processing method in an embodiment of the application.
  • FIG. 4 is a schematic diagram of the structure of a network device in an embodiment of the application.
  • FIG. 5 is a schematic structural diagram of another network device in an embodiment of this application.
  • FIG. 6 is a schematic structural diagram of another network device in an embodiment of this application.
  • FIG. 7 is a schematic structural diagram of another network device in an embodiment of this application.
  • Fig. 8 is a schematic structural diagram of another network device in an embodiment of this application.
  • the embodiment of the application provides a data processing method and device, which are used in the application identification of pipeline data by obtaining the data to be detected in the pipeline code stream, and processing the data to be detected to obtain the first characteristic area, and according to the application
  • the relevance information and the first feature area determine the application category of the data to be detected, without the need to parse the data in plain text, which improves the security of user data.
  • Figure 1 is a schematic diagram of the network architecture provided for this application.
  • the embodiment of the present application provides an exemplary network architecture.
  • the network architecture includes at least the first network device 101.
  • the first network device 101 can be connected to a network pipe, which is used to transmit data.
  • the network pipe can be a network pipe in a local area network, a network pipe in a wide area network, or a network pipe in other scenarios. There is no limitation here.
  • the first network device 101 can be installed between the router and the core network, connected by wired or wireless, can also be installed between the core network and the firewall, or can be installed in the local area network, as long as the first network device 101 is connected Just go to the network pipeline, such as the convergence node of the network traffic, the node where the network traffic flows through, etc.
  • the specifics are not limited here.
  • the first network device 101 is configured to generate application relevance information, and then obtain the data to be detected online in real time, and determine the application category of the data to be detected through the application relevance information.
  • the first network device 101 is configured to identify the application category corresponding to the message in the data transmission pipeline according to the application correlation information, and distinguish the data packet traffic belonging to different application types for data analysis.
  • the first network device 101 can be a server with a separate function, such as a separate application identification server, or can be integrated into an existing server, such as integrated in a network management server, or integrated in a network monitoring server, or integrated in traffic management
  • the server is medium, and the specific server format is not limited here.
  • the base station receives the data sent by the terminal and transmits the data to the route through the network channel. After the route transfers the data, the data is transmitted to the core network, and the core network then transmits the data that needs to be transmitted to the data destination, passing through the firewall, and finally arrives at the receiver. Data party.
  • the first network device 101 is connected to the network pipeline, such as the convergence node of the network traffic, the node that the network traffic flows through, etc.
  • the first network device 101 mirrors the place where the data flows in the communication network. Part of the data is used for application identification analysis.
  • one first network device 101 may exist alone, or multiple first network devices 101 may exist at the same time, and the details are not limited here. .
  • the network architecture may further include a terminal device 103.
  • the first network device 101 may send the data of the application category to the terminal device 103, so that the terminal device 103 may receive the data of the application category, and then process the data of the application category, for example, display the data of the application category.
  • the specific processing method is not limited here.
  • the terminal device 103 may be a computer device, or other devices, such as a network management device, which is not specifically limited here.
  • the network architecture may further include a second network device 102, and the second network device 102 may work offline and independently, or may be connected to the first network device.
  • the second network device 102 can be used on the offline side, that is, to obtain the first data used to train the model, and then train the first model through the first data, and obtain the first model according to the trained first model and the first data.
  • Application relevance information The second network device 102 is also used to send application relevance information to the first network device, which is not specifically limited here.
  • the first network device 101 may also be used on the offline side to obtain application correlation information, then the network architecture does not include the second network device 102.
  • Network pipe A collective name for equipment used to carry network data packets.
  • Application identification Identify which application category the traffic in the pipeline belongs to, for example, the pipeline traffic belongs to APP1, APP2, etc.
  • Code stream the data packet stream in the network.
  • Heat map A visual way to express the importance of data through color changes. For example, in the heat map, the brighter the data, the greater the impact on the result of application recognition.
  • Active area the location area where the data in the heat map has a greater influence, indicating the brighter location area in the heat map.
  • Pull test similar to a network data crawler, intercepting data packet information from the network.
  • the first network device and the second network device instead of the network device are used as an example for description.
  • the first network device can obtain application relevance information through the first model training, and then determine the application category corresponding to the message in the pipeline data through the application relevance information, and can also receive applications sent by other network devices
  • the relevance information is used to determine the application category corresponding to the message in the pipeline data through the application relevance information sent by other network devices. Therefore, there are several specific implementation manners of the embodiments of the present application, which are described below.
  • the first network device generates application correlation information.
  • FIG. 2 is a schematic flowchart of an embodiment of the data processing method provided by this application.
  • this embodiment can be divided into an online side and an offline side.
  • the online side is the online real-time identification of the application category corresponding to the online data stream
  • the offline side is the training model to obtain application relevance information.
  • the application correlation information can be used on the online side to identify the application category corresponding to the online data stream.
  • the offline side will be described.
  • step 201 the second network device obtains the first data.
  • the second network device obtains the data stream corresponding to the first application category, and the data stream includes byte data corresponding to the first application category.
  • the second network device may obtain the data stream in the pipeline data by means of plug-in testing, It is also possible to collect multiple data streams through other devices, and then uniformly send them to the second network device, which is not specifically limited here.
  • the data stream obtained by the second network device may be as follows:
  • the second network device may also obtain data streams corresponding to multiple application categories, as shown below:
  • the second network device acquires and processes the data of the first application category as an example for description. It should be understood that the acquisition and processing of data of multiple application categories is similar, and this application does not constitute a limitation.
  • the display mode of the data stream can be binary or converted hexadecimal, which is not specifically done here.
  • this application uses hexadecimal as an example for description.
  • this application is described in units of bytes, and this application may also be described in units of bits, etc., which is not specifically limited here.
  • the second network device may not intercept the first K bytes of data of the data stream, but use the data stream for subsequent processing, which is not specifically limited here.
  • the first data includes the first K bytes of data of the data stream.
  • the second network device does not perform the data stream When intercepting, the first data is the data stream.
  • the first data includes byte data of the multiple data streams.
  • step 202 the second network device trains the first model.
  • the second network device builds a multi-layer convolutional neural network.
  • the multi-layer convolutional neural network can be three or five layers, it can be a VGG type neural network, or it can be a ResNet
  • the type of neural network is not limited here.
  • the structure of the five-layer convolutional neural network is an input layer, a first hidden layer, a second hidden layer, a third hidden layer, and an output layer in order.
  • the number of nodes in the input layer is equal to K, which is the same as K in the first K bytes of information in the intercepted message by the second network device.
  • the number of nodes in the output layer of the neural network is the number of application categories.
  • the number of nodes in the output layer is one node.
  • the number of nodes in the output layer is the corresponding multiple nodes.
  • the input layer data adopts convolution operation, and the linear rectification function (rectified linear unit, ReLU) is used to generate the first hidden layer, the first hidden layer is convolution operation, and the ReLU activation function is used to generate the second hidden layer.
  • Layer for the second hidden layer, use global average pooling (GAP) operation to generate the third hidden layer, and use the fully connected operation for the third hidden layer, and activate it through the normalized exponential function softmax Function to generate output layer data.
  • GAP global average pooling
  • the first data may be normalized to obtain normalized data for training of the first model.
  • the normalization can be achieved by the following method: One:
  • the corresponding prediction category is obtained through the forward operation of the first model, the cross entropy loss value of the prediction category and the first application category is calculated, the gradient descent method is executed, and the model parameters are updated.
  • the training of the first model is completed. It should be understood that when the training of the first model is completed, the data corresponding to the first application category is input to the first model, and the output of the first model is the first application category.
  • step 203 the second network device obtains h first feature values based on the first application category and the first model.
  • the second network device obtains h first feature values based on the first application category and the first model.
  • the first feature value represents the correlation between the first feature point in the first data and the first application category.
  • the first feature The point refers to one byte of data, and the larger the feature value corresponding to the first feature point, the higher the correlation with the first application category.
  • the first feature point may also refer to multiple bytes of data, such as 2 bytes, etc., and this application does not constitute a limitation.
  • the second network device can obtain the h first feature values in a variety of ways. For example, the second network device obtains the connection weight value based on the first application category and the data of the last hidden layer in the architecture of the first model. The obtained connection weight value is multiplied with the corresponding penultimate hidden layer value to obtain the weighted feature information. The weighted feature information is added, and the first data is up-sampling to obtain the h The first characteristic value.
  • the h first eigenvalues can also be obtained in different ways.
  • This embodiment is a schematic example, and the method for obtaining the h first eigenvalues is not specifically described. The limit.
  • step 204 the second network device obtains z target feature points according to the h first feature values.
  • the second network device After obtaining the h first feature values, the second network device obtains the first z feature values of the h first feature values, where z is a positive integer less than or equal to h, thereby obtaining the z features Z feature points corresponding to the value. Take the z feature points as z target feature points.
  • the data size in the first eigenvalue is ranked first.
  • the eigenvalues of different feature points may be the same, for example:
  • step 205 the second network device obtains n second feature regions according to z target feature points.
  • the second network device After the second network device obtains z target feature points, the second network device intercepts one or more consecutive feature points including at least one target feature point in the first data to obtain the second feature area, and so on, There will be n second feature regions, where n is a positive integer greater than or equal to z.
  • n is a positive integer greater than or equal to z.
  • the following will take the second feature region including q consecutive feature points as an example for description, and q is an integer greater than or equal to 1. It should be understood that for different second feature regions, the value of q may be different.
  • the center point of the second feature region is the target feature point.
  • step 203 to step 205 are steps performed on one data stream. When there are multiple data streams, step 203 to step 205 are repeated. For data streams of multiple application categories, the processing of step 203 to step 205 is performed on the data streams corresponding to the multiple application categories, respectively.
  • step 206 the second network device determines the regional correlation between the second characteristic area and the first application category.
  • the second network device After processing the multiple data streams, the second network device further determines the regional correlation between the second characteristic area and the application category corresponding to the second characteristic area.
  • the second network device counts the number of the same second characteristic area in n second characteristic areas, and further obtains the n corresponding to the second characteristic area in the first application category. The probability of occurrence in the second characteristic area, thereby obtaining the regional correlation degree between the second characteristic area and the first application category. As shown in Table 1a:
  • the second feature area "82, 0a, 2a, 2e, 67, 76, 74, 32" in Table 1a as an example for illustration.
  • the number of the second feature area (also called the number of occurrences) is 80, the The second feature area corresponds to the first application category app1.
  • the total number of second feature areas of the first application category is 100, and the area correlation between the second feature area and its corresponding first application category is 80/100 , Which is 0.8.
  • the second network device separately counts the number of the same second feature area in the second feature area corresponding to each application category in the multiple application categories, and further obtains the second feature The probability of the region appearing in the total amount of the second feature region of the corresponding application category, thereby obtaining the regional correlation between the second feature region and the corresponding application category.
  • Table 1b shows the second network device separately counts the number of the same second feature area in the second feature area corresponding to each application category in the multiple application categories, and further obtains the second feature The probability of the region appearing in the total amount of the second feature region of the corresponding application category, thereby obtaining the regional correlation between the second feature region and the corresponding application category.
  • the second network device may perform statistics and calculations on the second characteristic regions of each application category.
  • the method for calculating the area relevance can also be (a certain The number of times that a characteristic area appears in all characteristic areas in its corresponding application category*category preference weight value)/the number of all characteristic areas in the application category is not specifically limited here.
  • the second network device when the second network device counts the number of occurrences of each characteristic area in the first data in the n second characteristic areas and the application category to which it belongs, if the n second characteristic areas include the sixth characteristic area and the first characteristic area Four characteristic regions, if the proportion of the feature points in the sixth characteristic region and the characteristic points in the fourth characteristic region being repeated is greater than the first preset threshold, and the sixth characteristic region is the characteristic region of the application category corresponding to the first application category If the number of times that the fourth characteristic area appears in the characteristic area of the corresponding application category in the first application category is greater than the number of times that the fourth characteristic area appears in the characteristic area of the corresponding application category in the first application category, the second network device deletes the information of the fourth characteristic area. It should be noted that if the number of occurrences of the two characteristic regions is equal, one of the characteristic regions will be arbitrarily deleted, and the details are not limited here.
  • the second network device when the second network device counts the number of occurrences of each characteristic area in the first data in the n second characteristic areas and the application category to which it belongs, if the n second characteristic areas further include the fifth characteristic area, And the fifth characteristic area corresponds to two or more application categories, the fifth characteristic area is deleted.
  • deleting the feature area can improve the efficiency of determining the data stream online.
  • step 207 the second network device generates application relevance information based on the relevance of the second characteristic area to the area of the first application category.
  • the application correlation information does not include the information of the characteristic area.
  • the second network device After the second network device obtains the area relevance of the characteristic areas in the n second characteristic areas, the second network device generates application relevance information according to the area relevance of the characteristic areas in the n second characteristic areas.
  • the application relevance information includes the area relevance information of the second feature area, the area relevance information of the second feature area includes the second feature area, and the second feature area corresponds to The first application category, and the regional correlation between the second feature area and the first application category.
  • the application relevance information can be as shown in Table 2a,
  • the second network device respectively generates application relevance information according to the regional relevance of each of the n second feature areas of different application categories.
  • the application relevance information includes the area relevance information of the second feature area, and the area relevance information of the second feature area includes the second feature area, the application category corresponding to the second feature area, and the second feature area and the second feature area.
  • the application relevance information can be as shown in Table 2b,
  • the application relevance information can also exist in other forms, as long as the application relevance information can indicate the association relationship between the feature area and the application category and the regional relevance of the feature area under the application category, for example,
  • the application relevance information is expressed in the form of a heat map. It is understandable that the application relevance information can also be expressed in other ways, such as a one-dimensional vector or a table, which is not specifically limited here.
  • Steps 201 to 207 describe the method on the offline side in this embodiment, and the following steps describe the method on the online side in this embodiment.
  • Figure 3 is a schematic diagram of the process on the online side of this application.
  • step 301 the first network device obtains the data to be detected.
  • the first network device receives the application relevance information sent by the second network device.
  • the first network device needs to identify and classify the data packets in the pipeline data
  • the first network device acquires the data to be detected in the pipeline data.
  • the first network device may obtain the data to be detected by dialing and testing by itself, and may also receive the data to be detected sent by other gateway devices, which is not specifically limited here.
  • the data to be detected obtained by the first network device may be a binary data packet or a hexadecimal data packet, and the specifics are not limited here.
  • step 302 the first network device obtains w first characteristic regions according to the data to be detected.
  • the first network device After the first network device obtains the data to be detected, it intercepts the first K bytes of information of the message, that is, intercepts the same byte information as when training the model on the offline side, and then uses the sliding window method according to the K bytes
  • the information generates w first feature regions, where w is a positive integer greater than or equal to 1.
  • the w first feature regions can also be obtained in other ways, for example, the w first feature regions can be obtained by AC (Aho–Corasick, AC) automata algorithm or prefix tree algorithm, which is not specifically done here. limited.
  • the AC automata algorithm needs to be constructed according to the application correlation information, and then the w first feature regions are obtained through the AC automata algorithm, that is, according to Apply the existing feature regions in the correlation information to automatically obtain w first feature regions that match it.
  • the characteristic area can also be obtained in other ways, as long as a collection of bytes of different sizes is obtained, which is not specifically limited here.
  • the first characteristic area can be obtained by processing the byte information of the data stream.
  • the first network device determines the regional correlation between the first characteristic region and the corresponding application category according to the application correlation information and the first characteristic region.
  • the correlation information of the third characteristic region includes the third characteristic region and the application category corresponding to the third characteristic region.
  • the regional correlation between the third characteristic area and the corresponding application category, and when the p third characteristic areas include at least one characteristic area in the w first characteristic areas the first network device is related to the application Find the area correlation information corresponding to each feature area in the w first feature areas in the degree information, such as the corresponding application category, and the area correlation with the application category.
  • the value of the regional relevance corresponding to the characteristic region is 0.
  • the first network device determines the application category corresponding to the first characteristic area and the area between the first characteristic area and the corresponding application category according to the w first characteristic areas and the application correlation information relativity;
  • the first network device counts the sum of the regional correlation degrees of the first characteristic area corresponding to each application category based on the application category;
  • the first network device determines that the data to be detected corresponds to the first application category based on that the sum of the regional correlations of the first characteristic areas corresponding to the first application category is the maximum value.
  • the first network device determines the application category corresponding to the first feature area and the area correlation between the first feature area and the corresponding application category according to the w first feature areas and application correlation information; and makes statistics based on the application category The sum of the regional relevance of the first feature area corresponding to each application category, so as to obtain the total regional relevance corresponding to each application category, for example, as shown in Table 3a below,
  • the feature area corresponding to this app1 is "65, 6a, 77, 8e, 67, 6b, 45, 33", and “33, 11, 96, 5e, 6b, 3e, 45, 33", and the regional correlations corresponding to these two feature regions are 0.4 and 0.15, respectively, then the sum of the regional correlations corresponding to the "app1" is calculated to be 0.55.
  • the network device can perform statistics and calculations on the regional correlation degrees corresponding to each application category to obtain the total regional correlation degrees corresponding to each application category.
  • step 304 the first network device determines that the data to be detected corresponds to the first application category based on that the sum of the regional correlations of the first characteristic regions corresponding to the first application category is the maximum value.
  • the first network device After the first network device obtains statistics of the total area relevance of the first feature area corresponding to different application categories, the first network device determines the data to be detected corresponding to the first feature area according to the maximum value of the sum of the area relevance of the first feature area Corresponds to the first application category.
  • the first network device may also determine whether the sum of the area correlations of the first characteristic area is higher than a preset threshold. The sum of the area correlations of the first characteristic area is higher than the preset threshold, and the first network device determines that the application category corresponding to the first characteristic area is the application category corresponding to the data to be detected. If the sum of the regional relevance of the first characteristic area is lower than the preset threshold, the first network device determines that the application category corresponding to the data to be detected is not the application category in the application relevance information.
  • the first network device may display the result of the application category corresponding to the data to be detected in the display area of the first network device, or send the result to other devices, such as Terminal equipment for operation and maintenance personnel.
  • the first network device determines that the application category corresponding to the data to be detected is not the application category in the application relevance information, it can generate application relevance information corresponding to the application category through steps 201 to 207. Furthermore, the application relevance information corresponding to the application category can be integrated with the original application relevance information to form updated application relevance information.
  • steps 201 to 207 can also be executed by the first network device.
  • steps 201 to 207 can also be executed by the first network device.
  • the first network device When executed by the first network device, then in step 301, when the first network device needs to use the application relevance information, it directly obtains the application Relevance information is sufficient.
  • the first 1024 bytes of information of the data stream include IP information, DNS information, and port information. And so on binary data ciphertext information, because this information can reflect certain characteristics of the application category, so the application correlation information is generated from the binary data of this information, and then the message data in the pipeline data is identified according to the application correlation information.
  • the application category of the first network device has improved the accuracy of identifying application categories.
  • FIG. 4 is a schematic structural diagram of an embodiment of the network device provided by this application.
  • the obtaining unit 401 is configured to obtain the data to be detected
  • the processing unit 402 is configured to obtain w first characteristic regions according to the data to be detected, the first characteristic regions include at least one byte of data in the data to be detected, and w is a positive integer;
  • the determining unit 403 is configured to determine the application category corresponding to the data to be detected according to the w first feature regions and the application correlation information, and the application correlation information indicates the correlation between the first feature region and the application category.
  • each unit of the network device is similar to those described in the foregoing embodiment shown in FIG. 2 and will not be repeated here.
  • FIG. 5 is a schematic structural diagram of another embodiment of the network device provided by this application.
  • the obtaining unit 501 is configured to obtain the data to be detected
  • the processing unit 503 is configured to obtain w first characteristic regions according to the data to be detected, the first characteristic regions include at least one byte of data in the data to be detected, and w is a positive integer;
  • the determining unit 505 is configured to determine the application category corresponding to the data to be detected according to the w first feature regions and application correlation information, and the application correlation information indicates the correlation between the first feature region and the application category.
  • the determining unit 505 is specifically configured to determine the application category corresponding to the first feature area and the regional correlation between the first feature area and the corresponding application category according to the w first feature areas and application correlation information;
  • the statistics unit 504 is configured to count the sum of the regional correlation degrees of the first feature region corresponding to each application category based on the application category;
  • the determining unit 505 is further configured to determine that the to-be-detected data corresponds to the first application category based on that the sum of the regional correlations of the first feature area corresponding to the first application category is the maximum value.
  • the application correlation information includes the correlation information of p third characteristic regions, where the correlation information of the third characteristic region includes the third characteristic region, the application category corresponding to the third characteristic region, and the relationship between the third characteristic region and the third characteristic region.
  • the regional correlation between the corresponding application categories; the p third characteristic regions include at least one characteristic region among the w first characteristic regions.
  • the data to be detected includes the first K bytes of at least one message
  • the processing unit 503 is specifically configured to perform sliding window processing on the first K bytes of at least one message to obtain w first feature regions.
  • the first characteristic area includes s consecutive bytes, and s is an integer greater than 1.
  • the acquiring unit 501 is further configured to acquire first data, where the first data includes byte data corresponding to the first application category;
  • Network equipment also includes:
  • the input unit 502 is configured to input first data into a first model, where the output of the first model is the first application category;
  • the processing unit 503 is further configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
  • the determining unit 505 is further configured to determine the regional correlation between the second characteristic region and the first application category;
  • Network equipment also includes:
  • the generating unit 506 is configured to generate application relevance information based on the relevance of the second characteristic area and the area of the first application category.
  • the application relevance information includes second feature area relevance information
  • the second feature area relevance information includes a second feature area, the first application category corresponding to the second feature area, and the second feature area and the first application category Regional relevance;
  • the n second characteristic regions include at least one first characteristic region among the w first characteristic regions, and the application category corresponding to the data to be detected is the first application category.
  • each unit of the network device is similar to those described in the foregoing embodiments shown in FIG. 2 and FIG. 3, and will not be repeated here.
  • FIG. 6 is a schematic structural diagram of another embodiment of a network device provided by this application.
  • the acquiring unit 601 is configured to acquire first data, where the first data includes byte data corresponding to the first application category;
  • the input unit 602 is configured to input first data into a first model, where the output of the first model is the first application category;
  • the processing unit 603 is configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
  • the determining unit 604 is configured to determine the regional correlation between the second characteristic area and the first application category
  • the generating unit 605 is configured to generate application relevance information based on the relevance of the second characteristic area and the area of the first application category.
  • each unit of the network device is similar to those described in the foregoing embodiment shown in FIG. 3, and will not be repeated here.
  • FIG. 7 is a schematic structural diagram of another embodiment of the network device provided by this application.
  • the obtaining unit 701 is configured to obtain first data, where the first data includes byte data corresponding to the first application category;
  • the input unit 702 is configured to input first data into a first model, where the output of the first model is the first application category;
  • the processing unit 703 is configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
  • the determining unit 704 is configured to determine the regional correlation between the second characteristic area and the first application category
  • the generating unit 705 is configured to generate application relevance information based on the relevance of the second characteristic area to the area of the first application category.
  • the application correlation information includes second characteristic area correlation information
  • the second characteristic area correlation information includes a second characteristic area, the first application category corresponding to the second characteristic area, and the second characteristic area and the first application category Regional relevance.
  • the processing unit 703 is specifically configured to obtain h first feature values based on the first application category and the first model, where the first feature values indicate the correlation between the first application category and the first feature point in the first data, and A characteristic point includes one byte of data in the first data, and h is a positive integer;
  • the processing unit 703 is specifically configured to obtain n second feature regions according to the h first feature values.
  • the acquiring unit 701 is further configured to acquire z target feature points according to the h first feature values, and the feature value of the target feature point is the first z of the h first feature values arranged in descending order of value.
  • One of the eigenvalues, z is a positive integer, and z is an integer less than or equal to h;
  • the processing unit 703 is specifically configured to obtain n second feature regions according to z target feature points, and each second feature region includes at least one target feature point.
  • the midpoint of the second feature region is the target feature point.
  • the n second feature regions include a sixth feature region and a fourth feature region, if the ratio of feature points in the sixth feature region and feature points in the fourth feature region is greater than the first preset threshold, and The number of times that the sixth characteristic area appears in the characteristic area of the corresponding application category in the first application category is greater than the number of times the fourth characteristic area appears in the characteristic area of the corresponding application category in the first application category, then the network device further includes:
  • the processing unit 703 is configured to delete the information of the fourth characteristic region.
  • the n second characteristic regions include a fifth characteristic region, and if the fifth characteristic region corresponds to at least two application categories in the application relevance information, the processing unit 703 is further configured to delete the information of the fifth characteristic region.
  • each unit of the network device is similar to those described in the foregoing embodiments shown in FIG. 2 and FIG. 3, and will not be repeated here.
  • another embodiment of the network device in the embodiment of the present application includes:
  • FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application.
  • the computer device includes at least one processor 801, a communication bus 802 and a memory 803, and may also include at least one communication interface 804 and an I/O interface 805.
  • the processor may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of this application.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the communication bus may include a path to transfer information between the above-mentioned components.
  • the communication interface uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, wireless access network (RAN), wireless local area network (Wireless Local Area NetworKs, WLAN), etc.
  • the memory can be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • Dynamic storage devices can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, optical disc storage ( Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be stored by a computer Any other media taken, but not limited to this.
  • the memory can exist independently and is connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the memory is used to store application program code for executing the solution of the present application, and the processor controls the execution.
  • the processor is configured to execute the application program code stored in the memory.
  • the processor may include one or more CPUs, and each CPU may be a single-core processor or a multi-core processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the computer device may further include an input/output (I/O) interface.
  • the output device may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector, etc.
  • the input device can be a mouse, a keyboard, a touch screen device, or a sensor device.
  • the above-mentioned computer equipment may be a general-purpose computer equipment or a special-purpose computer equipment.
  • the computer equipment can be a desktop computer, a portable computer, a network server, a PDA (Personal Digital Assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, or the like in Figure 7 Structure of the equipment.
  • PDA Personal Digital Assistant
  • the embodiments of this application do not limit the type of computer equipment.
  • the first network device, the second network device or the terminal device in FIG. 1, FIG. 2 or FIG. 3 may be the device shown in FIG. 8, and one or more software modules are stored in the memory.
  • the network device and the terminal device can implement the software module through the processor and the program code in the memory to complete the method executed by the network device or the terminal device in the foregoing embodiment.
  • the processor 801 can execute the operations performed by the first network device or the second network device in the embodiments shown in FIG. 2 and FIG. 3, and details are not described herein again.
  • the embodiments of the present application also provide a system for identifying applications.
  • the system includes a first network device and a second network device.
  • the first network device is used to execute the method for executing the first network device in the embodiment shown in FIG. 3, and details are not described herein again.
  • the second network device is used to execute the method executed by the second network device in the embodiment shown in FIG. 2, and details are not described herein again.
  • the second network device is further configured to send application relevance information to the first network device.
  • the first network device is further configured to send the application category corresponding to the data to be detected to the terminal device.
  • the embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • a computer program is stored on which a computer program is stored.
  • the processor mentioned in the network device in the above embodiment of this application may be a central processing unit (CPU) or other general-purpose processors. , Digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic Devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the number of processors in the network device in the above embodiments of the present application may be one or multiple, and may be adjusted according to actual application scenarios. This is only an exemplary description and is not limited.
  • the number of memories in the embodiments of the present application may be one or multiple, and may be adjusted according to actual application scenarios. This is only an exemplary description and is not limited.
  • the memory or readable storage medium mentioned in the network device in the above embodiments in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM, DR RAM
  • the network device includes a processor (or processing unit) and a memory
  • the processor in this application may be integrated with the memory, or the processor and the memory may be connected through an interface, which can be based on actual conditions.
  • the application scenario adjustment is not limited.
  • the embodiments of the present application also provide a computer program or a computer program product including a computer program.
  • the computer program When the computer program is executed on a computer, the computer will enable the computer to realize the connection with the network device in any of the above-mentioned method embodiments. Method flow.
  • FIG. 2 to FIG. 3 it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium.
  • the storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.
  • the words “if” or “if” as used herein can be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
  • the phrase “if determined” or “if detected (statement or event)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event) )” or “in response to detection (statement or event)”.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Disclosed in embodiments of the present application is a data processing method. The method in the embodiments of the present application can be used in network data transmission, and comprises: a first network device obtains data to be inspected; the first network device obtains w first feature regions according to the data to be inspected, the first feature regions comprising data of at least one byte in the data to be inspected, and w being a positive integer; the first network device obtains application correlation degree information, the application correlation degree information indicating correlation degrees between the w first feature regions and application categories; the first network device determines, according to the w first feature regions and the application correlation degree information, an application category corresponding to the data to be inspected. According to the embodiments of the present application, an application category corresponding to data to be inspected can be determined without plaintext analysis, thereby improving the security of user data.

Description

一种数据处理方法及其设备Data processing method and equipment
本申请要求于2020年2月17日提交中国国家知识产权局、申请号为202010097474.8、发明名称为“一种数据处理方法及其设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office of China, the application number is 202010097474.8, and the invention title is "a data processing method and its equipment" on February 17, 2020, the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请实施例涉及网络通信技术领域,具体涉及一种数据处理方法及其设备。The embodiments of the present application relate to the field of network communication technology, and specifically relate to a data processing method and equipment.
背景技术Background technique
随着服务技术的不断提升,为了满足运营商对管道数据包流量的细分管理,对管道数据包流量进行应用识别变的越来越重要。网络应用识别是运营商服务建模的核心技术,其将属于不同应用类型的数据包流量区分出来,用于进行数据分析,提升客户对网络服务质量的满意度。为了完成管道数据包流量的应用识别,业界普遍采用基于深度数据包检测(deep packet inspection,DPI)的识别技术。With the continuous improvement of service technology, in order to satisfy operators' subdivision management of pipeline data packet flow, application identification of pipeline data packet flow has become more and more important. Network application identification is the core technology of operator service modeling. It distinguishes data packet flows belonging to different application types for data analysis and improves customer satisfaction with network service quality. In order to complete the application identification of the pipeline data packet flow, the industry generally adopts an identification technology based on deep packet inspection (DPI).
基于DPI的技术对数据流进行深度数据解析,增加了应用层数据分析,在解析的应用层数据中查找服务器的域名信息,来识别网络中流量对应的应用类别。The DPI-based technology performs in-depth data analysis on the data stream, adds application layer data analysis, and finds the domain name information of the server in the parsed application layer data to identify the application category corresponding to the traffic in the network.
在解析的过程中,基于DPI的技术是采用明文解析的方式解析管道中的数据包,而明文解析会影响用户数据的安全性。In the process of parsing, DPI-based technology uses plaintext parsing to parse the data packets in the pipeline, and plaintext parsing will affect the security of user data.
发明内容Summary of the invention
本申请实施例提供了一种数据处理方法,用于在网络中识别应用时,第一网络设备根据获取到的应用相关度信息,不对报文进行明文解析就可以确定管道码流中的报文对应的应用类别,提升了用户数据的安全性。The embodiment of the present application provides a data processing method, which is used to identify the message in the pipeline code stream by the first network device according to the obtained application correlation information without performing plaintext parsing on the message when the application is identified in the network. The corresponding application category improves the security of user data.
本申请第一方面提供了一种数据处理方法。The first aspect of this application provides a data processing method.
当需要识别网络管道中数据流对应的应用类别时,第一网络设备会获取管道数据中的待检测数据,即该待检测数据包括了管道数据中的字节数据。When it is necessary to identify the application category corresponding to the data stream in the network pipeline, the first network device will obtain the data to be detected in the pipeline data, that is, the data to be detected includes byte data in the pipeline data.
该第一网络设备获取到该待检测数据之后,该第一网络设备通过处理该待检测数据,得到一个或者多个第一特征区域,该第一特征区域包括该待检测数据中至少一个字节的数据。After the first network device obtains the to-be-detected data, the first network device processes the to-be-detected data to obtain one or more first characteristic regions, where the first characteristic region includes at least one byte in the to-be-detected data The data.
该第一网络设备获取系统中保存的应用相关度信息,该应用相关度信息用于指示该一个或者多个第一特征区域和该应用相关度信息中应用类别之间的相关度。The first network device obtains application relevance information stored in the system, where the application relevance information is used to indicate the relevance between the one or more first feature regions and the application category in the application relevance information.
该第一网络设备获取到应用相关度信息之后,该第一网络设备根据该一个或者多个第一特征区域和该应用相关度信息,确定该一个或者多个第一特征区域对应的应用类别,进而确定该待检测数据对应的应用类别。After the first network device obtains the application relevance information, the first network device determines the application category corresponding to the one or more first characteristic areas according to the one or more first characteristic areas and the application relevance information, Then determine the application category corresponding to the data to be detected.
本申请实施例中,第一网络设备在进行应用识别时,将从管道码流中获取到的待检测数据进行处理,得到第一特征区域,并且根据获取到的应用相关度信息和该第一特征区域确定该待检测数据的应用类别,不需要通过明文解析就可以确定该待检测数据对应的应用类别, 提升了用户数据的安全性。In the embodiment of this application, when the first network device performs application identification, it processes the to-be-detected data obtained from the pipeline code stream to obtain the first characteristic area, and according to the obtained application correlation information and the first The characteristic area determines the application category of the data to be detected, and the application category corresponding to the data to be detected can be determined without clear text analysis, which improves the security of user data.
可选地,在一种可能的实现方式中,第一网络设备根据应用相关度信息和w个第一特征区域确定了第一特征区域对应的应用类别,以及该第一特征区域与对应的应用类别之间的区域相关度,该第一网络设备基于应用类别统计与每个应用类别对应的第一特征区域的区域相关度之和。Optionally, in a possible implementation manner, the first network device determines the application category corresponding to the first characteristic area according to the application correlation information and the w first characteristic areas, and the first characteristic area and the corresponding application For the regional correlation between the categories, the first network device counts the sum of the regional correlations of the first feature region corresponding to each application category based on the application category.
第一网络设备基于与第一应用类别对应的第一特征区域的区域相关度之和为最大值,确定待检测数据对应于第一应用类别。The first network device determines that the to-be-detected data corresponds to the first application category based on the maximum value of the sum of the regional correlations of the first feature area corresponding to the first application category.
本申请实施例中,第一网络设备根据应用相关度信息确定第一特征区域对应的区域相关度,并根据该区域相关度确定待检测数据对应的应用类别,提升了方案的可实现性。In the embodiment of the present application, the first network device determines the area relevance corresponding to the first characteristic area according to the application relevance information, and determines the application category corresponding to the data to be detected according to the area relevance, which improves the feasibility of the solution.
可选地,在一种可能的实现方式中,应用相关度信息还包括p个第三特征区域的相关度信息,其中,第三特征区域的相关度信息包括第三特征区域、第三特征区域对应的应用类别和第三特征区域与第三特征区域对应的应用类别之间的区域相关度,该p个第三特征区域包括w个第一特征区域中至少1个第一特征区域。Optionally, in a possible implementation manner, the application correlation information further includes the correlation information of p third characteristic regions, where the correlation information of the third characteristic region includes the third characteristic region and the third characteristic region. The corresponding application category and the regional correlation between the third characteristic area and the application category corresponding to the third characteristic area, where the p third characteristic areas include at least one first characteristic area in the w first characteristic areas.
本申请实施例中,通过限定了应用相关度信息还包括第三特征区域的相关度信息,提升了方案的可实现性。In the embodiment of the present application, by limiting the application relevance information to also include the relevance information of the third characteristic region, the feasibility of the solution is improved.
可选的,在第一网络设备基于与第一应用类别对应的第一特征区域的区域相关度之和是最大值,且该最大值大于预设阈值,则确定待检测数据对应于第一应用类别。Optionally, in the first network device, based on the sum of the regional correlation of the first feature region corresponding to the first application category is the maximum value, and the maximum value is greater than a preset threshold, it is determined that the data to be detected corresponds to the first application category.
本申请实施例中,第一网络上设备确定第一特征区域的相关度之和的值需要高于预设阈值,第一网络设备才确定该第一应用类别为该待检测数据对应的应用类别,因为当该第一特征区域的相关度之和的值还是低于预设阈值时,则表明该待检测数据中没有与应用相关度信息中的应用类别强相关的信息,因此该待检测数据对应的应用类别可能不在该应用相关度信息中,所以需要设定第一特征区域的相关度之和需要高于预设阈值,才能确定该待检测数据对应的应用类别,由此来提升方案确定待检测数据的准确率。In the embodiment of the present application, the device on the first network determines that the value of the sum of the correlation degrees of the first feature area needs to be higher than the preset threshold before the first network device determines that the first application category is the application category corresponding to the data to be detected , Because when the sum of the correlation degrees of the first feature region is still lower than the preset threshold, it indicates that there is no information that is strongly related to the application category in the application correlation information in the data to be detected, so the data to be detected The corresponding application category may not be in the application relevance information, so it is necessary to set the sum of the relevance of the first feature area to be higher than the preset threshold in order to determine the application category corresponding to the data to be detected, thereby improving the solution determination The accuracy of the data to be tested.
可选地,在一种可能的实现方式中,在获取到的待检测数据中至少包括一个报文的前K个字节,第一网络设备对该包括至少一个报文的前K个字节的待检测数据做滑动窗口处理,以得到w个第一特征区域。Optionally, in a possible implementation manner, the acquired data to be detected includes at least the first K bytes of a message, and the first network device responds to the first K bytes including at least one message. Sliding window processing is performed on the to-be-detected data to obtain w first feature regions.
本申请实施例中,第一网络设备通过滑动窗口的方式处理该待检测数据得到第一特征区域,提升了方案的可实现性。In the embodiment of the present application, the first network device processes the to-be-detected data by means of a sliding window to obtain the first characteristic region, which improves the feasibility of the solution.
可选地,在一种可能的实现方式中,该第一特征区域包括连续的s个字节的数据,该s为大于1的正整数。Optionally, in a possible implementation manner, the first characteristic area includes continuous s bytes of data, and the s is a positive integer greater than 1.
本申请实施例中,通过限定第一特征区域的具体数据形式,提升了方案的可实现性。In the embodiment of the present application, the feasibility of the solution is improved by limiting the specific data format of the first characteristic area.
可选地,在一种可能的实现方式中,在第一网络设备从管道数据中获取该待检测数据之前,该第一网络设备会生成该应用相关度信息。Optionally, in a possible implementation manner, before the first network device obtains the to-be-detected data from the pipeline data, the first network device generates the application correlation information.
该第一网络设备在准备生成应用相关度信息时,第一网络设备会获取第一应用类别对应的字节数据,即该第一应用类别对应的字节数据为第一数据。When the first network device prepares to generate application relevance information, the first network device obtains byte data corresponding to the first application category, that is, the byte data corresponding to the first application category is the first data.
第一网络设备将该第一数据输入训练好的第一模型中,第一模型会输出预测的应用类别,该第一模型是第一网络设备训练得到的,也可以是其他设备训练好之后发送给该第一网络设备的,该预测的应用类别信息即是第一应用类别。The first network device inputs the first data into the trained first model. The first model will output the predicted application category. The first model is trained by the first network device, or it can be sent after training by other devices. For the first network device, the predicted application category information is the first application category.
第一网络设备得到第一应用类别之后,第一网络设备基于该第一应用类别以及第一模型得到n个第二特征区域,该第二特征区域包括第一数据中q个相邻的字节,n和q为正整数。After the first network device obtains the first application category, the first network device obtains n second characteristic regions based on the first application category and the first model, and the second characteristic regions include q adjacent bytes in the first data , N and q are positive integers.
第一网络设备得到该n个第二特征区域之后,第一网络设备确定第二特征区域和第一应用类别的区域相关度,并生成应用相关度信息,该应用相关度信息包括了第二特征区域和第二特征区域的区域相关度。After the first network device obtains the n second feature regions, the first network device determines the regional relevance of the second feature region and the first application category, and generates application relevance information, where the application relevance information includes the second feature The regional correlation between the region and the second feature region.
本申请实施例中,第一网络设备通过获取第一应用类别的相关字节数据,并将该数据输入到训练好的第一模型中,得到预测应用类别信息,并根据该预测应用类别信息生成应用相关度信息,提升了方案的可实现性。In the embodiment of the present application, the first network device obtains the relevant byte data of the first application category and inputs the data into the trained first model to obtain the predicted application category information, and generates the predicted application category information according to the predicted application category information. The application of relevance information improves the feasibility of the solution.
可选地,在一种可能的实现方式中,应用相关度信息还包括了第二特征区域相关度信息,该第二特征区域相关度信息包括第二特征区域,第二特征区域对应的第一应用类别,还有第二特征区域与第一应用类别的区域相关度,该n个第二特征区域里至少包含了w个第一特征区域中至少一个第一特征区域,即该待检测数据对应的应用类别为第一应用类别。Optionally, in a possible implementation manner, the application correlation information further includes second characteristic region correlation information, and the second characteristic region correlation information includes a second characteristic region, and the first characteristic region corresponding to the second characteristic region Application category, and the regional correlation between the second feature area and the first application category. The n second feature areas contain at least one first feature area in the w first feature areas, that is, the data to be detected corresponds to The application category of is the first application category.
可选地,在一种可能的实现方式中,该第一网络设备基于第一应用类别以及第一模型得到h个第一特征值,例如,该第一网络设备可以根据第一模型计算该第一应用类别,得到h个第一特征值,该h个第一特征值用于指示第一应用类别与第一数据中第一特征点的相关度,该第一特征点包括第一数据中至少一个字节数据,该h为正整数。Optionally, in a possible implementation manner, the first network device obtains h first characteristic values based on the first application category and the first model. For example, the first network device may calculate the first characteristic value according to the first model. An application category, h first feature values are obtained, and the h first feature values are used to indicate the correlation between the first application category and the first feature point in the first data, and the first feature point includes at least the first feature point in the first data One byte of data, the h is a positive integer.
该第一网络设备在得到该h个第一特征值之后,该第一网络设备根据该h个第一特征值得到n个第二特征区域。After the first network device obtains the h first characteristic values, the first network device obtains n second characteristic regions according to the h first characteristic values.
本申请实施例中,第一网络设备通过处理该第一应用类别得到h个第一特征值,并根据该h个第一特征值得到n个第二特征区域,提升了方案的可实现性。In the embodiment of the present application, the first network device obtains h first feature values by processing the first application category, and obtains n second feature regions according to the h first feature values, which improves the feasibility of the solution.
可选地,在一种可能的实现方式中,该第一网络设备根据h个第一特征值获取第一数据中的z个目标特征点,该第一数据中的一个目标特征点对应的特征值为该h个第一特征值中数值从大到小的排序中排列在前z个的特征值中的一个,该z为正整数,且z小于或等于h的整数。Optionally, in a possible implementation manner, the first network device obtains z target feature points in the first data according to h first feature values, and a feature corresponding to a target feature point in the first data The value is one of the first z eigenvalues in the h first eigenvalues sorted from largest to smallest, where z is a positive integer, and z is an integer less than or equal to h.
该第一网络设备在获取到z个目标特征点之后,该第一网络设备根据该z个目标特征点得到n个第二特征区域,即每个第二特征区域包含至少一个目标特征点。After the first network device obtains z target feature points, the first network device obtains n second feature regions according to the z target feature points, that is, each second feature region includes at least one target feature point.
本申请实施例中,第一网络设备根据该h个第一特征值获取第一数据中的z个目标特征点,并根据该z个目标特征点得到n个第二特征区域,因为一个目标特征点对应的特征值为h个第一特征值中从大到小排序中排列在前几个的特征值中的一个,因为特征值是指示特征点与应用类别之间的关联度的,特征值越高代表关联度越高,因此根据该目标特征点得到的n个第二特征区域与该应用类别之间的关联度就越高,进而根据该n个特征区域生成的应用相关度信息在后续确定应用类别时的准确率就越高。In the embodiment of the present application, the first network device obtains z target feature points in the first data according to the h first feature values, and obtains n second feature regions according to the z target feature points, because one target feature The feature value corresponding to the point is one of the first few feature values arranged in descending order of h first feature values, because the feature value indicates the degree of association between the feature point and the application category, the feature value The higher the higher, the higher the degree of association. Therefore, the higher the degree of association between the n second feature regions obtained from the target feature point and the application category, and the application correlation information generated based on the n feature regions will be used in the subsequent The higher the accuracy rate when determining the application category.
可选地,在一种可能的实现方式中,第二特征区域的中点为目标特征点。Optionally, in a possible implementation manner, the midpoint of the second feature region is the target feature point.
本申请实施例中,通过说明第二特征区域的组成方式,提升了方案的可实现性。In the embodiment of the present application, the feasibility of the solution is improved by explaining the composition method of the second characteristic region.
可选地,在一种可能的实现方式中,第一网络设备根据n个第二特征区域中每个特征区域在对应的应用类别中出现的次数,和该n个第二特征区域中每个特征区域对应的应用类别在n个第二特征区域中对应的特征区域的数量,得到n个第二特征区域中每个特征区域的区域相关度,该n个第二特征区域中每个特征区域的区域相关度表示该n个第二特征区域中每 个特征区域与第一应用类别的关联度,即该区域相关度越高,则表示与第一应用类别的关联度越高。Optionally, in a possible implementation manner, the first network device is based on the number of times each characteristic area in the n second characteristic areas appears in the corresponding application category, and each of the n second characteristic areas. The number of feature regions corresponding to the application category corresponding to the feature region in the n second feature regions, and the region correlation degree of each feature region in the n second feature regions is obtained, and each feature region in the n second feature regions The regional correlation degree of represents the correlation degree between each feature region of the n second characteristic regions and the first application category, that is, the higher the regional correlation degree, the higher the correlation degree with the first application category.
该第一网络设备根据该n个第二特征区域中每个特征区域的区域相关度生成应用相关度信息。The first network device generates application relevance information according to the area relevance of each of the n second characteristic areas.
本申请实施例中,第一网络设备通过n个第二特征区域中每个特征区域对应的区域相关度生成应用相关度信息,提升了方案的可实现性。In the embodiment of the present application, the first network device generates application relevance information according to the area relevance corresponding to each characteristic area in the n second characteristic areas, which improves the feasibility of the solution.
可选地,在一种可能的实现方式中,第一网络设备获取第一数据中,以z个目标特征点中每个目标特征点为中点的连续q个特征点得到n个第二特征区域,该m为小于n的正整数。Optionally, in a possible implementation manner, in the first data acquired by the first network device, n second features are obtained by using q consecutive feature points with each target feature point as the midpoint among the z target feature points Area, the m is a positive integer less than n.
本申请实施例中,第二网络设备通过以目标特征点为中点的方式得到第二特征区域,提升了方案的可实现性。In the embodiment of the present application, the second network device obtains the second feature area by taking the target feature point as the midpoint, which improves the feasibility of the solution.
可选地,在一种可能的实现方式中,当n个第二特征区域中有两个不同的特征区域第六特征区域和第四特区域,这两个特征区域的相似度很高,即这两个不同的特征区域的特征点重复的比例大于第一预设阈值,且这两个不同的特征区域在第一应用类别中对应的应用类别相同,则第一网络设备删除这两个特征区域中在第一特征区域中出现的次数少的那一个特征区域。Optionally, in a possible implementation manner, when there are two different feature regions in the n second feature regions, the sixth feature region and the fourth feature region, the two feature regions have a high degree of similarity, that is, The repetition ratio of the feature points of the two different feature regions is greater than the first preset threshold, and the two different feature regions correspond to the same application category in the first application category, the first network device deletes these two features The feature region that appears less frequently in the first feature region in the region.
本申请实施例中,当n个第二特征区域中有两个不同的特征区域的相似度很高时,则第一网络设备删除在n个第二特征区域中出现的次数少的那一个特征区域,避免了在计算区域相关度时,一些高度近似的特征区域重复计算区域相关度,提升了计算区域相关度时的准确度。In the embodiment of the present application, when the similarity of two different feature regions in the n second feature regions is high, the first network device deletes the feature that appears less frequently in the n second feature regions Region, avoiding the repeated calculation of the region correlation degree for some highly similar feature regions when calculating the region correlation degree, which improves the accuracy of calculating the region correlation degree.
可选地,在一种可能的实现方式中,在n个第二特征区域中,若某一个特征区域在第一应用类别中对应至少两个应用类别,则第一网络设备删除该特征区域的相关信息,该特征区域即为第五特征区域。Optionally, in a possible implementation manner, among the n second characteristic areas, if a certain characteristic area corresponds to at least two application categories in the first application category, the first network device deletes the Related information, this characteristic area is the fifth characteristic area.
本申请实施例中,第一网络设备删除n个第二特征区域中对应至少两个应用类别的特征区域,因为当有特征区域对应了两个以上的应用类别,则说明该特征区域表示不同类别中相同的特征,因此不能代表具体的某一个应用类别的强相关特征,因此第一网络设备删除了该对应了两个以上的应用类别的特征区域后,可以提升方案确定待检测数据的准确率。In the embodiment of this application, the first network device deletes the characteristic areas corresponding to at least two application categories in the n second characteristic areas, because when there are characteristic areas corresponding to more than two application categories, it means that the characteristic areas represent different categories. Therefore, it cannot represent the strongly related features of a specific application category. Therefore, after the first network device deletes the feature area corresponding to more than two application categories, the accuracy of the solution to determine the data to be detected can be improved. .
可选地,在一种可能的实现方式中,该应用相关度信息可以以热图的形式显示,该热图中的特征点对应的特征值越大,则该特征点的色彩越鲜艳。Optionally, in a possible implementation manner, the application correlation information may be displayed in the form of a heat map, and the larger the feature value corresponding to the feature point in the heat map, the more vivid the color of the feature point.
本申请实施例中,通过热图的方式显示应用相关度信息,可以更直观的看出该应用相关度信息的结果。In the embodiment of the present application, the application relevance information is displayed in the form of a heat map, so that the result of the application relevance information can be seen more intuitively.
可选地,在一种可能的实现方式中,在获取到第一数据之后,可以截取该第一数据的前K个字节信息,该K的取值包括784,或者1024。Optionally, in a possible implementation manner, after the first data is acquired, the first K bytes of information of the first data may be intercepted, and the value of K includes 784 or 1024.
本申请实施例中,通过设定具体的数值,提升了方案的可实现性。In the embodiments of the present application, by setting specific values, the feasibility of the solution is improved.
可选地,在一种可能的实现方式中,在获取到待检测数据之后,可以截取该待检测数据的前K个字节信息,该K的取值包括784,或者1024。Optionally, in a possible implementation manner, after the data to be detected is acquired, the first K bytes of information of the data to be detected may be intercepted, and the value of K includes 784 or 1024.
本申请实施例中,通过设定具体的数值,提升了方案的可实现性。In the embodiments of the present application, by setting specific values, the feasibility of the solution is improved.
本申请第二方面提供了一种数据处理方法。The second aspect of the application provides a data processing method.
第二网络设备获取第一应用类别对应的字节数据,即该第一应用类别对应的字节数据为 第一数据。The second network device obtains the byte data corresponding to the first application category, that is, the byte data corresponding to the first application category is the first data.
第二网络设备将该第一数据输入训练好的第一模型中,第一模型会输出预测的应用类别,该第一模型是第二网络设备训练得到的,也可以是其他设备训练好之后发送给该第二网络设备的,该预测的应用类别信息即是第一应用类别。The second network device inputs the first data into the trained first model. The first model will output the predicted application category. The first model is trained by the second network device, or it can be sent after training by other devices. For the second network device, the predicted application category information is the first application category.
第二网络设备得到第一应用类别之后,第二网络设备基于该第一应用类别以及第一模型得到n个第二特征区域,该n个第二特征区域中的每个第二特征区域包括第一数据中q个相邻的字节,n和q为正整数。After the second network device obtains the first application category, the second network device obtains n second feature areas based on the first application category and the first model, and each of the n second feature areas includes the first application category. Q adjacent bytes in a data, n and q are positive integers.
第二网络设备得到该n个第二特征区域之后,第二网络设备确定第二特征区域和第一应用类别的区域相关度,并生成应用相关度信息,该应用相关度信息包括了第二特征区域和第二特征区域的区域相关度。After the second network device obtains the n second feature regions, the second network device determines the regional relevance of the second feature region and the first application category, and generates application relevance information, where the application relevance information includes the second feature The regional correlation between the region and the second feature region.
本申请实施例中,第二网络设备通过获取第一应用类别的相关字节数据,并将该数据输入到训练好的第一模型中,得到预测应用类别信息,并根据该预测应用类别信息生成应用相关度信息,提升了方案的可实现性。In the embodiment of the present application, the second network device obtains the relevant byte data of the first application category and inputs the data into the trained first model to obtain predicted application category information, and generates the predicted application category information according to the predicted application category information. The application of relevance information improves the feasibility of the solution.
可选地,在一种可能的实现方式中,应用相关度信息还包括了第二特征区域相关度信息,该第二特征区域相关度信息包括第二特征区域,第二特征区域对应的第一应用类别,还有第二特征区域与第一应用类别的区域相关度,该n个第二特征区域里至少包含了w个第一特征区域中至少一个第一特征区域,即该待检测数据对应的应用类别为第一应用类别。Optionally, in a possible implementation manner, the application correlation information further includes second characteristic region correlation information, and the second characteristic region correlation information includes a second characteristic region, and the first characteristic region corresponding to the second characteristic region Application category, and the regional correlation between the second feature area and the first application category. The n second feature areas contain at least one first feature area in the w first feature areas, that is, the data to be detected corresponds to The application category of is the first application category.
可选地,在一种可能的实现方式中,该第二网络设备基于第一应用类别以及第一模型得到h个第一特征值,例如,该第二网络设备可以根据第一模型计算该第一应用类别,得到h个第一特征值,该h个第一特征值用于指示第一应用类别与第一数据中第一特征点的相关度,该第一特征点包括第一数据中至少一个字节数据,该h为正整数。Optionally, in a possible implementation manner, the second network device obtains h first characteristic values based on the first application category and the first model. For example, the second network device may calculate the first characteristic value according to the first model. An application category, h first feature values are obtained, and the h first feature values are used to indicate the correlation between the first application category and the first feature point in the first data, and the first feature point includes at least the first feature point in the first data One byte of data, the h is a positive integer.
该第二网络设备在得到到该h个第一特征值之后,该第二网络设备根据该h个第一特征值得到n个第二特征区域。After the second network device obtains the h first characteristic values, the second network device obtains n second characteristic regions according to the h first characteristic values.
本申请实施例中,第二网络设备通过通过处理该第一应用类别得到h个第一特征值,并根据该h个第一特征值得到n个第二特征区域,提升了方案的可实现性。In the embodiment of the present application, the second network device obtains h first feature values by processing the first application category, and obtains n second feature regions according to the h first feature values, which improves the feasibility of the solution .
可选地,在一种可能的实现方式中,该第二网络设备根据h个第一特征值获取第一数据中的z个目标特征点,该第一数据中的一个目标特征点对应的特征值为该h个第一特征值中数值从大到小的排序中排列在前z个的特征值中的一个,该z为正整数,且z小于或等于h的整数。Optionally, in a possible implementation manner, the second network device obtains z target feature points in the first data according to h first feature values, and a feature corresponding to one target feature point in the first data The value is one of the first z eigenvalues in the h first eigenvalues sorted from largest to smallest, where z is a positive integer, and z is an integer less than or equal to h.
该第二网络设备在获取到z个目标特征点之后,该第二网络设备根据该z个目标特征点得到n个第二特征区域,即每个第二特征区域包含至少一个目标特征点。After the second network device obtains z target feature points, the second network device obtains n second feature regions according to the z target feature points, that is, each second feature region includes at least one target feature point.
本申请实施例中,第二网络设备根据该h个第一特征值获取第一数据中的z个目标特征点,并根据该z个目标特征点得到n个第二特征区域,因为一个目标特征点对应的特征值为h个第一特征值中从大到小排序中排列在前几个的特征值中的一个,因为特征值是指示特征点与应用类别之间的关联度的,特征值越高代表关联度越高,因此根据该目标特征点得到的n个第二特征区域与该应用类别之间的关联度就越高,进而根据该n个特征区域生成的应用相关度信息在后续确定应用类别时的准确率就越高。In the embodiment of the present application, the second network device obtains z target feature points in the first data according to the h first feature values, and obtains n second feature regions according to the z target feature points, because one target feature The feature value corresponding to the point is one of the first few feature values arranged in descending order of h first feature values, because the feature value indicates the degree of association between the feature point and the application category, the feature value The higher the higher, the higher the degree of association. Therefore, the higher the degree of association between the n second feature regions obtained from the target feature point and the application category, and the application correlation information generated based on the n feature regions will be used in the subsequent The higher the accuracy rate when determining the application category.
可选地,在一种可能的实现方式中,第二特征区域的中点为目标特征点。Optionally, in a possible implementation manner, the midpoint of the second feature region is the target feature point.
本申请实施例中,通过限定第二特征区域的组成方式,提升了方案的可实现性。In the embodiment of the present application, the feasibility of the solution is improved by limiting the composition of the second characteristic area.
可选地,在一种可能的实现方式中,第二网络设备根据n个第二特征区域中每个特征区域在对应的应用类别中出现的次数,和该n个第二特征区域中每个特征区域对应的应用类别在n个第二特征区域中对应的特征区域的数量,得到n个第二特征区域中每个特征区域的区域相关度,该n个第二特征区域中每个特征区域的区域相关度表示该n个第二特征区域中每个特征区域与第一应用类别的关联度,即该区域相关度越高,则表示与第一应用类别的关联度越高。Optionally, in a possible implementation manner, the second network device is based on the number of times each characteristic area in the n second characteristic areas appears in the corresponding application category, and each of the n second characteristic areas The number of feature regions corresponding to the application category corresponding to the feature region in the n second feature regions, and the region correlation degree of each feature region in the n second feature regions is obtained, and each feature region in the n second feature regions The regional correlation degree of represents the correlation degree between each feature region of the n second characteristic regions and the first application category, that is, the higher the regional correlation degree, the higher the correlation degree with the first application category.
该第二网络设备根据该n个第二特征区域中每个特征区域的区域相关度生成应用相关度信息。The second network device generates application relevance information according to the area relevance of each of the n second characteristic areas.
本申请实施例中,第二网络设备通过n个第二特征区域中每个特征区域对应的区域相关度生成应用相关度信息,提升了方案的可实现性。In the embodiment of the present application, the second network device generates application relevance information according to the area relevance corresponding to each characteristic area in the n second characteristic areas, which improves the feasibility of the solution.
可选地,在一种可能的实现方式中,第二网络设备获取第一数据中,以z个目标特征点中每个目标特征点为中点的连续q个特征点得到n个第二特征区域,该m为小于n的正整数。Optionally, in a possible implementation manner, in the first data acquired by the second network device, n second features are obtained by using q consecutive feature points with each target feature point in the z target feature points as the midpoint. Area, the m is a positive integer less than n.
本申请实施例中,第二网络设备通过以目标特征点为中点的方式得到第二特征区域,提升了方案的可实现性。In the embodiment of the present application, the second network device obtains the second feature area by taking the target feature point as the midpoint, which improves the feasibility of the solution.
可选地,在一种可能的实现方式中,当n个第二特征区域中有两个不同的特征区域第六特征区域和第四特区域,这两个特征区域的相似度很高,即这两个不同的特征区域的特征点重复的比例大于第一预设阈值,且这两个不同的特征区域在第一应用类别中对应的应用类别相同,则第一网络设备删除这两个特征区域中在第一特征区域中出现的次数少的那一个特征区域。Optionally, in a possible implementation manner, when there are two different feature regions in the n second feature regions, the sixth feature region and the fourth feature region, the two feature regions have a high degree of similarity, that is, The repetition ratio of the feature points of the two different feature regions is greater than the first preset threshold, and the two different feature regions correspond to the same application category in the first application category, the first network device deletes these two features The feature region that appears less frequently in the first feature region in the region.
本申请实施例中,当n个第二特征区域中有两个不同的特征区域的相似度很高时,则第二网络设备删除在n个第二特征区域中出现的次数少的那一个特征区域,避免了在计算区域相关度时,一些高度近似的特征区域重复计算区域相关度,提升了计算区域相关度时的准确度。In the embodiment of the present application, when the similarity of two different feature regions in the n second feature regions is very high, the second network device deletes the feature that appears less frequently in the n second feature regions Region, avoiding the repeated calculation of the region correlation degree for some highly similar feature regions when calculating the region correlation degree, which improves the accuracy of calculating the region correlation degree.
可选地,在一种可能的实现方式中,在n个第二特征区域中,若某一个特征区域在第一应用类别中对应至少两个应用类别,则第二网络设备删除该特征区域的相关信息,该特征区域即为第五特征区域。Optionally, in a possible implementation manner, among the n second characteristic areas, if a certain characteristic area corresponds to at least two application categories in the first application category, the second network device deletes the Related information, this characteristic area is the fifth characteristic area.
本申请实施例中,第二网络设备删除n个第二特征区域中对应至少两个应用类别的特征区域,因为当有特征区域对应了两个以上的应用类别,则说明该特征区域表示不同类别中相同的特征,因此不能代表具体的某一个应用类别的强相关特征,因此第二网络设备删除了该对应了两个以上的应用类别的特征区域后,可以提升方案确定待检测数据的准确率。In the embodiment of the present application, the second network device deletes the characteristic areas corresponding to at least two application categories in the n second characteristic areas, because when there are characteristic areas corresponding to more than two application categories, it means that the characteristic areas represent different categories. Therefore, it cannot represent the strong correlation feature of a specific application category. Therefore, after the second network device deletes the feature area corresponding to more than two application categories, the accuracy of the solution to determine the data to be detected can be improved. .
可选地,在一种可能的实现方式中,该应用相关度信息可以以热图的形式显示,该热图中的特征点对应的特征值越大,则该特征点的色彩越鲜艳。Optionally, in a possible implementation manner, the application correlation information may be displayed in the form of a heat map, and the larger the feature value corresponding to the feature point in the heat map, the brighter the color of the feature point.
本申请实施例中,通过热图的方式显示应用相关度信息,可以更直观的看出该应用相关度信息的结果。In the embodiment of the present application, the application relevance information is displayed in the form of a heat map, so that the result of the application relevance information can be seen more intuitively.
可选地,在一种可能的实现方式中,在获取到第一数据之后,可以截取该第一数据的前K个字节信息,该K的取值包括784,或者1024。Optionally, in a possible implementation manner, after the first data is acquired, the first K bytes of information of the first data may be intercepted, and the value of K includes 784 or 1024.
本申请实施例中,通过设定具体的数值,提升了方案的可实现性。In the embodiments of the present application, by setting specific values, the feasibility of the solution is improved.
可选地,在一种可能的实现方式中,第二网络设备在获得了应用相关度信息之后,第二网络设备向第一网络设备发送该应用相关度信息。Optionally, in a possible implementation manner, after the second network device obtains the application relevance information, the second network device sends the application relevance information to the first network device.
本申请实施例中,在第二网络设备获得了应用相关度信息之后,且向第一网络设备发送该应用相关度信息,提升了方案的可实现性。In the embodiment of the present application, after the second network device obtains the application relevance information, the application relevance information is sent to the first network device, which improves the feasibility of the solution.
本申请第三方面提供了一种网络设备。The third aspect of this application provides a network device.
获取单元,用于获取待检测数据;The obtaining unit is used to obtain the data to be detected;
处理单元,用于根据待检测数据得到w个第一特征区域,第一特征区域包括待检测数据中至少一个字节的数据,w为正整数;A processing unit, configured to obtain w first characteristic regions according to the data to be detected, the first characteristic regions include at least one byte of data in the data to be detected, and w is a positive integer;
确定单元,用于根据w个第一特征区域和应用相关度信息确定待检测数据对应的应用类别,应用相关度信息指示第一特征区域和应用类别之间的相关度。The determining unit is configured to determine the application category corresponding to the data to be detected according to the w first feature regions and the application correlation information, and the application correlation information indicates the correlation between the first feature region and the application category.
确定单元具体用于根据w个第一特征区域和应用相关度信息确定第一特征区域对应的应用类别,以及第一特征区域与对应的应用类别之间的区域相关度;The determining unit is specifically configured to determine the application category corresponding to the first feature area and the regional correlation between the first feature area and the corresponding application category according to the w first feature areas and application correlation information;
统计单元,用于基于应用类别统计与每个应用类别对应的第一特征区域的区域相关度之和;A statistical unit, configured to count the sum of the regional correlations of the first feature region corresponding to each application category based on the application category;
确定单元还用于基于与第一应用类别对应的第一特征区域的区域相关度之和是最大值,确定待检测数据对应于第一应用类别。The determining unit is further configured to determine that the to-be-detected data corresponds to the first application category based on that the sum of the regional correlations of the first feature area corresponding to the first application category is the maximum value.
可选的,应用相关度信息包括p个第三特征区域的相关度信息,其中第三特征区域的相关度信息包括第三特征区域,第三特征区域对应的应用类别,以及第三特征区域与对应的应用类别之间的区域相关度;p个第三特征区域包括w个第一特征区域中至少1个特征区域。Optionally, the application correlation information includes the correlation information of p third characteristic regions, where the correlation information of the third characteristic region includes the third characteristic region, the application category corresponding to the third characteristic region, and the relationship between the third characteristic region and the third characteristic region. The regional correlation between the corresponding application categories; the p third characteristic regions include at least one characteristic region among the w first characteristic regions.
可选的,待检测数据包括至少一个报文的前K个字节;Optionally, the data to be detected includes the first K bytes of at least one message;
处理单元具体用于对至少一个报文的前K个字节做滑动窗口处理,以得到w个第一特征区域。The processing unit is specifically configured to perform sliding window processing on the first K bytes of at least one message to obtain w first characteristic regions.
可选的,第一特征区域包括连续的s个字节,s为大于1的整数。Optionally, the first characteristic area includes s consecutive bytes, and s is an integer greater than 1.
可选的,获取单元还用于获取第一数据,第一数据包括第一应用类别对应的字节数据;Optionally, the acquiring unit is further configured to acquire first data, where the first data includes byte data corresponding to the first application category;
网络设备还包括:Network equipment also includes:
输入单元,用于将第一数据输入第一模型,其中,第一模型的输出为第一应用类别;The input unit is used to input the first data into the first model, where the output of the first model is the first application category;
处理单元还用于基于第一应用类别以及第一模型得到n个第二特征区域,第二特征区域包括第一数据中q个相邻字节,n为正整数,q为正整数;The processing unit is further configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
确定单元还用于确定第二特征区域与第一应用类别的区域相关度;The determining unit is further configured to determine the regional correlation between the second characteristic area and the first application category;
网络设备还包括:Network equipment also includes:
生成单元,用于基于第二特征区域与第一应用类别的区域相关度生成应用相关度信息。The generating unit is configured to generate application relevance information based on the relevance of the second characteristic area and the area of the first application category.
可选的,应用相关度信息包括第二特征区域相关度信息,第二特征区域相关度信息包括第二特征区域,第二特征区域对应的第一应用类别,第二特征区域与第一应用类别的区域相关度;Optionally, the application relevance information includes second feature area relevance information, the second feature area relevance information includes a second feature area, the first application category corresponding to the second feature area, and the second feature area and the first application category Regional relevance;
n个第二特征区域包括w个第一特征区域中至少一个第一特征区域,待检测数据对应的应用类别为第一应用类别。The n second characteristic regions include at least one first characteristic region among the w first characteristic regions, and the application category corresponding to the data to be detected is the first application category.
本申请第四方面提供了一种网络设备。The fourth aspect of the present application provides a network device.
获取单元,用于获取第一数据,第一数据包括第一应用类别对应的字节数据;An acquiring unit, configured to acquire first data, where the first data includes byte data corresponding to the first application category;
输入单元,用于将第一数据输入第一模型,其中,第一模型的输出为第一应用类别;The input unit is used to input the first data into the first model, where the output of the first model is the first application category;
处理单元,用于基于第一应用类别以及第一模型得到n个第二特征区域,第二特征区域包括第一数据中q个相邻字节,n为正整数,q为正整数;The processing unit is configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
确定单元,用于确定第二特征区域与第一应用类别的区域相关度;The determining unit is used to determine the regional correlation between the second characteristic area and the first application category;
生成单元,用于基于第二特征区域与第一应用类别的区域相关度生成应用相关度信息。The generating unit is configured to generate application relevance information based on the relevance of the second characteristic area and the area of the first application category.
可选的,应用相关度信息包括第二特征区域相关度信息,第二特征区域相关度信息包括第二特征区域,第二特征区域对应的第一应用类别,第二特征区域与第一应用类别的区域相关度。Optionally, the application relevance information includes second feature area relevance information, the second feature area relevance information includes a second feature area, the first application category corresponding to the second feature area, and the second feature area and the first application category Regional relevance.
可选的,处理单元具体用于基于第一应用类别以及第一模型得到h个第一特征值,第一特征值指示第一应用类别与第一数据中第一特征点的相关度,第一特征点包括第一数据中一个字节的数据,h为正整数;Optionally, the processing unit is specifically configured to obtain h first feature values based on the first application category and the first model, where the first feature value indicates the correlation between the first application category and the first feature point in the first data, and the first The characteristic point includes one byte of data in the first data, and h is a positive integer;
处理单元具体用于根据h个第一特征值得到n个第二特征区域。The processing unit is specifically configured to obtain n second characteristic regions according to the h first characteristic values.
可选的,获取单元还用于根据h个第一特征值获取z个目标特征点,目标特征点的特征值为h个第一特征值中按数值从大到小的顺序排列的前z个特征值中的一个,z为正整数,z小于或等于h的整数;Optionally, the obtaining unit is further configured to obtain z target feature points according to the h first feature values, and the feature value of the target feature point is the first z of the h first feature values arranged in descending order of value. One of the eigenvalues, z is a positive integer, and z is an integer less than or equal to h;
处理单元具体用于根据z个目标特征点得到n个第二特征区域,每个第二特征区域包含至少一个目标特征点。The processing unit is specifically configured to obtain n second feature regions according to z target feature points, and each second feature region includes at least one target feature point.
可选的,第二特征区域的中点为目标特征点。Optionally, the midpoint of the second feature region is the target feature point.
可选的,n个第二特征区域包括第六特征区域和第四特征区域,若第六特征区域中的特征点和第四特征区域中的特征点重复的比例大于第一预设阈值,且第六特征区域在第一应用类别中对应的应用类别的特征区域中出现的次数大于第四特征区域在第一应用类别中对应的应用类别的特征区域中出现的次数,则网络设备还包括:Optionally, the n second feature regions include a sixth feature region and a fourth feature region, if the ratio of feature points in the sixth feature region and feature points in the fourth feature region is greater than the first preset threshold, and The number of times that the sixth characteristic area appears in the characteristic area of the corresponding application category in the first application category is greater than the number of times the fourth characteristic area appears in the characteristic area of the corresponding application category in the first application category, then the network device further includes:
处理单元,用于删除第四特征区域的信息。The processing unit is used to delete the information of the fourth characteristic area.
可选的,n个第二特征区域包括第五特征区域,若第五特征区域在应用相关度信息中对应至少两个应用类别,则处理单元还用于删除第五特征区域的信息。Optionally, the n second characteristic regions include a fifth characteristic region, and if the fifth characteristic region corresponds to at least two application categories in the application relevance information, the processing unit is further configured to delete the information of the fifth characteristic region.
本申请第五方面提供了一种网络设备。The fifth aspect of the present application provides a network device.
至少一个处理器和存储器,存储器存储了程序代码,处理器调用程序代码以执行如本申请第一方面实施方式所述的方法。At least one processor and a memory, the memory stores program code, and the processor calls the program code to execute the method described in the implementation manner of the first aspect of the present application.
本申请第六方面提供了一种网络设备。The sixth aspect of the present application provides a network device.
至少一个处理器和存储器,存储器存储了程序代码,处理器调用程序代码以执行如本申请第二方面实施方式所述的方法。At least one processor and a memory, the memory stores program code, and the processor calls the program code to execute the method described in the implementation manner of the second aspect of the present application.
本申请第七方面提供了一种应用识别系统,包括第一网络设备和第二网络设备。The seventh aspect of the present application provides an application identification system, including a first network device and a second network device.
第一网络设备用于执行如本申请第一方面实施方式所述的方法。The first network device is used to execute the method described in the implementation manner of the first aspect of the present application.
第二网络设备用于执行如本申请第二方面实施方式所述的方法。The second network device is used to execute the method described in the implementation manner of the second aspect of the present application.
第二网络设备用于向第一网络设备发送应用相关度信息。The second network device is used to send application relevance information to the first network device.
本申请第八方面提供了一种计算机存储介质,所述计算机存储介质中存储有指令,所述指令在所述计算机上执行时,使得计算机执行如本申请第一方面,和/或,第二方面实施方式所述的方法。The eighth aspect of the present application provides a computer storage medium. The computer storage medium stores instructions. When the instructions are executed on the computer, the computer executes the same as the first aspect of the present application, and/or the second Aspects implement the method described in the mode.
本申请第九方面提供了一种计算机程序产品,所述计算机程序产品在计算机上执行时,使得所述计算机执行如本申请第一方面,和/或,第二方面实施方式所述的方法。The ninth aspect of the present application provides a computer program product. When the computer program product is executed on a computer, the computer executes the method described in the first aspect of the present application and/or the implementation manner of the second aspect.
从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:
第一网络设备获取待检测数据,并处理该待检测数据得到第一特征区域,根据获取到的应用相关度信息和该第一特征区域确定待检测数据的应用类别,而不需要将数据明文解析,提升了用户数据的安全性。The first network device obtains the data to be detected, and processes the data to be detected to obtain the first characteristic area, and determines the application category of the data to be detected according to the obtained application correlation information and the first characteristic area, without the need to parse the data in plaintext , Improve the security of user data.
附图说明Description of the drawings
图1为本申请实施例中一个网络架构示意图;Figure 1 is a schematic diagram of a network architecture in an embodiment of the application;
图2为本申请实施例中一个数据处理方法流程示意图;Figure 2 is a schematic flowchart of a data processing method in an embodiment of the application;
图3为本申请实施例中另一数据处理方法流程示意图;3 is a schematic flowchart of another data processing method in an embodiment of the application;
图4为本申请实施例中一个网络设备结构示意图;FIG. 4 is a schematic diagram of the structure of a network device in an embodiment of the application;
图5为本申请实施例中另一网络设备结构示意图;FIG. 5 is a schematic structural diagram of another network device in an embodiment of this application;
图6为本申请实施例中另一网络设备结构示意图;FIG. 6 is a schematic structural diagram of another network device in an embodiment of this application;
图7为本申请实施例中另一网络设备结构示意图;FIG. 7 is a schematic structural diagram of another network device in an embodiment of this application;
图8为本申请实施例中另一网络设备结构示意图。Fig. 8 is a schematic structural diagram of another network device in an embodiment of this application.
具体实施方式Detailed ways
本申请实施例提供了一种数据处理方法及其装置,用于对管道数据的应用识别中,通过获取管道码流中的待检测数据,并处理该待检测数据得到第一特征区域,根据应用相关度信息和第一特征区域确定待检测数据的应用类别,而不需要将数据明文解析,提升了用户数据的安全性。The embodiment of the application provides a data processing method and device, which are used in the application identification of pipeline data by obtaining the data to be detected in the pipeline code stream, and processing the data to be detected to obtain the first characteristic area, and according to the application The relevance information and the first feature area determine the application category of the data to be detected, without the need to parse the data in plain text, which improves the security of user data.
请参阅图1,为本申请提供的网络架构示意图。Please refer to Figure 1, which is a schematic diagram of the network architecture provided for this application.
本申请实施例提供了一种示例性的网络架构。The embodiment of the present application provides an exemplary network architecture.
该网络架构至少包括第一网络设备101。The network architecture includes at least the first network device 101.
第一网络设备101可以与网络管道连接,该网络管道用于传输数据,该网络管道可以是局域网中的网络管道,也可以是广域网中的网络管道,还可以是其他场景下的网络管道,具体此处不做限定。The first network device 101 can be connected to a network pipe, which is used to transmit data. The network pipe can be a network pipe in a local area network, a network pipe in a wide area network, or a network pipe in other scenarios. There is no limitation here.
例如,第一网络设备101可以安装在路由器与核心网之间,通过有线或者无线的方式连接,也可以安装在核心网与防火墙之间,还可以安装在局域网中,只要第一网络设备101连接到网络管道中即可,例如网络流量的汇聚节点处、网络流量的流经节点处等,具体此处不做限定。For example, the first network device 101 can be installed between the router and the core network, connected by wired or wireless, can also be installed between the core network and the firewall, or can be installed in the local area network, as long as the first network device 101 is connected Just go to the network pipeline, such as the convergence node of the network traffic, the node where the network traffic flows through, etc. The specifics are not limited here.
第一网络设备101用于生成应用相关度信息,进而在线实时获取待检测数据,并通过应用相关度信息确定该待检测数据的应用类别。The first network device 101 is configured to generate application relevance information, and then obtain the data to be detected online in real time, and determine the application category of the data to be detected through the application relevance information.
具体地,第一网络设备101用于根据应用相关度信息识别数据传输管道中的报文所对应的应用类别,将属于不同应用类型的数据包流量区分出来,用于进行数据分析。第一网络设备101可以是单独功能的服务器,例如单独的应用识别服务器,也可以集成于现有的服务器当中,例如集成于网络管理服务器中,或者集成于网络监控服务器中,或者集成于流量管理服务器中等,具体的服务器形式此处不做限定。Specifically, the first network device 101 is configured to identify the application category corresponding to the message in the data transmission pipeline according to the application correlation information, and distinguish the data packet traffic belonging to different application types for data analysis. The first network device 101 can be a server with a separate function, such as a separate application identification server, or can be integrated into an existing server, such as integrated in a network management server, or integrated in a network monitoring server, or integrated in traffic management The server is medium, and the specific server format is not limited here.
例如,基站接收终端发送的数据,并经由网络管道向路由传输数据,路由中转分配数据后,将数据向核心网传输,核心网再向数据目的地传输需要传输的数据,途经防火墙,最终到达接收数据方。在此过程中,第一网络设备101连接到网络管道中,例如网络流量的汇聚节点处、网络流量的流经节点处等,第一网络设备101在通信网络中数据流经的地方,镜像出一部分数据来进行应用识别分析。For example, the base station receives the data sent by the terminal and transmits the data to the route through the network channel. After the route transfers the data, the data is transmitted to the core network, and the core network then transmits the data that needs to be transmitted to the data destination, passing through the firewall, and finally arrives at the receiver. Data party. In this process, the first network device 101 is connected to the network pipeline, such as the convergence node of the network traffic, the node that the network traffic flows through, etc. The first network device 101 mirrors the place where the data flows in the communication network. Part of the data is used for application identification analysis.
需要说明的是,在本申请实施例中,在该数据传输场景中,可以是一台第一网络设备101单独存在,也可以是多台第一网络设备101同时存在,具体此处不做限定。It should be noted that, in the embodiment of the present application, in the data transmission scenario, one first network device 101 may exist alone, or multiple first network devices 101 may exist at the same time, and the details are not limited here. .
可选的,该网络架构还可以包括终端设备103,当第一网络设备确定了待检测数据的应用类别时,第一网络设备101可以向终端设备103发送该应用类别的数据,以使得终端设备103可以接收该应用类别的数据,进而处理该应用类别的数据,例如显示该应用类别的数据,具体处理方式此处不做限定。Optionally, the network architecture may further include a terminal device 103. When the first network device determines the application category of the data to be detected, the first network device 101 may send the data of the application category to the terminal device 103, so that the terminal device 103 may receive the data of the application category, and then process the data of the application category, for example, display the data of the application category. The specific processing method is not limited here.
可以理解的是,该终端设备103可以是一种计算机设备,还可以是其他设备,例如网络管理设备,具体此处不做限定。It can be understood that the terminal device 103 may be a computer device, or other devices, such as a network management device, which is not specifically limited here.
可选的,该网络架构还可以包括第二网络设备102,该第二网络设备102可以是离线单独的工作,也可以和第一网络设备连接。该第二网络设备102可以用于离线侧,即获取用于训练模型的第一数据,进而通过该第一数据训练得到第一模型,并根据训练好的第一模型和所述第一数据得到应用相关度信息。该第二网络设备102还用于将应用相关度信息发送给第一网络设备,具体此处不做限定。Optionally, the network architecture may further include a second network device 102, and the second network device 102 may work offline and independently, or may be connected to the first network device. The second network device 102 can be used on the offline side, that is, to obtain the first data used to train the model, and then train the first model through the first data, and obtain the first model according to the trained first model and the first data. Application relevance information. The second network device 102 is also used to send application relevance information to the first network device, which is not specifically limited here.
可选的,第一网络设备101也可以用于离线侧,得到应用相关度信息,则该网络架构不包括第二网络设备102。Optionally, the first network device 101 may also be used on the offline side to obtain application correlation information, then the network architecture does not include the second network device 102.
为了便于理解,本申请实施例对以下名词做基础解释:For ease of understanding, the embodiments of this application provide basic explanations of the following terms:
网络管道:用于承载网络数据包的设备统称。Network pipe: A collective name for equipment used to carry network data packets.
应用识别:识别管道中的流量属于什么应用类别,例如管道流量属于APP1、APP2等。Application identification: Identify which application category the traffic in the pipeline belongs to, for example, the pipeline traffic belongs to APP1, APP2, etc.
码流:网络中的数据包流。Code stream: the data packet stream in the network.
热图:通过色彩变化来表示数据重要性的可视化方式。例如,在热图中,越亮位置的数据,对应用识别的结果的影响越大。Heat map: A visual way to express the importance of data through color changes. For example, in the heat map, the brighter the data, the greater the impact on the result of application recognition.
激活区域:热图中数据影响比较大的位置区域,表示热图中比较亮的位置区域。Active area: the location area where the data in the heat map has a greater influence, indicating the brighter location area in the heat map.
拔测:类似于网络数据爬虫,从网络上截取数据包信息。Pull test: similar to a network data crawler, intercepting data packet information from the network.
下面结合图1的数据传输框架,对本申请实施例中的数据处理方法进行描述:The following describes the data processing method in the embodiment of the present application in conjunction with the data transmission framework of FIG. 1.
为了方便描述,本申请实施例中以第一网络设备和第二网络设备代替网络设备为例进行说明。For ease of description, in the embodiments of the present application, the first network device and the second network device instead of the network device are used as an example for description.
本申请实施例中,第一网络设备可以通过第一模型训练得到应用相关度信息,进而通过该应用相关度信息确定管道数据中的报文对应的应用类别,也可以接收其他网络设备发送的应用相关度信息,进而通过其他网络设备发送的应用相关度信息确定管道数据中的报文对应的应用类别,因此本申请实施例的具体实施方式有几种,下面分别进行描述。In the embodiment of the present application, the first network device can obtain application relevance information through the first model training, and then determine the application category corresponding to the message in the pipeline data through the application relevance information, and can also receive applications sent by other network devices The relevance information is used to determine the application category corresponding to the message in the pipeline data through the application relevance information sent by other network devices. Therefore, there are several specific implementation manners of the embodiments of the present application, which are described below.
一、第一网络设备生成应用相关度信息。1. The first network device generates application correlation information.
请参阅图2,为本申请提供的数据处理方法的一个实施例的流程示意图。Please refer to FIG. 2, which is a schematic flowchart of an embodiment of the data processing method provided by this application.
需要说明的是,本实施例可以分为在线侧和离线侧,在线侧即为在线实时识别在线数据 流对应的应用类别,离线侧即为通过搭建训练模型进行训练,以获得应用相关度信息的过程,该应用相关度信息可以用于在线侧识别在线数据流对应的应用类别。首先对离线侧进行描述。It should be noted that this embodiment can be divided into an online side and an offline side. The online side is the online real-time identification of the application category corresponding to the online data stream, and the offline side is the training model to obtain application relevance information. In the process, the application correlation information can be used on the online side to identify the application category corresponding to the online data stream. First, the offline side will be described.
在步骤201中,第二网络设备获取第一数据。In step 201, the second network device obtains the first data.
第二网络设备获取第一应用类别对应的数据流,该数据流包括第一应用类别对应的字节数据,可选的,第二网络设备可以通过拔测的方式获取管道数据中的数据流,也可以通过其他设备收集到多份数据流后,再统一发送给该第二网络设备,具体此处不做限定。The second network device obtains the data stream corresponding to the first application category, and the data stream includes byte data corresponding to the first application category. Optionally, the second network device may obtain the data stream in the pipeline data by means of plug-in testing, It is also possible to collect multiple data streams through other devices, and then uniformly send them to the second network device, which is not specifically limited here.
当第二网络设备获取一个应用类别对应的数据流时,第二网络设备获取的数据流可以如下所示:When the second network device obtains a data stream corresponding to an application category, the data stream obtained by the second network device may be as follows:
数据流1:82 0a 2a 2e 67 76 74 32……=====>app1Data stream 1: 82 0a 2a 2e 67 76 74 32…… =====>app1
应理解,第二网络设备还可以获取多个应用类别对应的数据流,如下所示:It should be understood that the second network device may also obtain data streams corresponding to multiple application categories, as shown below:
数据流2 53 88 01 bb b8 bc 6a 14……=====>app2Data stream 2 53 88 01 bb b8 bc 6a 14…… =====>app2
数据流3 29 6f e5 6d d3 9c 80 10……=====>app1Data stream 3 29 6f 6d d3 9c 80 10…… =====>app1
需要说明的是,在本申请实施例未说明的情况下,以第二网络设备获取和处理第一应用类别的数据为例进行说明。应理解,对于获取和处理多个应用类别的数据也是类似的,本申请不构成限定。It should be noted that, in the case where the embodiment of the present application is not described, the second network device acquires and processes the data of the first application category as an example for description. It should be understood that the acquisition and processing of data of multiple application categories is similar, and this application does not constitute a limitation.
数据流在管道中传输时,以二进制数据的方式在传输,当获取到该数据流时,该数据流的显示方式可以是二进制,也可以是转换后的十六进制,具体此处不做限定,本申请以显示方式为十六进制为例进行说明。该第二网络设备截取该数据流的前K个字节数据,该截取的前K个字节数据包括地址信息对应的数据,域名信息对应的数据等等基础信息相关的数据。例如,K=784,则当第二网络设备从管道数据中获取到数据流时,截取该数据流的前784个字节。另外,本申请以字节为单位进行描述,本申请也可以以比特等为单位,具体此处不做限定。When the data stream is transmitted in the pipeline, it is transmitted in the form of binary data. When the data stream is obtained, the display mode of the data stream can be binary or converted hexadecimal, which is not specifically done here. For limitation, this application uses hexadecimal as an example for description. The second network device intercepts the first K bytes of data of the data stream, and the intercepted first K bytes of data includes data corresponding to address information, data corresponding to domain name information, and other basic information-related data. For example, K=784, when the second network device obtains a data stream from the pipeline data, it intercepts the first 784 bytes of the data stream. In addition, this application is described in units of bytes, and this application may also be described in units of bits, etc., which is not specifically limited here.
需要说明的是,在实际应用过程中,第二网络设备也可以不截取该数据流的前K个字节数据,而以该数据流来做后续处理,具体此处不做限定。It should be noted that in the actual application process, the second network device may not intercept the first K bytes of data of the data stream, but use the data stream for subsequent processing, which is not specifically limited here.
需要说明的是,当第二网络设备截取该数据流的前K个字节数据时,该第一数据即包括该数据流的前K个字节数据,当第二网络设备不对该数据流进行截取时,则该第一数据即为该数据流。It should be noted that when the second network device intercepts the first K bytes of data of the data stream, the first data includes the first K bytes of data of the data stream. When the second network device does not perform the data stream When intercepting, the first data is the data stream.
需要说明的是,当存在多个数据流的字节数据时,则该第一数据包括该多个数据流的字节数据。It should be noted that when there are byte data of multiple data streams, the first data includes byte data of the multiple data streams.
在步骤202中,第二网络设备训练第一模型。In step 202, the second network device trains the first model.
第二网络设备搭建一个多层的卷积神经网络,可以理解的是,该多层的卷积神经网络可以是三层,也可以是五层,可以是VGG类型的神经网络,还可以是ResNet类型的神经网络,具体此处不做限定。The second network device builds a multi-layer convolutional neural network. It is understandable that the multi-layer convolutional neural network can be three or five layers, it can be a VGG type neural network, or it can be a ResNet The type of neural network is not limited here.
例如,当搭建的卷积神经网络为五层时,该五层的卷积神经网络其结构依次是输入层、第一隐藏层、第二隐藏层、第三隐藏层和输出层。输入层的节点数等于K,和第二网络设备截取报文中前K个字节信息中的K相同。For example, when the constructed convolutional neural network has five layers, the structure of the five-layer convolutional neural network is an input layer, a first hidden layer, a second hidden layer, a third hidden layer, and an output layer in order. The number of nodes in the input layer is equal to K, which is the same as K in the first K bytes of information in the intercepted message by the second network device.
神经网络中输出层的节点数为应用类别数。当该模型用于训练一个应用类别的数据时,则输出层的节点数为一个节点,当该模型用于训练多个应用类别的数据时,则输出层的节点 数为对应的多个节点。The number of nodes in the output layer of the neural network is the number of application categories. When the model is used to train data of one application category, the number of nodes in the output layer is one node. When the model is used to train data of multiple application categories, the number of nodes in the output layer is the corresponding multiple nodes.
输入层数据采用卷积操作,并通过线性整流函数(rectified linear unit,ReLU),生成第一个隐藏层,对第一个隐藏层采用卷积操作,并通过ReLU激活函数,生成第二个隐藏层,对第二个隐藏层,采用全局平均池化(global average pooling,GAP)操作,生成第三个隐藏层,对第三个隐藏层采用全连接操作,并通过归一化指数函数softmax激活函数,生成输出层数据。The input layer data adopts convolution operation, and the linear rectification function (rectified linear unit, ReLU) is used to generate the first hidden layer, the first hidden layer is convolution operation, and the ReLU activation function is used to generate the second hidden layer. Layer, for the second hidden layer, use global average pooling (GAP) operation to generate the third hidden layer, and use the fully connected operation for the third hidden layer, and activate it through the normalized exponential function softmax Function to generate output layer data.
将获取到的第一数据输入到搭建完成的神经网络中,即将第一数据输入到第一模型中。可选的,在将第一数据输入到第一模型之前,还可以将第一数据进行归一化处理,以得到用于第一模型训练的归一化数据,例如,可以通过以下方法实现归一化:Input the acquired first data into the constructed neural network, that is, input the first data into the first model. Optionally, before inputting the first data to the first model, the first data may be normalized to obtain normalized data for training of the first model. For example, the normalization can be achieved by the following method: One:
归一化数据=第一数据/255Normalized data = first data/255
第一数据输入到第一模型后,经过该第一模型的前向运算,得到对应的预测类别,计算该预测类别与第一应用类别的交叉熵损失值,执行梯度下降法,更新模型参数,当达到最大的训练迭代次数或者当输出预测类别的准确度达到预设阈值时,则完成该第一模型的训练。应理解,当第一模型训练完成时,将第一应用类别对应的数据输入第一模型,则所述第一模型的输出为第一应用类别。After the first data is input to the first model, the corresponding prediction category is obtained through the forward operation of the first model, the cross entropy loss value of the prediction category and the first application category is calculated, the gradient descent method is executed, and the model parameters are updated. When the maximum number of training iterations is reached or when the accuracy of the output prediction category reaches a preset threshold, the training of the first model is completed. It should be understood that when the training of the first model is completed, the data corresponding to the first application category is input to the first model, and the output of the first model is the first application category.
在步骤203中,第二网络设备基于第一应用类别以及第一模型得到h个第一特征值。In step 203, the second network device obtains h first feature values based on the first application category and the first model.
第二网络设备基于第一应用类别以及第一模型得到h个第一特征值,该第一特征值表示第一数据中第一特征点与第一应用类别之间的相关度,该第一特征点指的是一个字节的数据,且第一特征点对应的特征值越大表示与第一应用类别的相关度越高。The second network device obtains h first feature values based on the first application category and the first model. The first feature value represents the correlation between the first feature point in the first data and the first application category. The first feature The point refers to one byte of data, and the larger the feature value corresponding to the first feature point, the higher the correlation with the first application category.
应理解,第一特征点也可以指多个字节的数据,例如2个字节等,本申请不构成限定。It should be understood that the first feature point may also refer to multiple bytes of data, such as 2 bytes, etc., and this application does not constitute a limitation.
该第二网络设备可以通过多种方式得到该h个第一特征值,例如,该第二网络设备基于第一应用类别和第一模型的架构中最后一个隐藏层的数据得到连接权重值,将得到的连接权重值和对应的倒数第二个隐藏层数值进行相乘,得到加权后的特征信息,对加权后的特征信息进行加法操作,并通过上采样操作到第一数据,得到该h个第一特征值。The second network device can obtain the h first feature values in a variety of ways. For example, the second network device obtains the connection weight value based on the first application category and the data of the last hidden layer in the architecture of the first model. The obtained connection weight value is multiplied with the corresponding penultimate hidden layer value to obtain the weighted feature information. The weighted feature information is added, and the first data is up-sampling to obtain the h The first characteristic value.
可以理解的是,在不同的模型架构下,还可以有不同的方式得到该h个第一特征值,本实施例为示意性的举例,并不对获取该h个第一特征值的方式做具体的限定。It is understandable that under different model architectures, the h first eigenvalues can also be obtained in different ways. This embodiment is a schematic example, and the method for obtaining the h first eigenvalues is not specifically described. The limit.
在步骤204中,第二网络设备根据h个第一特征值获取z个目标特征点。In step 204, the second network device obtains z target feature points according to the h first feature values.
在获得了h个第一特征值之后,第二网络设备获取h个第一特征值中数值大小排序在前z个的特征值,z为小于或等于h的正整数,从而获取该z个特征值对应的z个特征点。以该z个特征点为z个目标特征点。After obtaining the h first feature values, the second network device obtains the first z feature values of the h first feature values, where z is a positive integer less than or equal to h, thereby obtaining the z features Z feature points corresponding to the value. Take the z feature points as z target feature points.
例如,该h(h=10)个第一特征值分别为0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.95,那么获取该第一特征值中数据大小排序在前z(z=5)个的特征值,则获取0.6,0.7,0.8,0.9,0.95,该0.6,0.7,0.8,0.9,0.95在第一数据中对应的特征点分别为5E,1C,B2,E0,A6,即该z(z=5)个目标特征点为5E,1C,B2,E0,A6。For example, if the h (h=10) first eigenvalues are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, then the data size in the first eigenvalue is ranked first. (z=5) feature values, then get 0.6, 0.7, 0.8, 0.9, 0.95, the corresponding feature points of 0.6, 0.7, 0.8, 0.9, 0.95 in the first data are 5E, 1C, B2, E0 , A6, that is, the z (z=5) target feature points are 5E, 1C, B2, E0, A6.
在实际应用过程中,不同的特征点特征值可能相同,例如:In the actual application process, the eigenvalues of different feature points may be the same, for example:
h(h=10)个第一特征值分别为,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.9,0.95,那么获取该第一特征值中数据大小排序在前z(z=5)个的特征值,则获取0.7,0.8,0.9,0.9,95,该0.7,0.8,0.9,0.9,0.95在第一数据中对应的特征点分别为6E,3C,B5,E2,A7,即该z(z=5)目 标特征点为6E,3C,B5,E2,A7。h(h=10) the first eigenvalues are 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.9, 0.95, then the data size in the first eigenvalue is ranked first z(z = 5) feature values, then get 0.7, 0.8, 0.9, 0.9, 95, the corresponding feature points of 0.7, 0.8, 0.9, 0.9, 0.95 in the first data are 6E, 3C, B5, E2, A7 , That is, the target feature points of z (z=5) are 6E, 3C, B5, E2, A7.
在步骤205中,第二网络设备根据z个目标特征点得到n个第二特征区域。In step 205, the second network device obtains n second feature regions according to z target feature points.
在第二网络设备获取到z个目标特征点之后,第二网络设备在第一数据中截取包括至少一个目标特征点的连续的一个或多个特征点以得到第二特征区域,以此类推,将会得到n个第二特征区域,n为大于或者等于z的正整数,下文以第二特征区域包括q个连续的特征点为例进行说明,q为大于等于1的整数。应理解,对于不同的第二特征区域,q的值可以不同。After the second network device obtains z target feature points, the second network device intercepts one or more consecutive feature points including at least one target feature point in the first data to obtain the second feature area, and so on, There will be n second feature regions, where n is a positive integer greater than or equal to z. The following will take the second feature region including q consecutive feature points as an example for description, and q is an integer greater than or equal to 1. It should be understood that for different second feature regions, the value of q may be different.
可选的,第二特征区域的中心点为目标特征点。Optionally, the center point of the second feature region is the target feature point.
例如,以第一数据中z个目标特征点中每个目标特征点为中心点,左右各截取a个特征点,a的取值范围为a∈[B1,B2],则总共构建a*(B2-B1+1)个第二特征区域。而第二特征区域包含(2a+1)个特征点的特征区域。应理解,n=a*(B2-B1+1),q=2a+1。For example, taking each of the z target feature points in the first data as the center point, and intercepting a feature point on the left and right sides, and the value range of a is a ∈ [B1, B2], a total of a*( B2-B1+1) the second feature area. The second feature area contains (2a+1) feature points. It should be understood that n=a*(B2-B1+1) and q=2a+1.
需要说明的是,步骤203至步骤205为对一条数据流执行的步骤,当存在多条数据流时,则重复执行步骤203至步骤205。对于多个应用类别的数据流,则对多个应用类别对应的数据流分别进行步骤203至步骤205的处理。It should be noted that step 203 to step 205 are steps performed on one data stream. When there are multiple data streams, step 203 to step 205 are repeated. For data streams of multiple application categories, the processing of step 203 to step 205 is performed on the data streams corresponding to the multiple application categories, respectively.
在步骤206中,第二网络设备确定第二特征区域与第一应用类别的区域相关度。In step 206, the second network device determines the regional correlation between the second characteristic area and the first application category.
第二网络设备在处理了多条数据流之后,进一步确定第二特征区域与第二特征区域对应的应用类别之间的区域相关度。After processing the multiple data streams, the second network device further determines the regional correlation between the second characteristic area and the application category corresponding to the second characteristic area.
对于多条数据流都对应第一应用类别的情况:第二网络设备统计n个第二特征区域中,相同第二特征区域的数量,进一步得到该第二特征区域在第一应用类别对应的n个第二特征区域中出现的概率,从而得到了该第二特征区域与第一应用类别的区域相关度。如表1a所示:For the case where multiple data streams correspond to the first application category: the second network device counts the number of the same second characteristic area in n second characteristic areas, and further obtains the n corresponding to the second characteristic area in the first application category. The probability of occurrence in the second characteristic area, thereby obtaining the regional correlation degree between the second characteristic area and the first application category. As shown in Table 1a:
Figure PCTCN2020129007-appb-000001
Figure PCTCN2020129007-appb-000001
表1aTable 1a
以表1a中第二特征区域“82,0a,2a,2e,67,76,74,32”为例来进行说明,该第二特征区域的数量(又可以叫做出现的次数)为80,该第二特征区域对应于第一应用类别app1,该第一应用类别的第二特征区域总数为100,则该第二特征区域和与其对应的第一应用类别之间的区域相关度为80/100,即0.8。Take the second feature area "82, 0a, 2a, 2e, 67, 76, 74, 32" in Table 1a as an example for illustration. The number of the second feature area (also called the number of occurrences) is 80, the The second feature area corresponds to the first application category app1. The total number of second feature areas of the first application category is 100, and the area correlation between the second feature area and its corresponding first application category is 80/100 , Which is 0.8.
对于多条数据流对应多个应用类别的情况:第二网络设备分别统计多个应用类别中每个应用类别对应的第二特征区域中,相同第二特征区域的数量,进一步得到该第二特征区域在与其对应的应用类别的第二特征区域总量中出现的概率,从而得到了该第二特征区域和与其对应的应用类别之间的区域相关度。如表1b所示:For the case where multiple data streams correspond to multiple application categories: the second network device separately counts the number of the same second feature area in the second feature area corresponding to each application category in the multiple application categories, and further obtains the second feature The probability of the region appearing in the total amount of the second feature region of the corresponding application category, thereby obtaining the regional correlation between the second feature region and the corresponding application category. As shown in Table 1b:
Figure PCTCN2020129007-appb-000002
Figure PCTCN2020129007-appb-000002
表1bTable 1b
以表1b中第二特征区域“53,99,55,b9,b4,b8,3a,25”为例来进行说明,该第二特征区域的数量为30,该第二特征区域对应于第二应用类别app2,该第二应用类别app2的第二特征区域总数为300,则该第二特征区域和与其对应的第二应用类别app2之间的区域相关度为30/300,即0.1。应理解,该情况下,第二网络设备可以对各应用类别的第二特征区域分别做统计和计算。Take the second feature area "53, 99, 55, b9, b4, b8, 3a, 25" in Table 1b as an example for illustration. The number of the second feature area is 30, and the second feature area corresponds to the second feature area. Application category app2, the total number of second feature areas of the second application category app2 is 300, and the regional correlation between the second feature area and the corresponding second application category app2 is 30/300, that is, 0.1. It should be understood that, in this case, the second network device may perform statistics and calculations on the second characteristic regions of each application category.
需要说明的是,在实际应用过程中,还可能存在某个特征区域的类别偏好权重,即在计算区域相关度时设置类别偏好权重值,则该计算区域相关度的方式还可以是(某个特征区域在其对应的应用类别中所有的特征区域中出现的次数*类别偏好权重值)/该应用类别中所有的特征区域数量,具体此处不做限定。It should be noted that in the actual application process, there may also be a category preference weight for a certain characteristic area, that is, when the category preference weight value is set when calculating the area relevance, the method for calculating the area relevance can also be (a certain The number of times that a characteristic area appears in all characteristic areas in its corresponding application category*category preference weight value)/the number of all characteristic areas in the application category is not specifically limited here.
可选地,在第二网络设备统计n个第二特征区域中每个特征区域在第一数据中出现的数量以及所属的应用类别时,若n个第二特征区域包括第六特征区域和第四特征区域,若第六特征区域中的特征点和第四特征区域中的特征点重复的比例大于第一预设阈值,且第六特征区域在第一应用类别中对应的应用类别的特征区域中出现的次数大于第四特征区域在第一应用类别中对应的应用类别的特征区域中出现的次数,则第二网络设备删除第四特征区域的信息。需要说明的是,若两个特征区域出现的次数相等,则任意删除其中一个特征区域,具体此处不做限定。Optionally, when the second network device counts the number of occurrences of each characteristic area in the first data in the n second characteristic areas and the application category to which it belongs, if the n second characteristic areas include the sixth characteristic area and the first characteristic area Four characteristic regions, if the proportion of the feature points in the sixth characteristic region and the characteristic points in the fourth characteristic region being repeated is greater than the first preset threshold, and the sixth characteristic region is the characteristic region of the application category corresponding to the first application category If the number of times that the fourth characteristic area appears in the characteristic area of the corresponding application category in the first application category is greater than the number of times that the fourth characteristic area appears in the characteristic area of the corresponding application category in the first application category, the second network device deletes the information of the fourth characteristic area. It should be noted that if the number of occurrences of the two characteristic regions is equal, one of the characteristic regions will be arbitrarily deleted, and the details are not limited here.
当存在两个相似度极高的特征区域时,则确定该两个特征区域在第一数据中的位置比较接近,因此删除两个相似度极高的特征区域中出现次数少的特征区域,有利于在计算区域相关度时提高准确率。When there are two feature regions with extremely high similarity, it is determined that the two feature regions are relatively close in position in the first data. Therefore, the feature regions that appear less frequently among the two feature regions with high similarity are deleted. It is helpful to improve the accuracy when calculating the regional correlation.
可选地,在第二网络设备统计n个第二特征区域中每个特征区域在第一数据中出现的数量以及所属的应用类别时,若n个第二特征区域还包括第五特征区域,且该第五特征区域对应两个以及两个以上的应用类别,则删除该第五特征区域。Optionally, when the second network device counts the number of occurrences of each characteristic area in the first data in the n second characteristic areas and the application category to which it belongs, if the n second characteristic areas further include the fifth characteristic area, And the fifth characteristic area corresponds to two or more application categories, the fifth characteristic area is deleted.
当一个特征区域对应两个以上的应用类别时,则说明该特征区域对应的特征点不是某个应用类别的基础特征,所以删除该特征区域,可以提高在线时确定数据流的效率。When a feature area corresponds to more than two application categories, it means that the feature point corresponding to the feature area is not the basic feature of a certain application category. Therefore, deleting the feature area can improve the efficiency of determining the data stream online.
在步骤207中,第二网络设备基于第二特征区域与第一应用类别的区域相关度生成应用相关度信息。In step 207, the second network device generates application relevance information based on the relevance of the second characteristic area to the area of the first application category.
应理解,若步骤206中,第二网络设备删除第五特征区域和/或第四特征区域,则应用相 关度信息不包含该特征区域的信息。It should be understood that if the second network device deletes the fifth characteristic area and/or the fourth characteristic area in step 206, the application correlation information does not include the information of the characteristic area.
在第二网络设备得到n个第二特征区域中特征区域的区域相关度后,第二网络设备根据该n个第二特征区域中特征区域的区域相关度生成应用相关度信息。After the second network device obtains the area relevance of the characteristic areas in the n second characteristic areas, the second network device generates application relevance information according to the area relevance of the characteristic areas in the n second characteristic areas.
对于多条数据流都对应第一应用类别的情况:应用相关度信息包括第二特征区域的区域相关度信息,第二特征区域的区域相关度信息包括第二特征区域,第二特征区域对应的第一应用类别,以及第二特征区域与第一应用类别的区域相关度。例如,该应用相关度信息可以如表2a所示,For the case where multiple data streams correspond to the first application category: the application relevance information includes the area relevance information of the second feature area, the area relevance information of the second feature area includes the second feature area, and the second feature area corresponds to The first application category, and the regional correlation between the second feature area and the first application category. For example, the application relevance information can be as shown in Table 2a,
特征区域Feature area 所属应用类别和区域相关度Application category and regional relevance
08 0a 2a 2e 67 53 2408 0a 2a 2e 67 53 24 <app1,0.8><app1,0.8>
53 88 01 bb b8 bc 6a 1e53 88 01 bb b8 bc 6a 1e <app1,0.1><app1,0.1>
…….……. ……...
表2aTable 2a
对于多条数据流都对应第一应用类别的情况:第二网络设备分别根据不同应用类别的n个第二特征区域中每个特征区域的区域相关度生成应用相关度信息。应用相关度信息包括第二特征区域的区域相关度信息,第二特征区域的区域相关度信息包括第二特征区域,第二特征区域对应的应用类别,以及第二特征区域和与第二特征区域对应的应用类别之间的区域相关度。例如,该应用相关度信息可以如表2b所示,For the case where multiple data streams correspond to the first application category: the second network device respectively generates application relevance information according to the regional relevance of each of the n second feature areas of different application categories. The application relevance information includes the area relevance information of the second feature area, and the area relevance information of the second feature area includes the second feature area, the application category corresponding to the second feature area, and the second feature area and the second feature area. The regional correlation between the corresponding application categories. For example, the application relevance information can be as shown in Table 2b,
特征区域Feature area 所属应用类别和区域相关度Application category and regional relevance
08 0a 2a 2e 67 53 2408 0a 2a 2e 67 53 24 <app1,0.8><app1,0.8>
53 88 01 bb b8 bc 6a 1e53 88 01 bb b8 bc 6a 1e <app2,0.2><app2,0.2>
…….……. ……...
表2bTable 2b
可以理解的是,该应用相关度信息还可以以其他形式存在,只要该应用相关度信息可以表示出特征区域与应用类别和该特征区域在该应用类别下的区域相关度的关联关系,例如可以通过热图的方式表现该应用相关度信息,可以理解的是,还可以通过其他方式表现该应用相关度信息,例如通过一维向量或者表格的方式,具体此处不做限定。It is understandable that the application relevance information can also exist in other forms, as long as the application relevance information can indicate the association relationship between the feature area and the application category and the regional relevance of the feature area under the application category, for example, The application relevance information is expressed in the form of a heat map. It is understandable that the application relevance information can also be expressed in other ways, such as a one-dimensional vector or a table, which is not specifically limited here.
步骤201至步骤207描述的是本实施例中离线侧的方法,以下步骤描述的是本实施例中在线侧的方法。 Steps 201 to 207 describe the method on the offline side in this embodiment, and the following steps describe the method on the online side in this embodiment.
请参阅图3,为本申请在线侧的流程示意图。Please refer to Figure 3, which is a schematic diagram of the process on the online side of this application.
在步骤301中,第一网络设备获取待检测数据。In step 301, the first network device obtains the data to be detected.
当应用相关度信息由第二网络设备生成时,则第一网络设备接收第二网络设备发送的应用相关度信息。当第一网络设备需要对管道数据中的数据包进行应用类别的识别分类时,第一网络设备获取该管道数据中的待检测数据。When the application relevance information is generated by the second network device, the first network device receives the application relevance information sent by the second network device. When the first network device needs to identify and classify the data packets in the pipeline data, the first network device acquires the data to be detected in the pipeline data.
需要说明的是,第一网络设备可以通过自己拨测的方式获取该待检测数据,还可以通过接收其他网关设备发送的待检测数据,具体此处不做限定。It should be noted that the first network device may obtain the data to be detected by dialing and testing by itself, and may also receive the data to be detected sent by other gateway devices, which is not specifically limited here.
在实际应用过程中,第一网络设备获取到的待检测数据可以是一个二进制的数据包,也可以是一个十六进制的数据包,具体此处不做限定。In the actual application process, the data to be detected obtained by the first network device may be a binary data packet or a hexadecimal data packet, and the specifics are not limited here.
在步骤302中,第一网络设备根据待检测数据得到w个第一特征区域。In step 302, the first network device obtains w first characteristic regions according to the data to be detected.
第一网络设备获取到待检测数据之后,截取该报文的前K个字节信息,即截取与离线侧训练模型时相同的字节信息,进而通过滑动窗口的方式,根据该K个字节信息生成w个第一 特征区域,w为大于等于1的正整数。After the first network device obtains the data to be detected, it intercepts the first K bytes of information of the message, that is, intercepts the same byte information as when training the model on the offline side, and then uses the sliding window method according to the K bytes The information generates w first feature regions, where w is a positive integer greater than or equal to 1.
例如,对于待检测数据:53 88 01 bb b8 bc 6a 1e 08 0a 2a 2e 67 53 24……,当滑动窗口的大小范围为[6,10]时,则滑动窗口的取值可以取6,7,8,9,10。当滑动窗口大小=6时,滑动步长为1,会生成如下的特征区域:For example, for the data to be detected: 53 88 01 bb b8 bc 6a 1e 08 0a 2a 2e 67 53 24... When the size range of the sliding window is [6, 10], the value of the sliding window can be 6, 7 ,8,9,10. When the size of the sliding window = 6, the sliding step is 1, and the following feature regions will be generated:
Figure PCTCN2020129007-appb-000003
Figure PCTCN2020129007-appb-000003
当滑动窗口大小=7时,滑动步长为1,则生成如下的特征区域:When the size of the sliding window = 7, and the sliding step length is 1, the following feature regions are generated:
Figure PCTCN2020129007-appb-000004
Figure PCTCN2020129007-appb-000004
以此类推,会获得若干个特征区域,该若干个特征区域即为第一特征区域。By analogy, several characteristic regions will be obtained, and the several characteristic regions are the first characteristic regions.
可以理解的是,还可以通过其他方式获得该w个第一特征区域,例如通过AC(Aho–Corasick,AC)自动机算法或者前缀树算法获得该w个第一特征区域,具体此处不做限定。It is understandable that the w first feature regions can also be obtained in other ways, for example, the w first feature regions can be obtained by AC (Aho–Corasick, AC) automata algorithm or prefix tree algorithm, which is not specifically done here. limited.
当通过AC自动机算法或者前缀数算法获得该w个第一特征区域时,需要根据应用相关度信息构建该AC自动机算法,再通过AC自动机算法获得该w个第一特征区域,即根据应用相关度信息中的已经存在的特征区域,自动获得与之匹配的w个第一特征区域。When the w first feature regions are obtained through the AC automata algorithm or the prefix number algorithm, the AC automata algorithm needs to be constructed according to the application correlation information, and then the w first feature regions are obtained through the AC automata algorithm, that is, according to Apply the existing feature regions in the correlation information to automatically obtain w first feature regions that match it.
可以理解的是,还可以通过其他方式获得该特征区域,只需要得到大小不同的字节的集合即可,具体此处不做限定。It is understandable that the characteristic area can also be obtained in other ways, as long as a collection of bytes of different sizes is obtained, which is not specifically limited here.
需要说明的是,当不截取数据流的前K个字节信息时,可以通过处理该数据流的字节信息,得到第一特征区域。It should be noted that when the first K bytes of information of the data stream are not intercepted, the first characteristic area can be obtained by processing the byte information of the data stream.
在步骤303中,第一网络设备根据应用相关度信息和第一特征区域确定第一特征区域与对应的应用类别之间的区域相关度。In step 303, the first network device determines the regional correlation between the first characteristic region and the corresponding application category according to the application correlation information and the first characteristic region.
由图2所示实施例可知,应用相关度信息包括p个第三特征区域的区域相关度信息时,其中第三特征区域的相关度信息包括第三特征区域、第三特征区域对应的应用类别,以及第三特征区域与对应的应用类别之间的区域相关度,且当该p个第三特征区域包括w个第一特征区域中至少1个特征区域时,则第一网络设备在应用相关度信息中查找该w个第一特征区域中每个特征区域对应的区域相关度信息,例如对应的应用类别,以及与该应用类别的区域相关度,当第一特征区域中某一特征区域没有找到对应的应用类别,则该特征区域对应的区域相关度的值为0。It can be seen from the embodiment shown in FIG. 2 that when the application correlation information includes the area correlation information of p third characteristic regions, the correlation information of the third characteristic region includes the third characteristic region and the application category corresponding to the third characteristic region. , And the regional correlation between the third characteristic area and the corresponding application category, and when the p third characteristic areas include at least one characteristic area in the w first characteristic areas, the first network device is related to the application Find the area correlation information corresponding to each feature area in the w first feature areas in the degree information, such as the corresponding application category, and the area correlation with the application category. When a feature area in the first feature area does not If the corresponding application category is found, the value of the regional relevance corresponding to the characteristic region is 0.
所述第一网络设备根据所述w个第一特征区域和所述应用相关度信息确定所述第一特征区域对应的应用类别,以及所述第一特征区域与对应的应用类别之间的区域相关度;The first network device determines the application category corresponding to the first characteristic area and the area between the first characteristic area and the corresponding application category according to the w first characteristic areas and the application correlation information relativity;
所述第一网络设备基于应用类别统计与每个应用类别对应的第一特征区域的区域相关度 之和;The first network device counts the sum of the regional correlation degrees of the first characteristic area corresponding to each application category based on the application category;
所述第一网络设备基于与第一应用类别对应的第一特征区域的区域相关度之和是最大值,确定所述待检测数据对应于所述第一应用类别。The first network device determines that the data to be detected corresponds to the first application category based on that the sum of the regional correlations of the first characteristic areas corresponding to the first application category is the maximum value.
第一网络设备根据w个第一特征区域和应用相关度信息确定第一特征区域对应的应用类别,以及所述第一特征区域与对应的应用类别之间的区域相关度;并且基于应用类别统计与每个应用类别对应的第一特征区域的区域相关度之和,从而得到各个应用类别分别对应的总区域相关度,例如下表3a所示,The first network device determines the application category corresponding to the first feature area and the area correlation between the first feature area and the corresponding application category according to the w first feature areas and application correlation information; and makes statistics based on the application category The sum of the regional relevance of the first feature area corresponding to each application category, so as to obtain the total regional relevance corresponding to each application category, for example, as shown in Table 3a below,
Figure PCTCN2020129007-appb-000005
Figure PCTCN2020129007-appb-000005
表3aTable 3a
以表3a中所属应用类别“app1”为例进行说明,该app1对应的特征区域为“65,6a,77,8e,67,6b,45,33”,和“33,11,96,5e,6b,3e,45,33”,且这两个特征区域对应的区域相关度分别为0.4和0.15,则统计该“app1”对应的区域相关度之和为0.55,应理解,该情况下,第二网络设备可以对各应用类别分别对应的区域相关度分别做统计和计算,得到各个应用类别分别对应的总区域相关度。Take the application category "app1" in Table 3a as an example. The feature area corresponding to this app1 is "65, 6a, 77, 8e, 67, 6b, 45, 33", and "33, 11, 96, 5e, 6b, 3e, 45, 33", and the regional correlations corresponding to these two feature regions are 0.4 and 0.15, respectively, then the sum of the regional correlations corresponding to the "app1" is calculated to be 0.55. It should be understood that in this case, the first Second, the network device can perform statistics and calculations on the regional correlation degrees corresponding to each application category to obtain the total regional correlation degrees corresponding to each application category.
例如,该第一特征区域对应不同应用类别的总区域相关度如下表3b所示,For example, the total area relevance of the first feature area corresponding to different application categories is shown in Table 3b below.
应用类别Application category 对应的总区域相关度Corresponding total area correlation
app1app1 0.850.85
app2app2 0.60.6
…….……. ……...
表3bTable 3b
在步骤304中,第一网络设备基于与第一应用类别对应的第一特征区域的区域相关度之和是最大值,确定待检测数据对应于第一应用类别。In step 304, the first network device determines that the data to be detected corresponds to the first application category based on that the sum of the regional correlations of the first characteristic regions corresponding to the first application category is the maximum value.
第一网络设备统计得到第一特征区域对应不同应用类别的总区域相关度之后,第一网络设备根据第一特征区域的区域相关度之和是最大值,确定第一特征区域对应的待检测数据对应于第一应用类别。After the first network device obtains statistics of the total area relevance of the first feature area corresponding to different application categories, the first network device determines the data to be detected corresponding to the first feature area according to the maximum value of the sum of the area relevance of the first feature area Corresponds to the first application category.
可选地,该第一网络设备在确定了第一特征区域的区域相关度之和是最大值之后,还可以判断该第一特征区域的区域相关度之和是否高于预设阈值,若该第一特征区域的区域相关度之和高于该预设阈值,则第一网络设备确定该第一特征区域对应的应用类别为该待检测数据对应的应用类别。若该第一特征区域的区域相关度之和低于该预设阈值,则第一网络设备确定该待检测数据对应的应用类别不是该应用相关度信息中的应用类别。Optionally, after determining that the sum of the area correlations of the first characteristic area is the maximum value, the first network device may also determine whether the sum of the area correlations of the first characteristic area is higher than a preset threshold. The sum of the area correlations of the first characteristic area is higher than the preset threshold, and the first network device determines that the application category corresponding to the first characteristic area is the application category corresponding to the data to be detected. If the sum of the regional relevance of the first characteristic area is lower than the preset threshold, the first network device determines that the application category corresponding to the data to be detected is not the application category in the application relevance information.
在确定了该待检测数据对应的应用类别之后,第一网络设备可以在该第一网络设备的显示区域显示该待检测数据对应的应用类别的结果,也可以将该结果发送给其他设备,如运维人员的终端设备。After determining the application category corresponding to the data to be detected, the first network device may display the result of the application category corresponding to the data to be detected in the display area of the first network device, or send the result to other devices, such as Terminal equipment for operation and maintenance personnel.
需要说明的是,当第一网络设备确定该待检测数据对应的应用类别不是该应用相关度信息中的应用类别,则可以通过步骤201至步骤207,生成该应用类别对应的应用相关度信息。进而,可以将该应用类别对应的应用相关度信息与原有应用相关度信息整合到一起,形成更新的应用相关度信息。It should be noted that, when the first network device determines that the application category corresponding to the data to be detected is not the application category in the application relevance information, it can generate application relevance information corresponding to the application category through steps 201 to 207. Furthermore, the application relevance information corresponding to the application category can be integrated with the original application relevance information to form updated application relevance information.
在本实施例中,步骤201至步骤207,也可以由第一网络设备执行,当由第一网络设备执行时,则步骤301中,第一网络设备需要使用应用相关度信息时,直接获取应用相关度信息即可。In this embodiment, steps 201 to 207 can also be executed by the first network device. When executed by the first network device, then in step 301, when the first network device needs to use the application relevance information, it directly obtains the application Relevance information is sufficient.
本实施例中,通过截取数据流的前K个字节信息,例如截取数据流的前1024个字节信息,在数据流的前1024个字节信息中包含有IP信息,DNS信息,端口信息等等二进制数据密文信息,因为这些信息可以反应出应用类别的某些特征,因此通过这些信息的二进制数据生成应用相关度信息,进而根据该应用相关度信息识别管道数据中的报文数据对应的应用类别,提升了第一网络设备识别应用类别的准确性。In this embodiment, by intercepting the first K bytes of information of the data stream, for example, intercepting the first 1024 bytes of information of the data stream, the first 1024 bytes of information of the data stream include IP information, DNS information, and port information. And so on binary data ciphertext information, because this information can reflect certain characteristics of the application category, so the application correlation information is generated from the binary data of this information, and then the message data in the pipeline data is identified according to the application correlation information. The application category of the first network device has improved the accuracy of identifying application categories.
上面对本申请实施例中的数据处理方法进行了描述,下面对本申请实施例中的网络设备进行描述,请参阅图4,为本申请提供的网络设备的一个实施例的结构示意图。The data processing method in the embodiment of the application is described above, and the network device in the embodiment of the application is described below. Please refer to FIG. 4, which is a schematic structural diagram of an embodiment of the network device provided by this application.
获取单元401,用于获取待检测数据;The obtaining unit 401 is configured to obtain the data to be detected;
处理单元402,用于根据待检测数据得到w个第一特征区域,第一特征区域包括待检测数据中至少一个字节的数据,w为正整数;The processing unit 402 is configured to obtain w first characteristic regions according to the data to be detected, the first characteristic regions include at least one byte of data in the data to be detected, and w is a positive integer;
确定单元403,用于根据w个第一特征区域和应用相关度信息确定待检测数据对应的应用类别,应用相关度信息指示第一特征区域和应用类别之间的相关度。The determining unit 403 is configured to determine the application category corresponding to the data to be detected according to the w first feature regions and the application correlation information, and the application correlation information indicates the correlation between the first feature region and the application category.
本实施例中,网络设备各单元所执行的操作与前述图2所示实施例描述的类似,此处不再赘述。In this embodiment, the operations performed by each unit of the network device are similar to those described in the foregoing embodiment shown in FIG. 2 and will not be repeated here.
请参阅图5,为本申请提供的网络设备的另一实施例的结构示意图。Please refer to FIG. 5, which is a schematic structural diagram of another embodiment of the network device provided by this application.
获取单元501,用于获取待检测数据;The obtaining unit 501 is configured to obtain the data to be detected;
处理单元503,用于根据待检测数据得到w个第一特征区域,第一特征区域包括待检测数据中至少一个字节的数据,w为正整数;The processing unit 503 is configured to obtain w first characteristic regions according to the data to be detected, the first characteristic regions include at least one byte of data in the data to be detected, and w is a positive integer;
确定单元505,用于根据w个第一特征区域和应用相关度信息确定待检测数据对应的应用类别,应用相关度信息指示第一特征区域和应用类别之间的相关度。The determining unit 505 is configured to determine the application category corresponding to the data to be detected according to the w first feature regions and application correlation information, and the application correlation information indicates the correlation between the first feature region and the application category.
确定单元505具体用于根据w个第一特征区域和应用相关度信息确定第一特征区域对应的应用类别,以及第一特征区域与对应的应用类别之间的区域相关度;The determining unit 505 is specifically configured to determine the application category corresponding to the first feature area and the regional correlation between the first feature area and the corresponding application category according to the w first feature areas and application correlation information;
统计单元504,用于基于应用类别统计与每个应用类别对应的第一特征区域的区域相关度之和;The statistics unit 504 is configured to count the sum of the regional correlation degrees of the first feature region corresponding to each application category based on the application category;
确定单元505还用于基于与第一应用类别对应的第一特征区域的区域相关度之和是最大值,确定待检测数据对应于第一应用类别。The determining unit 505 is further configured to determine that the to-be-detected data corresponds to the first application category based on that the sum of the regional correlations of the first feature area corresponding to the first application category is the maximum value.
可选的,应用相关度信息包括p个第三特征区域的相关度信息,其中第三特征区域的相关度信息包括第三特征区域,第三特征区域对应的应用类别,以及第三特征区域与对应的应用类别之间的区域相关度;p个第三特征区域包括w个第一特征区域中至少1个特征区域。Optionally, the application correlation information includes the correlation information of p third characteristic regions, where the correlation information of the third characteristic region includes the third characteristic region, the application category corresponding to the third characteristic region, and the relationship between the third characteristic region and the third characteristic region. The regional correlation between the corresponding application categories; the p third characteristic regions include at least one characteristic region among the w first characteristic regions.
可选的,待检测数据包括至少一个报文的前K个字节;Optionally, the data to be detected includes the first K bytes of at least one message;
处理单元503具体用于对至少一个报文的前K个字节做滑动窗口处理,以得到w个第一 特征区域。The processing unit 503 is specifically configured to perform sliding window processing on the first K bytes of at least one message to obtain w first feature regions.
可选的,第一特征区域包括连续的s个字节,s为大于1的整数。Optionally, the first characteristic area includes s consecutive bytes, and s is an integer greater than 1.
可选的,获取单元501还用于获取第一数据,第一数据包括第一应用类别对应的字节数据;Optionally, the acquiring unit 501 is further configured to acquire first data, where the first data includes byte data corresponding to the first application category;
网络设备还包括:Network equipment also includes:
输入单元502,用于将第一数据输入第一模型,其中,第一模型的输出为第一应用类别;The input unit 502 is configured to input first data into a first model, where the output of the first model is the first application category;
处理单元503还用于基于第一应用类别以及第一模型得到n个第二特征区域,第二特征区域包括第一数据中q个相邻字节,n为正整数,q为正整数;The processing unit 503 is further configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
确定单元505还用于确定第二特征区域与第一应用类别的区域相关度;The determining unit 505 is further configured to determine the regional correlation between the second characteristic region and the first application category;
网络设备还包括:Network equipment also includes:
生成单元506,用于基于第二特征区域与第一应用类别的区域相关度生成应用相关度信息。The generating unit 506 is configured to generate application relevance information based on the relevance of the second characteristic area and the area of the first application category.
可选的,应用相关度信息包括第二特征区域相关度信息,第二特征区域相关度信息包括第二特征区域,第二特征区域对应的第一应用类别,第二特征区域与第一应用类别的区域相关度;Optionally, the application relevance information includes second feature area relevance information, the second feature area relevance information includes a second feature area, the first application category corresponding to the second feature area, and the second feature area and the first application category Regional relevance;
n个第二特征区域包括w个第一特征区域中至少一个第一特征区域,待检测数据对应的应用类别为第一应用类别。The n second characteristic regions include at least one first characteristic region among the w first characteristic regions, and the application category corresponding to the data to be detected is the first application category.
本实施例中,网络设备各单元所执行的操作与前述图2和图3所示实施例描述的类似,此处不再赘述。In this embodiment, the operations performed by each unit of the network device are similar to those described in the foregoing embodiments shown in FIG. 2 and FIG. 3, and will not be repeated here.
请参阅图6,为本申请提供的网络设备的另一实施例的结构示意图。Please refer to FIG. 6, which is a schematic structural diagram of another embodiment of a network device provided by this application.
获取单元601,用于获取第一数据,第一数据包括第一应用类别对应的字节数据;The acquiring unit 601 is configured to acquire first data, where the first data includes byte data corresponding to the first application category;
输入单元602,用于将第一数据输入第一模型,其中,第一模型的输出为第一应用类别;The input unit 602 is configured to input first data into a first model, where the output of the first model is the first application category;
处理单元603,用于基于第一应用类别以及第一模型得到n个第二特征区域,第二特征区域包括第一数据中q个相邻字节,n为正整数,q为正整数;The processing unit 603 is configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
确定单元604,用于确定第二特征区域与第一应用类别的区域相关度;The determining unit 604 is configured to determine the regional correlation between the second characteristic area and the first application category;
生成单元605,用于基于第二特征区域与第一应用类别的区域相关度生成应用相关度信息。The generating unit 605 is configured to generate application relevance information based on the relevance of the second characteristic area and the area of the first application category.
本实施例中,网络设备各单元所执行的操作与前述图3所示实施例描述的类似,此处不再赘述。In this embodiment, the operations performed by each unit of the network device are similar to those described in the foregoing embodiment shown in FIG. 3, and will not be repeated here.
请参阅图7,为本申请提供的网络设备的另一实施例的结构示意图。Please refer to FIG. 7, which is a schematic structural diagram of another embodiment of the network device provided by this application.
获取单元701,用于获取第一数据,第一数据包括第一应用类别对应的字节数据;The obtaining unit 701 is configured to obtain first data, where the first data includes byte data corresponding to the first application category;
输入单元702,用于将第一数据输入第一模型,其中,第一模型的输出为第一应用类别;The input unit 702 is configured to input first data into a first model, where the output of the first model is the first application category;
处理单元703,用于基于第一应用类别以及第一模型得到n个第二特征区域,第二特征区域包括第一数据中q个相邻字节,n为正整数,q为正整数;The processing unit 703 is configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is a positive integer, and q is a positive integer;
确定单元704,用于确定第二特征区域与第一应用类别的区域相关度;The determining unit 704 is configured to determine the regional correlation between the second characteristic area and the first application category;
生成单元705,用于基于第二特征区域与第一应用类别的区域相关度生成应用相关度信息。The generating unit 705 is configured to generate application relevance information based on the relevance of the second characteristic area to the area of the first application category.
可选的,应用相关度信息包括第二特征区域相关度信息,第二特征区域相关度信息包括 第二特征区域,第二特征区域对应的第一应用类别,第二特征区域与第一应用类别的区域相关度。Optionally, the application correlation information includes second characteristic area correlation information, the second characteristic area correlation information includes a second characteristic area, the first application category corresponding to the second characteristic area, and the second characteristic area and the first application category Regional relevance.
可选的,处理单元703具体用于基于第一应用类别以及第一模型得到h个第一特征值,第一特征值指示第一应用类别与第一数据中第一特征点的相关度,第一特征点包括第一数据中一个字节的数据,h为正整数;Optionally, the processing unit 703 is specifically configured to obtain h first feature values based on the first application category and the first model, where the first feature values indicate the correlation between the first application category and the first feature point in the first data, and A characteristic point includes one byte of data in the first data, and h is a positive integer;
处理单元703具体用于根据h个第一特征值得到n个第二特征区域。The processing unit 703 is specifically configured to obtain n second feature regions according to the h first feature values.
可选的,获取单元701还用于根据h个第一特征值获取z个目标特征点,目标特征点的特征值为h个第一特征值中按数值从大到小的顺序排列的前z个特征值中的一个,z为正整数,z小于或等于h的整数;Optionally, the acquiring unit 701 is further configured to acquire z target feature points according to the h first feature values, and the feature value of the target feature point is the first z of the h first feature values arranged in descending order of value. One of the eigenvalues, z is a positive integer, and z is an integer less than or equal to h;
处理单元703具体用于根据z个目标特征点得到n个第二特征区域,每个第二特征区域包含至少一个目标特征点。The processing unit 703 is specifically configured to obtain n second feature regions according to z target feature points, and each second feature region includes at least one target feature point.
可选的,第二特征区域的中点为目标特征点。Optionally, the midpoint of the second feature region is the target feature point.
可选的,n个第二特征区域包括第六特征区域和第四特征区域,若第六特征区域中的特征点和第四特征区域中的特征点重复的比例大于第一预设阈值,且第六特征区域在第一应用类别中对应的应用类别的特征区域中出现的次数大于第四特征区域在第一应用类别中对应的应用类别的特征区域中出现的次数,则网络设备还包括:Optionally, the n second feature regions include a sixth feature region and a fourth feature region, if the ratio of feature points in the sixth feature region and feature points in the fourth feature region is greater than the first preset threshold, and The number of times that the sixth characteristic area appears in the characteristic area of the corresponding application category in the first application category is greater than the number of times the fourth characteristic area appears in the characteristic area of the corresponding application category in the first application category, then the network device further includes:
处理单元703,用于删除第四特征区域的信息。The processing unit 703 is configured to delete the information of the fourth characteristic region.
可选的,n个第二特征区域包括第五特征区域,若第五特征区域在应用相关度信息中对应至少两个应用类别,则处理单元703还用于删除第五特征区域的信息。Optionally, the n second characteristic regions include a fifth characteristic region, and if the fifth characteristic region corresponds to at least two application categories in the application relevance information, the processing unit 703 is further configured to delete the information of the fifth characteristic region.
本实施例中,网络设备各单元所执行的操作与前述图2和图3所示实施例描述的类似,此处不再赘述。In this embodiment, the operations performed by each unit of the network device are similar to those described in the foregoing embodiments shown in FIG. 2 and FIG. 3, and will not be repeated here.
请参阅图8,本申请实施例中网络设备另一实施例包括:Referring to FIG. 8, another embodiment of the network device in the embodiment of the present application includes:
本申请实施例的网络设备或者媒体服务器还可以以图8中的计算机设备(或系统)的方式来实现。图8所示为本申请明实施例提供的计算机设备示意图。该计算机设备包括至少一个处理器801,通信总线802和存储器803,还可以包括至少一个通信接口804和I/O接口805。The network device or the media server in the embodiment of the present application can also be implemented in the manner of the computer device (or system) in FIG. 8. FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application. The computer device includes at least one processor 801, a communication bus 802 and a memory 803, and may also include at least one communication interface 804 and an I/O interface 805.
处理器可以是一个通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。The processor may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of this application.
通信总线可包括一通路,在上述组件之间传送信息。所述通信接口,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(Wireless Local Area NetworKs,WLAN)等。The communication bus may include a path to transfer information between the above-mentioned components. The communication interface uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, wireless access network (RAN), wireless local area network (Wireless Local Area NetworKs, WLAN), etc.
存储器可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构 形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。The memory can be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions Dynamic storage devices can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, optical disc storage ( Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be stored by a computer Any other media taken, but not limited to this. The memory can exist independently and is connected to the processor through a bus. The memory can also be integrated with the processor.
其中,所述存储器用于存储执行本申请方案的应用程序代码,并由处理器来控制执行。所述处理器用于执行所述存储器中存储的应用程序代码。Wherein, the memory is used to store application program code for executing the solution of the present application, and the processor controls the execution. The processor is configured to execute the application program code stored in the memory.
在具体实现中,处理器可以包括一个或多个CPU,每个CPU可以是一个单核(single-core)处理器,也可以是一个多核(multi-Core)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a specific implementation, the processor may include one or more CPUs, and each CPU may be a single-core processor or a multi-core processor. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
在具体实现中,作为一种实施例,该计算机设备还可以包括输入/输出(I/O)接口。例如,输出设备可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备可以是鼠标、键盘、触摸屏设备或传感设备等。In specific implementation, as an embodiment, the computer device may further include an input/output (I/O) interface. For example, the output device may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector, etc. . The input device can be a mouse, a keyboard, a touch screen device, or a sensor device.
上述的计算机设备可以是一个通用计算机设备或者是一个专用计算机设备。在具体实现中,计算机设备可以是台式机、便携式电脑、网络服务器、掌上电脑(Personal Digital Assistant,PDA)、移动手机、平板电脑、无线终端设备、通信设备、嵌入式设备或有图7中类似结构的设备。本申请实施例不限定计算机设备的类型。The above-mentioned computer equipment may be a general-purpose computer equipment or a special-purpose computer equipment. In a specific implementation, the computer equipment can be a desktop computer, a portable computer, a network server, a PDA (Personal Digital Assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, or the like in Figure 7 Structure of the equipment. The embodiments of this application do not limit the type of computer equipment.
如图1、图2或图3中的第一网络设备、第二网络设备或者终端设备,可以为图8所示的设备,存储器中存储了一个或多个软件模块。网络设备和终端设备可以通过处理器以及存储器中的程序代码来实现软件模块,完成上述实施例中网络设备或者终端设备执行的方法。The first network device, the second network device or the terminal device in FIG. 1, FIG. 2 or FIG. 3 may be the device shown in FIG. 8, and one or more software modules are stored in the memory. The network device and the terminal device can implement the software module through the processor and the program code in the memory to complete the method executed by the network device or the terminal device in the foregoing embodiment.
本实施例中,该处理器801可以执行前述图2和图3所示实施例中第一网络设备或第二网络设备所执行的操作,具体此处不再赘述。In this embodiment, the processor 801 can execute the operations performed by the first network device or the second network device in the embodiments shown in FIG. 2 and FIG. 3, and details are not described herein again.
本申请实施例还提供了一种识别应用的系统,该系统包括了第一网络设备和第二网络设备。The embodiments of the present application also provide a system for identifying applications. The system includes a first network device and a second network device.
第一网络设备用于执行如图3所示实施例中第一网络设备执行方法,具体此处不再赘述。The first network device is used to execute the method for executing the first network device in the embodiment shown in FIG. 3, and details are not described herein again.
第二网络设备用于执行如图2所示实施例中第二网络设备执行的方法,具体此处不再赘述。The second network device is used to execute the method executed by the second network device in the embodiment shown in FIG. 2, and details are not described herein again.
并且,所述第二网络设备还用于向所述第一网络设备发送应用相关度信息。In addition, the second network device is further configured to send application relevance information to the first network device.
一种可能的设计中,所述第一网络设备还用于向终端设备发送待检测数据对应的应用类别。In a possible design, the first network device is further configured to send the application category corresponding to the data to be detected to the terminal device.
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时实现上述任一方法实施例中与网络设备相关的方法流程。The embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a computer, the method process related to the network device in any of the foregoing method embodiments is implemented.
应理解,本申请以上实施例中的网络设备中提及的处理器,或者本申请上述实施例提供的处理器,可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor mentioned in the network device in the above embodiment of this application, or the processor provided in the above embodiment of this application, may be a central processing unit (CPU) or other general-purpose processors. , Digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic Devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
还应理解,本申请中以上实施例中的网络设备中的处理器的数量可以是一个,也可以是多个,可以根据实际应用场景调整,此处仅仅是示例性说明,并不作限定。本申请实施例中 的存储器的数量可以是一个,也可以是多个,可以根据实际应用场景调整,此处仅仅是示例性说明,并不作限定。It should also be understood that the number of processors in the network device in the above embodiments of the present application may be one or multiple, and may be adjusted according to actual application scenarios. This is only an exemplary description and is not limited. The number of memories in the embodiments of the present application may be one or multiple, and may be adjusted according to actual application scenarios. This is only an exemplary description and is not limited.
还应理解,本申请实施例中以上实施例中的网络设备提及的存储器或可读存储介质等,可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。It should also be understood that the memory or readable storage medium mentioned in the network device in the above embodiments in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Sexual memory both. Among them, the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), and synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) ) And direct memory bus random access memory (direct rambus RAM, DR RAM).
还需要说明的是,当网络设备包括处理器(或处理单元)与存储器时,本申请中的处理器可以是与存储器集成在一起的,也可以是处理器与存储器通过接口连接,可以根据实际应用场景调整,并不作限定。It should also be noted that when the network device includes a processor (or processing unit) and a memory, the processor in this application may be integrated with the memory, or the processor and the memory may be connected through an interface, which can be based on actual conditions. The application scenario adjustment is not limited.
本申请实施例还提供了一种计算机程序或包括计算机程序的一种计算机程序产品,该计算机程序在某一计算机上执行时,将会使所述计算机实现上述任一方法实施例中与网络设备的方法流程。The embodiments of the present application also provide a computer program or a computer program product including a computer program. When the computer program is executed on a computer, the computer will enable the computer to realize the connection with the network device in any of the above-mentioned method embodiments. Method flow.
在上述图2-图3中各个实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In each of the above-mentioned embodiments in FIG. 2 to FIG. 3, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信 连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者其他网络设备等)执行本申请图2至图6中各个实施例所述方法的全部或部分步骤。而该存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. , Including several instructions to make a computer device (which can be a personal computer, a server, or other network devices, etc.) execute all or part of the steps of the methods described in the various embodiments in Figures 2 to 6 of this application. The storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,并且,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms “first” and “second” in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, and, in order to clearly describe the technical solutions of the embodiments of this application, in this application In the embodiments of, words such as "first" and "second" are used to distinguish the same or similar items that have basically the same function and effect. Those skilled in the art can understand that words such as "first" and "second" do not limit the quantity and order of execution, and words such as "first" and "second" do not limit the difference. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a way of distinguishing objects with the same attributes in the description of the embodiments of the present application. In addition, the terms "include" and "have" and any variations of them are intended to cover non-exclusive inclusion, so that a process, method, system, product or device containing a series of units is not necessarily limited to those units, but may include Listed or inherent to these processes, methods, products, or equipment.
本申请各实施例中提供的消息/帧/信息、模块或单元等的名称仅为示例,可以使用其他名称,只要消息/帧/信息、模块或单元等的作用相同即可。The names of messages/frames/information, modules or units, etc. provided in the embodiments of the present application are only examples, and other names can be used as long as the functions of the messages/frames/information, modules or units, etc. are the same.
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本申请实施例中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;本申请中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. The singular forms of "a", "said" and "the" used in the embodiments of the present application are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that, in the description of this application, unless otherwise specified, "/" means that the objects associated before and after are in an "or" relationship, for example, A/B can mean A or B; in this application, "and" "/Or" is just an association relationship describing the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. Among them, A and B can be singular or plural.
取决于语境,如在此所使用的词语“如果”或“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the words "if" or "if" as used herein can be interpreted as "when" or "when" or "in response to determination" or "in response to detection". Similarly, depending on the context, the phrase "if determined" or "if detected (statement or event)" can be interpreted as "when determined" or "in response to determination" or "when detected (statement or event) )" or "in response to detection (statement or event)".
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实 施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

Claims (30)

  1. 一种数据处理方法,其特征在于,包括:A data processing method, characterized in that it comprises:
    第一网络设备获取待检测数据;The first network device obtains the data to be detected;
    所述第一网络设备根据所述待检测数据得到w个第一特征区域,所述第一特征区域包括所述待检测数据中至少一个字节的数据,所述w为正整数;The first network device obtains w first characteristic regions according to the data to be detected, the first characteristic regions include at least one byte of data in the data to be detected, and w is a positive integer;
    所述第一网络设备根据所述w个第一特征区域和应用相关度信息确定所述待检测数据对应的应用类别,所述应用相关度信息指示所述第一特征区域和应用类别之间的相关度。The first network device determines the application category corresponding to the data to be detected according to the w first feature regions and application correlation information, and the application correlation information indicates the difference between the first feature region and the application category. relativity.
  2. 根据权利要求1所述的方法,其特征在于,所述第一网络设备根据所述w个第一特征区域和所述应用相关度信息确定所述待检测数据对应的应用类别具体包括:The method according to claim 1, wherein the first network device determining the application category corresponding to the data to be detected according to the w first feature regions and the application correlation information specifically comprises:
    所述第一网络设备根据所述w个第一特征区域和所述应用相关度信息确定所述第一特征区域对应的应用类别,以及所述第一特征区域与对应的应用类别之间的区域相关度;The first network device determines the application category corresponding to the first characteristic area and the area between the first characteristic area and the corresponding application category according to the w first characteristic areas and the application correlation information relativity;
    所述第一网络设备基于应用类别统计与每个应用类别对应的第一特征区域的区域相关度之和;The first network device counts the sum of the regional correlation degrees of the first characteristic region corresponding to each application category based on the application category;
    所述第一网络设备基于与第一应用类别对应的第一特征区域的区域相关度之和是最大值,确定所述待检测数据对应于所述第一应用类别。The first network device determines that the data to be detected corresponds to the first application category based on that the sum of the regional correlations of the first characteristic areas corresponding to the first application category is the maximum value.
  3. 根据权利要求1或2所述的方法,其特征在于,所述应用相关度信息包括p个第三特征区域的区域相关度信息,其中所述第三特征区域的区域相关度信息包括第三特征区域,所述第三特征区域对应的应用类别,以及所述第三特征区域与所述对应的应用类别之间的区域相关度;p个所述第三特征区域包括所述w个第一特征区域中至少1个特征区域。The method according to claim 1 or 2, wherein the application relevance information includes area relevance information of p third characteristic areas, wherein the area relevance information of the third characteristic area includes a third characteristic Area, the application category corresponding to the third characteristic area, and the regional correlation between the third characteristic area and the corresponding application category; p of the third characteristic areas include the w first characteristics At least one characteristic area in the area.
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述待检测数据包括至少一个报文的前K个字节;The method according to any one of claims 1-3, wherein the data to be detected includes the first K bytes of at least one message;
    所述第一网络设备根据所述待检测数据得到w个第一特征区域,具体包括:The first network device obtains w first characteristic regions according to the data to be detected, which specifically includes:
    所述第一网络设备对所述至少一个报文的前K个字节做滑动窗口处理,以得到所述w个第一特征区域。The first network device performs sliding window processing on the first K bytes of the at least one message to obtain the w first characteristic regions.
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述第一特征区域包括连续的s个字节,所述s为大于1的整数。The method according to any one of claims 1 to 4, wherein the first characteristic area comprises s consecutive bytes, and the s is an integer greater than 1.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述第一网络设备获取待检测数据之前,所述方法还包括:The method according to any one of claims 1 to 5, wherein before the first network device obtains the data to be detected, the method further comprises:
    所述第一网络设备获取第一数据,所述第一数据包括第一应用类别对应的字节数据;Acquiring, by the first network device, first data, where the first data includes byte data corresponding to a first application category;
    所述第一网络设备将所述第一数据输入第一模型,其中,所述第一模型的输出为所述第一应用类别;The first network device inputs the first data into a first model, where the output of the first model is the first application category;
    所述第一网络设备基于所述第一应用类别以及所述第一模型得到n个第二特征区域,所述第二特征区域包括所述第一数据中q个相邻字节,所述n为正整数,所述q为正整数;The first network device obtains n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, and the n Is a positive integer, and the q is a positive integer;
    所述第一网络设备确定所述第二特征区域与所述第一应用类别的区域相关度;Determining, by the first network device, an area correlation degree between the second characteristic area and the first application category;
    所述第一网络设备基于所述第二特征区域与所述第一应用类别的区域相关度生成所述应用相关度信息。The first network device generates the application relevance information based on the relevance of the second characteristic area and the area of the first application category.
  7. 根据权利要求6所述的方法,其特征在于,所述应用相关度信息包括所述第二特 征区域相关度信息,所述第二特征区域相关度信息包括所述第二特征区域,所述第二特征区域对应的所述第一应用类别,所述第二特征区域与所述第一应用类别的区域相关度;The method according to claim 6, wherein the application correlation information includes the second characteristic region correlation information, the second characteristic region correlation information includes the second characteristic region, and the second characteristic region The first application category corresponding to the second characteristic area, and the regional correlation between the second characteristic area and the first application category;
    所述n个第二特征区域包括所述w个第一特征区域中至少一个第一特征区域,所述待检测数据对应的应用类别为所述第一应用类别。The n second characteristic regions include at least one first characteristic region among the w first characteristic regions, and the application category corresponding to the to-be-detected data is the first application category.
  8. 一种数据处理方法,其特征在于,包括:A data processing method, characterized in that it comprises:
    第二网络设备获取第一数据,所述第一数据包括第一应用类别对应的字节数据;The second network device acquires first data, where the first data includes byte data corresponding to the first application category;
    所述第二网络设备将所述第一数据输入第一模型,其中,所述第一模型的输出为所述第一应用类别;The second network device inputs the first data into a first model, wherein the output of the first model is the first application category;
    所述第二网络设备基于所述第一应用类别以及所述第一模型得到n个第二特征区域,所述第二特征区域包括所述第一数据中q个相邻字节,所述n为正整数,所述q为正整数;The second network device obtains n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, and the n Is a positive integer, and the q is a positive integer;
    所述第二网络设备确定所述第二特征区域与所述第一应用类别的区域相关度;Determining, by the second network device, an area correlation degree between the second characteristic area and the first application category;
    所述第二网络设备基于所述第二特征区域与所述第一应用类别的区域相关度生成应用相关度信息。The second network device generates application relevance information based on the relevance of the second characteristic area and the area of the first application category.
  9. 根据权利要求8所述的方法,其特征在于,所述应用相关度信息包括所述第二特征区域相关度信息,所述第二特征区域相关度信息包括所述第二特征区域,所述第二特征区域对应的所述第一应用类别,所述第二特征区域与所述第一应用类别的区域相关度。The method according to claim 8, wherein the application correlation information includes the second characteristic region correlation information, the second characteristic region correlation information includes the second characteristic region, and the second characteristic region The first application category corresponding to the two characteristic areas, and the regional correlation between the second characteristic area and the first application category.
  10. 根据权利要求8或9所述的方法,其特征在于,所述第二网络设备基于所述第一应用类别以及所述第一模型得到n个第二特征区域包括:The method according to claim 8 or 9, wherein the second network device obtaining n second characteristic regions based on the first application category and the first model comprises:
    所述第二网络设备基于所述第一应用类别以及所述第一模型得到h个第一特征值,所述第一特征值指示第一应用类别与所述第一数据中第一特征点的相关度,所述第一特征点包括所述第一数据中一个字节的数据,所述h为正整数;The second network device obtains h first feature values based on the first application category and the first model, and the first feature value indicates the difference between the first application category and the first feature point in the first data Correlation, the first feature point includes one byte of data in the first data, and the h is a positive integer;
    所述第二网络设备根据所述h个第一特征值得到n个第二特征区域。The second network device obtains n second characteristic regions according to the h first characteristic values.
  11. 根据权利要求10所述的方法,其特征在于,所述第二网络设备根据所述h个第一特征值得到n个第二特征区域包括:The method according to claim 10, wherein the obtaining, by the second network device, the n second characteristic regions according to the h first characteristic values comprises:
    所述第二网络设备根据h个第一特征值获取z个目标特征点,所述目标特征点的特征值为所述h个第一特征值中按数值从大到小的顺序排列的前z个特征值中的一个,所述z为正整数,所述z为小于或等于所述h的整数;The second network device obtains z target feature points according to the h first feature values, and the feature value of the target feature point is the first z of the h first feature values arranged in descending order of numerical value. One of the eigenvalues, the z is a positive integer, and the z is an integer less than or equal to the h;
    所述第二网络设备根据所述z个目标特征点得到所述n个第二特征区域,每个所述第二特征区域包含至少一个所述目标特征点。The second network device obtains the n second characteristic regions according to the z target characteristic points, and each of the second characteristic regions includes at least one target characteristic point.
  12. 根据权利要求11所述的方法,其特征在于,所述第二特征区域的中点为所述目标特征点。The method according to claim 11, wherein the midpoint of the second feature area is the target feature point.
  13. 根据权利要求8至12中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 8 to 12, wherein the method further comprises:
    所述n个第二特征区域包括第六特征区域和第四特征区域,若所述第六特征区域中的特征点和所述第四特征区域中的特征点重复的比例大于第一预设阈值,且所述第六特征区域在所述第一应用类别中对应的应用类别的特征区域中出现的次数大于所述第四特征区域在所述第一应用类别中对应的应用类别的特征区域中出现的次数,则所述第二网络设备删除所述第四特征区域的信息。The n second feature regions include a sixth feature region and a fourth feature region, if the ratio of the feature points in the sixth feature region and the feature points in the fourth feature region is greater than the first preset threshold , And the number of times that the sixth characteristic area appears in the characteristic area of the corresponding application category in the first application category is greater than that of the fourth characteristic area in the characteristic area of the application category corresponding to the first application category The number of occurrences, the second network device deletes the information of the fourth characteristic area.
  14. 根据权利要求8至13中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 8 to 13, wherein the method further comprises:
    所述n个第二特征区域包括第五特征区域,若所述第五特征区域在应用相关度信息中对应至少两个应用类别,则所述第二网络设备删除第五特征区域的信息。The n second characteristic regions include a fifth characteristic region, and if the fifth characteristic region corresponds to at least two application categories in the application relevance information, the second network device deletes the information of the fifth characteristic region.
  15. 一种网络设备,其特征在于,包括:A network device, characterized in that it comprises:
    获取单元,用于获取待检测数据;The obtaining unit is used to obtain the data to be detected;
    处理单元,用于根据所述待检测数据得到w个第一特征区域,所述第一特征区域包括所述待检测数据中至少一个字节的数据,所述w为正整数;A processing unit, configured to obtain w first characteristic regions according to the to-be-detected data, the first characteristic regions including at least one byte of data in the to-be-detected data, and the w is a positive integer;
    确定单元,用于根据所述w个第一特征区域和应用相关度信息确定所述待检测数据对应的应用类别,所述应用相关度信息指示所述第一特征区域和应用类别之间的相关度。The determining unit is configured to determine the application category corresponding to the data to be detected according to the w first feature regions and application correlation information, where the application correlation information indicates the correlation between the first feature region and the application category Spend.
  16. 根据权利要求15所述的网络设备,其特征在于,所述确定单元具体用于根据所述w个第一特征区域和所述应用相关度信息确定所述第一特征区域对应的应用类别,以及所述第一特征区域与对应的应用类别之间的区域相关度;The network device according to claim 15, wherein the determining unit is specifically configured to determine the application category corresponding to the first characteristic area according to the w first characteristic areas and the application correlation information, and The regional correlation between the first characteristic region and the corresponding application category;
    统计单元,用于基于应用类别统计与每个应用类别对应的第一特征区域的区域相关度之和;A statistical unit, configured to count the sum of the regional correlations of the first feature region corresponding to each application category based on the application category;
    所述确定单元还用于基于与第一应用类别对应的第一特征区域的区域相关度之和是最大值,确定所述待检测数据对应于所述第一应用类别。The determining unit is further configured to determine that the to-be-detected data corresponds to the first application category based on that the sum of the regional correlations of the first characteristic regions corresponding to the first application category is the maximum value.
  17. 根据权利要求15或16所述的网络设备,其特征在于,所述应用相关度信息包括p个第三特征区域的相关度信息,其中所述第三特征区域的相关度信息包括第三特征区域,所述第三特征区域对应的应用类别,以及所述第三特征区域与所述对应的应用类别之间的区域相关度;p个所述第三特征区域包括所述w个第一特征区域中至少1个特征区域。The network device according to claim 15 or 16, wherein the application correlation information includes the correlation information of p third characteristic regions, wherein the correlation information of the third characteristic region includes the third characteristic region , The application category corresponding to the third characteristic area, and the regional correlation between the third characteristic area and the corresponding application category; p of the third characteristic areas include the w first characteristic areas At least 1 feature area in the middle.
  18. 根据权利要求15至17任一所述的网络设备,其特征在于,所述待检测数据包括至少一个报文的前K个字节;The network device according to any one of claims 15 to 17, wherein the data to be detected includes the first K bytes of at least one message;
    所述处理单元具体用于对所述至少一个报文的前K个字节做滑动窗口处理,以得到所述w个第一特征区域。The processing unit is specifically configured to perform sliding window processing on the first K bytes of the at least one message to obtain the w first characteristic regions.
  19. 根据权利要求15至18任一所述的网络设备,其特征在于,所述第一特征区域包括连续的s个字节,所述s为大于1的整数。The network device according to any one of claims 15 to 18, wherein the first characteristic area comprises s consecutive bytes, and the s is an integer greater than 1.
  20. 根据权利要求15至19中任一项所述的网络设备,其特征在于,所述获取单元还用于获取第一数据,所述第一数据包括第一应用类别对应的字节数据;The network device according to any one of claims 15 to 19, wherein the acquiring unit is further configured to acquire first data, and the first data includes byte data corresponding to a first application category;
    所述网络设备还包括:The network device also includes:
    输入单元,用于将所述第一数据输入第一模型,其中,所述第一模型的输出为所述第一应用类别;An input unit, configured to input the first data into a first model, wherein the output of the first model is the first application category;
    所述处理单元还用于基于所述第一应用类别以及所述第一模型得到n个第二特征区域,所述第二特征区域包括所述第一数据中q个相邻字节,所述n为正整数,所述q为正整数;The processing unit is further configured to obtain n second characteristic regions based on the first application category and the first model, and the second characteristic regions include q adjacent bytes in the first data. n is a positive integer, and the q is a positive integer;
    所述确定单元还用于确定所述第二特征区域与所述第一应用类别的区域相关度;The determining unit is further configured to determine the regional correlation between the second characteristic area and the first application category;
    所述网络设备还包括:The network device also includes:
    生成单元,用于基于所述第二特征区域与所述第一应用类别的区域相关度生成所述应用相关度信息。The generating unit is configured to generate the application relevance information based on the relevance of the second characteristic area and the area of the first application category.
  21. 根据权利要求20所述的网络设备,其特征在于,所述应用相关度信息包括所述第二特征区域相关度信息,所述第二特征区域相关度信息包括所述第二特征区域,所述第 二特征区域对应的所述第一应用类别,所述第二特征区域与所述第一应用类别的区域相关度;The network device according to claim 20, wherein the application correlation information includes the second characteristic region correlation information, the second characteristic region correlation information includes the second characteristic region, and the The first application category corresponding to the second characteristic area, and the regional correlation between the second characteristic area and the first application category;
    所述n个第二特征区域包括所述w个第一特征区域中至少一个第一特征区域,所述待检测数据对应的应用类别为所述第一应用类别。The n second characteristic regions include at least one first characteristic region among the w first characteristic regions, and the application category corresponding to the to-be-detected data is the first application category.
  22. 一种网络设备,其特征在于,包括:A network device, characterized in that it comprises:
    获取单元,用于获取第一数据,所述第一数据包括第一应用类别对应的字节数据;An obtaining unit, configured to obtain first data, where the first data includes byte data corresponding to the first application category;
    输入单元,用于将所述第一数据输入第一模型,其中,所述第一模型的输出为所述第一应用类别;An input unit, configured to input the first data into a first model, wherein the output of the first model is the first application category;
    处理单元,用于基于所述第一应用类别以及所述第一模型得到n个第二特征区域,所述第二特征区域包括所述第一数据中q个相邻字节,所述n为正整数,所述q为正整数;A processing unit, configured to obtain n second characteristic regions based on the first application category and the first model, where the second characteristic regions include q adjacent bytes in the first data, where n is A positive integer, the q is a positive integer;
    确定单元,用于确定所述第二特征区域与所述第一应用类别的区域相关度;A determining unit, configured to determine the regional correlation between the second characteristic area and the first application category;
    生成单元,用于基于所述第二特征区域与所述第一应用类别的区域相关度生成应用相关度信息。The generating unit is configured to generate application relevance information based on the relevance of the second characteristic area and the area of the first application category.
  23. 根据权利要求22所述的网络设备,其特征在于,所述应用相关度信息包括所述第二特征区域相关度信息,所述第二特征区域相关度信息包括所述第二特征区域,所述第二特征区域对应的所述第一应用类别,所述第二特征区域与所述第一应用类别的区域相关度。The network device according to claim 22, wherein the application relevance information includes the second characteristic area relevance information, the second characteristic area relevance information includes the second characteristic area, and the The first application category corresponding to the second characteristic area, and the area correlation between the second characteristic area and the first application category.
  24. 根据权利要求22或23所述的网络设备,其特征在于,所述处理单元具体用于基于所述第一应用类别以及所述第一模型得到h个第一特征值,所述第一特征值指示第一应用类别与所述第一数据中第一特征点的相关度,所述第一特征点包括所述第一数据中一个字节的数据,所述h为正整数;The network device according to claim 22 or 23, wherein the processing unit is specifically configured to obtain h first eigenvalues based on the first application category and the first model, and the first eigenvalues Indicating the correlation between the first application category and the first feature point in the first data, the first feature point includes one byte of data in the first data, and the h is a positive integer;
    所述处理单元具体用于根据所述h个第一特征值得到n个第二特征区域。The processing unit is specifically configured to obtain n second characteristic regions according to the h first characteristic values.
  25. 根据权利要求24所述的网络设备,其特征在于,所述获取单元还用于根据h个第一特征值获取z个目标特征点,所述目标特征点的特征值为所述h个第一特征值中按数值从大到小的顺序排列的前z个特征值中的一个,所述z为为小于或等于所述h的整数;The network device according to claim 24, wherein the obtaining unit is further configured to obtain z target feature points according to the h first feature values, and the feature values of the target feature points are the h first feature values. One of the first z eigenvalues in the eigenvalues arranged in descending order of numerical value, where z is an integer less than or equal to h;
    所述处理单元具体用于根据所述z个目标特征点得到所述n个第二特征区域,每个所述第二特征区域包含至少一个所述目标特征点。The processing unit is specifically configured to obtain the n second characteristic regions according to the z target characteristic points, and each of the second characteristic regions includes at least one target characteristic point.
  26. 根据权利要求25所述的网络设备,其特征在于,所述第二特征区域的中点为所述目标特征点。The network device according to claim 25, wherein the midpoint of the second feature area is the target feature point.
  27. 根据权利要求22至26中任一项所述的网络设备,其特征在于,所述n个第二特征区域包括第六特征区域和第四特征区域,若所述第六特征区域中的特征点和所述第四特征区域中的特征点重复的比例大于第一预设阈值,且所述第六特征区域在所述第一应用类别中对应的应用类别的特征区域中出现的次数大于所述第四特征区域在所述第一应用类别中对应的应用类别的特征区域中出现的次数,则所述处理单元还用于删除所述第四特征区域的信息。The network device according to any one of claims 22 to 26, wherein the n second characteristic areas include a sixth characteristic area and a fourth characteristic area, if the characteristic points in the sixth characteristic area The proportion of overlapping feature points with the fourth feature region is greater than the first preset threshold, and the number of times the sixth feature region appears in the feature region of the corresponding application category in the first application category is greater than the For the number of times the fourth characteristic area appears in the characteristic area of the corresponding application category in the first application category, the processing unit is further configured to delete the information of the fourth characteristic area.
  28. 根据权利要求22至27中任一项所述的网络设备,其特征在于,所述n个第二特征区域包括第五特征区域,若所述第五特征区域在应用相关度信息中对应至少两个应用类别,则所述处理单元还用于删除第五特征区域的信息。The network device according to any one of claims 22 to 27, wherein the n second characteristic regions comprise a fifth characteristic region, and if the fifth characteristic region corresponds to at least two characteristic regions in the application correlation information If there is an application category, the processing unit is also used to delete the information of the fifth characteristic area.
  29. 一种网络设备,其特征在于,包括:A network device, characterized in that it comprises:
    至少一个处理器和存储器,所述存储器存储了程序代码,所述处理器调用所述程序代码以执行如权利要求1至7中任一项所述的方法。At least one processor and a memory, the memory storing program code, and the processor calls the program code to execute the method according to any one of claims 1 to 7.
  30. 一种网络设备,其特征在于,包括:A network device, characterized in that it comprises:
    至少一个处理器和存储器,所述存储器存储了程序代码,所述处理器调用所述程序代码以执行如权利要求8至14中任一项所述的方法。At least one processor and a memory, the memory storing program code, and the processor calls the program code to execute the method according to any one of claims 8 to 14.
PCT/CN2020/129007 2020-02-17 2020-11-16 Data processing method and device therefor WO2021164340A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010097474.8 2020-02-17
CN202010097474.8A CN113271263B (en) 2020-02-17 2020-02-17 Data processing method and equipment thereof

Publications (1)

Publication Number Publication Date
WO2021164340A1 true WO2021164340A1 (en) 2021-08-26

Family

ID=77227551

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129007 WO2021164340A1 (en) 2020-02-17 2020-11-16 Data processing method and device therefor

Country Status (2)

Country Link
CN (1) CN113271263B (en)
WO (1) WO2021164340A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115374130A (en) * 2022-10-26 2022-11-22 中科三清科技有限公司 Atmospheric pollution historical data storage method and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101202652A (en) * 2006-12-15 2008-06-18 北京大学 Device for classifying and recognizing network application flow quantity and method thereof
CN105323117A (en) * 2014-08-04 2016-02-10 中国电信股份有限公司 Application identification method, application identification device, application identification system and application server
CN107181736A (en) * 2017-04-21 2017-09-19 湖北微源卓越科技有限公司 Based on 7 layers of network data packet classification method applied and system
US10333664B1 (en) * 2016-09-19 2019-06-25 Sprint Spectrum L.P. Systems and methods for dynamically selecting wireless devices for uplink (UL) multiple-input-multiple-output (MIMO) pairing
CN110708215A (en) * 2019-10-10 2020-01-17 深圳市网心科技有限公司 Deep packet inspection rule base generation method and device, network equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8738906B1 (en) * 2011-11-30 2014-05-27 Juniper Networks, Inc. Traffic classification and control on a network node
CN103763320B (en) * 2014-01-21 2017-01-25 中国联合网络通信集团有限公司 Method and system for merging flow records
CN104144089B (en) * 2014-08-06 2017-06-16 山东大学 It is a kind of that flow knowledge method for distinguishing is carried out based on BP neural network
CN109951357A (en) * 2019-03-18 2019-06-28 西安电子科技大学 Network application recognition methods based on multilayer neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101202652A (en) * 2006-12-15 2008-06-18 北京大学 Device for classifying and recognizing network application flow quantity and method thereof
CN105323117A (en) * 2014-08-04 2016-02-10 中国电信股份有限公司 Application identification method, application identification device, application identification system and application server
US10333664B1 (en) * 2016-09-19 2019-06-25 Sprint Spectrum L.P. Systems and methods for dynamically selecting wireless devices for uplink (UL) multiple-input-multiple-output (MIMO) pairing
CN107181736A (en) * 2017-04-21 2017-09-19 湖北微源卓越科技有限公司 Based on 7 layers of network data packet classification method applied and system
CN110708215A (en) * 2019-10-10 2020-01-17 深圳市网心科技有限公司 Deep packet inspection rule base generation method and device, network equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115374130A (en) * 2022-10-26 2022-11-22 中科三清科技有限公司 Atmospheric pollution historical data storage method and medium
CN115374130B (en) * 2022-10-26 2022-12-20 中科三清科技有限公司 Atmospheric pollution historical data storage method and medium

Also Published As

Publication number Publication date
CN113271263B (en) 2023-01-06
CN113271263A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
US11075853B2 (en) Resource prioritization and communication-channel establishment
US10812358B2 (en) Performance-based content delivery
US10027739B1 (en) Performance-based content delivery
WO2019169928A1 (en) Traffic detection method and traffic detection device
CN112640381B (en) Method and system for detecting undesirable behaviors of internet of things equipment
WO2018133573A1 (en) Method and device for analyzing service survivability
WO2021051917A1 (en) Artificial intelligence (ai) model evaluation method and system, and device
WO2021052162A1 (en) Network parameter configuration method and apparatus, computer device, and storage medium
TWI640177B (en) Data delivery method and system in software defined network
WO2020228527A1 (en) Data stream classification method and message forwarding device
US20190313267A1 (en) Visualization of personalized quality of experience regarding mobile network
US11483177B2 (en) Dynamic intelligent analytics VPN instantiation and/or aggregation employing secured access to the cloud network device
WO2021164340A1 (en) Data processing method and device therefor
WO2020258982A1 (en) Method and system for analyzing security log of base station, and computer-readable storage medium
EP3972315B1 (en) Network device identification
WO2019209503A1 (en) Unsupervised anomaly detection for identifying anomalies in data
US20190342180A1 (en) System and method for providing a dynamic comparative network health analysis of a network environment
US20200187023A1 (en) Service type identification systems and methods for optimizing local area networks
CN106789437B (en) Message processing method, forwarding method, related device and packet loss rate measuring method
JPWO2015182629A1 (en) Monitoring system, monitoring device and monitoring program
CN110730191A (en) Intent-oriented OSI seven-layer network protocol model based on data, information and knowledge objects
CN102546548B (en) Method and device for recognizing layer protocol
US20230252980A1 (en) Multi-channel conversation processing
CN105357129A (en) Service awareness system and method based on software defined network
WO2018019018A1 (en) Distribution policy generating method and device, and network optimization system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919483

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20919483

Country of ref document: EP

Kind code of ref document: A1