CN111355628A - Model training method, business recognition device and electronic device - Google Patents
Model training method, business recognition device and electronic device Download PDFInfo
- Publication number
- CN111355628A CN111355628A CN202010089667.9A CN202010089667A CN111355628A CN 111355628 A CN111355628 A CN 111355628A CN 202010089667 A CN202010089667 A CN 202010089667A CN 111355628 A CN111355628 A CN 111355628A
- Authority
- CN
- China
- Prior art keywords
- model
- application program
- pcap file
- data
- target application
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/028—Capturing of monitoring data by filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/34—Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Stored Programmes (AREA)
Abstract
A model training method, a business identification method, a device and an electronic device can download and install a target application program from a software downloading platform based on an acquired target application program list; simulating the operation of a human on a target application program, and capturing a code stream generated in the operation process to obtain a pcap file; model training data with labels of business types are obtained based on a pcap file, a business recognition model is trained according to the data, the model can learn the relation between the model training data and the business types through the training, the pcap file is obtained by capturing and packing code streams generated by the operation of a simulator, the simulated operation is known, and the pcap file contains information reflecting the operation of a person on an application interface, so that the model specifically learns the relation between the business types operated by the person on the application interface and the pcap file generated by the operation, namely the model has the recognition effect on user behaviors in an application program.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a model training method, a service identification device, and an electronic device.
Background
Conventional traffic analysis distinguishes different services based on port numbers of a transport layer, and classifies and counts traffic by identifying the port numbers. However, as the demand for mobile internet contents rapidly increases, the majority of traffic of a mobile data network is occupied by the HTTP and P2P-based minority traffic, and the port number-based traffic identification technology cannot effectively identify the minority traffic.
Deep Packet Inspection (DPI) technology is to further probe the data application layer based on the traditional service identification based on IP five-tuple (source IP address, source port number, destination IP address, destination port number and bearer protocol). A DPI technology is adopted to identify data flow services, a flow characteristic library needs to be established, and data are identified as corresponding services based on the flow characteristic library.
At present, a service identification technology based on DPI cannot deeply analyze application content, where deep analysis of application content refers to identification of user behavior in an application. The service identification technology based on the DPI can only analyze a subclass of an application, a user can know information of a specific name and a version number of an app when using the app at a certain time point, but what content the user operates in the app generates specific behaviors, the existing technology cannot identify the specific behaviors, and the limitation of the DPI analysis technology is also the limitation.
Disclosure of Invention
The embodiment of the application provides a model training method, a business identification device and an electronic device, and identification of user behaviors in an application can be improved.
A first aspect of an embodiment of the present application provides a model training method, including:
acquiring a target application program list, and automatically downloading a target application program from a software downloading platform according to information in the target application program list;
installing the downloaded target application program;
simulating the operation of a human on the target application program, and capturing a code stream generated in the operation process to obtain a pcap file, wherein the pcap file comprises the name of the target application program;
obtaining model training data with labels based on the pcap file, training a preset business recognition model by using the model training data, and obtaining a trained business recognition model, wherein the labels of the model training data are the business types of the operations corresponding to the pcap file.
A second aspect of the embodiments of the present application provides a service identification method, which is implemented based on a service identification model that is trained and completed in the first aspect of the embodiments of the present application, and the service identification method includes:
if the operation of a user on the application program is detected, capturing a code stream generated by the operation to obtain a to-be-used pcap file of the application program, wherein the to-be-used pcap file comprises the name of the application program;
obtaining data to be identified of the business identification model based on the pcap file to be used, and identifying the data to be identified based on the business identification model;
and determining the service type corresponding to the operation of the application program based on the identification result of the service identification model.
A third aspect of the embodiments of the present application provides a model training apparatus, including:
the automatic acquisition module is used for acquiring a target application program list and automatically downloading the target application program from a software downloading platform according to the information in the target application program list;
the automatic installation module is used for installing the downloaded target application program;
the automatic simulation operation module is used for simulating the operation of a human on the target application program, and capturing a code stream generated in the operation process to obtain a pcap file, wherein the pcap file comprises the name of the target application program;
and the deep learning module is used for obtaining model training data with labels based on the pcap file, training a preset business recognition model by using the model training data, and obtaining a trained business recognition model, wherein the labels of the model training data are the business types of the operation corresponding to the pcap file.
A fourth aspect of the embodiments of the present application provides an electronic apparatus, including: the device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method of the first aspect of the embodiment of the present application.
A fifth aspect of the present embodiment provides a service identification apparatus, where the service identification apparatus is implemented based on a service identification model that has been trained in the first aspect of the present embodiment, and the method includes:
the packet capturing module is used for capturing a code stream generated by operation to obtain a to-be-used pcap file of the application program if the operation of a user on the application program is detected, wherein the to-be-used pcap file comprises the name of the application program;
the intelligent identification module is used for obtaining data to be identified of the business identification model based on the pcap file to be used and identifying the data to be identified based on the business identification model;
and the determining module is used for determining the service type corresponding to the operation of the application program by using the recognition result based on the service recognition model.
A sixth aspect of the embodiments of the present application provides an electronic apparatus, including: the device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method provided by the second aspect of the embodiment of the present application.
The embodiment of the application provides a model training method, a business identification device and an electronic device, wherein a target application program is automatically downloaded from a software downloading platform according to information in a target application program list by acquiring the target application program list; installing the downloaded target application program; simulating the operation of a human on a target application program, and capturing a code stream generated in the operation process to obtain a pcap file; model training data with labels of business types are obtained based on the pcap file, a preset business recognition model is trained by the model training data, a trained business recognition model can be obtained, the model can learn the relation between the model training data based on the pcap file and the business type through the training, the pcap file is obtained by capturing the code stream generated by the simulation person operating the application program, and the simulated operation is known, and the pcap file contains the information reflecting the operation of the person on the application program interface, therefore, the trained model specifically learns the relationship between the service type of the operation of the person on the application program interface and the pcap file generated by the operation through training, that is, compared with the prior art, the service identification model trained by the embodiment can improve the identification effect of the user behavior in the application program.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of an electronic device provided in the present application;
FIG. 2 is a schematic flow chart of a model training method according to a first embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a refinement step of step 201 in FIG. 2;
fig. 4 is a schematic flowchart of a service identification method according to a second embodiment of the present application;
fig. 5 is a schematic diagram of a service identification system according to a second embodiment of the present application;
FIG. 6 is a schematic view of another service identification system in a second embodiment of the present application;
FIG. 7 is a schematic structural diagram of a model training apparatus according to a third embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to a third embodiment of the present application;
fig. 9 is a schematic structural diagram of a service identification apparatus according to a fourth embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
Detailed Description
In order to make the objects, features and advantages of the present invention more apparent and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 shows a block diagram of an electronic device. The model training method and the service identification method provided by the embodiment of the invention can be applied to the electronic device 10 shown in fig. 1, and the electronic device 10 includes but is not limited to: the mobile terminal is a smart phone, a notebook, a wearable smart device and the like, the fixed terminal is a desktop computer and a smart television, and the server and the like.
As shown in fig. 1, the electronic device 10 includes a memory 101, a memory controller 102, one or more processors 103 (only one of which is shown), a peripheral interface 104, and a touch screen 105. These components communicate with each other via one or more communication buses/signal lines 106.
It is to be understood that the structure shown in fig. 1 is merely an illustration and is not intended to limit the structure of the electronic device. The electronic device 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
The memory 101 may be used to store software programs and modules, such as program instructions/modules corresponding to the model training method and the service identification method and the electronic device in the embodiments of the present invention, and the processor 103 executes various functional applications and data processing, such as implementing the model training method and the service identification method described above, by executing the software programs and modules stored in the memory 101.
The peripheral interface 104 couples various input/output devices to the CPU and to the memory 101. The processor 103 executes various software, instructions within the memory 101 to perform various functions of the electronic device 10 and to perform data processing.
In some embodiments, the peripheral interface 104, the processor 103, and the memory controller 102 may be implemented in a single chip. In other examples, they may be implemented separately from the individual chips.
The touch screen 105 provides both an output and an input interface between the electronic device and the user. In particular, the touch screen 105 displays video output to the user, the content of which may include text, graphics, video, and any combination thereof. Some of the output results of the touch screen 105 correspond to some user interface objects. The touch screen 105 also receives user inputs, such as user clicks, swipes, and other gesture operations, so that the user interface objects respond to these user inputs. The technique of detecting user input may be based on resistive, capacitive, or any other possible touch detection technique. Specific examples of touch screen 105 display units include, but are not limited to, liquid crystal displays or light emitting polymer displays.
The model training method and the service identification method in the embodiment of the invention are described based on the electronic device.
The first embodiment:
the embodiment provides a model training method, and referring to fig. 2, the method includes:
optionally, in this embodiment, the information included in the target application level list includes but is not limited to: name, version number, download address, and download amount of the target application, and so on.
The software downloading platform in this embodiment includes, but is not limited to, a platform on a web page, or a platform on an intelligent mobile terminal, such as an application mall.
Further, referring to fig. 3, the step 201 of acquiring the target application list specifically includes the following steps:
301, accessing a software downloading platform according to a preset time interval, and collecting the latest relevant information of an application program on the software downloading platform;
In this embodiment, the preset time interval may be set according to an actual situation, for example, the frequency setting of updating the service identification model according to needs, optionally, the preset time interval may be set to any time interval, and this embodiment does not limit this, for example, the preset time interval is set to 1 month, 15 days, and the like.
Optionally, the latest relevant information of the application program may be understood as relevant information of the application program at the current time on the software downloading platform, where the relevant information includes, but is not limited to, information of an application name, a downloading amount, a current version number, a downloading address, and the like of the application program.
Optionally, when collecting in step 301, the latest relevant information of all the applications on the platform may be obtained, and the relevant information of a specific application on the platform may also be obtained, for example, the information of the applications under a specific category.
Taking an application mall as an example, the application programs on the application mall are classified into application categories, such as shopping, reading, social contact, news, videos, travel, tools, music, and the like, in step 301, the application mall can be accessed according to a preset time interval to obtain specific categories of the application programs to be collected, and latest relevant information of the application programs under the specific categories is collected on the application mall.
Optionally, in this embodiment, after comparing the collected information about the application program with the information about the target application program in the target application program list obtained last time, if it is determined that there is information that is not collected in the target application program list (for example, an application program that is not collected) in the collected information about the application program, or a version number in the collected information about the application program is changed, the target application program list is updated according to the changed situations, for example, the application program that is not collected is added to the target application program list, and the information about the target application program with the changed version number in the target application program list is updated.
In this embodiment, the target application list includes contents including, but not limited to, an application name, a version number, a download amount, and a download address. In one specific example, the list of target applications is specifically shown in table 1.
TABLE 1
Optionally, the automatically downloading the target application from the software downloading platform according to the information in the target application list may specifically be: and automatically downloading the target application program from the software downloading platform according to the downloading address or the application name in the target application program list.
Optionally, in this embodiment, the crawler system may obtain the target application program list through the software downloading platform at preset time intervals, and go to the software downloading platform, such as an application store, to download the installation package of the target application program according to the target application program list.
Optionally, in this embodiment, the downloaded installation packages may be stored in a specific storage space, and an installation package directory may be generated corresponding to the downloaded installation packages, in an example, the directory format is as follows:
log file stores the app name contained in the current directory.
in this embodiment, the target application downloaded in step 201 may be installed by using an automatic application installation system, and optionally, in step 202, the downloaded target application may be specifically installed on a mobile phone simulator, and the mobile phone simulator may be installed on any device capable of operating the mobile phone simulator, such as a desktop computer and a notebook computer, which is not limited in this embodiment. Of course, in this embodiment, the target application may also be directly installed on an intelligent mobile terminal such as a mobile phone.
in step 203, the operation of the target application program by the human is simulated by running the code, and the operation is known in advance, so that the corresponding service type is known. Including but not limited to various operations at the application interface of the target application and startup and shutdown operations for the target application. Optionally, the simulated operation of the target application program is an operation of an application interface of the target application program. Optionally, the operation on the target application includes, but is not limited to, a click operation.
Optionally, capturing a code stream generated in an operation process to obtain a pcap file includes: and (3) performing (network port) packet capturing operation on the code stream generated by the operation aiming at the target application program in the background to obtain a pcap file, and writing the name of the target application program into the file name of the pcap file. In this embodiment, the pcap file obtained by the bale plucking is often the pcap file interacted between the client and the server.
It is understood that the above-mentioned human simulator operates the target application program, and actually simulates normal use of the application program.
After the packet grabbing is finished, the obtained pcap file can be saved as a file named as 'application name, pcap'. Optionally, the pcap file of each target application obtained by the packet capturing may be stored in a specific storage area, and a directory of the pcap file is generated correspondingly, in an example, the format of the directory may be as follows:
log file stores the app name contained in the current directory.
Optionally, in this embodiment, after obtaining the pcap file of the target application program, the method further includes: the target application is uninstalled.
And 204, obtaining model training data with labels based on the pcap file, training a preset business recognition model by using the model training data, and obtaining a trained business recognition model, wherein the labels of the model training data are the business types of the operations corresponding to the pcap file.
In this embodiment, based on different processing modes of a code stream packet (pcap file), service type automatic identification is mainly performed in two modes, namely code stream identification and predefined feature identification, where models to be trained are different.
For the code stream identification scheme, the code stream identification does not carry out unpacking and manual analysis on pcap data, but trains a convolutional neural network model, directly inputs the code stream into a pre-trained model, and directly outputs service classification by the model.
Optionally, in an embodiment, obtaining model training data with a label based on the pcap file, training a preset service recognition model with the model training data, and obtaining a trained service recognition model includes: and training a preset convolutional neural network model serving as a service recognition model by using the model training data to obtain the trained preset convolutional neural network model.
In this embodiment, the trained preset convolutional neural network model learns the relationship between the code stream generated by the operation and the service type corresponding to the operation, and in the subsequent actual recognition process, the preset convolutional neural network model can directly analyze the input pcap file and output a classification result.
For the scheme of predefined feature identification, the pcap file needs to be introduced into a DPI program, and a field in the pcap file that can be used for identifying the service type is parsed by the DPI program, where optionally, the field used for identifying the service type in this embodiment includes but is not limited to: host, uri, user _ agent, refer, xonlinehost, server _ name, and ip/ip + port, among others. The invention uses the identified field information to identify the service type mainly through two modes.
1) By rule model identification
For the identification scheme, training data with service type labels needs to be prepared in advance, a rule model is obtained based on the corresponding relation between the field information obtained by analyzing the pcap file and the service type, and the service type corresponding to the operation is determined according to the rule model and the field information corresponding to the actual operation in the subsequent actual identification process.
Model training data with labels are obtained based on the pcap file, a preset business recognition model is trained according to the model training data, and the obtained trained business recognition model comprises the following steps: analyzing the pcap file to obtain field information used for identifying the service type in the pcap file, taking the field information as model training data, taking the service type of operation corresponding to the pcap file as a label of the model training data, training the field information and service type corresponding relation model which is taken as a service identification model by using the model training data, and obtaining the trained field information and service type corresponding relation model.
Optionally, after the model of correspondence between field information and service type is trained, the correspondence between each field information and service type may be obtained, and in the model of correspondence, one service type may correspond to multiple pieces of field information.
2) Machine learning algorithm model identification
In the scheme, training data with service type labels needs to be prepared in advance, and a machine learning algorithm model is trained according to the corresponding relation between field information and service types obtained by analyzing the pcap file. In the subsequent actual identification process, the service type corresponding to the operation can be determined according to the model and the field information corresponding to the actual operation.
Optionally, model training data with a label is obtained based on the pcap file, a preset service recognition model is trained according to the model training data, and obtaining a trained service recognition model includes: analyzing the pcap file to obtain field information used for identifying the service type in the pcap file, taking the field information as model training data, taking the service type of operation corresponding to the pcap file as a label of the model training data, and training a preset machine learning algorithm model serving as a service identification model by using the model training data to obtain a trained preset machine learning algorithm model;
in this embodiment, the preset machine learning algorithm model includes, but is not limited to, a SVM/bayesian network/gradient spanning tree and other machine learning algorithm models.
The embodiment provides a model training method, which includes the steps that a target application program list is obtained, and the target application program is automatically downloaded from a software downloading platform according to information in the target application program list; installing the downloaded target application program; simulating the operation of a human on a target application program, and capturing a code stream generated in the operation process to obtain a pcap file; model training data with labels of business types are obtained based on the pcap file, a preset business recognition model is trained by the model training data, a trained business recognition model can be obtained, the model can learn the relation between the model training data based on the pcap file and the business type through the training, the pcap file is obtained by capturing the code stream generated by the simulation person operating the application program, and the simulated operation is known, and the pcap file contains the information reflecting the operation of the person on the application program interface, therefore, the trained model specifically learns the relationship between the service type of the operation of the person on the application program interface and the pcap file generated by the operation through training, that is, compared with the prior art, the service identification model trained by the embodiment can improve the identification effect of the user behavior in the application program.
Second embodiment:
the present embodiment provides a service identification method, which is implemented based on a service identification model trained in the first embodiment, and with reference to fig. 4, the service identification method includes:
in this embodiment, the application may be any application that has been installed on the electronic device. Before step 401, the electronic device may continuously monitor the user's operation of the application installed thereon to determine whether the user operates a certain application. Alternatively, the operation in step 401 may be any operation on an application interface of the application program and an operation of starting and closing the application program.
Optionally, the detecting that the user operates the application program, capturing a code stream generated by the operation, and obtaining a to-be-used pcap file of the application program includes: and when the operation of the user on the application interface of the application program is detected, capturing the code stream generated by the operation to obtain the to-be-used pcap file of the application program. Optionally, in this embodiment, the code stream in step 401 may also be captured by the crawler system, so as to obtain the pcap file.
Optionally, before capturing the code stream generated by the operation and obtaining the pcap file to be used by the application program, the method further includes: and judging whether the code stream generated by the operation belongs to the data of the https protocol, if so, capturing the code stream generated by the operation to obtain a to-be-used pcap file of the application program. Otherwise, identifying the service type corresponding to the operation according to the DPI identification technology in the prior art.
Optionally, in this embodiment, when the code stream is captured to obtain the pcap file, the name of the pcap file may be saved as an "application name.
402, obtaining data to be identified of a service identification model based on a pcap file to be used, and identifying the data to be identified based on the service identification model;
and step 403, determining a service type corresponding to the operation of the application program based on the recognition result of the service recognition model.
Optionally, in this embodiment, based on that there are three specific models in the first embodiment, the specific scheme for identifying the service type in step 402 at least includes the following three types:
the first method comprises the following steps: the service identification model is a preset convolutional neural network model;
in the scheme, the data to be recognized of the service recognition model is obtained based on the pcap file to be used, and the recognizing of the data to be recognized based on the service recognition model comprises the following steps: taking the pcap file to be used as data to be identified of a preset convolutional neural network model, and inputting the data to be identified into the preset convolutional neural network model to identify the service type of the data to be identified;
in this scheme, the preset convolutional neural network model is the trained model in the first embodiment. The preset convolutional neural network model can be added into the existing DPI identification scheme as a newly added intelligent identification server, and optionally, a crawler server can be also arranged in the existing DPI identification scheme to grab code streams generated by operation.
Referring to fig. 5, fig. 5 shows a new service identification system obtained by adding an intelligent identification server with a preset convolutional neural network model to a system of DPI identification technology. The arrows in fig. 5 represent the flow of data. In the system of fig. 5, an original code stream (pcap file may be obtained) generated by a crawling operation is captured by a crawler system (not shown in the figure), the original code stream (pcap file) is transmitted to an intelligent identification server, the intelligent identification server identifies the service type of the code stream through a preset convolutional neural network model, an identification result is output to merge equipment (integration equipment), the integration equipment integrates the identification result, then data is transmitted to 2file equipment, and the 2file equipment outputs the data to xdr (external data representation) equipment according to requirements.
And the second method comprises the following steps: the service identification model is a preset machine learning algorithm model, the data to be identified of the service identification model is obtained based on the pcap file to be used, and the identification of the data to be identified based on the service identification model comprises the following steps: analyzing the to-be-used pcap file to obtain field information for identifying the service type in the to-be-used pcap file, using the field information as to-be-identified data, and inputting the to-be-identified data into a preset machine learning algorithm model to identify the service type of the to-be-identified data; the machine learning algorithm model can be a machine learning algorithm model based on an SVM/Bayesian network/gradient lifting tree and the like.
Referring to fig. 6, fig. 6 shows a new service identification system obtained by adding an intelligent identification server with a preset machine learning algorithm model to a DPI identification system. The arrows in fig. 6 represent the flow of data. In the system of fig. 6, an original code stream (a pcap file may be obtained) generated by a crawling operation is captured by a crawler system (not shown in the figure), the original code stream (i.e., the pcap file) is transmitted to a dpi device, the dpi device analyzes the code stream, analyzes field information for identifying a service type, transmits the field information to an intelligent identification server, the intelligent identification server identifies the service type of the field information through a preset machine learning algorithm model, outputs an identification result to a merge device (an integration device), the integration device integrates the identification result, then transmits data to a 2file device, and the 2file device outputs the data to an xdr device according to requirements.
And the third is that: the service identification model is a corresponding relation model of field information and service types, data to be identified of the service identification model is obtained based on the pcap file to be used, and the identification of the data to be identified based on the service identification model comprises the following steps: and analyzing the to-be-used pcap file to obtain field information for identifying the service type in the to-be-used pcap file, and matching the service type corresponding to the field information of the to-be-used pcap file from the field information and service type corresponding relation model to serve as an identification result of the service identification model.
Also taking fig. 6 as an example, fig. 6 shows a new service identification scheme obtained after an intelligent identification server having a model of correspondence between field information and service types is added to the DPI identification scheme. The arrows in fig. 6 represent the flow of data. In the system of fig. 6, an original code stream (pcap file may be obtained) generated by a crawling operation is captured by a crawler system (not shown in fig. 6), the original code stream (pcap file) is transmitted to a dpi device, the dpi device analyzes the code stream, analyzes field information for identifying a service type, transmits the field information to an intelligent identification server, the intelligent identification server identifies the service type of the field information through a correspondence model between the field information and the service type, outputs the identification result to a merge device (integration device), the integration device integrates the identification result, then transmits data to a 2file device, and the 2file device outputs the data to xdr as required.
Optionally, a test certificate may be extracted in this embodiment.
For example, randomly extracting 100 high-download application apps from an application download platform (or a target application program list) to a dial-up test mobile phone, then installing, using and capturing a data packet for the 100 apps to obtain a pcap file, and then uninstalling the pcap file; the pcap file is identified by the service type identification method, and the dial testing time is recorded.
And (3) data comparison:
1. extracting information of the applications participating in the dial-up test from the existing network DPI environment, wherein the information includes but is not limited to start time, users, total flow and service types;
2. extracting information of applications participating in dial-up test from the intelligent service identification environment (using the service identification method of the embodiment), including but not limited to start time, users, total traffic and service type;
3. and comparing the service conditions respectively extracted from the DPI environment and the intelligent service identification environment by combining the dial-up test condition.
Sampling test:
randomly extracting 100 high-download application apps, tracking and testing the mobile phone through the DPI equipment, then installing the 100 apps, using and capturing a data packet to a pcap file, and then unloading the apps; the pcap file is identified by the service type identification method, and the dial testing time is recorded.
The test mode adopts a random sampling mode and a system sampling mode:
1. random sampling
And according to the dial testing records, randomly extracting 10 APP records, and inquiring the identification conditions of the APPs from the intelligent service identification system.
2. System sampling
Classifying the randomly extracted 100 applications according to a certain rule (such as business major classes), extracting 2-5 applications in each class (according to the number of the classes), and inquiring the identification conditions of the APPs from the intelligent business identification system.
The embodiment provides a service identification method, which includes that when a user operates an application program, code streams generated by the operation are captured to obtain a to-be-used pcap file of the application program; obtaining data to be identified of a business identification model based on the pcap file to be used, and identifying the data to be identified based on the business identification model; the identification result of the service identification model can be obtained, and the identification result is the service type corresponding to the operation of the application program, so that the embodiment can realize the identification of the user behavior in the application program. Although in the prior art, when the behavior data of the user needs to be known in the business production, the data can be collected by adopting a front-end point-burying mode. However, the manual spot burying engineering amount is very large, mistakes are easy to make carelessly, the engineering is painful for many engineers, the development period is long, time and labor are consumed, and many small-scale companies do not have the spot burying capability. A large amount of manpower and time are consumed in the early stage of acquiring the user behavior data in a point burying mode, and the business identification method of the embodiment is based on a mode of automatically identifying the user behavior through machine learning, so that the research and development cost is saved, the business identification efficiency is improved, and the method is more advanced and efficient.
Further, by adopting the scheme of the embodiment, the service identification can be performed on the data in the https protocol which cannot be unpacked and analyzed in the existing DPI-based service identification technology, and the practicability of the service identification method in the embodiment is improved.
The third embodiment:
referring to fig. 7, the present embodiment provides a model training apparatus, including:
an automatic obtaining module 701, configured to obtain a target application list, and automatically download a target application from a software downloading platform according to information in the target application list;
an automatic installation module 702, configured to install the downloaded target application;
the automatic simulation operation module 703 is configured to simulate a human operation on a target application program, and capture a code stream generated in the operation process to obtain a pcap file, where the pcap file includes a name of the target application program;
and the deep learning module 704 is configured to obtain model training data with a label based on the pcap file, train a preset service recognition model with the model training data, and obtain a trained service recognition model, where the label of the model training data is a service type of an operation corresponding to the pcap file.
Further, the automatic acquisition module is specifically configured to access the software downloading platform at preset time intervals, and collect latest relevant information of the application program on the software downloading platform; and comparing the collected relevant information of the application program with the information of the target application program in the target application program list acquired last time, and updating the target application program list as the current target application program list according to the comparison result.
Further, the automatic simulation operation module 703 is configured to perform a packet capturing operation on a code stream generated by an operation on the target application program in the background to obtain a pcap file, and write the name of the target application program into the file name of the pcap file.
Further, the deep learning module 704 is configured to use the pcap file as model training data, use the service type of the operation corresponding to the pcap file as a label of the model training data, train a preset convolutional neural network model as a service identification model with the model training data, and obtain a trained preset convolutional neural network model; or analyzing the pcap file to obtain field information used for identifying the service type in the pcap file, taking the field information as model training data, taking the service type of operation corresponding to the pcap file as a label of the model training data, training a preset machine learning algorithm model serving as a service identification model by using the model training data, and obtaining the trained preset machine learning algorithm model; or analyzing the pcap file to obtain field information used for identifying the service type in the pcap file, taking the field information as model training data, taking the service type of the operation corresponding to the pcap file as a label of the model training data, training a corresponding relation model of the field information and the service type as a service identification model by using the model training data, and obtaining a corresponding relation model of the field information and the service type after training.
Further, the present embodiment also provides an electronic device, which includes a model training method that can be used to implement the embodiments shown in fig. 2 and fig. 3.
As shown in fig. 8, the electronic device mainly includes: a memory 801, a processor 802, a bus 803, and computer programs stored on the memory 801 and executable on the processor 802, the memory 801 and the processor 802 being connected by the bus 803. The processor 802, when executing the computer program, implements the model training method in the embodiment shown in fig. 2 and 3. Wherein the number of processors may be one or more.
The Memory 801 may be a high-speed Random Access Memory (RAM) Memory or a non-volatile Memory (non-volatile Memory), such as a disk Memory. The memory 801 is used to store executable program code, and the processor 802 is coupled to the memory 801.
Further, an embodiment of the present application also provides a storage medium, which may be a computer-readable storage medium disposed in the electronic device in this embodiment, and the storage medium may be the memory in the foregoing embodiment shown in fig. 8. The storage medium has stored thereon a computer program which, when executed by a processor, implements the model training method shown in the embodiments of fig. 2 and 3. Further, the storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.
The fourth embodiment:
referring to fig. 9, the present embodiment provides a service recognition apparatus, which implements service recognition based on the trained service recognition model in the first embodiment, and the apparatus includes:
a packet capturing module 901, configured to capture, if an operation of the application program by a user is detected, a code stream generated by the operation to obtain a to-be-used pcap file of the application program, where the to-be-used pcap file includes a name of the application program;
the intelligent identification module 902 is used for obtaining data to be identified of the service identification model based on the pcap file to be used and identifying the data to be identified based on the service identification model;
and the determining module 903 is used for determining the service type corresponding to the operation of the application program by using the identification result based on the service identification model.
In an example of this embodiment, the service identification model is a preset convolutional neural network model, and the intelligent identification module 902 is configured to use the pcap file as data to be identified of the preset convolutional neural network model, and input the data to be identified into the preset convolutional neural network model to identify a service type of the data to be identified.
In another example of this embodiment, the service identification model is a preset machine learning algorithm model, and the intelligent identification module 902 is configured to parse the to-be-used pcap file to obtain field information for identifying the service type in the to-be-used pcap file, use the field information as to-be-identified data, and input the to-be-identified data into the preset machine learning algorithm model to identify the service type of the to-be-identified data.
In another example of this embodiment, the service identification model is a model of correspondence between field information and service type, and the intelligent identification module 902 is configured to analyze the pcap file to be used to obtain field information for identifying the service type in the pcap file to be used, and match the service type corresponding to the field information of the pcap file to be used from the model of correspondence between field information and service type as an identification result of the service identification model.
Further, the present embodiment also provides an electronic device, which includes a service identification method that can be used to implement the embodiment shown in fig. 4.
As shown in fig. 10, the electronic device mainly includes: a memory 1001, a processor 1002, a bus 1003 and a computer program stored on the memory 1001 and executable on the processor 1002, the memory 1001 and the processor 1002 being connected by the bus 1003. The processor 1002, when executing the computer program, implements the service identification method in the embodiment shown in fig. 4. Wherein the number of processors may be one or more.
The Memory 1001 may be a high-speed Random Access Memory (RAM) Memory or a non-volatile Memory (e.g., a disk Memory). The memory 1001 is used for storing executable program code, and the processor 1002 is coupled to the memory 1001.
Further, an embodiment of the present application also provides a storage medium, which may be a computer-readable storage medium disposed in the electronic device in the present embodiment, and the storage medium may be the memory in the foregoing embodiment shown in fig. 10. The storage medium has stored thereon a computer program which, when executed by a processor, implements the service identification method in the embodiment shown in fig. 4. Further, the storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a RAM, a magnetic disk, or an optical disk.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a readable storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned readable storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the above description of the model training method, the service identification method, the device, the electronic device and the storage medium provided by the present application, for those skilled in the art, according to the ideas of the embodiments of the present application, there are changes in the specific implementation and the application scope, and in summary, the contents of the present specification should not be construed as limiting the present application.
Claims (10)
1. A method of model training, comprising:
acquiring a target application program list, and automatically downloading a target application program from a software downloading platform according to information in the target application program list;
installing the downloaded target application program;
simulating the operation of a human on the target application program, and capturing a code stream generated in the operation process to obtain a pcap file, wherein the pcap file comprises the name of the target application program;
obtaining model training data with labels based on the pcap file, training a preset business recognition model by using the model training data, and obtaining a trained business recognition model, wherein the labels of the model training data are the business types of the operations corresponding to the pcap file.
2. The model training method of claim 1, wherein the obtaining a list of target applications comprises:
accessing the software downloading platform according to a preset time interval, and collecting the latest relevant information of the application program on the software downloading platform;
and comparing the collected related information of the application program with the information of the target application program in the target application program list acquired last time, and updating the target application program list as the current target application program list according to the comparison result.
3. The model training method of claim 1, wherein the capturing the codestream generated in the operation process to obtain the pcap file comprises:
and performing packet capturing operation on the code stream generated by the operation aiming at the target application program in the background to obtain a pcap file, and writing the name of the target application program into the file name of the pcap file.
4. The model training method of claim 1, wherein the obtaining labeled model training data based on the pcap file, and training a preset business recognition model with the model training data to obtain a trained business recognition model comprises:
taking the pcap file as model training data, taking the operation type corresponding to the pcap file as a label of the model training data, and training a preset convolutional neural network model serving as a service identification model by using the model training data to obtain a trained preset convolutional neural network model;
or analyzing the pcap file to obtain field information used for identifying the service type in the pcap file, taking the field information as model training data, taking the service type of the operation corresponding to the pcap file as a label of the model training data, and training a preset machine learning algorithm model serving as a service identification model by using the model training data to obtain a trained preset machine learning algorithm model;
or analyzing the pcap file to obtain field information used for identifying the service type in the pcap file, taking the field information as model training data, taking the service type of the operation corresponding to the pcap file as a label of the model training data, training the model training data as a corresponding relation model of the field information and the service type of the service identification model, and obtaining a corresponding relation model of the field information and the service type after training.
5. A service identification method implemented based on the service identification model trained in any one of claims 1 to 4, comprising:
if the operation of a user on the application program is detected, capturing a code stream generated by the operation to obtain a to-be-used pcap file of the application program, wherein the to-be-used pcap file comprises the name of the application program;
obtaining data to be identified of the business identification model based on the pcap file to be used, and identifying the data to be identified based on the business identification model;
and determining the service type corresponding to the operation of the application program based on the identification result of the service identification model.
6. The service identification method according to claim 5, wherein the service identification model is a preset convolutional neural network model, the data to be identified of the service identification model is obtained based on the pcap file to be used, and identifying the data to be identified based on the service identification model comprises: taking the pcap file to be used as data to be identified of the preset convolutional neural network model, and inputting the data to be identified into the preset convolutional neural network model to identify the service type of the data to be identified;
or, the service identification model is a preset machine learning algorithm model, the data to be identified of the service identification model is obtained based on the pcap file to be used, and the identification of the data to be identified based on the service identification model comprises: analyzing the to-be-used pcap file to obtain field information for identifying the service type in the to-be-used pcap file, using the field information as to-be-identified data, and inputting the to-be-identified data into the preset machine learning algorithm model to identify the service type of the to-be-identified data;
or, the service identification model is a corresponding relation model of field information and service type, the data to be identified of the service identification model is obtained based on the pcap file to be used, and identifying the data to be identified based on the service identification model includes: analyzing the to-be-used pcap file to obtain field information used for identifying the service type in the to-be-used pcap file, and matching the service type corresponding to the field information of the to-be-used pcap file from the field information and service type corresponding relation model to serve as the identification result of the service identification model.
7. A model training apparatus, comprising:
the automatic acquisition module is used for acquiring a target application program list and automatically downloading the target application program from a software downloading platform according to the information in the target application program list;
the automatic installation module is used for installing the downloaded target application program;
the automatic simulation operation module is used for simulating the operation of a human on the target application program, and capturing a code stream generated in the operation process to obtain a pcap file, wherein the pcap file comprises the name of the target application program;
and the deep learning module is used for obtaining model training data with labels based on the pcap file, training a preset business recognition model by using the model training data, and obtaining a trained business recognition model, wherein the labels of the model training data are the business types of the operation corresponding to the pcap file.
8. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-4 when executing the computer program.
9. A service identification apparatus, wherein the service identification is implemented based on the service identification model trained and completed in any one of claims 1 to 4, and the apparatus comprises:
the packet capturing module is used for capturing a code stream generated by operation to obtain a to-be-used pcap file of the application program if the operation of a user on the application program is detected, wherein the to-be-used pcap file comprises the name of the application program;
the intelligent identification module is used for obtaining data to be identified of the business identification model based on the pcap file to be used and identifying the data to be identified based on the business identification model;
and the determining module is used for determining the service type corresponding to the operation of the application program by using the recognition result based on the service recognition model.
10. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 5 or 6 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010089667.9A CN111355628B (en) | 2020-02-12 | 2020-02-12 | Model training method, service identification method, device and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010089667.9A CN111355628B (en) | 2020-02-12 | 2020-02-12 | Model training method, service identification method, device and electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111355628A true CN111355628A (en) | 2020-06-30 |
CN111355628B CN111355628B (en) | 2023-05-09 |
Family
ID=71195670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010089667.9A Active CN111355628B (en) | 2020-02-12 | 2020-02-12 | Model training method, service identification method, device and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111355628B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113141616A (en) * | 2021-04-20 | 2021-07-20 | 博瑞得科技有限公司 | Method, device and system for selecting energy-saving base station and energy-saving mode through self-adaptive identification of O + B domain data and service scene |
CN114510305A (en) * | 2022-01-20 | 2022-05-17 | 北京字节跳动网络技术有限公司 | Model training method and device, storage medium and electronic equipment |
CN114513685A (en) * | 2022-01-28 | 2022-05-17 | 武汉绿色网络信息服务有限责任公司 | Method and device for identifying HTTPS encrypted video stream based on stream characteristics |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060239219A1 (en) * | 2005-04-22 | 2006-10-26 | At&T Corporation | Application signature based traffic classification |
CN104935478A (en) * | 2015-06-19 | 2015-09-23 | 上海斐讯数据通信技术有限公司 | Intelligent terminal depth perception method and system thereof |
CN108234345A (en) * | 2016-12-21 | 2018-06-29 | 中国移动通信集团湖北有限公司 | A kind of traffic characteristic recognition methods of terminal network application, device and system |
CN109995601A (en) * | 2017-12-29 | 2019-07-09 | 中国移动通信集团上海有限公司 | A kind of network flow identification method and device |
-
2020
- 2020-02-12 CN CN202010089667.9A patent/CN111355628B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060239219A1 (en) * | 2005-04-22 | 2006-10-26 | At&T Corporation | Application signature based traffic classification |
CN104935478A (en) * | 2015-06-19 | 2015-09-23 | 上海斐讯数据通信技术有限公司 | Intelligent terminal depth perception method and system thereof |
CN108234345A (en) * | 2016-12-21 | 2018-06-29 | 中国移动通信集团湖北有限公司 | A kind of traffic characteristic recognition methods of terminal network application, device and system |
CN109995601A (en) * | 2017-12-29 | 2019-07-09 | 中国移动通信集团上海有限公司 | A kind of network flow identification method and device |
Non-Patent Citations (1)
Title |
---|
魏松杰等: "DroidBet:事件驱动的Android应用网络行为的自动检测系统", 《通信学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113141616A (en) * | 2021-04-20 | 2021-07-20 | 博瑞得科技有限公司 | Method, device and system for selecting energy-saving base station and energy-saving mode through self-adaptive identification of O + B domain data and service scene |
CN113141616B (en) * | 2021-04-20 | 2022-07-29 | 博瑞得科技有限公司 | Method, device and system for selecting energy-saving base station and energy-saving mode through adaptive identification of O + B domain data + service scene |
CN114510305A (en) * | 2022-01-20 | 2022-05-17 | 北京字节跳动网络技术有限公司 | Model training method and device, storage medium and electronic equipment |
CN114510305B (en) * | 2022-01-20 | 2024-01-23 | 北京字节跳动网络技术有限公司 | Model training method and device, storage medium and electronic equipment |
CN114513685A (en) * | 2022-01-28 | 2022-05-17 | 武汉绿色网络信息服务有限责任公司 | Method and device for identifying HTTPS encrypted video stream based on stream characteristics |
Also Published As
Publication number | Publication date |
---|---|
CN111355628B (en) | 2023-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595583B (en) | Dynamic graph page data crawling method, device, terminal and storage medium | |
CN107818344B (en) | Method and system for classifying and predicting user behaviors | |
CN111355628B (en) | Model training method, service identification method, device and electronic device | |
CN108667855B (en) | Network flow abnormity monitoring method and device, electronic equipment and storage medium | |
US11122142B2 (en) | User behavior data processing method and device, and computer-readable storage medium | |
US20180217986A1 (en) | Automated extraction tools and their use in social content tagging systems | |
CN108292257A (en) | System and method for explaining client-server affairs | |
CN111737692A (en) | Application program risk detection method and device, equipment and storage medium | |
US9336316B2 (en) | Image URL-based junk detection | |
WO2016188334A1 (en) | Method and device for processing application access data | |
CN113886204A (en) | User behavior data collection method and device, electronic equipment and readable storage medium | |
CN111949356A (en) | Popup window processing method and device and electronic equipment | |
CN116015842A (en) | Network attack detection method based on user access behaviors | |
EP3722974B1 (en) | Collecting apparatus, collection method, and collection program | |
CN114428705A (en) | Network data monitoring method, device, equipment and storage medium | |
CN105227528B (en) | To the detection method and device of the attack of Web server group | |
US11831417B2 (en) | Threat mapping engine | |
CN111859069B (en) | Network malicious crawler identification method, system, terminal and storage medium | |
CN111428117B (en) | Application program data acquisition method and device | |
TWI557583B (en) | Webpage comment classification method, system and webpage management device | |
CN110069691A (en) | For handling the method and apparatus for clicking behavioral data | |
CN114513355A (en) | Malicious domain name detection method, device, equipment and storage medium | |
CN110262856B (en) | Application program data acquisition method, device, terminal and storage medium | |
CN113890835A (en) | Method and device for processing DPI application test message | |
CN113660663A (en) | Internet of things equipment identification method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 401120 No.2, 7th floor, Fenghuang a building, No.18, Qingfeng North Road, Yubei District, Chongqing Applicant after: Broid Technology Co.,Ltd. Address before: No.1, area a, building B1, Shenzhen digital technology park, No.002, Gaoxin South 7th Road, Nanshan District, Shenzhen, Guangdong 518000 Applicant before: SHENZHEN BROADTECH CO.,LTD. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |