CN112615758A - Application identification method, device, equipment and storage medium - Google Patents

Application identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN112615758A
CN112615758A CN202011490933.5A CN202011490933A CN112615758A CN 112615758 A CN112615758 A CN 112615758A CN 202011490933 A CN202011490933 A CN 202011490933A CN 112615758 A CN112615758 A CN 112615758A
Authority
CN
China
Prior art keywords
application
target
data
feature data
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011490933.5A
Other languages
Chinese (zh)
Other versions
CN112615758B (en
Inventor
田慧萌
万月亮
火一莽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruian Technology Co Ltd
Original Assignee
Beijing Ruian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruian Technology Co Ltd filed Critical Beijing Ruian Technology Co Ltd
Priority to CN202011490933.5A priority Critical patent/CN112615758B/en
Publication of CN112615758A publication Critical patent/CN112615758A/en
Priority to PCT/CN2021/115879 priority patent/WO2022127196A1/en
Application granted granted Critical
Publication of CN112615758B publication Critical patent/CN112615758B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses an application identification method, an application identification device, application identification equipment and a storage medium. The method comprises the following steps: acquiring target network flow data sent by a target terminal; analyzing the target network traffic data to obtain target characteristic data; and searching an application identification feature library according to the target feature data, and determining the target application corresponding to the target feature data. By the technical scheme, identification enhancement of massive applications can be realized, the efficiency is improved, the identification is comprehensive, the accuracy is high, the realization process is simple, and the method is easy to adapt to the change of the APP in the network era.

Description

Application identification method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of data processing, in particular to a method, a device, equipment and a storage medium for application identification.
Background
With the development of mobile internet, various Applications (APP) emerge endlessly. The total number of current APP is about three million, and the number of APP on and off shelves per month is in the order of hundred thousand. It becomes a heavy task to identify its corresponding application in the mass data traffic generated by many APPs in reverse.
Most of the APPs adopt HTTP or HTTPS protocol for communication.
The traditional method for enhancing application identification is to invest a lot of manpower to analyze a limited number of APPs and then support the APP by writing a template. This method is not only time consuming and laborious, but it is difficult to keep up with the rate of change of APP, and it is also difficult to cover enough APPs.
Disclosure of Invention
The embodiment of the invention provides an application identification method, an application identification device, application identification equipment and a storage medium, so that identification enhancement of massive applications can be realized, the efficiency is improved, and the identification is comprehensive and high in accuracy.
In a first aspect, an embodiment of the present invention provides application identification, including:
acquiring target network flow data sent by a target terminal;
analyzing the target network traffic data to obtain target characteristic data;
and searching an application identification feature library according to the target feature data, and identifying the target application corresponding to the target feature data.
Further, the target feature data includes: a target HOST and/or a target SNI;
correspondingly, searching an application identification feature library according to the target feature data, and determining the target application corresponding to the target feature data comprises:
and searching an application identification feature library according to the target HOST, and determining a target application ID and a target application name corresponding to the target HOST, or searching an application identification feature library according to the target SNI and determining a target application ID and a target application name corresponding to the target SNI.
Further, before obtaining the target network traffic data sent by the target terminal, the method further includes:
installing at least one application through the simulator;
acquiring network traffic data generated by each application during operation;
and storing the network traffic data generated by each application as a PCAP file, wherein the PCAP file carries the application ID and the application name corresponding to the network traffic data.
Further, after saving the network traffic data generated by each application as a PCAP file, the method includes:
analyzing the PCAP file to obtain candidate feature data, and an application ID and an application name corresponding to the candidate feature data;
determining first feature data corresponding to each application ID according to the candidate feature data and the application ID and the application name corresponding to the candidate feature data;
and storing the first characteristic data, and the application ID and the application name corresponding to the first characteristic data into an application identification characteristic library.
Further, determining the first feature data corresponding to each application ID according to the candidate feature data and the application ID and the application name corresponding to the candidate feature data includes:
acquiring candidate characteristic data corresponding to all applications;
establishing a candidate global hash table according to candidate characteristic data corresponding to all applications;
and selecting first characteristic data corresponding to each application ID through the candidate global hash table.
Further, the first feature data includes: a globally unique HOST and/or a globally unique SNI.
In a second aspect, an embodiment of the present invention further provides an application identification apparatus, where the apparatus includes:
the first acquisition module is used for acquiring target network traffic data sent by a target terminal;
the analysis module is used for analyzing the target network flow data to obtain target characteristic data;
and the identification module is used for searching an application identification feature library according to the target feature data and determining the target application corresponding to the target feature data.
Further, the target feature data includes: a target HOST and/or a target SNI;
correspondingly, the identification module is specifically configured to:
and searching an application identification feature library according to the target HOST, and determining a target application ID and a target application name corresponding to the target HOST, or searching an application identification feature library according to the target SNI and determining a target application ID and a target application name corresponding to the target SNI.
Further, the method also comprises the following steps:
the system comprises an installation module, a simulator and a control module, wherein the installation module is used for installing at least one application through the simulator before acquiring target network traffic data sent by a target terminal;
the second acquisition module is used for acquiring network flow data generated by each application during the running period;
and the storage module is used for storing the network traffic data generated by each application as a PCAP file, wherein the PCAP file carries the application ID and the application name corresponding to the network traffic data.
Further, the method also comprises the following steps:
the device comprises an obtaining module, a judging module and a judging module, wherein the obtaining module is used for analyzing a PCAP file to obtain candidate characteristic data, and an application ID and an application name corresponding to the candidate characteristic data after storing network flow data generated by each application as the PCAP file;
the determining module is used for determining first feature data corresponding to each application ID according to the candidate feature data and the application ID and the application name corresponding to the candidate feature data;
and the storage module is used for storing the first characteristic data, and the application ID and the application name corresponding to the first characteristic data into an application identification characteristic library.
Further, the determining module is specifically configured to:
acquiring candidate characteristic data corresponding to all applications;
establishing a candidate global hash table according to candidate characteristic data corresponding to all applications;
and selecting first characteristic data corresponding to each application ID through the candidate global hash table.
Further, the first feature data includes: a globally unique HOST and/or a globally unique SNI.
In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the application identification method according to any one of the embodiments of the present invention when executing the program.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the application identification method according to any one of the embodiments of the present invention.
The embodiment of the invention obtains the target characteristic data corresponding to the target network flow data by analyzing the target network flow data sent by the target terminal, and searches the application identification characteristic library according to the target characteristic data, thereby determining the corresponding target application, solving the problems that the traditional method for enhancing the application identification can only analyze a limited number of APPs and support the time and labor consumption by compiling a template, is difficult to keep up with the change speed of the APPs and cover enough APPs, realizing the identification enhancement of mass applications, improving the efficiency, being comprehensive in identification and high in accuracy, and being simple in realization process and easily adapting to the change of the APP in the network era.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of an application identification method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of an application identification method according to a second embodiment of the present invention;
FIG. 2a is a flow chart of an automatic packet capturing process according to a second embodiment of the present invention;
FIG. 2b is a diagram of the overall topology of application identification in the second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an application recognition apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device in the fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Example one
Fig. 1 is a flowchart of an application identification method according to an embodiment of the present invention, where this embodiment is applicable to identifying an application corresponding to network traffic data sent by a terminal, and the method may be executed by an application identification device according to an embodiment of the present invention, where the device may be implemented in a software and/or hardware manner, as shown in fig. 1, the method specifically includes the following steps:
s110, obtaining target network flow data sent by the target terminal.
The target terminal can be any terminal device capable of loading applications, such as a mobile phone, a tablet, a computer and the like.
The target network traffic data sent by the target terminal refers to network traffic data generated by the target terminal in the application using process.
Specifically, network traffic data sent by a user in the process of using an application on a target terminal is acquired.
And S120, analyzing the target network traffic data to obtain target characteristic data.
The target feature data is feature data which can be used for identifying an application in an HTTP (hyper text transport protocol) message or an HTTPS (hyper text transport protocol) message of an APP.
Specifically, target network traffic data sent by a target terminal is analyzed, and target characteristic data in the network traffic data is extracted to identify an application corresponding to the target characteristic data.
S130, searching an application identification feature library according to the target feature data, and determining the target application corresponding to the target feature data.
The application identification feature library may be a database that is pre-established before the application is identified and contains feature data of all applications. The embodiment of the invention does not limit the data storage mode of the application identification feature library.
Specifically, an application identification feature library is searched according to target feature data extracted from an HTTP protocol message or an HTTPs protocol message, and if the target feature data matches feature data included in the application identification feature library, a target application corresponding to the target feature data is determined according to target application information included in the application identification feature library.
According to the technical scheme, the target characteristic data corresponding to the target network flow data are obtained by analyzing the target network flow data sent by the target terminal, and the application identification characteristic library is searched according to the target characteristic data, so that the corresponding target application is determined, the identification enhancement of massive applications can be realized, the efficiency is improved, the identification is comprehensive, the accuracy is high, and the realization process is simple and is easily adapted to the change of the APP in the network era.
Example two
Fig. 2 is a flowchart of an application identification method in a second embodiment of the present invention, which is optimized based on the second embodiment, in this embodiment, the target feature data includes: correspondingly, searching an application identification feature library according to the target feature data, and determining a target application corresponding to the target feature data, wherein the target HOST and/or the target SNI comprise: and searching an application identification feature library according to the target HOST, and determining a target application ID and a target application name corresponding to the target HOST, or searching an application identification feature library according to the target SNI and determining a target application ID and a target application name corresponding to the target SNI.
As shown in fig. 2, the method of this embodiment specifically includes the following steps:
s210, acquiring target network traffic data sent by a target terminal.
S220, analyzing the target network flow data to obtain target characteristic data, wherein the target characteristic data comprises: a target HOST and/or a target SNI.
Specifically, target network traffic data is analyzed to obtain an HTTP protocol message and/or an HTTPS protocol message, a target HOST is extracted from the HTTP protocol message, and an HTTPS protocol message SNI is extracted from the HTTPS protocol message.
It should be noted that the network traffic data generated in the process of using APP may only include an HTTP protocol packet, may also include an HTTPs protocol packet, or may include both the HTTP protocol packet and the HTTPs protocol packet.
S230, searching an application identification feature library according to the target HOST, and determining a target application ID and a target application name corresponding to the target HOST, or searching an application identification feature library according to the target SNI and determining a target application ID and a target application name corresponding to the target SNI.
The application identification feature library is a database which is pre-established before application identification and used for storing feature data of the application, application IDs corresponding to the feature data and application names. The data storage method of the application identification feature library may be any database table method, which is not limited in the embodiment of the present invention.
Specifically, if the network traffic data corresponding to the APP only includes an HTTP protocol packet, searching an application identification feature library according to the target HOST, and determining a target application ID and a target application name corresponding to the target HOST; if the network traffic data corresponding to the APP only comprises an HTTPS protocol message, searching an application identification feature library according to the target SNI, and determining a target application ID and a target application name corresponding to the target SNI; if the network traffic data corresponding to the APP comprises an HTTP protocol message and an HTTPS protocol message, searching an application identification feature library according to the target HOST, and determining a target application ID and a target application name corresponding to the target HOST, or searching an application identification feature library according to the target SNI and determining the target application ID and the target application name corresponding to the target SNI.
Optionally, before obtaining the target network traffic data sent by the target terminal, the method further includes:
installing at least one application through the simulator;
acquiring network traffic data generated by each application during operation;
and storing the network traffic data generated by each application as a PCAP file, wherein the PCAP file carries the application ID and the application name corresponding to the network traffic data.
Specifically, as shown in fig. 2a, the installing of the application through the simulator, acquiring network traffic data generated by the application during the running period, and storing the network traffic data as the PCAP file is also referred to as an automatic packet capturing process, and the automatic packet capturing process may specifically include the following steps:
in the first step, the simulator is turned on.
The second step is that: and installing the APP package on the simulator by executing an installation tool, wherein the installation tool can be an ADB android tool and the like which can realize the installation of the APP.
The third step: executing a packet capturing tool program, introducing an application ID as a parameter, and waiting for the APP to start, wherein the packet capturing tool can be a tool which can capture network traffic data packets such as tcpdump.
The fourth step: the method comprises the steps of starting an application, browsing pages and other common operations for the APP, in order to achieve automatic and efficient acquisition of flow data packets generated in the APP using process, executing an AppCrawle tool for automatically traversing the APP page, simulating normal operation of a user on the APP, and waiting for the execution of the AppCrawler to be completed.
And fifthly, after the network traffic data generated during the operation of each application is obtained, exporting the network traffic data, and storing the network traffic data as a PCAP file, wherein the mode for exporting the network traffic data can be that the network traffic data generated in the APP use process is exported by using a tcpdump-dump command or other commands or tools. The PCAP file carries the application ID and the application name corresponding to the network traffic data, and the mode of carrying the application ID and the application name may be that the PCAP file is named according to the application ID, the application name, and the version information, or the application ID, the application name, and the version information are stored in the PCAP file. So as to determine the application information according to the data of the PCAP file.
And repeating the steps until the network flow data generated by each application is stored as a PCAP file.
It should be noted that, in order to efficiently obtain the corresponding relationship between the mass applications and the network data traffic generated by the applications to establish the application identification feature library, a crawler technology may be used to obtain the installation package of the mass applications, and the specific steps are as follows: deploying a python crawler script on a server, starting a timing task, executing the crawler script, and capturing the download addresses of the APPs in the ranking lists of the large application markets; and starting a download script, downloading the APPs one by one according to the collected download addresses, calling an AXMLPrinter2 decompiling tool of the android developer toolkit to analyze the APP installation package after the download is finished, extracting the application ID, the application name and the version information, and renaming the installation package according to the rule of the application name + the application ID.
Optionally, after the network traffic data generated by each application is stored as a PCAP file, the method includes:
analyzing the PCAP file to obtain candidate feature data, and an application ID and an application name corresponding to the candidate feature data;
determining first feature data corresponding to each application ID according to the candidate feature data and the application ID and the application name corresponding to the candidate feature data;
and storing the first characteristic data, and the application ID and the application name corresponding to the first characteristic data into an application identification characteristic library.
Specifically, after network traffic data generated by each application is stored as a PCAP file, the PCAP file or the PCAP file name is analyzed to obtain an application ID and an application name; and each PCAP file is analyzed to obtain an HTTP protocol message and an HTTPS protocol message, and the HTTP protocol message and the HTTPS protocol message are further analyzed to obtain candidate characteristic data. Determining first feature data corresponding to each application ID according to the candidate feature data and the application ID and the application name corresponding to the candidate feature data, and storing the first feature data and the application ID and the application name corresponding to the first feature data to an application identification feature library.
Specifically, the manner of storing the first feature data, and the application ID and the application name corresponding to the first feature data in the application identification feature library may be: storing the candidate characteristic data of each application, the application ID and the application name corresponding to the candidate characteristic data into a first file, reading the candidate characteristic data in the first file corresponding to each application, and establishing a candidate global hash table. And selecting first characteristic data corresponding to each application ID according to the candidate global hash table, and storing the first characteristic data into an application identification characteristic library. The method for storing the first feature data and the application ID and application name corresponding to the first feature data in the application identification feature library may also be a method for establishing a candidate global hash table according to candidate feature data of each application and the application ID and application name corresponding to the candidate feature data. And selecting first characteristic data corresponding to each application ID according to the candidate global hash table, and storing the first characteristic data into an application identification characteristic library.
Exemplary candidate feature data may include: and correspondingly, analyzing the HTTP protocol message to obtain a candidate HOST and analyzing the HTTPS protocol message to obtain a candidate SNI. It should be noted that each application may include multiple candidate HOSTs or multiple candidate SNIs, or may include multiple candidate HOSTs and multiple candidate SNIs simultaneously.
Correspondingly, all candidate feature data extracted from each PCAP file, and the application ID and application name corresponding to the candidate feature data are saved in a first file, the format of the first file is as follows:
[ application ID ];
Appname=name1,name2;
host=host1,host2;
sni=sni1,sni2;
wherein, the application ID is a unique identifier of the application, and may be in com. Most appnames are only one, but there may be a case that the same application uses different application names in different application markets, and commas are used for separation if there are a plurality of appnames; there may be more than one each of HOST and SNI, separated by commas.
Optionally, determining, according to the candidate feature data, and the application ID and the application name corresponding to the candidate feature data, first feature data corresponding to each application ID includes:
acquiring candidate characteristic data corresponding to all applications;
establishing a candidate global hash table according to candidate characteristic data corresponding to all applications;
and selecting first characteristic data corresponding to each application ID through the candidate global hash table.
The first characteristic data is characteristic data which is unique to each application compared with other applications, and can be used for identifying IDs and application names of different applications.
Specifically, candidate feature data corresponding to all applications are obtained, a candidate global hash table is generated, and first feature data corresponding to each application ID are screened out through the application IDs and the candidate feature data included in the candidate global hash table to serve as features for identifying the applications. And storing the first characteristic data, and the application ID and the application name corresponding to the first characteristic data into an application identification characteristic library.
Specifically, the manner of storing the first feature data, and the application ID and the application name corresponding to the first feature data in the application identification feature library may be: storing the first feature data and the application ID and application name corresponding to the first feature data into a second file, wherein the second file can be a mapping file, and then storing the first feature data in the second file, the application ID and application name corresponding to the first feature data into an application identification feature library; the first feature data and the application ID and the application name corresponding to the first feature data may also be directly stored in an application identification feature library.
Optionally, the first feature data includes: a globally unique HOST and/or a globally unique SNI.
Illustratively, the first characteristic data may include: globally unique HOST and/or globally unique SNI, then storing the first feature data and the application ID and application name corresponding to the first feature data to the application identification feature repository may be: if the first feature data is the globally unique HOST, storing the globally unique HOST, and the application ID and the application name corresponding to the globally unique HOST into an application identification feature library; if the first feature data is the global unique SNI, storing the global unique SNI, the application ID and the application name corresponding to the global unique SNI to an application identification feature library; and if the first feature data are globally unique HOST and globally unique SNI, storing the application ID and the application name corresponding to the globally unique HOST into an application identification feature library according to the globally unique HOST and the globally unique HOST, and storing the application ID and the application name corresponding to the globally unique SNI into an application identification feature library according to the globally unique SNI.
Correspondingly, the first feature data and the application ID and the application name corresponding to the first feature data are saved in a second file, and data storage in the second file adopts an XML format, for example:
<sni servername="sname1"appname="name1"appid="id1">;
<sni servername="sname2"appname="name1"appid="id2">;
<sni servername="sname3"appname="name2"appid="id3">;
<host servername="sname4"appname="name2"appid="id4">;
<host servername="sname5"appname="name3"appid="id5">;
as shown in fig. 2b, the specific steps of this embodiment are: in this embodiment, the application recognition system needs one server for application analysis, one high-performance server for application recognition, one shunt device in operation of the existing network, and one database server. Wherein the application analysis server needs to be able to access the internet. And after the application server performs automatic analysis, the mapping file in the format of xml, which is the final analysis result, is sent to the application identification server, and the application identification server reloads the mapping file, performs application identification and structured output data on the accessed data, and transmits the data to the database server. The implementation process needs the following steps:
(1) an application analysis server is established: an automatic packet capturing and storing program, a PCAP file analysis program and a collision analysis program need to be deployed on the application analysis server. And performing one-key installation deployment by the script.
(2) The application identification server is built: the application identification server needs to deploy an application identification program and install a data Transfer program, and the application identification server can use a File Transfer Protocol (FTP) for Transfer.
(3) Database server construction: building a database server according to the data scale, and if the data scale is small, mysql, oracle can be used; if the data volume is large, a Distributed storage System (HDFS) needs to be built.
(4) And data distribution, namely carrying out current network data flow mirroring by using distribution equipment.
(5) And starting an application analysis program, and sending the application analysis program to the application identification server through the FTP after the analysis is finished.
(6) And starting an application identification program, carrying out application identification on the access flow, then carrying out structured output, and transmitting the access flow to a database server through the FTP for service system analysis.
According to the technical scheme, the target characteristic data corresponding to the target application is acquired by analyzing the network flow data of the target application, and the application identification characteristic library is inquired according to the target characteristic data, so that the application name and the application ID of the target application are determined, the identification enhancement of massive applications can be realized, the efficiency is improved, the identification is comprehensive, the accuracy is high, and the realization process is simple and is easily adapted to the change of the APP in the network era.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an application identification apparatus according to a third embodiment of the present invention. The embodiment may be applicable to the case of identifying an application corresponding to network traffic data sent by a terminal, the apparatus may be implemented in a software and/or hardware manner, and the apparatus may be integrated in any device providing an application identification function, as shown in fig. 3, where the apparatus for identifying an application specifically includes: a first acquisition module 310, a parsing module 320, and an identification module 330.
A first obtaining module 310, configured to obtain target network traffic data sent by a target terminal;
the analysis module 320 is configured to analyze the target network traffic data to obtain target feature data;
the identifying module 330 is configured to search an application identification feature library according to the target feature data, and determine a target application corresponding to the target feature data.
Optionally, the target feature data includes: a target HOST and/or a target SNI;
correspondingly, the identification module is specifically configured to:
and searching an application identification feature library according to the target HOST, and determining a target application ID and a target application name corresponding to the target HOST, or searching an application identification feature library according to the target SNI and determining a target application ID and a target application name corresponding to the target SNI.
Optionally, the method further includes:
the system comprises an installation module, a simulator and a control module, wherein the installation module is used for installing at least one application through the simulator before acquiring target network traffic data sent by a target terminal;
the second acquisition module is used for acquiring network flow data generated by each application during the running period;
and the storage module is used for storing the network traffic data generated by each application as a PCAP file, wherein the PCAP file carries the application ID and the application name corresponding to the network traffic data.
Optionally, the method further includes:
the device comprises an obtaining module, a judging module and a judging module, wherein the obtaining module is used for analyzing a PCAP file to obtain candidate characteristic data, and an application ID and an application name corresponding to the candidate characteristic data after storing network flow data generated by each application as the PCAP file;
the determining module is used for determining first feature data corresponding to each application ID according to the candidate feature data and the application ID and the application name corresponding to the candidate feature data;
and the storage module is used for storing the first characteristic data, and the application ID and the application name corresponding to the first characteristic data into an application identification characteristic library.
Optionally, the determining module is specifically configured to:
acquiring candidate characteristic data corresponding to all applications;
establishing a candidate global hash table according to candidate characteristic data corresponding to all applications;
and selecting first characteristic data corresponding to each application ID through the candidate global hash table.
Optionally, the first feature data includes: globally unique HOST or globally unique SNI.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
According to the technical scheme, the target characteristic data corresponding to the target application is acquired by analyzing the network flow data of the target application, and the application identification characteristic library is inquired according to the target characteristic data, so that the application name and the application ID of the target application are determined, the identification enhancement of massive applications can be realized, the efficiency is improved, the identification is comprehensive, the accuracy is high, and the realization process is simple and is easily adapted to the change of the APP in the network era.
Example four
Fig. 4 is a schematic structural diagram of a computer device in the fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 4 is only one example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 4, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. In the computer device 12 of the present embodiment, the display 24 is not provided as a separate body but is embedded in the mirror surface, and when the display surface of the display 24 is not displayed, the display surface of the display 24 and the mirror surface are visually integrated. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing an application identification method provided by an embodiment of the present invention: acquiring target network flow data sent by a target terminal; analyzing the target network traffic data to obtain target characteristic data; and searching an application identification feature library according to the target feature data, and determining the target application corresponding to the target feature data.
EXAMPLE five
An embodiment five of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the application identification method provided in all the inventive embodiments of the present application: acquiring target network flow data sent by a target terminal; analyzing the target network traffic data to obtain target characteristic data; and searching an application identification feature library according to the target feature data, and determining the target application corresponding to the target feature data.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. An application identification method, comprising:
acquiring target network flow data sent by a target terminal;
analyzing the target network traffic data to obtain target characteristic data;
and searching an application identification feature library according to the target feature data, and identifying the target application corresponding to the target feature data.
2. The method of claim 1, wherein the target feature data comprises: a target HOST and/or a target SNI;
correspondingly, searching an application identification feature library according to the target feature data, and determining the target application corresponding to the target feature data comprises:
and searching an application identification feature library according to the target HOST, and determining a target application ID and a target application name corresponding to the target HOST, or searching an application identification feature library according to the target SNI and determining a target application ID and a target application name corresponding to the target SNI.
3. The method of claim 1, before obtaining the target network traffic data sent by the target terminal, further comprising:
installing at least one application through the simulator;
acquiring network traffic data generated by each application during operation;
and storing the network traffic data generated by each application as a PCAP file, wherein the PCAP file carries the application ID and the application name corresponding to the network traffic data.
4. The method of claim 3, wherein after saving the network traffic data generated by each application as a PCAP file, comprising:
analyzing the PCAP file to obtain candidate feature data, and an application ID and an application name corresponding to the candidate feature data;
determining first feature data corresponding to each application ID according to the candidate feature data and the application ID and the application name corresponding to the candidate feature data;
and storing the first characteristic data, and the application ID and the application name corresponding to the first characteristic data into an application identification characteristic library.
5. The method according to claim 4, wherein determining the first feature data corresponding to each application ID according to the candidate feature data and the application ID and application name corresponding to the candidate feature data comprises: acquiring candidate characteristic data corresponding to all applications;
establishing a candidate global hash table according to candidate characteristic data corresponding to all applications;
and selecting first characteristic data corresponding to each application ID through the candidate global hash table.
6. The method of claim 5, wherein the first characterization data comprises: a globally unique HOST and/or a globally unique SNI.
7. An application recognition apparatus, comprising:
the first acquisition module is used for acquiring target network traffic data sent by a target terminal;
the analysis module is used for analyzing the target network flow data to obtain target characteristic data;
and the identification module is used for searching an application identification feature library according to the target feature data and identifying the target application corresponding to the target feature data.
8. The apparatus according to claim 7, wherein the identification module is specifically configured to:
and searching an application identification feature library according to the target HOST, and determining a target application ID and a target application name corresponding to the target HOST, or searching an application identification feature library according to the target SNI and determining a target application ID and a target application name corresponding to the target SNI.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-6 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN202011490933.5A 2020-12-16 2020-12-16 Application identification method, device, equipment and storage medium Active CN112615758B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011490933.5A CN112615758B (en) 2020-12-16 2020-12-16 Application identification method, device, equipment and storage medium
PCT/CN2021/115879 WO2022127196A1 (en) 2020-12-16 2021-09-01 Application identification method and apparatus, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011490933.5A CN112615758B (en) 2020-12-16 2020-12-16 Application identification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112615758A true CN112615758A (en) 2021-04-06
CN112615758B CN112615758B (en) 2022-04-29

Family

ID=75240173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011490933.5A Active CN112615758B (en) 2020-12-16 2020-12-16 Application identification method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112615758B (en)
WO (1) WO2022127196A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113347184A (en) * 2021-06-01 2021-09-03 国家计算机网络与信息安全管理中心 Method, device, equipment and medium for testing network flow security detection engine
WO2022127196A1 (en) * 2020-12-16 2022-06-23 北京锐安科技有限公司 Application identification method and apparatus, and device and storage medium
CN115022216A (en) * 2022-05-27 2022-09-06 中国电信股份有限公司 Installed APP detection method and device, and network side equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110011860A (en) * 2019-04-16 2019-07-12 湖南警察学院 Android application and identification method based on network traffic analysis
CN110233769A (en) * 2018-03-06 2019-09-13 华为技术有限公司 A kind of flow rate testing methods and flow detection device
CN110245273A (en) * 2019-06-21 2019-09-17 武汉绿色网络信息服务有限责任公司 A kind of method obtaining APP service feature library and corresponding device
CN110768933A (en) * 2018-07-27 2020-02-07 深信服科技股份有限公司 Network flow application identification method, system and equipment and storage medium
US20200258118A1 (en) * 2019-02-10 2020-08-13 Surya Kumar Kovvali Correlating multi-dimensional data to extract & associate unique identifiers for analytics insights, monetization, QOE & Orchestration
CN111740923A (en) * 2020-06-22 2020-10-02 北京神州泰岳智能数据技术有限公司 Method and device for generating application identification rule, electronic equipment and storage medium
CN111953706A (en) * 2020-08-21 2020-11-17 公安部第三研究所 Method for identifying mobile application based on HTTPS flow information

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7587484B1 (en) * 2001-10-18 2009-09-08 Microsoft Corporation Method and system for tracking client software use
CN107222369A (en) * 2017-07-07 2017-09-29 北京小米移动软件有限公司 Recognition methods, device, switch and the storage medium of application program
CN111565311B (en) * 2020-04-29 2022-02-25 杭州迪普科技股份有限公司 Network traffic characteristic generation method and device
CN112039731B (en) * 2020-11-05 2021-01-01 武汉绿色网络信息服务有限责任公司 DPI (deep packet inspection) identification method and device, computer equipment and storage medium
CN112615758B (en) * 2020-12-16 2022-04-29 北京锐安科技有限公司 Application identification method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110233769A (en) * 2018-03-06 2019-09-13 华为技术有限公司 A kind of flow rate testing methods and flow detection device
CN110768933A (en) * 2018-07-27 2020-02-07 深信服科技股份有限公司 Network flow application identification method, system and equipment and storage medium
US20200258118A1 (en) * 2019-02-10 2020-08-13 Surya Kumar Kovvali Correlating multi-dimensional data to extract & associate unique identifiers for analytics insights, monetization, QOE & Orchestration
CN110011860A (en) * 2019-04-16 2019-07-12 湖南警察学院 Android application and identification method based on network traffic analysis
CN110245273A (en) * 2019-06-21 2019-09-17 武汉绿色网络信息服务有限责任公司 A kind of method obtaining APP service feature library and corresponding device
CN111740923A (en) * 2020-06-22 2020-10-02 北京神州泰岳智能数据技术有限公司 Method and device for generating application identification rule, electronic equipment and storage medium
CN111953706A (en) * 2020-08-21 2020-11-17 公安部第三研究所 Method for identifying mobile application based on HTTPS flow information

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022127196A1 (en) * 2020-12-16 2022-06-23 北京锐安科技有限公司 Application identification method and apparatus, and device and storage medium
CN113347184A (en) * 2021-06-01 2021-09-03 国家计算机网络与信息安全管理中心 Method, device, equipment and medium for testing network flow security detection engine
CN115022216A (en) * 2022-05-27 2022-09-06 中国电信股份有限公司 Installed APP detection method and device, and network side equipment

Also Published As

Publication number Publication date
CN112615758B (en) 2022-04-29
WO2022127196A1 (en) 2022-06-23

Similar Documents

Publication Publication Date Title
CN112615758B (en) Application identification method, device, equipment and storage medium
CN109947635B (en) Data reporting method, device, storage medium and terminal equipment
CN107800757B (en) User behavior recording method and device
CN109672722B (en) Data deployment method and device, computer storage medium and electronic equipment
CN103631623A (en) Method and device for allocating application software in trunking system
CN113760306A (en) Method and device for installing software, electronic equipment and storage medium
CN110659210A (en) Information acquisition method and device, electronic equipment and storage medium
CN112395253A (en) Index file generation method, terminal device, electronic device and medium
CN110688096A (en) Method, device, medium and electronic equipment for constructing application program containing plug-in
CN113051514A (en) Element positioning method and device, electronic equipment and storage medium
CN110245059B (en) Data processing method, device and storage medium
US20100017863A1 (en) Portable storage apparatus for providing working environment migration service and method thereof
CN114003269A (en) Component processing method and device, electronic equipment and storage medium
WO2020238131A1 (en) Web crawler system testing method and apparatus, storage medium, and electronic device
CN110688305B (en) Test environment synchronization method, device, medium and electronic equipment
WO2023151397A1 (en) Application program deployment method and apparatus, device, and medium
US9787552B2 (en) Operation process creation program, operation process creation method, and information processing device
CN104063306A (en) Automatic login method, device and system in intelligent terminal software testing
CN111488268A (en) Dispatching method and dispatching device for automatic test
CN115080114A (en) Application program transplanting processing method, device and medium
CN115016775A (en) Interface simulation method and device, storage medium and electronic equipment
CN111488286B (en) Method and device for independently developing Android modules
CN109509467B (en) Code generation method and device
CN110647331A (en) Development tool acquisition method and device, storage medium and electronic equipment
CN115454827B (en) Compatibility detection method, system, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant