CN112671671B - Third party flow identification method, device and equipment based on third party library - Google Patents

Third party flow identification method, device and equipment based on third party library Download PDF

Info

Publication number
CN112671671B
CN112671671B CN202110278161.7A CN202110278161A CN112671671B CN 112671671 B CN112671671 B CN 112671671B CN 202110278161 A CN202110278161 A CN 202110278161A CN 112671671 B CN112671671 B CN 112671671B
Authority
CN
China
Prior art keywords
party
module
traffic
target application
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110278161.7A
Other languages
Chinese (zh)
Other versions
CN112671671A (en
Inventor
徐国爱
郭燕慧
魏然
宁华
刘海峰
徐国胜
尹志颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Security Evaluation Center
Beijing University of Posts and Telecommunications
China Academy of Information and Communications Technology CAICT
Original Assignee
Beijing Information Security Evaluation Center
Beijing University of Posts and Telecommunications
China Academy of Information and Communications Technology CAICT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Security Evaluation Center, Beijing University of Posts and Telecommunications, China Academy of Information and Communications Technology CAICT filed Critical Beijing Information Security Evaluation Center
Priority to CN202110278161.7A priority Critical patent/CN112671671B/en
Publication of CN112671671A publication Critical patent/CN112671671A/en
Application granted granted Critical
Publication of CN112671671B publication Critical patent/CN112671671B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a third party flow identification method, a third party flow identification device and third party flow identification equipment based on a third party library, wherein the characteristic information of a target application is obtained, a third party module in the target application is determined according to the characteristic information of the target application and the characteristic information of the third party library obtained in advance, the flow calling path information of the target application and the path information of the third party module are obtained, the third party module flow and the own module flow in the flow of the target application are determined according to the flow calling path information and the path information of the third party module, and the third party module flow is used as the third party flow. The method and the device can identify the third party traffic in the application on the premise of not depending on the white list of the third party library and application confusion.

Description

Third party flow identification method, device and equipment based on third party library
Technical Field
The present disclosure relates to the field of computer communication technologies, and in particular, to a third-party traffic identification method, apparatus, and device based on a third-party library.
Background
The mobile internet industry has developed rapidly, and in order to facilitate development, a large number of third-party libraries are used in the development process of mobile application software. For example, to develop an application software, such as app for developing android system on Linux, a first party library may be considered as a library provided by Google, a second party library is a base library written by a developer, and a third party library is a base library provided by a company other than the developer or an open source library released by another company.
The security of the third-party library is not easy to guarantee, the mobile ecosystem can be damaged by the spread of the harmful and unstable third-party library, and if the situation that the third-party library is called in large quantity commonly exists in the application, various security problems can be caused easily, and the healthy development of the mobile internet industry is influenced. Therefore, it is necessary to identify third party traffic in an application, but a technique for accurately identifying third party traffic in an application is lacking in the related art.
Disclosure of Invention
In view of this, an object of the present disclosure is to provide a third party traffic identification method, device and apparatus based on a third party library.
Based on the above purpose, the present disclosure provides a third party traffic identification method based on a third party library, including:
acquiring characteristic information of a target application;
determining a third party module in the target application according to the characteristic information of the target application and the characteristic information of a third party library acquired in advance;
acquiring traffic calling path information of the target application and path information of the third-party module;
and determining third-party module flow and self-owned module flow in the flow of the target application according to the flow calling path information and the path information of the third-party module, and taking the third-party module flow as the third-party flow.
Based on the same inventive concept, the present disclosure provides a third party traffic identification apparatus based on a third party library, comprising:
the characteristic information acquisition module is used for acquiring the characteristic information of the target application;
the third-party module determining module is used for determining a third-party module in the target application according to the characteristic information of the target application and the characteristic information of a third-party library acquired in advance;
the path information acquisition module is used for acquiring the flow calling path information of the target application and the path information of the third-party module;
and the third party flow determining module is used for determining the third party module flow and the self-owned module flow in the flow of the target application according to the flow calling path information and the path information of the third party module, and taking the third party module flow as the third party flow.
Based on the same inventive concept, the present disclosure provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executing the program implements the method as described above.
As can be seen from the above, according to the third party traffic identification method, device and equipment based on the third party library, the feature information of the target application is obtained, the third party module in the target application is determined according to the feature information of the target application and the feature information of the third party library obtained in advance, the traffic calling path information of the target application and the path information of the third party module are obtained, the third party module traffic and the owned module traffic in the traffic of the target application are determined according to the traffic calling path information and the path information of the third party module, and the third party module traffic is used as the third party traffic. The method and the device can identify the third party traffic in the application on the premise of not depending on the white list of the third party library and application confusion.
Drawings
In order to more clearly illustrate the technical solutions in the present disclosure or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of a third party traffic identification method based on a third party library according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a third-party traffic identification method based on a third-party library according to an embodiment of the present disclosure;
fig. 3 is a more specific flowchart of a third-party traffic identification method based on a third-party library according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a third party traffic identification apparatus based on a third party library according to an embodiment of the present disclosure;
fig. 5 is a more specific hardware structure diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the disclosure is not intended to indicate any order, quantity, or importance, but rather to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
The mobile internet industry has developed rapidly, and in order to facilitate development, a large number of third-party libraries are used in the development process of mobile application software. For example, to develop an application software, such as app for developing android system on Linux, a first party library may be considered as a library provided by Google, a second party library is a base library written by a developer, and a third party library is a base library provided by a company other than the developer or an open source library released by another company.
The security of the third-party library is not easy to guarantee, the mobile ecosystem can be damaged by the spread of the harmful and unstable third-party library, and if the situation that the third-party library is called in large quantity commonly exists in the application, various security problems can be caused easily, and the healthy development of the mobile internet industry is influenced. Therefore, it is necessary to identify third party traffic in an application.
In the related art, most of the focuses on identifying a third-party library from an application, and the method mainly comprises the following steps: the first is white listing based on known libraries. Another method is to directly extract libraries from application programs, which does not require prior knowledge of third party libraries, by analyzing the source code of a large number of applications, where the third party library code has a high repetition rate due to being called by most applications, thereby identifying third party modules in the applications. For the differentiation of network access resources, i.e. the identification of own traffic and third party traffic in applications, in the related art, traffic of the above library is obtained as third party traffic based on a white list of a known library.
However, the white list based approach cannot resist the application confusion, and the names of the third party libraries in the application have been transformed due to the confusion, so that the locking application cannot use those third party libraries in the white list. The method for acquiring the third-party library by analyzing the source code of the application program seriously depends on the name of the Java package and the package structure when detecting and classifying the third-party library, but most of the package names are influenced by application confusion, and the package structures may be different in different versions of the same library, which has great influence on the accuracy of the obtained result.
To identify third party traffic in an application without relying on a white list of third party libraries and application confusion. The method comprises the steps of obtaining characteristic information of a target application, determining a third-party module in the target application according to the characteristic information of the target application and the characteristic information of a third-party library obtained in advance, obtaining flow calling path information of the target application and path information of the third-party module, determining third-party module flow and self-contained module flow in the flow of the target application according to the flow calling path information and the path information of the third-party module, and taking the third-party module flow as the third-party flow.
Fig. 1 is a schematic view of an application scenario of a third-party traffic identification method based on a third-party library according to an embodiment of the present application. The application scenario includes a terminal device 101, a server 102, and a data storage system 103. The terminal device 101, the server 102, and the data storage system 103 may be connected through a wired or wireless communication network. The terminal device 101 includes, but is not limited to, a desktop computer, a mobile phone, a mobile computer, a tablet computer, a media player, a smart wearable device, a Personal Digital Assistant (PDA), or other electronic devices capable of implementing the above functions. The server 102 and the data storage system 103 may be independent physical servers, may also be a server cluster or distributed system formed by a plurality of physical servers, and may also be cloud servers providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and big data and artificial intelligence platforms.
The server 102 is used for providing third-party traffic identification service for a user of the terminal device 101, a client communicated with the server 102 is installed in the terminal device 101, the user can input the application to be detected through the client, the client sends the application to be detected to the server 102 after clicking a detection button, the server 102 inputs the application to be detected into a trained third-party traffic identification model, a third-party traffic identification result corresponding to the application to be detected and output by the third-party traffic identification model is obtained, the third-party traffic identification result is sent to the client, and the client displays the third-party traffic identification result to the user so as to help the user to identify third-party traffic.
The data storage system 103 stores a large amount of training data, each training data includes an application to be detected and a third-party traffic recognition result corresponding to the application to be detected, the server 102 can train a third-party traffic recognition model based on the large amount of training data, so that the third-party traffic recognition model can perform third-party traffic recognition on the input application to be detected, and the source of the training data includes but is not limited to an existing database, data crawled from the internet, or data uploaded when a user uses a client. When the accuracy of the third-party traffic recognition model output meets a certain requirement, the server 102 may provide a third-party traffic recognition service to the user based on the third-party traffic recognition model, and meanwhile, the server 102 may continuously optimize the third-party traffic recognition model based on newly added training data.
The third-party traffic identification model of the embodiment of the application can be applied to scenes of different types of application software. The third-party traffic recognition models can be trained respectively based on training data of different types of application software to obtain the third-party traffic recognition models applied to the different types of application software.
The following describes a training method of a third-party traffic recognition model and a third-party traffic recognition method according to an exemplary embodiment of the present application with reference to an application scenario of fig. 1. It should be noted that the above application scenarios are only presented to facilitate understanding of the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.
Hereinafter, the technical means of the present disclosure will be described in further detail with reference to specific examples.
Fig. 2 is a schematic flowchart of a third-party traffic identification method based on a third-party library according to an embodiment of the present disclosure; the third party flow identification method based on the third party library comprises the following steps:
and S210, acquiring characteristic information of the target application.
In some embodiments, S210 specifically includes:
acquiring an application program package of the target application;
and converting the application program package into a Jimple statement by using a socket tool so as to obtain the characteristic information of the target application.
Wherein, the Soot is a byte code analysis tool. Jimple is a compact, stack-free and typed intermediate representation of three address codes.
The input to the Soot is multi-sourced and can be Java bytecode. Soot provides four intermediate representations, Baf, Grimp, Jimple and Shimple, respectively, to convert the source file into an intermediate representation. The root can directly create a Jimple code and can also be translated and translated by Java sourcecode or byte code. The bytecodes are translated into untyped jimples, and then types are added to the local variables by the type inference method. An important step in translation is to linearize the expression so that each statement can only refer to a maximum of 3 local variables or constants. Compared with more than 200 instructions of the bytecode, the Jimple has much fewer instruction types and respectively corresponds to NopStmt, identitytstmt and assesignstmt of the core instruction; IfStmt, GotosStmt, TableSwitchStmt, and LookUpSwitchStmt for intra-function control flow instructions, InvoeStamt, Return Stmt, Return Void Stmt for inter-function control flow, monitor instructions EnterMonitorStmt and ExitMonitorStmt, and finally handle exception ThrowStmt and retired RetStmt.
As one example, the application package of the target application may be a Java package. The socket tool reads in the Java package and then converts the Java bytecode in the Java package into a Jimple statement.
For each target application, the number of functions in an application program package (such as a Java package) of the target application and the number of each type of Jimple statement corresponding to each function are used as the characteristics of the target application, and the characteristics of the target application are combined into a list array to be used as the characteristic information of the target application. Wherein the type of the Jimple statement comprises: NopStmt, identitytstmt, AssignStmt of core instructions; IfStmt, GotosStmt, TableSwitchStmt, and LookUpSwitchStmt for intra-function control flow instructions, InvoeStamt, Return Stmt, Return Void Stmt for inter-function control flow, monitor instructions EnterMonitorStmt and ExitMonitorStmt, and finally handle exception ThrowStmt and retired RetStmt.
Jimple is characterized by clear structure, and the instruction types of Jimple are much fewer than other statements, generally only 15, and are more suitable for program analysis and code optimization. The inventor finds that, in a scene of application confusion, because the number of the Jimple statement instruction types is small and the mechanism is clear, the number of the functions in the application package (such as a Java package) of the target application and the number of the Jimple statements of each type corresponding to each function are more accurate and stable as the characteristics of the target application.
The number of functions in an application program package (such as a Java package) of the target application and the number of each type of Jimple statements corresponding to each function are used as the characteristics of the target application, the characteristics of the target application are combined into a list array and are used as the characteristic information of the target application for analysis, and the influence of application confusion can be effectively resisted.
S220, determining a third party module in the target application according to the characteristic information of the target application and the characteristic information of the third party library acquired in advance.
The target application comprises a plurality of modules.
In some embodiments, further comprising:
acquiring the third party library;
and converting the third party library into a Jimple statement by using a Soot tool so as to obtain the characteristic information of the third party library.
The method of obtaining the feature information of the third-party library is similar to the method of obtaining the feature information of the target application. In some embodiments, the feature information of a plurality of third party libraries is obtained in advance. In some embodiments, the obtained third party library is the original third party library. In the related art, the detection and classification of the third-party library are heavily dependent on the packet name and the packet structure, but most of the packet names are affected by application confusion, and the packet structure may be different in different versions of the same library, which greatly affects the accuracy of the obtained result. The third-party library acquired by the method is the original third-party library, can be preliminarily prevented from being influenced by package names and package structure changes, and improves the identification accuracy.
In some embodiments, S220 specifically includes:
for each of the modules in the target application,
and taking the module as the third-party module in response to the fact that the matching degree of the characteristic information of the target application corresponding to the module and the characteristic information of the third-party library is higher than a preset matching degree threshold value.
In some embodiments, for the feature information a of each third-party library and the feature information B of the target application corresponding to each module, the operation formula is:
L(A,B) = (A∩B) / A
wherein L is the matching degree between A and B. The closer the value of L is to 1, the more the third-party library is called in the target application, and in order to better determine the calling condition of the third-party library, it is preferable that the effect of determining that the third-party library is called in the target application is better when the value of L is greater than 0.7.
In some embodiments, for each module in the target application, the features of the module are combined into a list array as feature information of the target application corresponding to the module.
In some embodiments, in response to determining that the matching degree of the feature information of the target application corresponding to the module and the feature information of the third party library is not higher than a preset matching degree threshold value, the module is regarded as an own module.
And S230, acquiring the traffic calling path information of the target application and the path information of the third-party module.
In some embodiments, S230 specifically includes:
and acquiring the flow information in the target application by using an Xpos tool, and printing a function call stack to obtain the flow call path information.
The Xpos framework is an opening source framework, and the function of the Xpos framework is a framework service which can influence the program operation (modify the system) under the condition of not modifying the APK. The Xpos mainly has the functions of sending out http requests in hook applications and dynamically hijacking the functions to obtain specific URL information.
Xpos is used to hook the runtime traffic of the target application.
The flow sending mode in the android application basically adopts the following four modes:
HTTP connection is a general lightweight HTTP client suitable for most applications, and the development at this stage is relatively steady and slow, but its key API enables us to improve easily; okhtp is an efficient Http client, supports HTTP2/SPDY black technology, enables a socket to automatically select a best route, supports automatic reconnection, has a socket connection pool for automatic maintenance, reduces handshake times, has a queue thread pool, is easy to write and send, has interpunctors to easily process requests and responses (such as transparent GZIP compression and LOGGING), and is based on a cache policy of Headers; the HTTP client programming tool kit is an efficient, up-to-date and rich-function client programming tool kit supporting the HTTP protocol, and is a child item under the Apache Jakarta Common; volley, a network communication framework pushed up by the Google I/O congress in 2013, is excellent in performance besides "simple and easy to use", and has a design goal: to handle network operations with small amounts of data, but frequent communications.
And respectively searching a function to be hook for hook according to four modes of sending flow aiming at the target application, namely an HttUrlConnection mode, an HttpClient mode, an okHttp mode and a Volley mode, and simultaneously acquiring function call stack information. And sending the flow and function call stack information of the hook to a server side for storage in an http communication mode during the running period of the target application so as to facilitate subsequent analysis.
When a function call occurs, the data stored in the stack space is such that:
the caller function pushes the parameters needed by the called function into the stack according to the order opposite to the shape order of the called function, namely: sequentially pressing parameters required by the called function into a stack from right to left;
the caller function calls the called function by using a call instruction and pushes the address of the next instruction of the call instruction into the stack as a return address (the push operation is hidden in the call instruction);
in the called function, the called function stores the stack bottom address (push ebp) of the caller function first, and then stores the stack top address of the caller function, that is: the stack bottom address (mov ebp, esp) of the currently called function;
in the called function, the local variables and temporary variables in the called function are stored from the ebp position, and the addresses of the variables are sequentially reduced according to the defined sequence, namely: the addresses of the variables are arranged according to the extending direction of the stack, the defined variables are firstly stacked, and then the defined variables are stacked.
According to the method and the device, the call path of the flow can be accurately mastered by printing the function call stack information, so that the acquired flow information can be accurately classified.
S240, determining third-party module flow and self-owned module flow in the flow of the target application according to the flow calling path information and the path information of the third-party module, and taking the third-party module flow as the third-party flow.
In some embodiments, S240 specifically includes:
for each of the traffic volumes, invoking path information,
in response to determining that the traffic invoking path information is the same as the path information of the third-party module, taking the traffic corresponding to the traffic invoking path information as the third-party module traffic,
and in response to determining that the traffic calling path information is different from the path information of the third-party module, taking the traffic corresponding to the traffic calling path information as the own module traffic.
If one piece of traffic calling path information is the same as the path information of the third-party module, it is indicated that the traffic corresponding to the piece of traffic calling path information is called from the third-party module, and the traffic is obviously the third-party traffic.
As can be seen from the above, according to the third party traffic identification method, device and equipment based on the third party library, the feature information of the target application is obtained, the third party module in the target application is determined according to the feature information of the target application and the feature information of the third party library obtained in advance, the traffic calling path information of the target application and the path information of the third party module are obtained, the third party module traffic and the owned module traffic in the traffic of the target application are determined according to the traffic calling path information and the path information of the third party module, and the third party module traffic is used as the third party traffic. The method and the device can identify the third party traffic in the application on the premise of not depending on the white list of the third party library and application confusion.
Fig. 3 is a more specific flowchart of a third-party traffic identification method based on a third-party library according to an embodiment of the present disclosure; the third party flow identification method based on the third party library comprises the following steps:
s310, acquiring characteristic information of the target application.
In some embodiments, S310 specifically includes:
acquiring an application program package of the target application;
and converting the application program package into a Jimple statement by using a socket tool so as to obtain the characteristic information of the target application.
And S320, whether the matching degree of the characteristic information of the target application corresponding to the module and the characteristic information of the third party library is higher than a threshold value of the matching degree.
For each of the modules in the target application,
in response to the fact that the matching degree of the characteristic information of the target application corresponding to the module and the characteristic information of the pre-acquired third party library is higher than a preset matching degree threshold value, taking the module as a third party module,
and in response to the fact that the matching degree of the characteristic information of the target application corresponding to the module and the characteristic information of the pre-acquired third-party library is not higher than a preset matching degree threshold value, taking the module as a self-owned module.
S330, acquiring the traffic calling path information of the target application and the path information of the third-party module.
In some embodiments, S330 specifically includes:
and acquiring the flow information in the target application by using an Xpos tool, and printing a function call stack to obtain the flow call path information.
And S340, judging whether the traffic calling path information is the same as the path information of the third-party module.
For each piece of this traffic the path information is invoked,
in response to determining that the traffic invoking path information is the same as the path information of the third-party module, taking the traffic corresponding to the traffic invoking path information as the third-party module traffic,
and in response to determining that the traffic calling path information is different from the path information of the third-party module, taking the traffic corresponding to the traffic calling path information as the own module traffic.
And taking the third party module traffic as third party traffic.
Third party flow identification method based on third party library and further comprising
And S350, whether at least one preset identifier exists or not.
And in response to determining that at least one of the preset identifications exists in the owned module traffic, taking the owned module traffic as third party traffic.
In some embodiments, the preset identification comprises: appkey identification and token identification.
In the related art, it is considered that the third-party traffic in the application is only from the third-party module in the application (the third-party module is constructed based on the third-party library), but the inventor finds that the third-party traffic also exists in the self-owned module, and therefore, although the technical solution provided by the above embodiment has solved the problem that the third-party traffic in the application cannot be accurately identified in the related art, it is obviously necessary to identify whether the third-party traffic exists in the self-owned module traffic in order to more comprehensively identify the third-party traffic in the application.
The inventor finds that when the third-party library is used, the use of the web application program interface web api needs to apply for a corresponding call credential on a corresponding third-party platform to authenticate the identity, and meanwhile, the return type of the web api is mainly in the format of Json and Xml, so that whether the traffic contains the two characteristics can be judged to judge whether the traffic is the web api, and specifically, preset identifications such as an appkey identification and a token identification can be identified.
In addition, the number of the current third-party libraries reaches a large scale, the corresponding increase of the number of the third-party libraries is inevitable along with the rapid development of android applications, and the method is not based on a white list, is not influenced by the white list, and has strong adaptability.
It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to the method of any embodiment, the disclosure also provides a third party flow identification device based on a third party library.
Referring to fig. 4, the third party traffic identification apparatus based on the third party library includes:
the feature information obtaining module 410 is configured to obtain feature information of the target application.
In some embodiments, the characteristic information obtaining module 410 is specifically configured to:
acquiring an application program package of the target application;
and converting the application program package into a Jimple statement by using a socket tool so as to obtain the characteristic information of the target application.
And the third-party module determining module 420 is configured to determine a third-party module in the target application according to the feature information of the target application and the feature information of the third-party library acquired in advance.
In some embodiments, the third-party module determining module 420 is specifically configured to:
for each of the modules in the target application,
and taking the module as a third-party module in response to the fact that the matching degree of the characteristic information of the target application corresponding to the module and the characteristic information of the pre-acquired third-party library is higher than a preset matching degree threshold value.
The path information obtaining module 430 is configured to obtain the traffic call path information of the target application and the path information of the third-party module.
In some embodiments, the path information obtaining module 430 is specifically configured to:
and acquiring the flow information in the target application by using an Xpos tool, and printing a function call stack to obtain the flow call path information.
The third-party traffic determining module 440 is configured to determine third-party module traffic and owned module traffic in the traffic of the target application according to the traffic calling path information and the path information of the third-party module, and use the third-party module traffic as the third-party traffic.
In some embodiments, the third party traffic determination module 440 is specifically configured to:
for each piece of this traffic the path information is invoked,
in response to determining that the traffic invoking path information is the same as the path information of the third-party module, taking the traffic corresponding to the traffic invoking path information as the third-party module traffic,
and in response to determining that the traffic calling path information is different from the path information of the third-party module, taking the traffic corresponding to the traffic calling path information as the own module traffic.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations of the present disclosure.
The device of the foregoing embodiment is used to implement the third-party traffic identification method based on the third-party library in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to the method of any embodiment described above, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement the third-party traffic identification method based on the third-party library according to any embodiment described above.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device of the foregoing embodiment is used to implement the third-party traffic identification method based on the third-party library in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the third party traffic identification method based on the third party library according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the third-party traffic identification method based on the third-party library according to any embodiment, and have the beneficial effects of corresponding method embodiments, which are not described herein again.
It should be noted that the embodiments of the present disclosure can be further described in the following ways:
acquiring characteristic information of a target application;
determining a third party module in the target application according to the characteristic information of the target application and the characteristic information of a third party library acquired in advance;
acquiring traffic calling path information of the target application and path information of the third-party module;
and determining third-party module flow and self-owned module flow in the flow of the target application according to the flow calling path information and the path information of the third-party module, and taking the third-party module flow as the third-party flow.
Optionally, the method further includes:
and in response to determining that at least one of the preset identifications exists in the owned module traffic, taking the owned module traffic as third party traffic.
Optionally, the obtaining the feature information of the target application includes:
acquiring an application program package of the target application;
and converting the application program package into a Jimple statement by using a socket tool so as to obtain the characteristic information of the target application.
Optionally, the method further includes:
acquiring the third party library;
and converting the third party library into a Jimple statement by using a Soot tool so as to obtain the characteristic information of the third party library.
Optionally, the target application includes a plurality of modules; the determining a third party module in the target application according to the feature information of the target application and the feature information of a third party library acquired in advance comprises:
for each of the modules in the target application,
and taking the module as the third-party module in response to the fact that the matching degree of the characteristic information of the target application corresponding to the module and the characteristic information of the third-party library is higher than a preset matching degree threshold value.
Optionally, the obtaining of the traffic call path information of the target application includes:
and acquiring the flow information in the target application by using an Xpos tool, and printing a function call stack to obtain the flow call path information.
Optionally, the determining, according to the traffic call path information and the path information of the third-party module, the third-party module traffic and the owned module traffic in the traffic of the target application includes:
for each of the traffic volumes, invoking path information,
in response to determining that the traffic invoking path information is the same as the path information of the third-party module, taking the traffic corresponding to the traffic invoking path information as the third-party module traffic,
and in response to determining that the traffic calling path information is different from the path information of the third-party module, taking the traffic corresponding to the traffic calling path information as the own module traffic.
Optionally, the preset identifier includes: appkey identification and token identification.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the present disclosure, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the embodiments of the disclosure are intended to be included within the scope of the disclosure.

Claims (7)

1. A third party flow identification method based on a third party library comprises the following steps:
acquiring characteristic information of a target application; the method specifically comprises the following steps: acquiring an application program package of the target application; converting the application program package into a Jimple statement by using a socket tool to obtain the characteristic information of the target application; the feature information of the target application comprises a list array obtained by combining the number of functions in an application program package of the target application and the number of each type of the Jimple sentences corresponding to each function;
determining a third party module in the target application according to the characteristic information of the target application and the characteristic information of a third party library acquired in advance;
acquiring traffic calling path information of the target application and path information of the third-party module;
determining third-party module flow and self-owned module flow in the flow of the target application according to the flow calling path information and the path information of the third-party module, and taking the third-party module flow as third-party flow;
the method further comprises the following steps:
in response to determining that at least one of the preset identifiers exists in the owned module traffic, taking the owned module traffic as third party traffic; wherein the preset identification comprises: appkey identification and token identification.
2. The method of claim 1, further comprising:
acquiring the third party library;
and converting the third party library into a Jimple statement by using a Soot tool so as to obtain the characteristic information of the third party library.
3. The method of claim 1, wherein the target application comprises a plurality of modules; the determining a third party module in the target application according to the feature information of the target application and the feature information of a third party library acquired in advance comprises:
for each of the modules in the target application,
and taking the module as the third-party module in response to the fact that the matching degree of the characteristic information of the target application corresponding to the module and the characteristic information of the third-party library is higher than a preset matching degree threshold value.
4. The method of claim 1, wherein the obtaining traffic invocation path information of the target application comprises:
and acquiring the flow information in the target application by using an Xpos tool, and printing a function call stack to obtain the flow call path information.
5. The method of claim 1, wherein the determining third party module traffic and owned module traffic in the traffic of the target application according to the traffic invocation path information and the path information of the third party module comprises:
for each of the traffic volumes, invoking path information,
in response to determining that the traffic invoking path information is the same as the path information of the third-party module, taking the traffic corresponding to the traffic invoking path information as the third-party module traffic,
and in response to determining that the traffic calling path information is different from the path information of the third-party module, taking the traffic corresponding to the traffic calling path information as the own module traffic.
6. A third party traffic identification apparatus based on a third party library, comprising:
the characteristic information acquisition module is used for acquiring the characteristic information of the target application; the method specifically comprises the following steps: acquiring an application program package of the target application; converting the application program package into a Jimple statement by using a socket tool to obtain the characteristic information of the target application; the feature information of the target application comprises a list array obtained by combining the number of functions in an application program package of the target application and the number of each type of the Jimple sentences corresponding to each function;
the third-party module determining module is used for determining a third-party module in the target application according to the characteristic information of the target application and the characteristic information of a third-party library acquired in advance;
the path information acquisition module is used for acquiring the flow calling path information of the target application and the path information of the third-party module;
the third-party flow determining module is used for determining third-party module flow and self-owned module flow in the flow of the target application according to the flow calling path information and the path information of the third-party module, and taking the third-party module flow as third-party flow;
the third party traffic determining module is further configured to take the owned module traffic as third party traffic in response to determining that at least one of the preset identifiers exists in the owned module traffic; wherein the preset identification comprises: appkey identification and token identification.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 5 when executing the program.
CN202110278161.7A 2021-03-16 2021-03-16 Third party flow identification method, device and equipment based on third party library Active CN112671671B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110278161.7A CN112671671B (en) 2021-03-16 2021-03-16 Third party flow identification method, device and equipment based on third party library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110278161.7A CN112671671B (en) 2021-03-16 2021-03-16 Third party flow identification method, device and equipment based on third party library

Publications (2)

Publication Number Publication Date
CN112671671A CN112671671A (en) 2021-04-16
CN112671671B true CN112671671B (en) 2021-06-29

Family

ID=75399368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110278161.7A Active CN112671671B (en) 2021-03-16 2021-03-16 Third party flow identification method, device and equipment based on third party library

Country Status (1)

Country Link
CN (1) CN112671671B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609412A (en) * 2021-06-28 2021-11-05 北京华云安信息技术有限公司 Method for acquiring URL (Uniform resource locator) through Hook key function and event

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017184862A1 (en) * 2016-04-21 2017-10-26 Servicenow, Inc. Application usage analytics for licensing analysis
CN109104381A (en) * 2018-06-26 2018-12-28 东南大学 A kind of mobile application recognition methods based on third party's flow HTTP message
CN110727952A (en) * 2019-08-30 2020-01-24 国家计算机网络与信息安全管理中心 Privacy collection and identification method for third-party library of mobile application program

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104702577B (en) * 2013-12-09 2018-03-16 华为技术有限公司 Data flow security processing and device
US10902121B2 (en) * 2017-10-19 2021-01-26 International Business Machines Corporation Policy-based detection of anomalous control and data flow paths in an application program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017184862A1 (en) * 2016-04-21 2017-10-26 Servicenow, Inc. Application usage analytics for licensing analysis
CN109104381A (en) * 2018-06-26 2018-12-28 东南大学 A kind of mobile application recognition methods based on third party's flow HTTP message
CN110727952A (en) * 2019-08-30 2020-01-24 国家计算机网络与信息安全管理中心 Privacy collection and identification method for third-party library of mobile application program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
大规模移动应用第三方库自动检测和分类方法;王浩宇等;《软件学报》;20170220;第28卷(第06期);第1.3节,第2.1节 *

Also Published As

Publication number Publication date
CN112671671A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN106919509B (en) Client generation method and device and electronic equipment
US11507677B2 (en) Image classification modeling while maintaining data privacy compliance
WO2021017735A1 (en) Smart contract formal verification method, electronic apparatus and storage medium
KR102036392B1 (en) Managing script file dependencies and load times
CN111708539A (en) Application program code conversion method and device, electronic equipment and storage medium
US9454620B2 (en) Methods, apparatuses and computer program products for automated learning of data models
CN106295346B (en) Application vulnerability detection method and device and computing equipment
CN110727417B (en) Data processing method and device
CN112653579B (en) Gray release method based on OpenResity and related equipment
CN114579452A (en) Interface influence detection method and related equipment
CN112671671B (en) Third party flow identification method, device and equipment based on third party library
CN113778897B (en) Automatic test method, device and equipment for interface and storage medium
CN113419971B (en) Android system service vulnerability detection method and related device
JP5039946B2 (en) Technology for relaying communication between client devices and server devices
CN112286706B (en) Remote and rapid acquisition method for application information of android application and related equipment
CN106502707B (en) Code generation method and device
CN113742005A (en) Platform docking method and device
US20180314683A1 (en) Method and device for processing natural language
US11294649B1 (en) Techniques for translating between high level programming languages
US20160070564A1 (en) Dynamically schematized log messages for software applications
CN110471708B (en) Method and device for acquiring configuration items based on reusable components
CN111797009A (en) Method and device for detecting code compatibility and electronic equipment
WO2021133245A1 (en) Computer-implemented method and non-transitory computer-readable memory for test result analysis and device for use with method
CN112130860B (en) JSON object analysis method and device, electronic equipment and storage medium
CN115481137B (en) SQL statement-based software generation method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant