CN112148305A - Application detection method and device, computer equipment and readable storage medium - Google Patents

Application detection method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN112148305A
CN112148305A CN202011171532.3A CN202011171532A CN112148305A CN 112148305 A CN112148305 A CN 112148305A CN 202011171532 A CN202011171532 A CN 202011171532A CN 112148305 A CN112148305 A CN 112148305A
Authority
CN
China
Prior art keywords
code block
target
code
hash value
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011171532.3A
Other languages
Chinese (zh)
Inventor
王葵
蔡哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011171532.3A priority Critical patent/CN112148305A/en
Publication of CN112148305A publication Critical patent/CN112148305A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management

Abstract

The embodiment of the invention provides an application detection method, an application detection device, computer equipment and a readable storage medium, wherein the method comprises the following steps: in response to a code detection operation for the target application, dividing a target code block from the program code of the target application; calculating code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit; determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block; and adding target mark information to the target code block according to the matching code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matching code block, so that the SDK included in the application can be quickly detected, and the application detection efficiency is improved.

Description

Application detection method and device, computer equipment and readable storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to an application detection method, an application detection device, computer equipment and a readable storage medium.
Background
Currently, in order to improve Development efficiency and reduce cost, application developers commonly use Software Development Kit (SDK) from third parties. Generally, a formal on-shelf Application (APP) will integrate 20 or so third-party SDKs. While the third-party SDK is widely used, related security problems of the third-party SDK are increasingly frequent, such as security vulnerabilities of the SDK itself, private collection of user privacy data, and execution of malicious operations by the APP of part of malicious SDKs, and the like, and different types of SDKs may have different potential safety hazards, so that what types of third-party SDKs are integrated in the APP need to be detected. At present, in third-party SDK detection, certain features in an SDK code packet are extracted to identify a certain SDK mainly based on feature matching, then in APP detection, if the features are found, the SDK is integrated in the APP, and the method mainly depends on the SDK code packet, so that the detection efficiency is low.
Disclosure of Invention
The embodiment of the invention provides an application detection method, an application detection device, computer equipment and a readable storage medium, which can quickly detect SDK (software development kit) included in an application and improve the application detection efficiency.
In one aspect, an embodiment of the present invention provides an application detection method, where the method includes:
in response to a code detection operation for the target application, segmenting a target code block from program code of the target application;
calculating code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit;
determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
and adding target mark information to the target code block according to the matching code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matching code block.
In another aspect, an embodiment of the present application provides an application detection apparatus, where the apparatus includes:
a dividing unit configured to divide a target code block from a program code of a target application in response to a code detection operation for the target application;
the computing unit is used for computing the code similarity between the target code block and at least one reference code block, and one reference code block corresponds to one software development kit;
a determining unit, configured to determine, according to a code similarity between the target code block and each reference code block, a matching code block that matches the target code block from among the at least one reference code block;
and the adding unit is used for adding target mark information to the target code block according to the matching code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matching code block.
In another aspect, an embodiment of the present application provides a computer device, where the computer device includes an input device, an output device, and the computer device further includes:
a processor adapted to implement one or more instructions; and the number of the first and second groups,
a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:
in response to a code detection operation for the target application, segmenting a target code block from program code of the target application;
calculating code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit;
determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
and adding target mark information to the target code block according to the matching code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matching code block.
In yet another aspect, an embodiment of the present application provides a computer storage medium, where one or more instructions are stored, and the one or more instructions are adapted to be loaded by the processor and execute the following steps:
in response to a code detection operation for the target application, segmenting a target code block from program code of the target application;
calculating code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit;
determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
and adding target mark information to the target code block according to the matching code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matching code block.
In the embodiment of the invention, the computer device responds to a code detection operation aiming at a target application, divides a target code block from a program code of the target application, calculates the code similarity between the target code block and at least one reference code block, and then determines a matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code block and each reference code block; and adding target mark information to the target code block according to the matching code block. The computer equipment does not need to extract the characteristics of the SDK by obtaining a source packet (short for a source code packet) or a jar packet (short for a Java compression code packet) of the SDK, only needs to cut a program code of the target application and analyze the similarity of a code block obtained by cutting and a reference code block, can quickly detect the SDK integrated in the target application, and can effectively improve the application detection efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of an application detection system according to an embodiment of the present invention;
FIG. 2a is a schematic flow chart of an application detection scheme provided by an embodiment of the present invention;
FIG. 2b is a schematic structural diagram of a computer device according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of an application detection method according to an embodiment of the present invention;
FIG. 4a is a diagram illustrating a hierarchical style of program code according to an embodiment of the present invention;
FIG. 4b is a diagram of a code provided by an embodiment of the invention;
FIG. 5 is a schematic flow chart of another application detection method provided in the embodiment of the present invention;
FIG. 6a is a diagram illustrating a hash value of a compute operation instruction according to an embodiment of the present invention;
FIG. 6b is a diagram illustrating a calculation of a target hash value and a reference hash value according to an embodiment of the present invention;
FIG. 7 is a schematic flow chart of a detection method for specific applications according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of an application detection apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
To be able to better detect which software development kits are integrated in an application, the software development kits mentioned here may be: and the third party logs in the SDK of the sharing class, the SDK of the payment class, the SDK of the pushing class, the SDK of the advertisement class, the SDK of the data statistics class and the like. The embodiment of the application provides an application detection scheme; the execution subject of the application detection scheme may be a computer device, and the computer device mentioned herein may be a terminal device (hereinafter, referred to as a terminal) or a server. When the computer device is a server, the embodiment of the present application further provides an application detection system shown in fig. 1; the application detection system may comprise at least one terminal 101 and a server (i.e. computer device) 102. In the application detection system, the terminal 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited herein. It should be noted that the above-mentioned terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and an artificial intelligence platform, and the like.
In practical application, when a developer or other user wants to perform application detection on a target application, the developer or other user may send a code detection operation for the target application to a computer device through an application detection interface. In one embodiment, an application identification input box may be included within the application detection interface; in this embodiment, the code detection operation may be an operation of inputting an application identification of the target application in the application detection box. In another embodiment, the application detection interface may include at least an application icon of the target application; in this embodiment, the code detection operation may be a trigger operation for an application icon of the target application, such as a click operation, a press operation, or the like. Accordingly, the computer device may detect which software toolkits are integrated in the target application by using the application detection scheme provided in the embodiments of the present application in response to the code detection operation for the target application. Referring to fig. 2a, the general principle of the applied detection scheme is as follows: the computer device may first perform code block cutting on the program code of the target application to obtain at least one code block. For any code block obtained by cutting, carrying out similarity analysis on the code block and at least one reference code block; then, according to the analysis result and the SDKs corresponding to the reference code blocks, SDK code block marking may be performed on any one code block to mark the SDK to which any code block belongs. Based on the marking principle, the SDK code block marking can be carried out on each code block obtained by cutting. Each code block is cut out from the program code of the target application, namely each target code block is a part of the program code of the target application; therefore, by performing SDK code block marking on each code block, an SDK detection result of the target application can be output, and the SDK detection result is used for indicating which SDKs are integrated in the target application.
In a possible embodiment, in order to better implement the relevant steps of the application detection scheme, the embodiment of the present application may deploy the following modules on a computer device: a detection control module, a code block cutting module, and a code block marking module, as shown in fig. 2 b. In a specific implementation, the detection control module is mainly used for: and calling a code block cutting module and a code block marking module to detect the SDK integration condition in the target application and outputting an SDK detection result for indicating which SDKs are integrated in the target application. Specifically, the detection control module may first call the code block cutting module to perform code block cutting on the program code of the target application and obtain the feature parameters of each cut code block. And aiming at any code block obtained by cutting, the detection control module can call a code block marking module to calculate the similarity of the characteristic parameter of the code block and the characteristic parameter of at least one reference code block, and mark the code block of the code block according to the similarity calculation result and the SDK corresponding to each reference code block. Optionally, before detecting the target application, the detection control module may further call the code block labeling module to perform cluster analysis on at least one reference code block, and perform SDK code block labeling on each reference code block according to a cluster analysis result by using a labeling strategy to determine the SDK corresponding to each reference code block.
Therefore, the application detection scheme provided by the embodiment of the application can have the following beneficial effects: through the implementation process of the application detection, the computer equipment does not need to extract the characteristics of the SDK by acquiring a source packet (short for a source code packet) or a jar packet (short for a Java compression code packet) of the SDK, only needs to cut a program code of the target application, and performs similarity analysis on a code block obtained by cutting and a reference code block, so that the SDK integrated in the target application can be quickly detected, and the efficiency of the application detection can be effectively improved.
Referring to fig. 3, fig. 3 is a schematic flow chart of an application detection method according to an embodiment of the present invention. The application detection method may be executed by the computer device, and may include the following steps S301 to S304:
s301, in response to the code detection operation for the target application, a target code block is segmented from the program code of the target application.
The target application may be a social application, a multimedia playing application, a browser application, and the like. In a specific implementation, when a user wants to detect which software development kit SDKs are integrated in a target application, a code detection operation for the target application may be input. Accordingly, the computer device may split a target code block from the program code of the target application in response to a code detection operation for the target application.
Specifically, the computer device first obtains a code file of the target application, and performs decompiling on the code file to obtain a hierarchical structure of the program code; code files may include, but are not limited to: an Android Application Package (APK), an IOS application package, a WP application package, a Windows application package, and the like; for convenience of illustration, the code file is hereinafter described as an APK. The hierarchy may include at least one level, each level may include at least one code package identification therein; and for each code packet identifier in any layer level except the top layer, a corresponding parent code packet identifier can be found in a layer level above the any layer level. The structure style of the hierarchy may be a tree structure style or the structure style shown in fig. 4 a. After the hierarchical structure is obtained, code cutting can be carried out on the program code of the target application according to the hierarchical structure to obtain at least one code block; then, a target code block may be obtained from the at least one code block.
In the process of cutting the code of the target application program code according to the hierarchical structure to obtain at least one code block, in order to cut the code of the same SDK into the same code block as much as possible, the computer device may cut the code of the target application program code by using a cutting strategy proposed according to the practical result. The cutting strategy is used for indicating that the code packet identifiers meeting the cutting list are searched in the hierarchical structure according to the sequence from bottom to top, and the code cutting is carried out in a mode of cutting a layer downwards based on the searched code packet identifiers. Correspondingly, the specific implementation of performing code cutting on the program code of the target application according to the hierarchical structure to obtain at least one code block may be:
and acquiring a preset cutting list, wherein the cutting list comprises a code packet identifier needing to be cut. After the hierarchical structure of the program code is obtained, determining an initial level to be traversed from the hierarchical structure, and traversing each code packet identifier in the initial level; and if the currently traversed target code packet identifier is in a preset cutting list, cutting the code corresponding to each code packet identifier in the level next to the initial level into different code blocks. If the currently traversed target code packet identifier is not in the cutting list, whether a first father code packet identifier corresponding to the target code packet identifier in a first level is located in the cutting list is judged, and the first level is a level located at the upper layer of the initial level. If the first father code packet identification is located in the cutting list, cutting the code corresponding to the target code packet identification into a code block; and if the first father code packet identification is not located in the cutting list, judging whether a second father code packet identification corresponding to the first father code packet identification in a second level is located in the cutting list, wherein the second level is a level located at the upper layer of the first level. If the second father code packet identifier is located in the cutting list, cutting codes corresponding to the code packet identifiers in the first level into different code blocks respectively; if the second father code packet identification is not located in the cutting list, whether a third father code packet identification corresponding to the second father code packet identification in a third level is located in the cutting list is judged, the third level is a level located at the upper layer of the second level, and the like.
In a specific implementation, fig. 4a exemplarily shows a hierarchical structure including 3 levels obtained by decompiling a code file of a target application, and each level includes at least one code package identifier, where each code package identifier may refer to an english identifier shown in fig. 4 a. When code cutting is performed on a program code of a target application according to a hierarchical structure, the computer device can determine that an initial level to be traversed is a 3 rd level from a 3 rd-level hierarchical structure, and traverse each code packet identifier in the 3 rd level; if the currently traversed target code packet identifier is the soboade and the target code packet identifier soboade is not in the cutting list, judging whether a first father code packet identifier fadeook corresponding to the target code packet identifier soloader in a level 2 (namely, a first level located at a layer above the initial level) is located in the cutting list, and if the first father code packet identifier fadeok is located in the cutting list, cutting the code corresponding to the target code packet identifier soloader into a code block; if the first father code packet identification faebook is not located in the cutting list, judging whether a second father code packet identification com corresponding to the first father code packet identification faebook in the 1 st level (namely, a second level located at the upper layer of the first level) is located in the cutting list; if the second parent code packet identifier com is located in the cutting list, the codes corresponding to the code packet identifiers in the second hierarchy are cut into different code blocks, that is, the codes corresponding to the code packet identifiers fadebook are cut into one code block, the codes corresponding to the code packet identifiers donking are cut into one code block, the codes corresponding to the code packet identifiers aplipay are cut into one code block, and the like.
S302, calculating code similarity between the target code block and at least one reference code block.
In a specific implementation, a computer device may first obtain at least one reference code block, where one reference code block corresponds to one software development kit; specifically, the computer device may obtain at least one reference code block from a local storage space of the computer device or from a software development kit feature library. Then, a code similarity between the target code block and the at least one reference code block is calculated. In one embodiment, the computer device may compare the program code of the target code block with each reference code block, determine an overlapping condition of the target code block and each reference code block, and calculate a code similarity between the target code block and at least one reference code block according to the overlapping condition.
In yet another embodiment, because in the current development ecology, due to the consideration of code protection, the third-party software development kit provider usually requires the application developer to confuse the code of the SDK when integrating the SDK, which results in that the code content of the same SDK in different APPs is different. The code obfuscation can be used for program source codes and intermediate codes formed by compiling programs; which refers to the act of transforming the code of a program into a form that is functionally equivalent, but difficult to read and understand. In order to overcome the influence of code confusion on the code block similarity calculation, when calculating the code similarity between the target code block and the at least one reference code block, the computer device may obtain the characteristic parameters of the target code block and the characteristic parameters of the reference code blocks to perform similarity calculation, so as to obtain the code similarity between the target code block and the at least one reference code block.
The characteristic parameter is a parameter that does not change due to code confusion. Research shows that in the code obfuscation technology, meaningful parameters such as class names, function names and variable names in the program code are replaced by names which are difficult to read and meaningless in order to reduce readability of the program code, but in order to guarantee consistency of code functions, code obfuscation generally does not process parts of operation codes or operation instructions in the program code. The code shown in FIG. 4b, which includes variable names v3, v6, etc.; code obfuscation typically replaces the variable names with other names. Therefore, the computer device can extract the operation instruction in the target code block as the characteristic parameter of the target code block to effectively confuse the code for most of the pits. Also, since code obfuscation does not typically affect the hierarchical structure of the code block, the computer device may also obtain the hierarchical structure of the target code block as a characteristic parameter of the target code block.
S303, according to the code similarity between the target code block and each reference code block, determining a matching code block matched with the target code block from at least one reference code block.
In one specific implementation, the computer device selects a reference code block with the largest code similarity with the target code block from at least one reference code block as a matching code block matched with the target code block according to the code similarity between the target code block and each reference code block. That is, in this specific implementation, the matching code block refers to the reference code block having the largest code similarity with the target code block.
In yet another specific implementation, the computer device may determine whether a code similarity between the target code block and each reference code block is greater than a threshold; wherein the threshold value can be set according to experience and requirements. If the code similarity greater than the threshold exists, the target code block corresponding to the existing code similarity is considered to be similar to the reference code block, and then the reference code block corresponding to the code similarity greater than the threshold can be determined as the matching code block matched with the target code block. That is, in this particular implementation, a matching code block refers to a reference code block whose code similarity with a target code block is greater than a threshold.
In yet another specific implementation, the computer device may determine whether a code similarity between the target code block and each reference code block is greater than a threshold; and if the code similarity larger than the threshold exists, determining the maximum code similarity from the existing code similarities, and determining the reference code block corresponding to the maximum code similarity as a matching code block matched with the target code block. That is, in this specific implementation, the matching code block refers to a reference code block whose code similarity with the target code block is greater than a threshold and whose code similarity is the greatest.
And S304, adding target mark information to the target code block according to the matching code block.
Research shows that the software development toolkits corresponding to the two similar code blocks are usually the same; therefore, after a matching code block (i.e. a code block similar to the target code block) matching the target code block is determined, the software development kit corresponding to the matching code block can be directly determined as the software development kit to which the target code block belongs, so as to add target mark information to the target code block, wherein the target mark information is used for indicating that the target code block belongs to the software development kit corresponding to the matching code block. The target marking information may include a type identifier of a software development kit corresponding to the matching code block; by adding the target mark information to the target code block, the software development kit corresponding to the matching code block can be conveniently and directly determined to be integrated in the target application according to the target mark information when the target application is subjected to security detection in the follow-up process, and the target application is detected in a targeted manner by adopting a security detection strategy corresponding to the software development kit, so that the detection efficiency and the accuracy are improved.
In the embodiment of the invention, the computer device can respond to a code detection operation aiming at the target application, divide a target code block from a program code of the target application, calculate the code similarity between the target code block and at least one reference code block, further, according to the code similarity between the target code block and each reference code block, determine a matching code block matched with the target code block from at least one reference code block, and add target mark information to the target code block according to the matching code block.
Referring to fig. 5, fig. 5 is a flowchart illustrating another application detection method according to an embodiment of the present invention, where the application detection method can be executed by the computer device. In the embodiment of the present invention, reference code blocks are mainly obtained from a software development kit feature library as an example for description; the computer device may pre-build a software development kit feature library before performing the application detection method proposed by the embodiment of the present invention. The specific construction process is as follows:
the program code of the at least two reference applications may be obtained first and divided into a plurality of reference code blocks. Secondly, the computer equipment can perform cluster analysis on the multiple reference code blocks and perform grouping processing on the multiple reference code blocks according to the analysis result; reference code blocks in a group belong to the same software development kit. Specifically, the computer device may calculate a similarity between any two reference code blocks, and determine that any two reference code blocks having a similarity greater than a threshold belong to the same class, thereby clustering the two reference code blocks. Or, the computer device may also call the clustering model to perform clustering processing on the multiple reference code blocks, so as to perform grouping processing on the multiple reference code blocks; the clustering model is obtained by performing model training on the neural network model by adopting a large amount of sample data in advance. After the grouping processing, the computer device may obtain the flag information of each reference code block according to the group information of each reference code block, where the flag information of any reference code block is generated after performing feature joint analysis on each reference code block in the group to which any reference code block belongs to determine a software development kit to which any reference code block belongs. Each reference code block and corresponding marker information may then be added to the software development kit feature library.
A specific implementation of obtaining the flag information of each reference code block according to the group information of each reference code block may be: the group information of each reference code block can be pushed to a manager, so that the manager performs feature joint analysis on each reference code block in a group to which any reference code block belongs according to the group information to determine a software development kit to which any reference code block belongs; the corresponding marker information added by the reference code blocks in each group is then input in the computer device. Accordingly, the computer device can acquire the marker information of each reference code block input by the administrator. That is, in this embodiment, the administrator manually performs the feature joint analysis to determine the marker information of the reference code block. Alternatively, another specific implementation of acquiring the flag information of each reference code block according to the group information of each reference code block may be: and directly carrying out feature joint analysis on each reference code block in the group to which any reference code block belongs to determine a software development kit to which any reference code block belongs, and then acquiring the mark information of each reference code block according to the group information of each reference code block. That is, in this embodiment, the joint analysis of features is automatically performed by the computer device to determine the marker information of the reference code block.
After the software development kit feature library is constructed and obtained based on the above steps, after detecting the code detection operation for the target application, the computer device may perform the detection processing by using the application detection method shown in fig. 5, which may be specifically described in the following steps S501 to S507:
s501, responding to the code detection operation aiming at the target application, and acquiring a code file of the target application, wherein the code file comprises a program code of the target application.
S502, performing decompiling on the code file to obtain a hierarchical structure of the program code.
S503, performing code cutting on the program code of the target application according to the hierarchical structure to obtain at least one code block.
S504, selecting a target code block from at least one code block.
The implementation manners of steps S501 to S504 can refer to the specific implementation manner in step S301 in fig. 3, and are not described herein again.
And S505, calculating the code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit.
In a specific implementation, the code similarity between the target code block and each reference code block is calculated the same. Therefore, the following embodiments of the present application describe how to calculate the code similarity between the target code block and any reference code block. Specifically, step S505 may include the following steps S11-S12:
s11, obtaining the characteristic parameters of the target code block and the characteristic parameters of any reference code block.
As can be seen from the foregoing, the characteristic parameter may be an operation instruction or a hierarchical structure. When the characteristic parameter is an operation instruction, and the operation instruction includes a plurality of target instruction fragments, the specific implementation of the computer device obtaining the characteristic parameter of the target code block may be: at least one class can be obtained from the target code block, each class comprises at least one method and at least one variable, instruction extraction processing is carried out on each method in each class to obtain a plurality of candidate instruction fragments, and a plurality of target instruction fragments are selected from the candidate instruction fragments.
Specifically, the computer device may directly use each of the plurality of candidate instruction fragments as a target instruction fragment, thereby obtaining a plurality of target instruction fragments. Alternatively, the computer device may determine the instruction length of each candidate instruction segment, determine whether the instruction length of each candidate instruction segment is greater than a length threshold, and select a candidate instruction segment with an instruction length greater than the length threshold from the plurality of candidate instruction segments as the target instruction segment. By the method, the candidate instruction segment with shorter instruction length can be effectively filtered out, so that the subsequent similarity calculation efficiency is improved.
It should be noted that, as for the implementation manner of obtaining the operation instruction of any reference code block, reference may be made to the implementation manner of obtaining the operation instruction of the target code block, which is not described herein again.
s12, calculating the feature similarity of the feature parameters of the target code block and the feature parameters of any reference code block to obtain the code similarity between the target code block and any reference code block. The step s12 may have different embodiments depending on the characteristic parameters, and refer to the following descriptions:
the characteristic parameter is an operation command, and the specific implementation of step s12 is as follows:
first, a target hash algorithm may be used to perform a hash operation on an operation instruction of a target code block to obtain a target hash value. The target hash algorithm can be a locality sensitive hash algorithm used for large-scale text similarity calculation; the locality sensitive hashing algorithm may map high-dimensional feature vectors to low-dimensional feature vectors such that it is determined by the distance between two vectors whether the code blocks are highly similar. The operation instruction of the target code block includes a plurality of target instruction fragments, which can be specifically seen in fig. 6 a: the computer device may first perform a hash operation on each of the plurality of target instruction fragments to obtain a hash value set, where the hash value set includes hash values of the respective instruction fragments. Secondly, the weight value of each hash value can be determined according to the occurrence frequency of each hash value in the hash value set. Then, the weighted sum of the hash values can be performed to obtain the target hash value.
One implementation way of determining the weight value of each hash value according to the number of times that each hash value appears in the hash value set is as follows: and directly taking the frequency of the hash values appearing in the hash value set as the weight value of each hash value. For example, in a hash value set, the hash value "100101" appears 2 times in the hash value set, and the weight value of the hash value "100101" is 2; the hash value "101011" appears 3 times in the hash value set, and the weight value of the hash value "101011" is 3. Or, another implementation manner of determining the weight value of each hash value according to the number of times that each hash value appears in the hash value set is as follows: the computer device can determine the repetition degree of each hash value in the hash value set according to the occurrence frequency of each hash value in the hash value set, and uses the weight value corresponding to the repetition degree corresponding to each hash value as the weight value of each hash value. In a specific implementation, a relationship between a repetition degree and a weight value corresponding to each hash value is preset. Let the repetition degree be 0, the corresponding weight value be 1, the repetition degree be 20%, the corresponding weight value be 2, and so on. If there are 5 hash values in the hash value set, and the hash value "100101" appears in the hash value set 1 time, it may be determined that the repetition degree of the hash value "100101" is 20%, and then it may be determined that the weight value corresponding to the repetition degree is 2 according to the repetition degree of 20%.
Wherein each hash value comprises at least one of: at least one first value (e.g., value "0") and at least one second value (e.g., value "1"); correspondingly, the specific implementation of performing weighted summation on each hash value by using the weight value of each hash value to obtain the target hash value may be: and aiming at any hash value, performing multiplication weighting on each bit of the hash value by adopting the weight value of the hash value according to a preset multiplication principle to obtain a weighting result of the hash value. Wherein the preset multiplication principle is used for indicating that: if the current bit of any hash value is a first numerical value, positively multiplying the current bit by a weight value; and if the current bit of any hash value is the second numerical value, carrying out negative multiplication on the current bit by adopting the weight value. For example, any hash value is "100101", and the weight value of the hash value is 2; the first digit value of the hash value is 1, and the computer device multiplies the weight value 2 by the value 1 to obtain a value 2 corresponding to the second digit value; the value of the second digit of the hash value is 0, and corresponding to the first value, the computer device performs negative multiplication on the weighted value 2 and the value 0 to obtain a value-2, and so on, weights each digit of any hash value "100101" to obtain a final weighting result, wherein the weighting result is "2-2-22-22".
Based on this weighting principle, a weighting result for each hash value can be obtained. Summing the weighting results of all the hash values to obtain candidate hash values; and determining a target hash value according to the candidate hash value. In one embodiment, the candidate hash value may be directly taken as the target hash value; in another embodiment, for convenience of subsequent calculation, dimension reduction processing may be further performed on the candidate hash value to obtain a target hash value. The dimension reduction processing means: for any bit in the candidate hash value, if any bit is greater than zero, updating any bit to a second numerical value, and if any bit is less than or equal to 0, updating any bit to a first numerical value.
For example, two hash values are set, the hash value a is "100101", and the hash value B is "101011"; the weight value of the hash value A is 2, the weight value of the hash value B is 3, the weighting result of the hash value A is 2-2-22-22 and the weighting result of the hash value B is 3-33-333 can be obtained according to the weighting principle. And summing the weighted results of the two hash values, i.e. adding the corresponding bits in the weighted results of the two hash values, "(2 +3) ((-2) + (-3)) ((-2) +3) (2+ (-3)) ((-2) +3) (2+ 3)", to obtain the candidate hash value of "5-51-115". Then, performing dimension reduction processing on the candidate hash value '5-51-115'; since the value of the first bit of the candidate hash value is 5, which is greater than 0, the value of the first bit may be updated to 1. Since the value of the second bit of the candidate hash value is-5 and the value of the second bit is less than 0, the value of the second bit can be updated to 0, and so on, the target hash value "101011" can be obtained.
And secondly, carrying out hash operation on the operation instruction of any reference code block by adopting a target hash algorithm to obtain a reference hash value. It should be noted that, the implementation manner of performing hash operation on the operation instruction of any reference code block by using the target hash algorithm by the computer device to obtain the reference hash value may refer to the implementation manner of performing hash operation on the operation instruction of the target code block by using the target hash algorithm to obtain the target hash value, which is not described herein again.
Then, a distance operation may be performed on the target hash value and the reference hash value to obtain a code similarity between the target code block and any one of the reference code blocks. The distance operation of the target hash value and the reference hash value may be performed by using a hamming distance formula, an euclidean distance formula, and an equidistance calculation formula. Taking the hamming distance formula as an example, referring to fig. 6b, a specific implementation manner of performing distance operation on the target hash value and the reference hash value to obtain the code similarity between the target code block and any one of the reference code blocks may be: and performing distance operation on the target hash value and the reference hash value by using a hamming distance formula to obtain the hamming distance between the target code block and any one of the reference code blocks, and determining the code similarity between the target code block and any one of the reference code blocks according to the obtained hamming distance so as to judge whether the target code block is similar to any one of the reference code blocks according to the similarity. In a specific implementation, the computer device may perform xor calculation on the target hash value and the reference hash value, when the xor calculation is performed, the result is 1 only when two compared bits are different, and when the two compared bits are the same, the result is 0, the number of 1 obtained by performing xor calculation on the target hash value and the reference hash value is a hamming distance, and the hamming distance is a code similarity between the target code block and any one of the reference code blocks. For example, if the target hash value "101001" and the reference hash value "110101" are subjected to xor calculation using the hamming distance algorithm, the number of "1" is 3, and the code similarity between the target code block and any one of the reference code blocks is 3.
The second characteristic parameter is a hierarchical structure, and the specific implementation of step s12 is as follows:
the computer device can compare the hierarchical structure of the target code block with the hierarchical structure of any reference code block from multiple dimensions, and determine the code similarity between the target code block and any reference code block according to the comparison result, so that the reference code block which is most matched with the target code block can be determined from at least one reference code block according to the code similarity. The multiple dimensions may be dimensions such as the number of levels corresponding to the hierarchy, and the code package identifiers included in the hierarchy.
S506, according to the code similarity between the target code block and each reference code block, determining a matching code block matched with the target code block from at least one reference code block.
And S507, adding target mark information to the target code block according to the matching code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matching code block.
The specific implementation manner of steps S506 to S507 may refer to steps S302 to S303, which are not described herein again. Further, since the matching code block is obtained from the feature library of the software development kit, the matching code block is cut from the program code corresponding to a certain reference application, the target code block is cut from the program code of the target application, and the target code block is similar to the reference code block, it can be determined that different applications use the same code block through the embodiments of the present application.
In the embodiment of the present invention, in response to a code detection operation for a target application, a computer device may segment a target code block from a program code of the target application, acquire a feature parameter of the target code block and a feature parameter of any reference code block for any reference code block, further perform similarity calculation on the feature parameter of the target code block and the feature parameter of any reference code block to obtain a code similarity between the target code block and any reference code block, determine a matching code block matching the target code block from at least one reference code block according to the code similarity between the target code block and each reference code block, and add target flag information to the target code block according to the matching code block. By determining the SDK to which the target code block belongs, the application integrated SDK can be identified without extracting the characteristics of the SDK, so that the type of the SDK included in the application is detected quickly, and the application detection efficiency is effectively improved.
Based on the application detection method provided above, the embodiment of the present application further provides a more specific application detection method as shown in fig. 7. In the embodiment of the present invention, a code file of a target application is mainly taken as an APK for example. The specific process is as follows:
the APK (i.e., android application package) of the target application may be obtained first; and after the APK is subjected to dex decompiling, a hierarchical structure of a program code of the target application is obtained, and the program code is subjected to code cutting according to the hierarchical structure of the program code, so that n code blocks (namely, code block 1 and code block 2 … … code block n) in fig. 7 are obtained. Further, the computer device performs hash calculation on the operation instruction of each of the n code blocks, and when performing hash calculation on the operation instruction of each code block, the following 4 steps are required to be performed: (1) extracting a plurality of instruction fragments in the code block; (2) performing hash calculation on each instruction fragment of the plurality of instruction fragments; (3) weighting and combining the hash values of the instruction fragments after performing hash calculation on each instruction fragment, and (4) performing dimension reduction operation on the combined result after weighting and combining the hash values of the instruction fragments, so as to obtain the hash value of the operation instruction of the code block.
The hash values corresponding to the N code blocks can be obtained in the above manner. After obtaining the hash values of the N code blocks, the computer device may perform similarity calculation on the hash value of each code block and the hash value of at least one reference code block in the software development kit feature library, and output a recognition result of the code block according to a result of the similarity calculation. The similarity of the code blocks is compared by calculating the hash value of the operation instruction of the code block and the hash value of the operation instruction of the reference code block, so that the SDK integrated in the application can be identified relatively accurately. The computer device may generate the software development kit feature library through cluster analysis before the computer device may perform similarity calculation between the hash value of each code block and the hash value of at least one reference code block in the software development kit feature library.
In the embodiment of the present invention, in response to a code detection operation for a target application, a computer device may segment a target code block from a program code of the target application, acquire a feature parameter of the target code block and a feature parameter of any reference code block for any reference code block, further perform similarity calculation on the feature parameter of the target code block and the feature parameter of any reference code block to obtain a code similarity between the target code block and any reference code block, determine a matching code block matching the target code block from at least one reference code block according to the code similarity between the target code block and each reference code block, and add target flag information to the target code block according to the matching code block. By determining the SDK to which the target code block belongs, the application integrated SDK can be identified without extracting the characteristics of the SDK, so that the type of the SDK included in the application is detected quickly, and the application detection efficiency is effectively improved.
Based on the description of the above embodiment of the application detection method, an embodiment of the application detection apparatus is also disclosed in the present application, and the application detection apparatus may be a computer program (including program code) running in the above mentioned computer device. The application detection means may perform the method shown in fig. 3 or fig. 5. Referring to fig. 8, the application detection apparatus may operate as follows:
a dividing unit 801 for dividing a target code block from a program code of a target application in response to a code detection operation for the target application;
a calculating unit 802, configured to calculate a code similarity between the target code block and at least one reference code block, where one reference code block corresponds to one software development kit;
a determining unit 803, configured to determine, according to a code similarity between the target code block and each reference code block, a matching code block that matches the target code block from the at least one reference code block;
an adding unit 804, configured to add target flag information to the target code block according to the matching code block, where the target flag information is used to indicate that the target code block belongs to a software development kit corresponding to the matching code block.
In yet another implementation, the apparatus further includes: an acquisition unit 805, wherein:
the obtaining unit 805 is configured to obtain, for any reference code block, a feature parameter of the target code block and a feature parameter of the any reference code block;
the calculating unit 802 is configured to perform similarity calculation on the feature parameters of the target code block and the feature parameters of any reference code block to obtain a code similarity between the target code block and any reference code block.
In another implementation, the characteristic parameter is an operation instruction, and the operation instruction includes a plurality of target instruction fragments; the obtaining unit 805 is specifically configured to:
obtaining at least one class from the target code block, each class comprising at least one method and at least one variable;
performing instruction extraction processing on each method in each class to obtain a plurality of candidate instruction fragments;
and selecting a plurality of target instruction segments from the candidate instruction segments.
In still another implementation manner, the determining unit 803 is configured to: determining the instruction length of each candidate instruction fragment;
the obtaining unit is used for selecting a candidate instruction segment with an instruction length larger than a length threshold value from the plurality of candidate instruction segments as a target instruction segment.
In another implementation manner, the characteristic parameter is an operation instruction, and the calculating unit 802 is specifically configured to:
performing hash operation on the operation instruction of the target code block by adopting a target hash algorithm to obtain a target hash value; performing hash operation on the operation instruction of any reference code block by adopting the target hash algorithm to obtain a reference hash value;
and performing distance operation on the target hash value and the reference hash value to obtain the code similarity between the target code block and any one of the reference code blocks.
In another implementation, the operation instruction of the target code block includes a plurality of target instruction fragments, and the target hash algorithm is a locality sensitive hash algorithm; the calculating unit 802 is specifically configured to:
performing hash operation on each target instruction fragment in the plurality of target instruction fragments respectively to obtain a hash value set, wherein the hash value set comprises hash values of the target instruction fragments;
determining the weight value of each hash value according to the occurrence frequency of each hash value in the hash value set;
and performing weighted summation on each hash value by adopting the weight value of each hash value to obtain a target hash value.
In another implementation manner, the determining unit 803 is specifically configured to:
taking the number of times of each hash value appearing in the hash value set as the weight value of each hash value; alternatively, the first and second electrodes may be,
determining the repetition degree of each hash value in the hash value set according to the occurrence frequency of each hash value in the hash value set; and taking the weight value corresponding to the repetition degree corresponding to each hash value as the weight value of each hash value.
In another implementation manner, the dividing unit 801 is specifically configured to:
in response to a code detection operation for a target application, acquiring a code file of the target application, wherein the code file comprises a program code of the target application;
performing decompiling on the code file to obtain a hierarchical structure of the program code;
performing code cutting on the program code of the target application according to the hierarchical structure to obtain at least one code block;
and selecting a target code block from the at least one code block.
In yet another implementation, the at least one reference code block is stored in a software development kit feature library; the apparatus further comprises an analyzing unit 806, wherein:
the obtaining unit 805 is further configured to: acquiring program codes of at least two reference applications, and dividing the program codes of the at least two reference applications into a plurality of reference code blocks;
the analysis unit 806 is configured to perform cluster analysis on the multiple reference code blocks, and perform grouping processing on the multiple reference code blocks according to an analysis result; the reference code blocks in one group belong to the same software development kit;
the obtaining unit 805 is further configured to obtain flag information of each reference code block according to the group information of each reference code block; the marking information of any reference code block is generated after performing feature joint analysis on each reference code block in the group to which the reference code block belongs to determine a software development kit to which the reference code block belongs;
the adding unit 804 is further configured to add each reference code block and corresponding flag information to a feature library of the software development kit.
According to an embodiment of the present application, each step involved in the method shown in fig. 3 or fig. 5 may be performed by each unit in the application detection apparatus shown in fig. 8. For example, step S301 shown in fig. 3 is performed by the division unit 801 shown in fig. 8, step S302 is performed by the calculation unit 802 shown in fig. 8, step S303 is performed by the determination unit 803 shown in fig. 8, and step S304 is performed by the addition unit 804 shown in fig. 8. As another example, steps 501 to S504 shown in fig. 5 are performed by the division unit 801 shown in fig. 8, step S505 is performed by the calculation unit 802 shown in fig. 8, step S506 is performed by the determination unit 803 shown in fig. 8, and step S507 is performed by the addition unit 804 shown in fig. 8.
According to another embodiment of the present application, the units in the application detection apparatus shown in fig. 8 may be respectively or entirely combined into one or several other units to form the application detection apparatus, or some unit(s) therein may be further split into multiple functionally smaller units to form the application detection apparatus, which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the application-based detection apparatus may also include other units, and in practical applications, these functions may also be implemented by assistance of other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, the Processing element and the memory element may include a Central Processing Unit (CPU), a random access memory medium (RAM), a read only memory medium (ROM), and the like. A general purpose computing device, such as a computer, runs a computer program (including program code) capable of executing the steps involved in the corresponding method as shown in fig. 3 or fig. 5, to construct an application detection apparatus as shown in fig. 8, and to implement the application detection method of the embodiments of the present application. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the computer apparatus via the computer-readable recording medium.
In the embodiment of the invention, the computer device responds to a code detection operation aiming at a target application, divides a target code block from a program code of the target application, calculates the code similarity between the target code block and at least one reference code block, and then determines a matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code block and each reference code block; and adding target mark information to the target code block according to the matching code block. The computer equipment does not need to extract the characteristics of the SDK by acquiring a source packet or jar packet of the SDK, only needs to cut a program code of the target application and analyze the similarity of a code block obtained by cutting and a reference code block, can quickly detect the SDK integrated in the target application, and can effectively improve the efficiency of application detection.
Based on the description of the above embodiment of the application detection method, a computer device is further disclosed in the embodiment of the present application, and please refer to fig. 9, the computer device may at least include a processor 901, an input device 902, an output device 903, and a computer storage medium 904. Wherein the processor 901, the input device 902, the output device 903, and the computer storage medium 904 within the computer device may be connected by a bus or other means.
The computer storage media 904 is a memory device in a computer device for storing programs and data. It is understood that the computer storage media 904 may include built-in storage media of the computer device, and certainly may also include extended storage media supported by the computer device. Computer storage media 904 provides storage space that stores the operating system of the computer device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for being loaded and executed by processor 901. Note that the computer storage media herein can be high-speed RAM memory; optionally, the computer storage medium may be at least one computer storage medium remote from the aforementioned processor, where the processor may be referred to as a Central Processing Unit (CPU), and is a core and a control center of the computer device, and is adapted to implement one or more instructions, and specifically load and execute the one or more instructions to implement the corresponding method flow or function.
In one possible embodiment, one or more first instructions stored in a computer storage medium may be loaded and executed by the processor 901 to implement the corresponding steps of the method in the above-described embodiment of the application detection method; in particular implementations, one or more first instructions in the computer storage medium are loaded by the processor 901 and perform the following:
in response to a code detection operation for the target application, segmenting a target code block from program code of the target application;
calculating code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit;
determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
and adding target mark information to the target code block according to the matching code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matching code block.
In another implementation manner, the processor 901 is specifically configured to:
aiming at any reference code block, acquiring the characteristic parameters of the target code block and the characteristic parameters of the any reference code block;
and performing similarity calculation on the characteristic parameters of the target code block and the characteristic parameters of any reference code block to obtain the code similarity between the target code block and any reference code block.
In another implementation, the characteristic parameter is an operation instruction, and the operation instruction includes a plurality of target instruction fragments; the processor 901 is specifically configured to:
obtaining at least one class from the target code block, each class comprising at least one method and at least one variable;
performing instruction extraction processing on each method in each class to obtain a plurality of candidate instruction fragments;
and selecting a plurality of target instruction segments from the candidate instruction segments.
In another implementation manner, the processor 901 is specifically configured to:
determining the instruction length of each candidate instruction fragment;
and selecting a candidate instruction segment with the instruction length larger than the length threshold value from the plurality of candidate instruction segments as a target instruction segment.
In another implementation manner, the characteristic parameter is an operation instruction, and the processor 901 is specifically configured to:
performing hash operation on the operation instruction of the target code block by adopting a target hash algorithm to obtain a target hash value; performing hash operation on the operation instruction of any reference code block by adopting the target hash algorithm to obtain a reference hash value;
and performing distance operation on the target hash value and the reference hash value to obtain the code similarity between the target code block and any one of the reference code blocks.
In another implementation, the operation instruction of the target code block includes a plurality of target instruction fragments, and the target hash algorithm is a locality sensitive hash algorithm; the processor 901 is specifically configured to:
performing hash operation on each target instruction fragment in the plurality of target instruction fragments respectively to obtain a hash value set, wherein the hash value set comprises hash values of the target instruction fragments;
determining the weight value of each hash value according to the occurrence frequency of each hash value in the hash value set;
and performing weighted summation on each hash value by adopting the weight value of each hash value to obtain a target hash value.
In another implementation manner, the processor 901 is specifically configured to:
taking the number of times of each hash value appearing in the hash value set as the weight value of each hash value; alternatively, the first and second electrodes may be,
determining the repetition degree of each hash value in the hash value set according to the occurrence frequency of each hash value in the hash value set; and taking the weight value corresponding to the repetition degree corresponding to each hash value as the weight value of each hash value.
In another implementation manner, the processor 901 is specifically configured to:
in response to a code detection operation for a target application, acquiring a code file of the target application, wherein the code file comprises a program code of the target application;
decompiling the code file to obtain a hierarchical structure of the program code;
performing code cutting on the program code of the target application according to the hierarchical structure to obtain at least one code block;
and selecting a target code block from the at least one code block.
In yet another implementation, the at least one reference code block is stored in a software development kit feature library; the processor 901 is further configured to:
acquiring program codes of at least two reference applications, and dividing the program codes of the at least two reference applications into a plurality of reference code blocks;
performing cluster analysis on the plurality of reference code blocks, and performing grouping processing on the plurality of reference code blocks according to an analysis result; the reference code blocks in one group belong to the same software development kit;
acquiring mark information of each reference code block according to the group information of each reference code block; the marking information of any reference code block is generated after performing feature joint analysis on each reference code block in the group to which the reference code block belongs to determine a software development kit to which the reference code block belongs;
and adding each reference code block and corresponding mark information to a software development tool kit feature library.
In the embodiment of the invention, the computer device responds to a code detection operation aiming at a target application, divides a target code block from a program code of the target application, calculates the code similarity between the target code block and at least one reference code block, and then determines a matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code block and each reference code block; and adding target mark information to the target code block according to the matching code block. The computer equipment does not need to extract the characteristics of the SDK by acquiring a source packet or jar packet of the SDK, only needs to cut a program code of the target application and analyze the similarity of a code block obtained by cutting and a reference code block, can quickly detect the SDK integrated in the target application, and can effectively improve the efficiency of application detection.
It should be noted that the present application also provides a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the steps executed in fig. 3 or fig. 5 of the above-mentioned embodiment of the application detection method.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. An application detection method, comprising:
in response to a code detection operation for the target application, segmenting a target code block from program code of the target application;
calculating code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit;
determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
and adding target mark information to the target code block according to the matching code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matching code block.
2. The method of claim 1, wherein the calculating the code similarity between the target code block and at least one reference code block comprises:
aiming at any reference code block, acquiring the characteristic parameters of the target code block and the characteristic parameters of the any reference code block;
and performing similarity calculation on the characteristic parameters of the target code block and the characteristic parameters of any reference code block to obtain the code similarity between the target code block and any reference code block.
3. The method of claim 2, wherein the feature parameter is an operation instruction, the operation instruction including a plurality of target instruction fragments; the obtaining of the characteristic parameters of the target code block includes:
obtaining at least one class from the target code block, each class comprising at least one method and at least one variable;
performing instruction extraction processing on each method in each class to obtain a plurality of candidate instruction fragments;
and selecting a plurality of target instruction segments from the candidate instruction segments.
4. The method of claim 3, wherein said selecting a plurality of target instruction fragments from said plurality of candidate instruction fragments comprises:
determining the instruction length of each candidate instruction fragment;
and selecting a candidate instruction segment with the instruction length larger than the length threshold value from the plurality of candidate instruction segments as a target instruction segment.
5. The method according to any one of claims 2 to 4, wherein the characteristic parameter is an operation instruction, and the performing a characteristic similarity calculation on the characteristic parameter of the target code block and the characteristic parameter of any one of the reference code blocks to obtain a code similarity between the target code block and any one of the reference code blocks comprises:
performing hash operation on the operation instruction of the target code block by adopting a target hash algorithm to obtain a target hash value; performing hash operation on the operation instruction of any reference code block by adopting the target hash algorithm to obtain a reference hash value;
and performing distance operation on the target hash value and the reference hash value to obtain the code similarity between the target code block and any one of the reference code blocks.
6. The method of claim 5, wherein the operation instruction of the target code block comprises a plurality of target instruction fragments, the target hash algorithm is a locality sensitive hash algorithm; the performing hash operation on the operation instruction of the target code block by using a target hash algorithm to obtain a target hash value includes:
performing hash operation on each target instruction fragment in the plurality of target instruction fragments respectively to obtain a hash value set, wherein the hash value set comprises hash values of the target instruction fragments;
determining the weight value of each hash value according to the occurrence frequency of each hash value in the hash value set;
and performing weighted summation on each hash value by adopting the weight value of each hash value to obtain a target hash value.
7. The method of claim 6, wherein determining the weight value for each hash value based on the number of times that the respective hash value appears in the set of hash values comprises:
taking the number of times of each hash value appearing in the hash value set as the weight value of each hash value; alternatively, the first and second electrodes may be,
determining the repetition degree of each hash value in the hash value set according to the occurrence frequency of each hash value in the hash value set; and taking the weight value corresponding to the repetition degree corresponding to each hash value as the weight value of each hash value.
8. The method of claim 1, wherein the segmenting the target block of code from the program code of the target application in response to the code detection operation for the target application comprises:
in response to a code detection operation for a target application, acquiring a code file of the target application, wherein the code file comprises a program code of the target application;
decompiling the code file to obtain a hierarchical structure of the program code;
performing code cutting on the program code of the target application according to the hierarchical structure to obtain at least one code block;
and selecting a target code block from the at least one code block.
9. The method of claim 1, wherein the at least one reference code block is stored in a software development kit feature library; the method further comprises the following steps:
acquiring program codes of at least two reference applications, and dividing the program codes of the at least two reference applications into a plurality of reference code blocks;
performing cluster analysis on the plurality of reference code blocks, and performing grouping processing on the plurality of reference code blocks according to an analysis result; the reference code blocks in one group belong to the same software development kit;
acquiring mark information of each reference code block according to the group information of each reference code block; the marking information of any reference code block is generated after performing feature joint analysis on each reference code block in the group to which the reference code block belongs to determine a software development kit to which the reference code block belongs;
and adding each reference code block and corresponding mark information to a software development tool kit feature library.
10. An application detection apparatus, comprising:
a dividing unit configured to divide a target code block from a program code of a target application in response to a code detection operation for the target application;
the computing unit is used for computing the code similarity between the target code block and at least one reference code block, and one reference code block corresponds to one software development kit;
a determining unit, configured to determine, according to a code similarity between the target code block and each reference code block, a matching code block that matches the target code block from among the at least one reference code block;
and the adding unit is used for adding target mark information to the target code block according to the matching code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matching code block.
CN202011171532.3A 2020-10-28 2020-10-28 Application detection method and device, computer equipment and readable storage medium Pending CN112148305A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011171532.3A CN112148305A (en) 2020-10-28 2020-10-28 Application detection method and device, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011171532.3A CN112148305A (en) 2020-10-28 2020-10-28 Application detection method and device, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112148305A true CN112148305A (en) 2020-12-29

Family

ID=73953484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011171532.3A Pending CN112148305A (en) 2020-10-28 2020-10-28 Application detection method and device, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112148305A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685080A (en) * 2021-01-08 2021-04-20 深圳开源互联网安全技术有限公司 Open source component duplicate checking method, system, device and readable storage medium
CN112732581A (en) * 2021-01-12 2021-04-30 京东数字科技控股股份有限公司 SDK detection method, device, electronic equipment, system and storage medium
CN113805892A (en) * 2021-09-17 2021-12-17 杭州云深科技有限公司 Abnormal APK (android Package) identification method, electronic equipment and readable storage medium
CN114416600A (en) * 2022-03-29 2022-04-29 腾讯科技(深圳)有限公司 Application detection method and device, computer equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160100887A (en) * 2016-08-12 2016-08-24 충남대학교산학협력단 Method for detecting malware by code block comparison
US9471285B1 (en) * 2015-07-09 2016-10-18 Synopsys, Inc. Identifying software components in a software codebase
CN106803040A (en) * 2017-01-18 2017-06-06 腾讯科技(深圳)有限公司 Virus signature processing method and processing device
US20170242671A1 (en) * 2016-02-18 2017-08-24 Qualcomm Innovation Center, Inc. Semantically sensitive code region hash calculation for programming languages
US10048945B1 (en) * 2017-05-25 2018-08-14 Devfactory Fz-Llc Library suggestion engine
CN109710299A (en) * 2018-12-14 2019-05-03 平安普惠企业管理有限公司 A kind of open source class libraries monitoring method, device, equipment and computer storage medium
CN110175045A (en) * 2019-05-20 2019-08-27 北京邮电大学 Android application program beats again bag data processing method and processing device
CN111190603A (en) * 2019-12-18 2020-05-22 腾讯科技(深圳)有限公司 Private data detection method and device and computer readable storage medium
CN111338622A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Supply chain code identification method, device, server and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9471285B1 (en) * 2015-07-09 2016-10-18 Synopsys, Inc. Identifying software components in a software codebase
US20170242671A1 (en) * 2016-02-18 2017-08-24 Qualcomm Innovation Center, Inc. Semantically sensitive code region hash calculation for programming languages
KR20160100887A (en) * 2016-08-12 2016-08-24 충남대학교산학협력단 Method for detecting malware by code block comparison
CN106803040A (en) * 2017-01-18 2017-06-06 腾讯科技(深圳)有限公司 Virus signature processing method and processing device
US10048945B1 (en) * 2017-05-25 2018-08-14 Devfactory Fz-Llc Library suggestion engine
CN109710299A (en) * 2018-12-14 2019-05-03 平安普惠企业管理有限公司 A kind of open source class libraries monitoring method, device, equipment and computer storage medium
CN110175045A (en) * 2019-05-20 2019-08-27 北京邮电大学 Android application program beats again bag data processing method and processing device
CN111190603A (en) * 2019-12-18 2020-05-22 腾讯科技(深圳)有限公司 Private data detection method and device and computer readable storage medium
CN111338622A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Supply chain code identification method, device, server and readable storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685080A (en) * 2021-01-08 2021-04-20 深圳开源互联网安全技术有限公司 Open source component duplicate checking method, system, device and readable storage medium
CN112685080B (en) * 2021-01-08 2023-08-11 深圳开源互联网安全技术有限公司 Open source component duplicate checking method, system, device and readable storage medium
CN112732581A (en) * 2021-01-12 2021-04-30 京东数字科技控股股份有限公司 SDK detection method, device, electronic equipment, system and storage medium
CN112732581B (en) * 2021-01-12 2023-03-10 京东科技控股股份有限公司 SDK detection method, device, electronic equipment, system and storage medium
CN113805892A (en) * 2021-09-17 2021-12-17 杭州云深科技有限公司 Abnormal APK (android Package) identification method, electronic equipment and readable storage medium
CN113805892B (en) * 2021-09-17 2024-04-05 杭州云深科技有限公司 Abnormal APK identification method, electronic equipment and readable storage medium
CN114416600A (en) * 2022-03-29 2022-04-29 腾讯科技(深圳)有限公司 Application detection method and device, computer equipment and storage medium
CN114416600B (en) * 2022-03-29 2022-06-28 腾讯科技(深圳)有限公司 Application detection method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109753800B (en) Android malicious application detection method and system fusing frequent item set and random forest algorithm
US9665713B2 (en) System and method for automated machine-learning, zero-day malware detection
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
RU2722692C1 (en) Method and system for detecting malicious files in a non-isolated medium
CN110929203B (en) Abnormal user identification method, device, equipment and storage medium
CN111723371B (en) Method for constructing malicious file detection model and detecting malicious file
CN115562992A (en) File detection method and device, electronic equipment and storage medium
CN106301979B (en) Method and system for detecting abnormal channel
Liu et al. Using g features to improve the efficiency of function call graph based android malware detection
WO2016188334A1 (en) Method and device for processing application access data
Liu et al. MOBIPCR: Efficient, accurate, and strict ML-based mobile malware detection
CN112817877B (en) Abnormal script detection method and device, computer equipment and storage medium
CN108229168B (en) Heuristic detection method, system and storage medium for nested files
CN113360895A (en) Station group detection method and device and electronic equipment
CN109684844B (en) Webshell detection method and device, computing equipment and computer-readable storage medium
CN108804917B (en) File detection method and device, electronic equipment and storage medium
WO2016127858A1 (en) Method and device for identifying webpage intrusion script features
CN110598115A (en) Sensitive webpage identification method and system based on artificial intelligence multi-engine
CN112347477A (en) Family variant malicious file mining method and device
CN107844702B (en) Website trojan backdoor detection method and device based on cloud protection environment
Vahedi et al. Cloud based malware detection through behavioral entropy
CN114124913B (en) Method and device for monitoring network asset change and electronic equipment
CN115859273A (en) Method, device and equipment for detecting abnormal access of database and storage medium
CN115632874A (en) Method, device, equipment and storage medium for detecting threat of entity object
CN114491528A (en) Malicious software detection method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination