CN105550540A - Detection method and device for homogenization application - Google Patents

Detection method and device for homogenization application Download PDF

Info

Publication number
CN105550540A
CN105550540A CN201410607315.2A CN201410607315A CN105550540A CN 105550540 A CN105550540 A CN 105550540A CN 201410607315 A CN201410607315 A CN 201410607315A CN 105550540 A CN105550540 A CN 105550540A
Authority
CN
China
Prior art keywords
similarity
application
source
control flow
intended application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410607315.2A
Other languages
Chinese (zh)
Inventor
李青
潘伟
宋文才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Jiangsu Co Ltd
Original Assignee
China Mobile Group Jiangsu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Jiangsu Co Ltd filed Critical China Mobile Group Jiangsu Co Ltd
Priority to CN201410607315.2A priority Critical patent/CN105550540A/en
Publication of CN105550540A publication Critical patent/CN105550540A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a detection method and a detection device for homogenization application. The detection method comprises the steps of respectively performing code decompilation processing on a source application and a target application, thus obtaining source application information and target application information; analyzing the source application information and target application information to obtain assembly similarity, class layout similarity, code control flow similarity and text similarity; and determining the similarity between the source application and target application according to the assembly similarity, class layout similarity, code control flow similarity and text similarity.

Description

A kind of detection method of homogeneity application and device
Technical field
The present invention relates to the application detection technique in data service, particularly relate to detection method and the device of a kind of homogeneity application.
Background technology
Along with the fast development of electronic information, system is applied especially Android (Android) systematic difference and is emerged rapidly.The use of android system covers smart mobile phone, panel computer, TV set-top box and other portable embedded electronic equipment.
The success of android system will be given the credit to the opening of system to a great extent and be easy to the development kit of left-hand seat, but just because of the opening of system and the transparence of technology, causes again the appearance of a large amount of homogeneity application on the market; Here, so-called homogeneity refers to that similarity is high.Described homogeneity application generally includes two classes: a type is pirate application, namely pirate developer can carry out reverse engineering to source application, to the resource file that source application is distorted by a small margin or the application of replacement source is inner, and the application of amended application as oneself is introduced to the market, thus therefrom make a profit; Another kind of type is for make application excessively, namely make excessively developer and also the source code of oneself and resource file can be carried out replacement amendment, an application is repeatedly issued as multiple application market that is applied in, thus hides application market for the supervision afterwards of a certain application and examination.
For various application distribution platform, many homogeneity application can cause platform product quality to decline.In the long run, the behavior of this homogeneity application by very disruptive Android application market order, and affects the sound development of whole industrial chain.
At present, various application distribution platform only can be accomplished to carry out paper audit and manual testing to copyright document.This test subjective factor is high, wastes time and energy and accuracy is poor.Although there is the method that some homogeneities application detects, because reply scene is single, cannot apply piracy and make application excessively simultaneously and accomplish effective detection.
Summary of the invention
In view of this, the detection method that the embodiment of the present invention provides a kind of homogeneity to apply and device, accurately can determine the similarity between source application and intended application, thus can effectively detection of gangs apply and make application excessively.
For achieving the above object, the technical scheme of the embodiment of the present invention is achieved in that
The detection method that the embodiment of the present invention provides a kind of homogeneity to apply, the method comprises:
Respectively code decompiling process is carried out to source application and intended application, obtain source application message and intended application information;
Dissection process is carried out to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity;
The similarity of the application of described source and intended application is determined according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity.
In such scheme, described source application message comprises source profile and source program bag;
Correspondingly, described intended application information comprises target configuration file and target program bag.
In such scheme, described dissection process is carried out to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity and comprise:
Described source profile and target configuration file are resolved, obtains assembly similarity;
Described source program bag and target program bag are resolved, obtains class layout similarity, code control flow check similarity and text similarity.
In such scheme, described described source profile and target configuration file to be resolved, obtain assembly similarity, comprising:
Be corresponding two-dimensional array respectively by described source profile and target configuration file translations;
Calculate the similarity of two two-dimensional arrays after conversion, as assembly similarity.
In such scheme, described described source program bag and target program bag to be resolved, obtain class layout similarity, comprising:
Resolve described source program bag and target program bag respectively, obtain the class layout tree sequence of source application and the class layout tree sequence of intended application;
Calculate the routing information of all nodes in the class layout tree sequence of class layout tree sequence and the intended application determining to apply in described source;
According to described routing information determination class layout similarity.
In such scheme, described described source program bag and target program bag to be resolved, obtain code control flow check similarity, comprising:
Resolve described source program bag and target program bag respectively, set up the control flow check digraph of source application and the control flow check digraph of intended application;
The control flow check digraph applied according to described source and the control flow check digraph of intended application, determine code control flow check similarity.
In such scheme, described described source program bag and target program bag to be resolved, obtain text similarity, comprising:
Resolve described source program bag and target program bag respectively, set up all keyword vector set of source application and all keyword vector set of intended application;
All keyword vector set of apply described source and all keyword vector set of intended application compare process, obtain text similarity.
In such scheme, the described similarity determining the application of described source and intended application according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity, comprising:
Average computation is weighted to described assembly similarity, class layout similarity, code control flow check similarity and text similarity, determines the similarity of the application of described source and intended application.
The pick-up unit that the embodiment of the present invention also provides a kind of homogeneity to apply, this device comprises decompiling processing module, dissection process module and determination module;
Described decompiling processing module, for carrying out code decompiling process to source application and intended application respectively, obtains source application message and intended application information;
Described dissection process module, for carrying out dissection process to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity;
Described determination module, for determining the similarity of the application of described source and intended application according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity.
In such scheme, described source application message comprises source profile and source program bag;
Correspondingly, described intended application information comprises target configuration file and target program bag.
In such scheme, described dissection process module comprises the first dissection process submodule and the second dissection process submodule; Wherein,
Described first dissection process submodule, for resolving described source profile and target configuration file, obtains assembly similarity;
Described second dissection process submodule, for resolving described source program bag and target program bag, obtains class layout similarity, code control flow check similarity and text similarity.
In such scheme, described first dissection process submodule 621 comprises converting unit and computing unit; Wherein,
Described converting unit, for by described source profile and target configuration file translations being corresponding two-dimensional array respectively;
Described computing unit, for calculating the similarity of two two-dimensional arrays after conversion, as assembly similarity
In such scheme, described second dissection process submodule comprises resolution unit and determining unit; Wherein,
Described resolution unit, for resolving described source program bag and target program bag respectively, obtains the class layout tree sequence of source application and the class layout tree sequence of intended application;
Described determining unit, the class layout for calculating class layout tree sequence and the intended application determining to apply in described source sets the routing information of all nodes in sequence; According to described routing information determination class layout similarity.
In such scheme, described resolution unit, also for resolving described source program bag and target program bag respectively, sets up the control flow check digraph of source application and the control flow check digraph of intended application;
Described determining unit, also for the control flow check digraph of the control flow check digraph applied according to described source and intended application, determines code control flow check similarity.
In such scheme, described resolution unit, also for resolving described source program bag and target program bag respectively, sets up all keyword vector set of source application and all keyword vector set of intended application;
Described determining unit, all keyword vector set also for all keyword vector set of applying described source and intended application compare process, obtain text similarity.
In such scheme, described determination module, for being weighted average computation to described assembly similarity, class layout similarity, code control flow check similarity and text similarity, determines the similarity of the application of described source and intended application.
The detection method of the homogeneity application that the embodiment of the present invention provides and device, carry out code decompiling process to source application and intended application respectively, obtain source application message and intended application information; Dissection process is carried out to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity; The similarity of the application of described source and intended application is determined according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity.So, the similarity between source application with intended application accurately can be determined, thus to piracy application with make application excessively and accomplish effectively to detect.
Accompanying drawing explanation
Fig. 1 is the realization flow schematic diagram of the detection method of embodiment of the present invention homogeneity application;
Fig. 2 is that the embodiment of the present invention carries out the realization flow schematic diagram of dissection process to described source application message and intended application information;
Fig. 3 is the realization flow schematic diagram that the embodiment of the present invention is resolved described source profile and target configuration file;
Fig. 4 is the realization flow schematic diagram that the embodiment of the present invention is resolved described source program bag and target program bag;
Fig. 5 is the realization flow schematic diagram of the detection method of the present invention one application example homogeneity application;
Fig. 6 is the composition structural representation of the pick-up unit of embodiment of the present invention homogeneity application;
Fig. 7 is the composition structural representation of dissection process module described in the embodiment of the present invention;
Fig. 8 is the composition structural representation of the first dissection process submodule described in the embodiment of the present invention;
Fig. 9 is the composition structural representation of the second dissection process submodule described in the embodiment of the present invention.
Embodiment
In correlation technique, the executable file (apk file) of Android application by reverse engineering tool as after APKTools carries out decompiling, can application message be obtained; Described application message comprises resource bundle, configuration file, routine package (i.e. Smali code) and dynamic link library file bag.Wherein, described resource bundle is the files such as some pictures, layout and character string required for an Android application in the process of implementation; Described configuration file is priority assignation and some base program information of Android application; Described Smali code is essentially the code after decompiling, but readable poor, if source program developer employs Code obfuscation when compiling, then Smali code belongs to not readable code substantially; Described dynamic link library is some external functions needing in program operation process to call.
Under normal circumstances, the picture in the resource bundle in above-mentioned application message and character string etc. can be replaced by pirate application developer; More competent developer then can walk around some verifying functions by amendment configuration file and the mode of Smali code, injects some oneself code simultaneously, makes new program and original program have different on experiencing.And make excessively application developer owing to having grasped source code, therefore can search and replace function in source code and class name by replacement resource file bag, full dose, even insert the mode of some do-nothing functions to revise application.But, for cost consideration, pirate application developer and make excessively application developer all can not to source application carry out framework rewriting, therefore the essential characteristic of source code can not be changed.
Based on this, in embodiments of the present invention, respectively code decompiling process is carried out to source application and intended application, obtain source application message and intended application information; Dissection process is carried out to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity; The similarity of the application of described source and intended application is determined according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity.
Here, described intended application is homogeneity application to be detected.
Below in conjunction with drawings and the specific embodiments, the present invention is further described in more detail.
Fig. 1 is the realization flow schematic diagram of the detection method of embodiment of the present invention homogeneity application, and as shown in Figure 1, the detection method of embodiment of the present invention homogeneity application comprises:
Step S11: respectively code decompiling process is carried out to source application and intended application, obtain source application message and intended application information;
Particularly, respectively code decompiling process is carried out to source application and intended application by reverse engineering tool as APKTools, obtain source application message and intended application information.
Here, described source application message comprises source profile and source program bag; Correspondingly, described intended application information comprises target configuration file and target program bag.
It should be noted that, after code decompiling process being carried out to application (comprising source application and intended application) by step S11, the application message obtained not only can comprise configuration file and routine package, can also comprise resource bundle and dynamic link library file bag.But, due to based on the consideration to homogeneity application influence factor, the resource bundle in described application message and these two file bags of dynamic link library file bag can be ignored; Or, these two file bags can be preserved, as the supplementary means that subsequent artefacts checks.
Step S12: carry out dissection process to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity;
Particularly, as shown in Figure 2, dissection process is carried out to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity and comprise:
Step S121: described source profile and target configuration file are resolved, obtains assembly similarity;
Step S122: resolve described source program bag and target program bag, obtains class layout similarity, code control flow check similarity and text similarity.
Wherein, as shown in Figure 3, described in step S121, described source profile and target configuration file are resolved, obtain assembly similarity, comprising:
Step S1211: be corresponding two-dimensional array respectively by described source profile and target configuration file translations;
Step S1212: the similarity calculating two two-dimensional arrays after conversion, as assembly similarity.
Here, the algorithm of m-cosine similarity can be adopted to the calculating of the similarity of the two-dimensional array of two after described conversion, also can be calculated by quantity statistics mode.
Wherein, as shown in Figure 4, described in step S122, described source program bag and target program bag are resolved, obtain class layout similarity, comprising:
Step S1221 ~ S1223: resolve described source program bag and target program bag respectively, obtains the java class layout tree sequence of source application and the java class layout tree sequence of intended application; Calculate the routing information of all nodes in the java class layout tree sequence of java class layout tree sequence and the intended application determining to apply in described source; According to described routing information determination class layout similarity.
Here, the described method according to described routing information determination class layout similarity can adopt tree similarity of paths matrix algorithms, also other universal tree structural similarity algorithms can be passed through, as editing distance algorithm travels through the routing information of all nodes, to determine class layout similarity.
Described in step S122, described source program bag and target program bag are resolved, obtain code control flow check similarity, comprising:
Step S1224 ~ S1225: resolve described source program bag and target program bag respectively, sets up the control flow check digraph of source application and the control flow check digraph of intended application; The control flow check digraph applied according to described source and the control flow check digraph of intended application, determine code control flow check similarity.
Described in step S122, described source program bag and target program bag are resolved, obtain text similarity, comprising:
Step S1226 ~ S1227: resolve described source program bag and target program bag respectively, sets up all keyword vector set of source application and all keyword vector set of intended application; All keyword vector set of apply described source and all keyword vector set of intended application compare process, obtain text similarity.
Here, it should be noted that, in step S122, the execution sequencing of step S1221 ~ S1223, S1224 ~ S1225 and step S1226 ~ S1227 tri-step combinations is not limit.
Step S13: the similarity determining the application of described source and intended application according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity.
Particularly, by being weighted average computation to described assembly similarity, class layout similarity, code control flow check similarity and text similarity, thus determine the similarity of the application of described source and intended application.
It should be noted that, in actual applications, the weight of usual class layout similarity and code control flow check similarity can be slightly high.
So, by the detection method of homogeneity described in embodiment of the present invention application, the similarity between source application with intended application accurately can be determined, thus to piracy application with make application excessively and accomplish effectively to detect.
In order to help the understanding to the embodiment of the present invention, successively assembly similarity, class layout similarity, code control flow check similarity and text similarity are illustrated below.
The first, assembly similarity
In android system, the unit that application program performs is the predefined various assembly of system.Assembly similarity detects and obtains each assembly statement of program and the parameter be associated and resource information by the configuration file content of resolving in installation kit, finally carries out assembly similarity-rough set.Say exactly, android system is supplied to developer's class packaged in advance as the basic module in application program, particularly, Android application comprises four large basic modules: movable (Activity), service (Service), content provider (ContentProvider) and broadcast recipients (BroadcastReceiver).And between the components, android system provides intention (Intent) as internal information load mode.Any Android application, all needs to construct by inheriting, calling and expand above four large basic modules.
Simultaneously, android system requires that the assembly that self defines by developer in a configuration file (AndroidManifest.xml) and the Intent that inter-module is used for communicating all state out, and defines the System Privileges required for whole application in described configuration file.
Therefore, just can be known by the AndroidManifest.xml after resolving decompiling, how many self-defined Activity, Service, ContentProvider and BroadcastReceiver are had in an application, also can know simultaneously, which Intent of mutual response between these four basic modules, thus obtain the component architecture of an application.
The second, class layout similarity
The code development of android system is based on embedded programming (comprising Linux and Java), therefore, the class layout similarity of Android program can be abstracted into two kinds of tree-shaped relations to represent: a kind of relation representing class and bag, referred to as path tree; Another kind represents the realization of succession between class and class and interface, referred to as inheritance tree.
In Java grammer, often define a class, all need the bag (Package) of specifying this class place, and each layer catalogue under this Package.The relation of what path tree represented is this bag, directory path and class.The root directory bag of top layer (namely most) of the root nodes stand program of tree, branch node represents program sub-directory, and leaf node is concrete class.
By the Smali code after decompiling, the overall class layout tree of software can be derived, then compared by the relation, quantity etc. of the level of every tree in statistical path tree and inheritance tree, each intermediate node and leaf node and draw class layout similarity-rough set similarity.
Three, code control flow check similarity
In android system, the self-defining function dozens of at least that an application comprises, has complicated call relation between these functions, out abstract for the call relation of function, can obtain a digraph by up to ten thousand at most; Each node on behalf function in described digraph, directed edge represents between two nodes (function) being attached thereto exists call relation.In order to subsequent descriptions is convenient, such digraph can be defined as code control flow check similarity.
Four, text similarity
In android system, Smali code essence after decompiling is still text document, code aspect introduces text similarity and compares means, correlation statistics is carried out for the self-defining function of source code developer, in the similarity of assisting the application of judgement source and intended application, be quite effective means.
Described in detail below in conjunction with the specific implementation flow process of an application example to the detection method that embodiment of the present invention homogeneity is applied.In described application example, suppose that android system application platform receives a declaring and applies A, i.e. intended application A, and there is a application B in system application platform, i.e. source application B.
Fig. 5 is the realization flow schematic diagram of the detection method of the present invention one application example homogeneity application, and as shown in Figure 5, in the present invention one application example, the detection method of homogeneity application comprises:
Step S21: carry out code decompiling process to intended application A, obtains intended application information;
Particularly, adopt reverse engineering tool APKToolsAPKTools decompiling application A, obtain resource bundle (Resource_A.zip), configuration file (AndroidManifest_A.xml), routine package/Smali code packages (Smali_A.zip) and dynamic link library file (3rdDll_A.zip).
Here, it should be noted that, due to based on the consideration to homogeneity application influence factor, ignore Resource_A.zip and 3rdDll_A.zip, only Resource_A.zip and 3rdDll_A.zip is preserved, as the supplementary means that subsequent artefacts checks, not as the analysis foundation in the embodiment of the present invention.
Step S22: code decompiling process is carried out to source application B, obtains source application message;
Particularly, adopt reverse engineering tool APKToolsAPKTools decompiling application B, obtain resource bundle (Resource_B.zip), configuration file (AndroidManifest_B.xml), routine package/Smali code packages (Smali_B.zip) and dynamic link library file (3rdDll_B.zip); Same, ignore Resource_B.zip and 3rdDll_B.zip.
It should be noted that, in embodiments of the present invention, step S21,22 execution sequencing do not limit.
Step S23: dissection process is carried out to described source application B information and intended application A information, obtains assembly similarity;
Particularly, described step S23 comprises:
Step S23a: read AndroidManifest_A.xml and AndroidManifest_B.xml, successively parse the details of four basic modules Activity, Service, ContentProvider and BroadcastReceiver according to the hierarchy of described AndroidManifest_A.xml and AndroidManifest_B.xml;
Step S23b: AndroidManifest_A.xml and AndroidManifest_B.xml is separately converted to corresponding two-dimensional array, one of them dimension is Intent;
Particularly, read first <activity of the third layer in AndroidManifest_A.xml ... content between/> and </activity>, be labeled as Activity_A [0], and read the Intent information in this Activity in the middle of the 4th layer of <intent-filte> and </intent-filter>.An Activity may have multiple Intent usually, is recorded as Activity_A [0] { Intent [M] }.M represents concrete Intent number;
Continue to read other Activity in AndroidManifest_A.xml and its Intent comprised, obtain Activity array Activity_A [N1] { Intent [] };
Similar, read Service, ContentProvider and BroadcastReceiver module informations all in AndroidManifest_A.xml, obtain Service_A [N1] { Intent [] }, Content_A [N1] { Intent [] } and Provider_A [N1] { Intent [] } respectively;
Read the <uses-permission that the second layer is all ... the authority information comprised in/>, is recorded as use-permission_A [N1];
If there is other second layer key words <permission ... / >, <permission-tree ... / >, <permission-group ... / >, <instrumentation ... / >, <uses-sdk ... / >, <uses-configuration ... / >, <uses-feature ... / > and <supports-screens ... / >, then read details respectively and record,
In like manner, according to the process of abovementioned steps to AndroidManifest_A.xml in application A, accordingly same process is done to AndroidManifest_B.xml in application B, Activity_B [N2] { Intent [] }, the Service_B [N2] { Intent [] } of the B that is applied, Content_B [N2] { Intent [] }, Provider_B [N2] { Intent [] }, use-permission_B [N2] and other keyword record.
Step S23c: adopt quantity statistics mode to calculate the similarity of two two-dimensional arrays as assembly similarity.
Particularly, adding up Intent number in Activity_A [N1] is respectively the Activity number of 0 ~ M1, and M1 is an Intent maximum possible numerical value, obtains ActivitySum_A [M1]; Adding up Intent number in Activity_B [N2] is respectively the number of 0 ~ M2, obtains ActivitySum_B [M2]; If M1≤M2, then whether equal ActivitySum_B [0] from ActivitySum_A [0], judge whether equal ActivitySum_B [M1] to ActivitySum_A [M1] always; If ActivitySum_A [0] ~ ActivitySum_A [M1] is identical with each numerical value in ActivitySum_B [0] ~ ActivitySum_B [M1], then compare the title of Activity in these two-dimensional arrays.For the two-dimensional array that the title of Activity is identical, calculate its number percent further, be set to result CompSim_Activity, be denoted as CS1;
In like manner, do similar process for Service, Content and Provider, obtain result CompSim_Service, CompSim_Content and CompSim_Provider, be denoted as CS2, CS3 and CS4 respectively;
The fields such as more remaining permission (license), calculate identical number percent, obtain result ComSim_Others, be denoted as CS5;
Step S23d: utilize componentSim=((CS1 × α+CS2 × β+CS3 × γ+CS4 × δ+CS5 × ε))/5, calculates and determines assembly similarity;
Wherein, parameter alpha, β, γ, δ, ε are corresponding weighted value, and the value usually for weighting parameters α and ε of CS1 and CS5 is slightly high.
Step S24: dissection process is carried out to described source application B information and intended application A information, obtains java class layout similarity;
Particularly, described step S24 comprises:
Step S24a: the routine package Smali_A.zip resolving application A, search key is the code after .class, just can be wrapped and directory path.According to bag, path and the class formation java class layout tree sequence of many as previously mentioned, be set to Tree_A [N1];
Step S24b: the java class layout tree sequence of the routine package Smali_B.zip resolving application B, the B that is applied, is set to Tree_B [N2];
Step S24c: the routing information calculating all nodes of N1+N2 tree in Tree_A [N1] and Tree_B [N2], and all node path information is preserved, for follow-up tree similarity of paths matrix computations;
Step S24d: get i=0, j=0, same=0, use tree similarity of paths matrix algorithms traversal to compare Tree_A [i] and Tree_B [j]; If both are similar, then same+1, and weed out the Tree_A [i] of epicycle and Tree_B [j] and continue recycle ratio comparatively afterwards.After circulation terminates, obtain same value and should be less than min{N1, N2}.
Step S24e: utilize computing formula javaClassLayoutSim=(α 1× same)/(max{N1, N2}).Calculate java class layout similarity; Wherein, α 1numerical difference according to N1 and N2 calculates as controlling elements.
Step S25: dissection process is carried out to described source application B information and intended application A information, obtains code control flow check similarity;
Particularly, described step S25 comprises:
Step S25a: the routine package Smali_A.zip resolving application A, using invoke-as all code files of keyword search.Because function call in Smali code is all with invoke-xxxx{parameter}, the mode of methodtocall is carried out, methodtocall is navigated to according to this form, then find called function, and according to call function and called function relation, set up the control flow check digraph Graph_A [N1] of application A;
Step S25b: the routine package Smali_B.zip resolving application B, set up the control flow check digraph Grapg_B [N2] of application B;
Step S25c: get i=0, j=0, same=0, use isomorphic graphs algorithm traversal to compare Graph_A [i] and Graph_B [j]; If both are similar, then same+1, and continue to compare after weeding out the Graph_A [i] of epicycle and Graph_B [j].After circulation terminates, obtain same value and should be less than min{N1, N2};
Step S25d: utilize formula functionStreamSim=(α 2× same)/(max{N1, N2}) Accounting Legend Code control flow check similarity; Wherein, α 2numerical difference according to N1 and N2 calculates as controlling elements.
Step S26: dissection process is carried out to described source application B information and intended application A information, obtains text similarity;
Particularly, described step 26 comprises:
Step S26a: the routine package Smali_A.zip resolving application A, using invoke-as all code files of keyword search, decomposites key word by space, be defined as operational character, parameter and called function respectively;
Step S26b: remove operational character, parameter, after system function calls and more general third party increases income (API), the key word of the A that is applied;
Step S26c: add up all key word occurrence numbers, using a key word and its occur number of times as a keyword vector, set up application A all keyword vector set Vector_A [N1];
Step S26d: for application B, repeat step S26 ~ S28, set up all keyword vector set Vector_B [N2] of application B;
Step S26e: use cosine similarity method comparison Vector_A [N1] and Vector_B [N2], obtain text similarity textSim.
It should be noted that, in embodiments of the present invention, the execution sequencing of step S23 ~ S26 is not limit.
Step S27: determine the similarity between described source application B and intended application A according to described assembly similarity, java class layout similarity, code control flow check similarity and text similarity.
Particularly, by being weighted average computation to described assembly similarity componentSim, java class layout similarity javaClassLayoutSim, code control flow check similarity functionStreamSim and text similarity textSim, obtain the similarity androidAppSimilarity between described source application B and intended application A.Here, it should be noted that, the weight of usual java class layout similarity and code control flow check similarity can be slightly high.
Further, in one embodiment, described method can also comprise step S28: Output rusults is reported, judges and manual review for further system.
Here, include in described report the test described source application B and intended application A between similarity androidAppSimilarity, and the similarity-rough set result componentSim of each dimension, javaClassLayoutSim, functionStreamSim and textSim.
Fig. 6 is the composition structural representation of the pick-up unit of embodiment of the present invention homogeneity application, and as shown in Figure 6, the pick-up unit of embodiment of the present invention homogeneity application comprises decompiling processing module 61, dissection process module 62 and determination module 63; Wherein,
Described decompiling processing module 61, for carrying out code decompiling process to source application and intended application respectively, obtains source application message and intended application information;
Described dissection process module 62, for carrying out dissection process to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity;
Described determination module 63, for determining the similarity of the application of described source and intended application according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity.
Particularly, described determination module 63, by being weighted average computation to described assembly similarity, class layout similarity, code control flow check similarity and text similarity, determines the similarity of the application of described source and intended application.
In one embodiment, as shown in Figure 7, described dissection process module 62 comprises the first dissection process submodule 621 and the second dissection process submodule 622; Wherein,
Described first dissection process submodule 621, for resolving described source profile and target configuration file, obtains assembly similarity;
Described second dissection process submodule 622, for resolving described source program bag and target program bag, obtains class layout similarity, code control flow check similarity and text similarity.
In one embodiment, as shown in Figure 8, described first dissection process submodule 621 comprises converting unit 621a and computing unit 621b; Wherein,
Described converting unit 621a, for by described source profile and target configuration file translations being corresponding two-dimensional array respectively;
Described computing unit 621b, for calculating the similarity of two two-dimensional arrays after conversion, as assembly similarity
In one embodiment, as shown in Figure 9, described second dissection process submodule 622 comprises resolution unit 622a and the second determining unit 622b; Wherein,
Described resolution unit 622a, for resolving described source program bag and target program bag respectively, obtains the class layout similarity tree sequence of source application and the class layout similarity tree sequence of intended application;
Described determining unit 622b, the class layout similarity for calculating class layout similarity tree sequence and the intended application determining to apply in described source sets the routing information of all nodes in sequence; According to described routing information determination class layout similarity.
Further, described resolution unit 622a, also for resolving described source program bag and target program bag respectively, sets up the control flow check digraph of source application and the control flow check digraph of intended application;
Described determining unit 622b, also for the control flow check digraph of the control flow check digraph applied according to described source and intended application, determines code control flow check similarity.
Further, described resolution unit 622a, also for resolving described source program bag and target program bag respectively, sets up all keyword vector set of source application and all keyword vector set of intended application;
Described determining unit 622b, all keyword vector set also for all keyword vector set of applying described source and intended application compare process, obtain text similarity.
In actual applications, the each module provided in the embodiment of the present invention, and central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or the field programmable gate array (FPGA) in the pick-up unit that can be applied by homogeneity of the unit that comprises separately of module is realized.
The above, be only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.

Claims (16)

1. a detection method for homogeneity application, it is characterized in that, described method comprises:
Respectively code decompiling process is carried out to source application and intended application, obtain source application message and intended application information;
Dissection process is carried out to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity;
The similarity of the application of described source and intended application is determined according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity.
2. method according to claim 1, is characterized in that, described source application message comprises source profile and source program bag;
Correspondingly, described intended application information comprises target configuration file and target program bag.
3. method according to claim 2, is characterized in that, describedly carries out dissection process to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity and comprises:
Described source profile and target configuration file are resolved, obtains assembly similarity;
Described source program bag and target program bag are resolved, obtains class layout similarity, code control flow check similarity and text similarity.
4. method according to claim 3, is characterized in that, describedly resolves described source profile and target configuration file, obtains assembly similarity, comprising:
Be corresponding two-dimensional array respectively by described source profile and target configuration file translations;
Calculate the similarity of two two-dimensional arrays after conversion, as assembly similarity.
5. method according to claim 3, is characterized in that, describedly resolves described source program bag and target program bag, obtains class layout similarity, comprising:
Resolve described source program bag and target program bag respectively, obtain the class layout tree sequence of source application and the class layout tree sequence of intended application;
Calculate the routing information of all nodes in the class layout tree sequence of class layout tree sequence and the intended application determining to apply in described source;
According to described routing information determination class layout similarity.
6. method according to claim 3, is characterized in that, describedly resolves described source program bag and target program bag, obtains code control flow check similarity, comprising:
Resolve described source program bag and target program bag respectively, set up the control flow check digraph of source application and the control flow check digraph of intended application;
The control flow check digraph applied according to described source and the control flow check digraph of intended application, determine code control flow check similarity.
7. method according to claim 3, is characterized in that, describedly resolves described source program bag and target program bag, obtains text similarity, comprising:
Resolve described source program bag and target program bag respectively, set up all keyword vector set of source application and all keyword vector set of intended application;
All keyword vector set of apply described source and all keyword vector set of intended application compare process, obtain text similarity.
8. the method according to any one of claim 1 to 7, is characterized in that, the described similarity determining the application of described source and intended application according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity, comprising:
Average computation is weighted to described assembly similarity, class layout similarity, code control flow check similarity and text similarity, determines the similarity of the application of described source and intended application.
9. a pick-up unit for homogeneity application, it is characterized in that, described device comprises decompiling processing module, dissection process module and determination module;
Described decompiling processing module, for carrying out code decompiling process to source application and intended application respectively, obtains source application message and intended application information;
Described dissection process module, for carrying out dissection process to described source application message and intended application information, obtains assembly similarity, class layout similarity, code control flow check similarity and text similarity;
Described determination module, for determining the similarity of the application of described source and intended application according to described assembly similarity, class layout similarity, code control flow check similarity and text similarity.
10. device according to claim 9, is characterized in that, described source application message comprises source profile and source program bag;
Correspondingly, described intended application information comprises target configuration file and target program bag.
11. devices according to claim 10, is characterized in that, described dissection process module comprises the first dissection process submodule and the second dissection process submodule; Wherein,
Described first dissection process submodule, for resolving described source profile and target configuration file, obtains assembly similarity;
Described second dissection process submodule, for resolving described source program bag and target program bag, obtains class layout similarity, code control flow check similarity and text similarity.
12. devices according to claim 11, is characterized in that, described first dissection process submodule 621 comprises converting unit and computing unit; Wherein,
Described converting unit, for by described source profile and target configuration file translations being corresponding two-dimensional array respectively;
Described computing unit, for calculating the similarity of two two-dimensional arrays after conversion, as assembly similarity.
13. devices according to claim 11, is characterized in that, described second dissection process submodule comprises resolution unit and determining unit; Wherein,
Described resolution unit, for resolving described source program bag and target program bag respectively, obtains the class layout tree sequence of source application and the class layout tree sequence of intended application;
Described determining unit, the class layout for calculating class layout tree sequence and the intended application determining to apply in described source sets the routing information of all nodes in sequence; According to described routing information determination class layout similarity.
14. devices according to claim 11, is characterized in that,
Described resolution unit, also for resolving described source program bag and target program bag respectively, sets up the control flow check digraph of source application and the control flow check digraph of intended application;
Described determining unit, also for the control flow check digraph of the control flow check digraph applied according to described source and intended application, determines code control flow check similarity.
15. devices according to claim 11, is characterized in that,
Described resolution unit, also for resolving described source program bag and target program bag respectively, sets up all keyword vector set of source application and all keyword vector set of intended application;
Described determining unit, all keyword vector set also for all keyword vector set of applying described source and intended application compare process, obtain text similarity.
16. devices according to any one of claim 9 to 15, is characterized in that,
Described determination module, for being weighted average computation to described assembly similarity, class layout similarity, code control flow check similarity and text similarity, determines the similarity of the application of described source and intended application.
CN201410607315.2A 2014-10-31 2014-10-31 Detection method and device for homogenization application Pending CN105550540A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410607315.2A CN105550540A (en) 2014-10-31 2014-10-31 Detection method and device for homogenization application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410607315.2A CN105550540A (en) 2014-10-31 2014-10-31 Detection method and device for homogenization application

Publications (1)

Publication Number Publication Date
CN105550540A true CN105550540A (en) 2016-05-04

Family

ID=55829727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410607315.2A Pending CN105550540A (en) 2014-10-31 2014-10-31 Detection method and device for homogenization application

Country Status (1)

Country Link
CN (1) CN105550540A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293865A (en) * 2016-08-09 2017-01-04 中国银行股份有限公司 The Compilation Method of computer source code and device, authentication method, Apparatus and system
CN106445513A (en) * 2016-09-12 2017-02-22 中山大学 Similarity calculation method based on mobile application interface element
CN107463420A (en) * 2016-06-02 2017-12-12 深圳市慧动创想科技有限公司 A kind of convenient method of the code implant in Android APK
CN107622201A (en) * 2017-09-18 2018-01-23 湖南大学 A kind of Android platform clone's application program quick determination method of anti-reinforcing
CN110881002A (en) * 2018-09-06 2020-03-13 Oppo广东移动通信有限公司 Electronic red packet monitoring method and device and terminal equipment
CN110908705A (en) * 2019-11-20 2020-03-24 福州大学 Method for establishing mapping relation of program class sets of different versions

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315599A (en) * 2007-05-29 2008-12-03 北京航空航天大学 Method and device for detecting similarity of source codes
CN103577753A (en) * 2012-08-01 2014-02-12 联想(北京)有限公司 Method and electronic equipment for prompting potential hazards of camouflage application
CN103984900A (en) * 2014-05-19 2014-08-13 南京赛宁信息技术有限公司 Android application vulnerability detection method and Android application vulnerability detection system
CN103984883A (en) * 2014-05-21 2014-08-13 湘潭大学 Class dependency graph based Android application similarity detection method
CN104063318A (en) * 2014-06-24 2014-09-24 湘潭大学 Rapid Android application similarity detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315599A (en) * 2007-05-29 2008-12-03 北京航空航天大学 Method and device for detecting similarity of source codes
CN103577753A (en) * 2012-08-01 2014-02-12 联想(北京)有限公司 Method and electronic equipment for prompting potential hazards of camouflage application
CN103984900A (en) * 2014-05-19 2014-08-13 南京赛宁信息技术有限公司 Android application vulnerability detection method and Android application vulnerability detection system
CN103984883A (en) * 2014-05-21 2014-08-13 湘潭大学 Class dependency graph based Android application similarity detection method
CN104063318A (en) * 2014-06-24 2014-09-24 湘潭大学 Rapid Android application similarity detection method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463420A (en) * 2016-06-02 2017-12-12 深圳市慧动创想科技有限公司 A kind of convenient method of the code implant in Android APK
CN106293865A (en) * 2016-08-09 2017-01-04 中国银行股份有限公司 The Compilation Method of computer source code and device, authentication method, Apparatus and system
CN106293865B (en) * 2016-08-09 2019-05-31 中国银行股份有限公司 The Compilation Method and device of computer source code, authentication method, apparatus and system
CN106445513A (en) * 2016-09-12 2017-02-22 中山大学 Similarity calculation method based on mobile application interface element
CN107622201A (en) * 2017-09-18 2018-01-23 湖南大学 A kind of Android platform clone's application program quick determination method of anti-reinforcing
CN107622201B (en) * 2017-09-18 2018-07-24 湖南大学 A kind of Android platform clone's application program rapid detection method of anti-reinforcing
CN110881002A (en) * 2018-09-06 2020-03-13 Oppo广东移动通信有限公司 Electronic red packet monitoring method and device and terminal equipment
CN110881002B (en) * 2018-09-06 2022-06-21 Oppo广东移动通信有限公司 Electronic red packet monitoring method and device and terminal equipment
CN110908705A (en) * 2019-11-20 2020-03-24 福州大学 Method for establishing mapping relation of program class sets of different versions
CN110908705B (en) * 2019-11-20 2021-06-22 福州大学 Method for establishing mapping relation of program class sets of different versions

Similar Documents

Publication Publication Date Title
CN105550540A (en) Detection method and device for homogenization application
CN109426722B (en) SQL injection defect detection method, system, equipment and storage medium
US20180260199A1 (en) Method and apparatus for intermediate representation of applications
US8935575B2 (en) Test data generation
US10437661B2 (en) Methods, systems, devices, and products for error correction in computer programs
Krichen et al. Towards a model-based testing framework for the security of internet of things for smart city applications
US10185546B2 (en) Service extraction and application composition
CN103544430A (en) Operation environment safety method and electronic operation system
Liu et al. Covering code behavior on input validation in functional testing
Hiremath et al. MyWebGuard: toward a user-oriented tool for security and privacy protection on the web
Sejfia et al. Practical automated detection of malicious npm packages
US9369474B2 (en) Analytics data validation
Roy Choudhary et al. X-PERT: a web application testing tool for cross-browser inconsistency detection
Wu et al. A countermeasure to SQL injection attack for cloud environment
Karim et al. Mining android apps to recommend permissions
CN104537308A (en) System and method for providing application security auditing function
CN104636665A (en) Android application program describing and matching method
US20160294856A1 (en) Testing Frequency Control using a Volatility Score
KR101926142B1 (en) Apparatus and method for analyzing programs
Pérez et al. Lapse+ static analysis security software: Vulnerabilities detection in java ee applications
Roussev et al. Image-based kernel fingerprinting
Bao et al. What permissions should this android app request?
CN105468970A (en) Tamper-proof method and system of Android application on the basis of defense network
CN106681904B (en) Method and device for analyzing coverage rate of test piece and coverage application interface
CN114662108A (en) Software detection method and device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160504

RJ01 Rejection of invention patent application after publication