CN113918944A - Android counterfeit application detection method based on interface layout - Google Patents

Android counterfeit application detection method based on interface layout Download PDF

Info

Publication number
CN113918944A
CN113918944A CN202111158960.7A CN202111158960A CN113918944A CN 113918944 A CN113918944 A CN 113918944A CN 202111158960 A CN202111158960 A CN 202111158960A CN 113918944 A CN113918944 A CN 113918944A
Authority
CN
China
Prior art keywords
activity
interface
android
application
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111158960.7A
Other languages
Chinese (zh)
Inventor
付雄
聂晓晗
邓松
王俊昌
程春玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202111158960.7A priority Critical patent/CN113918944A/en
Publication of CN113918944A publication Critical patent/CN113918944A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an Android counterfeit application detection method based on interface layout, which is characterized in that interface structure characteristic vectors of the interface layout and type characteristic vectors corresponding to preset characteristics of various types are extracted based on various Activity running interfaces in a genuine Android application and an Android application to be detected; screening out each Activity group to be analyzed according to the interface structure feature vector and the screenshot of the interface, and calculating the corresponding similarity of each Activity group to be analyzed through the type feature vector corresponding to each preset type feature; finally, judging whether the Android application to be detected is counterfeit application or not based on the similarity between the genuine Android application and the Android application to be detected; compared with the existing mainstream counterfeit APP detection algorithm, the method has the main advantages that the method is strong in confusion resistance, high in execution efficiency and capable of effectively detecting different types of application counterfeit, not only can the traditional application counterfeit behaviors be detected, but also more complex and more targeted application interface counterfeit can be effectively detected.

Description

Android counterfeit application detection method based on interface layout
Technical Field
The invention relates to an Android counterfeit application detection method based on interface layout, and belongs to the technical field of mobile terminal safety and counterfeit identification.
Background
With the recent rise of the mobile market, the Android system is also developing as a mainstream mobile terminal operating system. Data analysis organization StatCounter data shows that the Android market share is steadily increasing year by year from the date of release, and as far as 2020, the Android system already occupies 74.3% of the global mobile terminal market share. Meanwhile, the number of Android applications is also in line with the explosive growth of the Android market, and nearly one million application programs available for downloading are newly put on the shelf in 2017 by the Google Play which is an Android official application store. Although the number of applications on Google Play has fallen back in 2018 for various reasons, there are nearly three million applications available in the application market and the Android application market is still full of vitality.
With the rapid development of the Android mobile application industry, mobile black and gray products (namely, the mobile end black industry and the gray industry, the same below) are also gradually activated. Black gray is an industry that makes profit by means of infringing on the interests of users, original application authors or other third parties, or by other suspicious means. On the one hand, as the threshold for developing mobile applications has decreased, the cost of developing a mobile application has generally been lower than the cost required to develop a similar desktop-level application; on the other hand, the mobile application function is flexible in implementation, the complexity of the mobile application is increased, and various new challenges are faced for the analysis and interception of black and gray products. The two aspects are combined to provide a good foundation for the development of the black and gray application in the mobile terminal.
Counterfeit applications are a widely existing class of mobile gray black products. The counterfeit application means that the counterfeiter applies the same application original data or application metadata as the original application to induce the user to download, for example, similar logo, similar name, similar UI interface and content, thereby achieving the emulational audio-visual emulational software. Most counterfeit application developers seek benefits by counterfeiting relatively popular applications with large downloads in application stores. Once a user downloads such a mock application, its built-in malicious behavior may manifest itself, such as: spreading illegal violation information such as violence terror, obscene pornography and the like; stealing user privacy information, unauthorized use of payment service, malicious advertisement pushing and the like directly damages the substantial benefits of users and threatens the safety of user privacy information.
Most of the existing application counterfeit fraud detection researches concern the problem of application repackaging, and fraud detection is realized by extracting and comparing application static characteristics. Some student studies have enabled detection of counterfeit fraud based on similarity of interface content. However, in recent years, as fraud detection countermeasures have been upgraded, some experienced malicious developers have deliberately modified interface content and functional code to circumvent detection. Particularly, there are the following questions: the counterfeit application needs to keep the similarity of the interface and the original application to deceive the user, and on the premise of little influence on the program dynamic display interface, the content characteristics of the interface are very easy to be modified by a malicious developer, and the structural characteristics of the interface are relatively kept stable.
Therefore, with the formation and improvement of the mobile application ecosystem, the traditional fraud behaviors are migrated to the mobile internet, and a novel fraud means is adopted, so that the infinite novel fraud behaviors not only cause huge damage to the ecosystem, but also bring a serious challenge to the application market and the supervision department. Fraud detection countermeasures are continuously upgraded, experienced malicious developers evade existing detection methods by upgrading fraud technologies, and existing technologies and means cannot effectively detect novel fraud behaviors with higher pertinence.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an Android counterfeit application detection method based on interface layout, and the accuracy and efficiency of counterfeit application detection can be effectively improved by adopting a brand-new detection comparison design.
The invention adopts the following technical scheme for solving the technical problems: the invention designs an Android counterfeit application detection method based on interface layout, which is used for detecting an Android application to be detected corresponding to the original Android application based on the original Android application, and comprises the following steps:
step A, acquiring screenshots of all Activity running interfaces in the Android application and layout information of all Activity running interfaces aiming at a legal Android application and an Android application to be tested respectively, and then entering step B;
step B, respectively aiming at each Activity operation interface in the legal Android application and the Android application to be tested, obtaining an interface structure feature vector corresponding to the Activity operation interface and a type feature vector of each type of preset feature corresponding to the Activity operation interface according to the layout information of the Activity operation interface, and then entering the step C;
step C, establishing pairwise combinations of each Activity running interface in the legal Android application and each Activity running interface in the to-be-analyzed Android application as each Activity group, screening each Activity group according to an interface structure feature vector corresponding to each Activity running interface and a screenshot of each Activity running interface, taking each obtained Activity group as each Activity group to be analyzed, and then entering step D;
step D, according to type feature vectors of each type of feature respectively corresponding to each Activity running interface in the legal Android application and the Android application to be analyzed, respectively aiming at each Activity group to be analyzed, obtaining feature similarity of each preset type of feature respectively corresponding to each Activity group to be analyzed, further obtaining feature similarity of each preset type of feature respectively corresponding to each Activity group to be analyzed, and then entering step E;
step E, respectively aiming at each Activity group to be analyzed, regarding each judgment condition that the feature similarity of each preset type of feature of the Activity group to be analyzed is not smaller than the similarity threshold of each corresponding type of feature, if at least one judgment condition is satisfied, defining the similarity corresponding to the Activity group to be analyzed as 1; if all the judgment conditions are not satisfied, defining the similarity corresponding to the Activity group to be analyzed as 0; then obtaining the similarity corresponding to each Activity group to be analyzed, and then entering step F;
step F, obtaining the sum G of the similarity corresponding to each Activity group to be analyzed, and then according to the following formula:
Figure BDA0003289364540000031
obtaining the similarity SIMAPP between the legal Android application and the Android application to be tested, and judging whether the similarity SIMAPP is greater than a preset application similarity threshold value or not, if so, judging the Android application to be tested as a counterfeit application, otherwise, judging the Android application to be tested as a non-counterfeit application; u, V respectively represents the number of Activity running interfaces in the genuine Android application and the number of Activity running interfaces in the to-be-tested Android application.
As a preferred technical scheme of the invention: in the step A, the following steps A1 to A3 are executed respectively for the genuine Android application and the Android application to be tested, screenshot of each Activity running interface in the Android application and layout information of each Activity running interface are obtained, and then the step B is started;
a1, carrying out decompression decompiling by using ApkTool aiming at an APK (Android Package) of the Android application to obtain a compiling result corresponding to the Android application, and entering the step A2;
a2, filtering out the third-party library activities registered in android manifest.xml in the compiling result, adding an intent-filter node, an action sub-node and a category sub-node for each remaining Activity in the compiling result, packaging the compiling result to form an APK to be processed, and entering the step A3;
and step A3, installing the APK to be processed based on the android simulator, starting each Activity in the APK to be processed by the Apdium, calling a getScreenshop () function provided by the Apium to obtain screenshots of each Activity running interface on the android simulator, and calling a getPageSource () function provided by the Apium to obtain layout information of each Activity running interface on the android simulator.
As a preferred technical scheme of the invention: in the step B, the following steps B1 to B4 are executed respectively for each Activity running interface in the legal Android application and the Android application to be tested, interface structure characteristic vectors corresponding to the Activity running interfaces and preset characteristic vectors of various types are obtained, and then the step C is carried out;
b1, traversing each control in the layout information in sequence according to the layout information of the Activity operation interface, constructing each layer and each control contained in each layer by extracting the upper and lower boundaries of a vertical coordinate from the bounds attributes of the controls, further forming a layer set corresponding to the Activity operation interface by combining the layers, and then entering the step B2;
step B2, aiming at the layer set corresponding to the Activity operation interface, obtaining each independent layer in a mode of combining adjacent layers containing the same control type and the same control number to form the independent layer, setting the attribute of the overlapped layer of each independent layer as true, directly taking the rest layers as each independent layer, setting the attribute of the overlapped layer of each independent layer as false, further forming the independent layer set corresponding to the Activity operation interface by combining the independent layers, and then entering the step B3;
step B3, combining the number of the layer concentration layers corresponding to the Activity operation interface, the number of the independent layers corresponding to the independent layer concentration layers and the number of the independent layers with the attribute of the overlapped layer of the independent layer concentration being true to form an interface structure characteristic vector corresponding to the Activity operation interface, and then entering the step B4;
step B4. traverses each independent layer in the independent layer set corresponding to the Activity running interface to obtain type feature vectors of each preset type feature corresponding to the Activity running interface.
As a preferred technical scheme of the invention: the step B1 comprises the following steps B1-1 to B1-4;
step B1-1. initialize l ═ 1, k ═ 1, and proceed to step B1-2;
b1-2, traversing the first control in the layout information according to the layout information of the Activity operation interface, extracting the upper and lower bounds of the vertical coordinate from the bounds attributes of the controls as the upper and lower bounds corresponding to the first control, and entering the step B1-3;
step B1-3, if l is 1, taking the upper and lower boundaries corresponding to the l-th control as the upper and lower boundaries of the kth layer, adding the l-th control into the kth layer, and then entering step B1-4;
if l is greater than 1, judging whether the upper and lower boundaries corresponding to the l control are included in the upper and lower boundaries of the kth layer, if so, adding the l control into the kth layer, and performing the step B1-4; otherwise, taking the upper and lower boundaries corresponding to the l-th control as the upper and lower boundaries of the (k + 1) th layer, adding the l-th control into the (k + 1) th layer, then updating by adding 1 according to the value of k, and then entering the step B1-4;
step B1-4, judging whether L is equal to the number L of the controls in the layout information of the Activity operation interface, if so, forming a layer set corresponding to the Activity operation interface by each layer and each control contained in each layer; otherwise, updating by adding 1 for the value of l, and returning to the step B1-2.
As a preferred technical scheme of the invention: the step C comprises the following steps C1 to C3;
c1, constructing pairwise combinations of each Activity running interface in the legal Android application and each Activity running interface in the to-be-tested Android application to serve as each Activity group, and entering the step C2;
step C2., obtaining, for each Activity group, an absolute value a of a difference between the numbers of the layers in the Activity group corresponding to the two Activity running interfaces, respectively, and an absolute value b of a difference between the numbers of the independent layers in the independent layer set corresponding to the two Activity running interfaces, respectively, with the attribute true, then judging whether a is greater than a preset first threshold or b is greater than a preset second threshold, if so, deleting the Activity group, otherwise, defining the Activity group as a primary Activity group; then proceed to step C3;
step C3. is to apply LMgist algorithm to obtain the space envelope feature vectors of the screenshots of the two Activity running interfaces of the initially selected Activity group, and to calculate the cosine similarity distance between the two space envelope feature vectors, and to judge whether the cosine similarity distance is larger than the preset third threshold, if yes, the initially selected Activity group is defined as the Activity group to be analyzed, otherwise, the initially selected Activity group is deleted.
As a preferred technical scheme of the invention: in the step D, according to type feature vectors of each type of feature, which are respectively preset correspondingly to each Activity running interface in the legal Android application and the Android application to be analyzed, the following operation is executed respectively for each Activity group to be analyzed, feature similarities of each preset type of feature, which correspond to each Activity group to be analyzed, are obtained, and then the step E is carried out;
the operation is as follows: aiming at each preset type feature, respectively, according to type feature vectors f of the type features respectively corresponding to two Activity operation interfaces in the Activity group to be analyzedA、fBAccording to the following formula:
Figure BDA0003289364540000051
obtaining the feature similarity SIM (f) of the Activity group to be analyzed corresponding to the type featureA,fB) Wherein I represents a type feature vector f of one Activity operation interface corresponding to the type feature in the Activity group to be analyzedAJ represents a type feature vector f of another Activity operation interface corresponding to the type feature in the Activity group to be analyzedBNumber of characteristic elements in (C)A,iA type feature vector f representing that one of the Activity running interfaces in the Activity group to be analyzed corresponds to the type featureAThe ith characteristic element of (1), CB,jA type feature vector f representing that another Activity operation interface in the Activity group to be analyzed corresponds to the type featureBThe jth feature element in (1), and SIM (C)A,i,CB,j) Obtained as follows:
Figure BDA0003289364540000052
and further obtaining the feature similarity of the Activity group to be analyzed corresponding to each preset type of feature.
As a preferred technical scheme of the invention: in step B4, traversing each independent layer in the independent layer set corresponding to the Activity running interface, storing the text or content-desc text of each control contained in the independent layer in the text characteristic set corresponding to the independent layer, further obtaining text characteristic sets corresponding to each independent layer, and combining the text characteristic sets to form the text characteristic vector corresponding to the Activity running interface.
As a preferred technical scheme of the invention: in step B4, traversing each independent layer in the independent layer set corresponding to the Activity running interface, storing the text of the class attribute of each control contained in the independent layer in the control type characteristic set corresponding to the independent layer, further obtaining the control type characteristic sets corresponding to each independent layer, and combining the control type characteristic sets to form the control type characteristic vector corresponding to the Activity running interface.
As a preferred technical scheme of the invention: in step B4, traversing each independent layer in the independent layer set corresponding to the Activity running interface, storing the text of resource-ID attribute of each control contained in the independent layer in the control ID characteristic set corresponding to the independent layer, further obtaining control ID characteristic sets corresponding to each independent layer, and combining the control ID characteristic sets to form the control ID characteristic vector corresponding to the Activity running interface.
Compared with the prior art, the Android counterfeit application detection method based on the interface layout has the following technical effects:
the invention designs an Android counterfeit application detection method based on interface layout, which comprises the steps of firstly obtaining screenshots of all Activity running interfaces in a legal Android application and an Android application to be detected and layout information of all Activity running interfaces; then preprocessing the layout information of each Activity operation interface, and extracting an interface structure feature vector, a text feature vector, a control type feature vector and a control ID feature vector of the interface layout; screening out each Activity group to be analyzed similar between the legal application and the application to be analyzed through the interface structure feature vector and the screenshot of the interface, and calculating the similarity corresponding to each Activity group to be analyzed through the text feature vector, the control type feature vector and the control ID feature vector; finally, calculating the similarity SIMAPP between the genuine Android application and the Android application to be detected based on the similarity between the Activity operation interfaces, and judging whether the Android application to be detected is a counterfeit application or not according to the calculation result; compared with the existing mainstream counterfeit APP detection algorithm, the method has the main advantages that the method is strong in confusion resistance, high in execution efficiency and capable of effectively detecting different types of application counterfeit, not only can the traditional application counterfeit behaviors be detected, but also more complex and more targeted application interface counterfeit can be effectively detected.
Drawings
FIG. 1 is a schematic flow chart of the Android counterfeit application detection method based on the interface layout.
Detailed Description
The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention designs an Android counterfeit application detection method based on interface layout, which is used for detecting an Android application to be detected corresponding to the original Android application based on the original Android application, and specifically executes the following steps A to A in practical application as shown in figure 1.
And step A, acquiring screenshots of all Activity running interfaces in the Android application and layout information of all Activity running interfaces aiming at the legal Android application and the Android application to be tested respectively, and then entering step B.
In practical application, in the step a, the following steps a1 to a step A3 are executed respectively for the genuine Android application and the to-be-tested Android application, so as to obtain screenshots of each Activity running interface in the Android application and layout information of each Activity running interface, and then the step B is performed.
And A1, carrying out decompression decompiling by using ApkTool aiming at the APK of the Android application to obtain a compiling result corresponding to the Android application, and entering the step A2.
And A2, filtering out the third-party library activities registered in the android manifest.xml in the compiling result, adding an intent-filter node, an action sub-node and a category sub-node for each remaining Activity in the compiling result, packaging the compiling result to form an APK to be processed, and entering the step A3.
And step A3, installing the APK to be processed based on the android simulator, starting each Activity in the APK to be processed by the Apdium, calling a getScreenshop () function provided by the Apium to obtain screenshots of each Activity running interface on the android simulator, and calling a getPageSource () function provided by the Apium to obtain layout information of each Activity running interface on the android simulator.
In application, the existing automatic testing tool is low in efficiency and not suitable for large-scale counterfeit fraud detection application scenes, and the average time for completely traversing all interfaces in one application is several hours. Since the Activity component needs to be declared in the android manifest. Therefore, after the Activity registered in android manifest xml is analyzed by decompiling the APK and the android. Therefore, analysis time such as interface entry point searching when the traditional automation tool traverses the application is saved. The complexity and time required for this approach is much less, a balance being achieved between UI coverage and automated test performance.
In specific implementation, because the Activity component of each application needs to have an "android. intent. launcher" tag, the original application program needs to be preprocessed, and the application program is decompiled by using a reverse tool Apktool to modify an android manifest. xml file in the application. And in consideration of the influence of the third-party library on the detection result, filtering out relevant Activity components according to a public white list of the third-party library. For each declared Activity, the intent-filter node and its action and category child nodes are added, then repackaged with Apktool and re-signed with Signapk to generate a new application that can run.
And B, respectively aiming at each Activity running interface in the legal Android application and the Android application to be tested, obtaining an interface structure feature vector corresponding to the Activity running interface and a type feature vector of each type of preset feature corresponding to the Activity running interface according to the layout information of the Activity running interface, and then entering the step C.
In practical application, in the step B, the following steps B1 to B4 are executed for each Activity running interface in the genuine Android application and the Android application to be tested, so as to obtain the interface structure feature vector corresponding to the Activity running interface and preset feature vectors of various types, and then the step C is performed.
And B1, traversing each control in the layout information in sequence according to the layout information of the Activity operation interface, constructing each layer and each control contained in each layer by extracting the upper and lower boundaries of a vertical coordinate from the bounds attributes of the controls, forming a layer set corresponding to the Activity operation interface by combining the layers, and entering the step B2.
The step B1 includes the following steps B1-1 to B1-4.
Step B1-1. initialize l ═ 1, k ═ 1, and proceed to step B1-2.
And B1-2, traversing the ith control in the layout information according to the layout information of the Activity operation interface, extracting the upper and lower bounds of the vertical coordinate from the bounds attributes of the controls as the upper and lower bounds corresponding to the ith control, and entering the step B1-3.
And step B1-3, if l is 1, taking the upper and lower boundaries corresponding to the l-th control as the upper and lower boundaries of the k-th layer, adding the l-th control into the k-th layer, and then entering the step B1-4.
If l is greater than 1, judging whether the upper and lower boundaries corresponding to the l control are included in the upper and lower boundaries of the kth layer, if so, adding the l control into the kth layer, and performing the step B1-4; otherwise, taking the upper and lower bounds corresponding to the l-th control as the upper and lower bounds of the (k + 1) -th layer, adding the l-th control into the (k + 1) -th layer, then updating by adding 1 according to the value of k, and then entering the step B1-4.
Step B1-4, judging whether L is equal to the number L of the controls in the layout information of the Activity operation interface, if so, forming a layer set corresponding to the Activity operation interface by each layer and each control contained in each layer; otherwise, updating by adding 1 for the value of l, and returning to the step B1-2.
And B2, aiming at the layer set corresponding to the Activity operation interface, acquiring each independent layer in a mode of combining adjacent layers containing the same control type and the same control number to form the independent layer, setting the attribute of the overlapped layer of each independent layer as true, directly taking the rest layers as each independent layer, setting the attribute of the overlapped layer of each independent layer as false, further forming the independent layer set corresponding to the Activity operation interface by combining the independent layers, and then entering the step B3.
And B3, combining the number of the layer concentration layers corresponding to the Activity operation interface, the number of the independent layers corresponding to the independent layer concentration layers and the number of the independent layers with the attribute of the overlapped layer of the independent layer concentration being true to form an interface structure characteristic vector corresponding to the Activity operation interface, and then entering the step B4.
Step B4. traverses each independent layer in the independent layer set corresponding to the Activity running interface to obtain type feature vectors of each preset type feature corresponding to the Activity running interface.
Pairwise comparisons of Activity runtime interfaces are very time consuming, since most applications contain no less than 10 activities, and the number of levels in an Activity is no less than 5, which results in feature comparisons of Activity runtime interfaces between applications more than 2500 times. In addition, the visual effects of the Activity running interfaces of many applications are extremely different, and the comparison of the Activity running interfaces is meaningless. Through observation, two Activity operation interfaces with larger visual effect difference mainly present two aspects on the hierarchical structure characteristics: (1) the number of layers differs greatly; (2) the number of overlapping layers differs greatly. Therefore, by using the method based on the hierarchical structure feature priority comparison, if the hierarchical structure features of the two Activity operation interfaces are greatly different, the two Activity operation interfaces are determined to be dissimilar, and the following step C is continuously executed without further comparison of other features.
And C, constructing pairwise combinations of each Activity running interface in the legal Android application and each Activity running interface in the to-be-analyzed Android application to serve as each Activity group, screening each Activity group according to the interface structure feature vector corresponding to each Activity running interface and the screenshot of each Activity running interface, taking each obtained Activity group as each Activity group to be analyzed, and entering the step D.
In practical applications, the step C is performed as the following steps C1 to C3.
And C1, constructing pairwise combinations of each Activity running interface in the legal Android application and each Activity running interface in the to-be-tested Android application to serve as each Activity group, and entering the step C2.
Step C2., obtaining, for each Activity group, an absolute value a of a difference between the numbers of the layers in the Activity group corresponding to the two Activity running interfaces, respectively, and an absolute value b of a difference between the numbers of the independent layers in the independent layer set corresponding to the two Activity running interfaces, respectively, with the attribute true, then judging whether a is greater than a preset first threshold or b is greater than a preset second threshold, if so, deleting the Activity group, otherwise, defining the Activity group as a primary Activity group; then proceed to step C3.
Step C3. is to apply LMgist algorithm to obtain the space envelope feature vectors of the screenshots of the two Activity running interfaces of the initially selected Activity group, and to calculate the cosine similarity distance between the two space envelope feature vectors, and to judge whether the cosine similarity distance is larger than the preset third threshold, if yes, the initially selected Activity group is defined as the Activity group to be analyzed, otherwise, the initially selected Activity group is deleted.
And D, respectively presetting type feature vectors of various types of features according to the Activity running interfaces in the legal Android application and the to-be-analyzed Android application, respectively aiming at each Activity group to be analyzed, obtaining feature similarity of the to-be-analyzed Activity group corresponding to the preset various types of features respectively, further obtaining the feature similarity of each to-be-analyzed Activity group corresponding to the preset various types of features respectively, and then entering the step E.
In practical application, in the step D, according to type feature vectors of each type of feature, which are respectively preset in correspondence to each Activity running interface in the genuine Android application and the to-be-analyzed Android application, the following operation is executed for each Activity group to be analyzed, so as to obtain feature similarities of each preset type of feature corresponding to each Activity group to be analyzed, further obtain feature similarities of each preset type of feature corresponding to each Activity group to be analyzed, and then the step E is performed.
The operation is as follows: aiming at each preset type feature, respectively, according to type feature vectors f of the type features respectively corresponding to two Activity operation interfaces in the Activity group to be analyzedA、fBAccording to the following formula:
Figure BDA0003289364540000091
obtaining the feature similarity SIM (f) of the Activity group to be analyzed corresponding to the type featureA,fB) Wherein I represents a type feature vector f of one Activity operation interface corresponding to the type feature in the Activity group to be analyzedAJ represents a type feature vector f of another Activity operation interface corresponding to the type feature in the Activity group to be analyzedBNumber of characteristic elements in (C)A,iA type feature vector f representing that one of the Activity running interfaces in the Activity group to be analyzed corresponds to the type featureAThe ith characteristic element of (1), CB,jA type feature vector f representing that another Activity operation interface in the Activity group to be analyzed corresponds to the type featureBThe jth characteristic element of (1), and
Figure BDA0003289364540000101
obtained as follows:
Figure BDA0003289364540000102
and further obtaining the feature similarity of the Activity group to be analyzed corresponding to each preset type of feature.
Specifically, in the step D, it is preset that each type of feature vector includes a text feature vector, a control type feature vector, and a control ID feature vector, where for the text feature vector, in the step B4, traversing each independent layer in the independent layer set corresponding to the Activity running interface, storing texts of text or content-desc of each control included in the independent layer in the text feature set corresponding to the independent layer, further obtaining text feature sets respectively corresponding to each independent layer, and combining the text feature sets to form the text feature vector corresponding to the Activity running interface.
For the control type feature vector, in the step B4, traversing each independent layer in the independent layer set corresponding to the Activity running interface, storing the text of the class attribute of each control included in the independent layer in the control type feature set corresponding to the independent layer, further obtaining the control type feature sets corresponding to each independent layer, and combining the control type feature sets to form the control type feature vector corresponding to the Activity running interface.
For the control ID feature vector, in the step B4, traversing each independent layer in the independent layer set corresponding to the Activity running interface, storing the text of resource-ID attribute of each control included in the independent layer in the control ID feature set corresponding to the independent layer, further obtaining the control ID feature sets corresponding to each independent layer, and combining the control ID feature sets to form the control ID feature vector corresponding to the Activity running interface.
Step E, respectively aiming at each Activity group to be analyzed, regarding each judgment condition that the feature similarity of each preset type of feature of the Activity group to be analyzed is not smaller than the similarity threshold of each corresponding type of feature, if at least one judgment condition is satisfied, defining the similarity corresponding to the Activity group to be analyzed as 1; if all the judgment conditions are not satisfied, defining the similarity corresponding to the Activity group to be analyzed as 0; and then obtaining the similarity corresponding to each Activity group to be analyzed, and then entering the step F.
Step F, obtaining the sum G of the similarity corresponding to each Activity group to be analyzed, and then according to the following formula:
Figure BDA0003289364540000103
obtaining the similarity SIMAPP between the legal Android application and the Android application to be tested, and judging whether the similarity SIMAPP is greater than a preset application similarity threshold value or not, if so, judging the Android application to be tested as a counterfeit application, otherwise, judging the Android application to be tested as a non-counterfeit application; u, V respectively represents the number of Activity running interfaces in the genuine Android application and the number of Activity running interfaces in the to-be-tested Android application.
The Android counterfeit application detection method based on the interface layout is designed in the technical scheme, and includes the steps that firstly, screenshots of all Activity running interfaces in a legal Android application and an Android application to be detected and layout information of all Activity running interfaces are obtained; then preprocessing the layout information of each Activity operation interface, and extracting an interface structure feature vector, a text feature vector, a control type feature vector and a control ID feature vector of the interface layout; screening out each Activity group to be analyzed similar between the legal application and the application to be analyzed through the interface structure feature vector and the screenshot of the interface, and calculating the similarity corresponding to each Activity group to be analyzed through the text feature vector, the control type feature vector and the control ID feature vector; finally, calculating the similarity SIMAPP between the genuine Android application and the Android application to be detected based on the similarity between the Activity operation interfaces, and judging whether the Android application to be detected is a counterfeit application or not according to the calculation result; compared with the existing mainstream counterfeit APP detection algorithm, the method has the main advantages that the method is strong in confusion resistance, high in execution efficiency and capable of effectively detecting application counterfeit of different types, not only can the traditional application counterfeit behavior be detected in application, but also more complex and more targeted application interface counterfeit can be effectively detected.
The embodiments of the present invention will be described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (9)

1. An Android counterfeit application detection method based on interface layout is based on a genuine Android application and detects a corresponding Android application to be detected, and is characterized by comprising the following steps:
step A, acquiring screenshots of all Activity running interfaces in the Android application and layout information of all Activity running interfaces aiming at a legal Android application and an Android application to be tested respectively, and then entering step B;
step B, respectively aiming at each Activity operation interface in the legal Android application and the Android application to be tested, obtaining an interface structure feature vector corresponding to the Activity operation interface and a type feature vector of each type of preset feature corresponding to the Activity operation interface according to the layout information of the Activity operation interface, and then entering the step C;
step C, establishing pairwise combinations of each Activity running interface in the legal Android application and each Activity running interface in the to-be-analyzed Android application as each Activity group, screening each Activity group according to an interface structure feature vector corresponding to each Activity running interface and a screenshot of each Activity running interface, taking each obtained Activity group as each Activity group to be analyzed, and then entering step D;
step D, according to type feature vectors of each type of feature respectively corresponding to each Activity running interface in the legal Android application and the Android application to be analyzed, respectively aiming at each Activity group to be analyzed, obtaining feature similarity of each preset type of feature respectively corresponding to each Activity group to be analyzed, further obtaining feature similarity of each preset type of feature respectively corresponding to each Activity group to be analyzed, and then entering step E;
step E, respectively aiming at each Activity group to be analyzed, regarding each judgment condition that the feature similarity of each preset type of feature of the Activity group to be analyzed is not smaller than the similarity threshold of each corresponding type of feature, if at least one judgment condition is satisfied, defining the similarity corresponding to the Activity group to be analyzed as 1; if all the judgment conditions are not satisfied, defining the similarity corresponding to the Activity group to be analyzed as 0; then obtaining the similarity corresponding to each Activity group to be analyzed, and then entering step F;
step F, obtaining the sum G of the similarity corresponding to each Activity group to be analyzed, and then according to the following formula:
Figure FDA0003289364530000011
obtaining the similarity SIMAPP between the legal Android application and the Android application to be tested, and judging whether the similarity SIMAPP is greater than a preset application similarity threshold value or not, if so, judging the Android application to be tested as a counterfeit application, otherwise, judging the Android application to be tested as a non-counterfeit application; u, V respectively represents the number of Activity running interfaces in the genuine Android application and the number of Activity running interfaces in the to-be-tested Android application.
2. The interface layout-based Android counterfeit application detection method according to claim 1, characterized in that: in the step A, the following steps A1 to A3 are executed respectively for the genuine Android application and the Android application to be tested, screenshot of each Activity running interface in the Android application and layout information of each Activity running interface are obtained, and then the step B is started;
a1, carrying out decompression decompiling by using ApkTool aiming at an APK (Android Package) of the Android application to obtain a compiling result corresponding to the Android application, and entering the step A2;
a2, filtering out the third-party library activities registered in android manifest.xml in the compiling result, adding an intent-filter node, an action sub-node and a category sub-node for each remaining Activity in the compiling result, packaging the compiling result to form an APK to be processed, and entering the step A3;
and step A3, installing the APK to be processed based on the android simulator, starting each Activity in the APK to be processed by the Apdium, calling a getScreenshop () function provided by the Apium to obtain screenshots of each Activity running interface on the android simulator, and calling a getPageSource () function provided by the Apium to obtain layout information of each Activity running interface on the android simulator.
3. The interface layout-based Android counterfeit application detection method according to claim 1, characterized in that: in the step B, the following steps B1 to B4 are executed respectively for each Activity running interface in the legal Android application and the Android application to be tested, interface structure characteristic vectors corresponding to the Activity running interfaces and preset characteristic vectors of various types are obtained, and then the step C is carried out;
b1, traversing each control in the layout information in sequence according to the layout information of the Activity operation interface, constructing each layer and each control contained in each layer by extracting the upper and lower boundaries of a vertical coordinate from the bounds attributes of the controls, further forming a layer set corresponding to the Activity operation interface by combining the layers, and then entering the step B2;
step B2, aiming at the layer set corresponding to the Activity operation interface, obtaining each independent layer in a mode of combining adjacent layers containing the same control type and the same control number to form the independent layer, setting the attribute of the overlapped layer of each independent layer as true, directly taking the rest layers as each independent layer, setting the attribute of the overlapped layer of each independent layer as false, further forming the independent layer set corresponding to the Activity operation interface by combining the independent layers, and then entering the step B3;
step B3, combining the number of the layer concentration layers corresponding to the Activity operation interface, the number of the independent layers corresponding to the independent layer concentration layers and the number of the independent layers with the attribute of the overlapped layer of the independent layer concentration being true to form an interface structure characteristic vector corresponding to the Activity operation interface, and then entering the step B4;
step B4. traverses each independent layer in the independent layer set corresponding to the Activity running interface to obtain type feature vectors of each preset type feature corresponding to the Activity running interface.
4. The interface layout-based Android counterfeit application detection method according to claim 3, characterized in that: the step B1 comprises the following steps B1-1 to B1-4;
step B1-1. initialize l ═ 1, k ═ 1, and proceed to step B1-2;
b1-2, traversing the first control in the layout information according to the layout information of the Activity operation interface, extracting the upper and lower bounds of the vertical coordinate from the bounds attributes of the controls as the upper and lower bounds corresponding to the first control, and entering the step B1-3;
step B1-3, if l is 1, taking the upper and lower boundaries corresponding to the l-th control as the upper and lower boundaries of the kth layer, adding the l-th control into the kth layer, and then entering step B1-4;
if l is greater than 1, judging whether the upper and lower boundaries corresponding to the l control are included in the upper and lower boundaries of the kth layer, if so, adding the l control into the kth layer, and performing the step B1-4; otherwise, taking the upper and lower boundaries corresponding to the l-th control as the upper and lower boundaries of the (k + 1) th layer, adding the l-th control into the (k + 1) th layer, then updating by adding 1 according to the value of k, and then entering the step B1-4;
step B1-4, judging whether L is equal to the number L of the controls in the layout information of the Activity operation interface, if so, forming a layer set corresponding to the Activity operation interface by each layer and each control contained in each layer; otherwise, updating by adding 1 for the value of l, and returning to the step B1-2.
5. The interface layout-based Android counterfeit application detection method according to claim 3 or 4, characterized in that: the step C comprises the following steps C1 to C3;
c1, constructing pairwise combinations of each Activity running interface in the legal Android application and each Activity running interface in the to-be-tested Android application to serve as each Activity group, and entering the step C2;
step C2., obtaining, for each Activity group, an absolute value a of a difference between the numbers of the layers in the Activity group corresponding to the two Activity running interfaces, respectively, and an absolute value b of a difference between the numbers of the independent layers in the independent layer set corresponding to the two Activity running interfaces, respectively, with the attribute true, then judging whether a is greater than a preset first threshold or b is greater than a preset second threshold, if so, deleting the Activity group, otherwise, defining the Activity group as a primary Activity group; then proceed to step C3;
step C3. is to apply LMgist algorithm to obtain the space envelope feature vectors of the screenshots of the two Activity running interfaces of the initially selected Activity group, and to calculate the cosine similarity distance between the two space envelope feature vectors, and to judge whether the cosine similarity distance is larger than the preset third threshold, if yes, the initially selected Activity group is defined as the Activity group to be analyzed, otherwise, the initially selected Activity group is deleted.
6. The interface layout-based Android counterfeit application detection method according to claim 1, characterized in that: in the step D, according to type feature vectors of each type of feature, which are respectively preset correspondingly to each Activity running interface in the legal Android application and the Android application to be analyzed, the following operation is executed respectively for each Activity group to be analyzed, feature similarities of each preset type of feature, which correspond to each Activity group to be analyzed, are obtained, and then the step E is carried out;
operation, namely respectively aiming at each preset type characteristic, and respectively corresponding to the type characteristic vector f of the type characteristic according to two Activity operation interfaces in the Activity group to be analyzedA、fBAccording to the following formula:
Figure FDA0003289364530000041
obtaining the feature similarity SIM (f) of the Activity group to be analyzed corresponding to the type featureA,fB) Wherein I represents a type feature vector f of one Activity operation interface corresponding to the type feature in the Activity group to be analyzedAJ represents a type feature vector f of another Activity operation interface corresponding to the type feature in the Activity group to be analyzedBNumber of characteristic elements in (C)A,iA type feature vector f representing that one of the Activity running interfaces in the Activity group to be analyzed corresponds to the type featureAThe ith characteristic element of (1), CB,jA type feature vector f representing that another Activity operation interface in the Activity group to be analyzed corresponds to the type featureBThe jth feature element in (1), and SIM (C)A,i,CB,j) Obtained as follows:
Figure FDA0003289364530000042
and further obtaining the feature similarity of the Activity group to be analyzed corresponding to each preset type of feature.
7. The interface layout-based Android counterfeit application detection method according to claim 1, 3 or 6, characterized in that: in step B4, traversing each independent layer in the independent layer set corresponding to the Activity running interface, storing the text or content-desc text of each control contained in the independent layer in the text characteristic set corresponding to the independent layer, further obtaining text characteristic sets corresponding to each independent layer, and combining the text characteristic sets to form the text characteristic vector corresponding to the Activity running interface.
8. The interface layout-based Android counterfeit application detection method according to claim 1, 3 or 6, characterized in that: in step B4, traversing each independent layer in the independent layer set corresponding to the Activity running interface, storing the text of the class attribute of each control contained in the independent layer in the control type characteristic set corresponding to the independent layer, further obtaining the control type characteristic sets corresponding to each independent layer, and combining the control type characteristic sets to form the control type characteristic vector corresponding to the Activity running interface.
9. The interface layout-based Android counterfeit application detection method according to claim 1, 3 or 6, characterized in that: in step B4, traversing each independent layer in the independent layer set corresponding to the Activity running interface, storing the text of resource-ID attribute of each control contained in the independent layer in the control ID characteristic set corresponding to the independent layer, further obtaining control ID characteristic sets corresponding to each independent layer, and combining the control ID characteristic sets to form the control ID characteristic vector corresponding to the Activity running interface.
CN202111158960.7A 2021-09-30 2021-09-30 Android counterfeit application detection method based on interface layout Pending CN113918944A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111158960.7A CN113918944A (en) 2021-09-30 2021-09-30 Android counterfeit application detection method based on interface layout

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111158960.7A CN113918944A (en) 2021-09-30 2021-09-30 Android counterfeit application detection method based on interface layout

Publications (1)

Publication Number Publication Date
CN113918944A true CN113918944A (en) 2022-01-11

Family

ID=79237430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111158960.7A Pending CN113918944A (en) 2021-09-30 2021-09-30 Android counterfeit application detection method based on interface layout

Country Status (1)

Country Link
CN (1) CN113918944A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115225930A (en) * 2022-07-25 2022-10-21 广州博冠信息科技有限公司 Processing method and device for live interactive application, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273546A (en) * 2017-07-14 2017-10-20 北京邮电大学 Counterfeit application detection method and system
US20180144132A1 (en) * 2016-11-18 2018-05-24 Sichuan University Kind of android malicious code detection method on the base of community structure analysis
CN108898013A (en) * 2018-06-14 2018-11-27 南京大学 A kind of Android application interface similarity-rough set method dividing feature vector based on layout

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180144132A1 (en) * 2016-11-18 2018-05-24 Sichuan University Kind of android malicious code detection method on the base of community structure analysis
CN107273546A (en) * 2017-07-14 2017-10-20 北京邮电大学 Counterfeit application detection method and system
CN108898013A (en) * 2018-06-14 2018-11-27 南京大学 A kind of Android application interface similarity-rough set method dividing feature vector based on layout

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
付雄 等: "《基于界面相似度的Android仿冒应用检测研究》", 《计算机科学》, 15 June 2023 (2023-06-15), pages 1 - 7 *
刘永明;杨婧;: "基于图像相似性的Android钓鱼恶意应用检测方法", 计算机系统应用, no. 12, 15 December 2014 (2014-12-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115225930A (en) * 2022-07-25 2022-10-21 广州博冠信息科技有限公司 Processing method and device for live interactive application, electronic equipment and storage medium
CN115225930B (en) * 2022-07-25 2024-01-09 广州博冠信息科技有限公司 Live interaction application processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108304720B (en) Android malicious program detection method based on machine learning
CN106951780B (en) Beat again the static detection method and device of packet malicious application
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN105184160B (en) A kind of method of the Android phone platform application program malicious act detection based on API object reference relational graphs
CN106845240A (en) A kind of Android malware static detection method based on random forest
US9317607B2 (en) Executing a fast crawl over a computer-executable application
CN101751530B (en) Method for detecting loophole aggressive behavior and device
Yue et al. RepDroid: an automated tool for Android application repackaging detection
CN105068921A (en) App comparative analysis based Android application store credibility evaluation method
Wang et al. LSCDroid: Malware detection based on local sensitive API invocation sequences
CN104680065A (en) Virus detection method, virus detection device and virus detection equipment
CN108446572A (en) A kind of privacy authority management method based on service granularity
Arslan AndroAnalyzer: android malicious software detection based on deep learning
CN108090360A (en) The Android malicious application sorting technique and system of a kind of Behavior-based control feature
Li et al. Large-scale third-party library detection in android markets
Sanz et al. Instance-based anomaly method for Android malware detection
CN111324893B (en) Detection method and background system for android malicious software based on sensitive mode
CN113158251A (en) Application privacy disclosure detection method, system, terminal and medium
Lee et al. Understanding {iOS-based} Crowdturfing Through Hidden {UI} Analysis
Srivastava et al. Android malware detection amid COVID-19
CN116932381A (en) Automatic evaluation method for security risk of applet and related equipment
Ikeda et al. An empirical study of readme contents for javascript packages
Chew et al. ESCAPADE: Encryption-type-ransomware: System call based pattern detection
CN113918944A (en) Android counterfeit application detection method based on interface layout
CN117009972A (en) Vulnerability detection method, vulnerability detection device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination