CN107766342B - Application identification method and device - Google Patents

Application identification method and device Download PDF

Info

Publication number
CN107766342B
CN107766342B CN201610670627.7A CN201610670627A CN107766342B CN 107766342 B CN107766342 B CN 107766342B CN 201610670627 A CN201610670627 A CN 201610670627A CN 107766342 B CN107766342 B CN 107766342B
Authority
CN
China
Prior art keywords
application
information
identified
legal
illegal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610670627.7A
Other languages
Chinese (zh)
Other versions
CN107766342A (en
Inventor
邱勤
袁捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201610670627.7A priority Critical patent/CN107766342B/en
Publication of CN107766342A publication Critical patent/CN107766342A/en
Application granted granted Critical
Publication of CN107766342B publication Critical patent/CN107766342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Stored Programmes (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses an application identification method and device, which are used for establishing a legal application list and an illegal application list; respectively determining the basic information and the characteristic information of each application in a legal application list and an illegal application list; determining the legality of the application to be identified according to whether the basic information of the application to be identified is matched with the basic information of each application in the legal application list and the illegal application list; and when the basic information of the application to be identified is not matched with the basic information of each application in the legal application list and the illegal application list, determining the legality of the application to be identified according to the basic information and/or the characteristic information of the application to be identified and the basic information and/or the characteristic information of each application in the legal application list and/or the illegal application list and a preset rule. The invention also discloses an application identification device.

Description

Application identification method and device
Technical Field
The invention relates to the field of mobile application security, in particular to an application identification method and device.
Background
The application of the mobile internet is rapidly popularized, and meanwhile, the security threat is increasingly severe; especially, in recent years, mobile payment services have been developed dramatically, and a huge interest space has prompted illegal organizations to distribute illegal applications through modes of emulation, secondary packaging, tamper implantation and the like. The illegal application is usually called as a emulational application, and refers to a disguised application which is re-issued after tampering or simulation of an official application, which is used for luring an end user to download and use, so as to destroy or obtain illegal benefits, and may seriously damage the benefits of developers, operators and end users of the legal application.
Public data show that the illegal manufacturing situation of the mobile application in 2015 is rampant, especially the Android application; on average, each legal version application corresponds to 92 pirate APPs, wherein each software application corresponds to 100 pirates, and each game application corresponds to 66 pirates; compared with 2014, the data is increased by 3.5 times on average; the higher the popularity of the application is, the larger the download amount is, and the larger the corresponding illegal application quantity is; the main reasons for this problem are:
1. the lack of a complete and effective official application white list feature library leads to no basis for illegal application monitoring and lack of an analysis benchmark;
2. the existing monitoring and analyzing technical method for illegal application has the defects in actual service application, the target range is too wide, and the technical scheme adopted at the present stage needs to invest more network resources and hardware equipment resources;
3. the method can not eliminate the illegal application installed on the user terminal and lacks the suppression means of the illegal application.
Through the document retrieval in the prior art, the technical scheme in the prior art mostly crawls application information from channels such as an application publishing mall, a network forum, a webpage and the like in a web crawler mode, downloads application samples, and unpacks, extracts, classifies and stores characteristic information of the downloaded application samples each time. When official legal application needs to monitor whether illegal application exists, extracting characteristic information of the legal application, and finding illegal application or suspected illegal application by comparing and analyzing algorithm rules with a stored sample library. The technical scheme needs larger network and hardware equipment resource support, the number of application samples on the mainstream application release mall ranked in top ten at home at present exceeds 30 ten thousand, if the application samples are calculated according to the average size of 20MB of each application, the required application sample storage space is at least 5.7TB, and in addition, the application unpacking and the storage of characteristic information are added, the application storage space on a single application mall at least needs 12 TB; in addition, the application can be updated and crawled quickly only by using larger bandwidth resources, so that the construction cost for recognizing and discovering illegal applications is higher.
The other technical scheme is that the method comprises the steps of crawling application release webpage information, obtaining basic information of the application such as name, description, version, category, author and the like, classifying and then warehousing; when an official legal application needs to monitor piracy, extracting basic characteristic information of the legal application, comparing and analyzing the basic characteristic information with application information base data to obtain an application set needing further comparison and analysis, downloading the application from the Internet to the local by a crawler engine, extracting characteristic information of the downloaded application, and analyzing illegal application and suspected illegal application by an algorithm rule. Compared with the previous scheme, the scheme has the advantages that the resource is greatly saved, but the effectiveness is poor, and because a uniform legal application white list library is not established, the phenomenon of data omission still exists during comparison and analysis.
In summary, the existing illegal application identification method has the following disadvantages:
1. searching illegal application from an internet entrance in an analysis range in a web crawler mode from a massive application set according to an algorithm rule, wherein due to the large target size, the cost of resources required to be input is high, and the identification coverage rate of the illegal application is not high;
2. in the aspect of identification algorithm, the prior art is based on the scheme of applying binary file characteristics or name word segmentation similarity or basic attribute comparison and the like; the applied content characteristics are not subjected to correlation analysis, so that the identification accuracy rate is insufficient;
3. the technical scheme of the existing illegal application identification is that aiming at a specific type of genuine application, an illegal application set is obtained by matching from a large amount of unknown applications according to algorithm rules; on one hand, the scheme has long recognition time-consuming period, and on the other hand, one-time batch recognition analysis cannot be carried out on all installed applications on the intelligent terminal;
4. in the prior art, after illegal application is identified, the illegal application is manually processed in a reporting and early warning mode, and the legal application cannot be directly replaced for a terminal user.
Therefore, the improvement of the accuracy and the recognition efficiency of illegal application recognition is an urgent problem to be solved.
Disclosure of Invention
In view of this, embodiments of the present invention are expected to provide an application identification method and apparatus, which can improve accuracy and efficiency of identifying an illegal application, and reduce resource cost of the illegal application identification.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
the embodiment of the invention provides an application identification method, which comprises the following steps: establishing a legal application list and an illegal application list, and respectively determining the basic information and the characteristic information of each application in the legal application list and the illegal application list; the method further comprises the following steps:
determining the legality of the application to be identified according to whether the basic information of the application to be identified is matched with the basic information of each application in the legal application list and the illegal application list;
and when the basic information of the application to be identified is not matched with the basic information of each application in the legal application list and the illegal application list, determining the legality of the application to be identified according to the basic information and/or the characteristic information of the application to be identified and the basic information and/or the characteristic information of each application in the legal application list and/or the illegal application list and a preset rule.
In the foregoing solution, the determining the validity of the application to be identified according to the preset rule includes:
comparing the application to be identified with first specified information in the basic information and/or the characteristic information of each application in the legal application list, and if the first specified information is the same as the basic information and/or the characteristic information of each application in the legal application list, determining that the application to be identified is a legal application;
comparing the first specified information in the basic information and/or the characteristic information of each application in the application to be identified and the illegal application list, and if the first specified information and/or the characteristic information are the same, determining that the application to be identified is illegal application;
if the first designated information of the application to be identified is different from the first designated information of each application in the legal application list and the illegal application list, determining the matching degree of the application to be identified and the basic information and/or the characteristic information of each application in the legal application list, and determining the legality of the application to be identified according to the matching degree;
the first specifying information includes: a Message Digest Algorithm 5(MD5, Message Digest Algorithm 5) value, and/or signature certificate information is applied.
In the above solution, the determining the matching degree between the application to be identified and the basic information and/or the feature information of each application in the legal application list, and determining the validity of the application to be identified according to the matching degree, includes:
presetting and determining more than one second specified information in basic information and/or characteristic information, and determining the information matching degree of each second specified information of the application to be identified and each second specified information applied in the legal application list;
determining the product of each information matching degree and a preset weight corresponding to the information matching degree;
determining the average value of the sum of the products as the similarity, and comparing the similarity with a preset legal threshold value and an illegal threshold value;
if the similarity is not smaller than the legal threshold, determining that the application to be identified is legal application, if the similarity is not larger than the illegal threshold, determining that the application to be identified is illegal application, otherwise, determining that the application to be identified is application to be distinguished;
the second specifying information includes: application name, and/or application package name, and/or configuration file structure tree, and/or source directory structure tree, and/or resource file MD5 value.
In the above scheme, the determining the feature information of each application in the legal application list and the illegal application list includes: and (3) unpacking and reversing each application by using an automatic reversing technology to obtain the characteristic information of each application.
In the above scheme, the method further comprises: deleting an illegal application or replacing the illegal application with a legitimate application.
In the above scheme, the basic information includes: application name, and/or signature certificate information, and/or application package name, and/or application tag, and/or application file size, and/or application version number, and/or application MD5 value, and/or application installation file information;
the characteristic information includes: a profile directory table, and/or a source code directory structure tree, and/or a profile structure tree, and/or a resource file directory table, and/or a resource file MD5 value;
the source code directory structure tree includes: source file directory structure, and/or source file size, and/or source file key functions.
The embodiment of the invention also provides an application identification device, which comprises: the device comprises a setting module, a first determining module and a second determining module; wherein the content of the first and second substances,
the setting module is used for establishing a legal application list and an illegal application list and respectively determining the basic information and the characteristic information of each application in the legal application list and the illegal application list; the method further comprises the following steps:
the first determining module is used for determining the legality of the application to be identified according to whether the basic information of the application to be identified is matched with the basic information of each application in the legal application list and the illegal application list;
and the second determining module is used for determining the legality of the application to be identified according to the basic information and/or the characteristic information of the application to be identified and the basic information and/or the characteristic information of each application in the legal application list and/or the illegal application list when the basic information of the application to be identified is not matched with the basic information of each application in the legal application list and the illegal application list.
In the foregoing solution, the second determining module is specifically configured to:
comparing the application to be identified with first specified information in the basic information and/or the characteristic information of each application in the legal application list, and if the first specified information is the same as the basic information and/or the characteristic information of each application in the legal application list, determining that the application to be identified is a legal application;
comparing the first specified information in the basic information and/or the characteristic information of each application in the application to be identified and the illegal application list, and if the first specified information and/or the characteristic information are the same, determining that the application to be identified is illegal application;
if the first designated information of the application to be identified is different from the first designated information of each application in the legal application list and the illegal application list, determining the matching degree of the application to be identified and the basic information and/or the characteristic information of each application in the legal application list, and determining the legality of the application to be identified according to the matching degree;
the first specifying information includes: an application MD5 value, and/or signature certificate information.
In the foregoing solution, the second determining module is specifically configured to:
presetting and determining more than one second specified information in basic information and/or characteristic information, and determining the information matching degree of each second specified information of the application to be identified and each second specified information applied in the legal application list;
determining the product of each information matching degree and a preset weight corresponding to the information matching degree;
determining the average value of the sum of the products as the similarity, and comparing the similarity with a preset legal threshold value and an illegal threshold value;
if the similarity is not smaller than the legal threshold, determining that the application to be identified is legal application, if the similarity is not larger than the illegal threshold, determining that the application to be identified is illegal application, otherwise, determining that the application to be identified is application to be distinguished;
the second specifying information includes: application name, and/or application package name, and/or configuration file structure tree, and/or source directory structure tree, and/or resource file MD5 value.
In the above scheme, the basic information includes: application name, and/or signature certificate information, and/or application package name, and/or application tag, and/or application file size, and/or application version number, and/or application MD5 value, and/or application installation file information;
the characteristic information includes: a profile directory table, and/or a source code directory structure tree, and/or a profile structure tree, and/or a resource file directory table, and/or a resource file MD5 value.
According to the application identification method and device provided by the embodiment of the invention, a legal application list and an illegal application list are established, and the basic information and the characteristic information of each application in the legal application list and the illegal application list are respectively determined; determining the legality of the application to be identified according to whether the basic information of the application to be identified is matched with the basic information of each application in the legal application list and the illegal application list; and when the basic information of the application to be identified is not matched with the basic information of each application in the legal application list and the illegal application list, determining the legality of the application to be identified according to the basic information and/or the characteristic information of the application to be identified and the basic information and/or the characteristic information of each application in the legal application list and/or the illegal application list and a preset rule. Therefore, by establishing the legal application list and the illegal application list, reference information is established for identifying the application, and multiple identification is carried out on the application to be identified, so that the accuracy of identifying the illegal application can be improved, and the efficiency of identifying the illegal application can be improved; and the resource cost of illegal application identification can be reduced.
Drawings
FIG. 1 is a schematic flow chart of an identification method applied in an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of creating a legal application list and an illegal application list according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart illustrating the preliminary identification of an application to be identified according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating further identification of an application to be identified according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating steps for replacing illegal applications according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an identification device applied in the embodiment of the present invention.
Detailed Description
In the embodiment of the invention, a legal application list and an illegal application list are established, and the basic information and the characteristic information of each application in the legal application list and the illegal application list are respectively determined; determining the legality of the application to be identified according to whether the basic information of the application to be identified is matched with the basic information of each application in the legal application list and the illegal application list; and when the basic information of the application to be identified is not matched with the basic information of each application in the legal application list and the illegal application list, determining the legality of the application to be identified according to the basic information and/or the characteristic information of the application to be identified and the basic information and/or the characteristic information of each application in the legal application list and/or the illegal application list according to a preset rule.
Here, the legal application list may be a white list in actual applications; the list of illegitimate applications may be a blacklist of what is said to be in actual use.
The present invention will be described in further detail with reference to examples.
As shown in fig. 1, the method for identifying an application according to an embodiment of the present invention includes:
step 101: establishing a legal application list and an illegal application list, and respectively determining the basic information and the characteristic information of each application in the legal application list and the illegal application list;
specifically, as shown in fig. 2, the specific steps of establishing the legal application list and the illegal application list may include:
step 1011: reporting all legal applications or found illegal applications through an automatic information system or a manual mode;
step 1012: the uniqueness of the MD5 value of each application can be verified, and the integrity of each application can be verified;
step 1013: if the MD5 value is repeated or the application is incomplete, refusing to enter a legal application list or an illegal application list, ending the process, otherwise, entering the step 1014;
step 1014: extracting basic information of each application; here, some existing application basic information extraction methods, such as existing software or instructions, may be adopted to extract the basic information of the application; according to the reported categories, a legal application list or an illegal application list is selected and input; the entered basic information of the legitimate application list is not limited to: application name, and/or signature certificate information, and/or application package name, and/or application tag, and/or application file size, and/or application version number, and/or application MD5 value, and/or application installation file information, developer, operator, etc.; the basic information of the entered illegal application list is not limited to: the method comprises the following steps of (1) applying a name, signature certificate information, an application package name, an application mark, an application file size, an application version number, an application MD5 value, application installation file information, a discovery mode, a download amount, illegal feature description and the like;
step 1015: extracting characteristic information of the application; the characteristic information of the application can be deeply extracted through reverse application software, and a legal application list or an illegal application list is selected and input according to the reported categories; the characteristic information includes: a configuration file directory table, and/or a source code directory structure tree, and/or a configuration file structure tree, and/or a resource file directory table, and/or a resource file MD5 value, and/or a resource file size, and/or a configuration file structure index table, and/or a resource file structure index table, the characteristic information comprising: a configuration file directory table, and/or a source code directory structure tree, and/or a configuration file structure tree, and/or a resource file directory table, and/or a resource file MD5 value, and/or a library file, and/or a Software Development Kit (SDK) file structure, and/or a resource file structure, etc.; wherein the source code directory structure tree comprises: source file directory structure, and/or source file size, and/or source file key functions.
In practical applications, the application identification may be performed by a terminal in which the application to be identified is installed or an external terminal through data connection to the terminal in which the application to be identified is installed.
Step 102: determining the legality of the application to be identified according to whether the basic information of the application to be identified is matched with the basic information of each application in the legal application list and the illegal application list;
firstly, comparing basic information of an application to be identified, and if the basic information of the application to be identified is matched with the basic information of the application in a legal application list, determining that the application to be identified is a legal application; if the basic information of the application to be identified is matched with the basic information of the application in the illegal application list, determining that the application to be identified is an illegal application; here, all the acquired basic information may be compared;
as shown in fig. 3, the specific step of determining the validity of the application to be identified by comparing the basic information includes:
step 1021: extracting basic information of an installed application to be identified, wherein the basic information of the application to be identified comprises: application name, and/or signature certificate information, and/or application package name, and/or application tag, and/or application file size, and/or application version number, and/or application MD5 value, and/or application installation file information, etc.;
step 1022: comparing the basic information of the application to be identified with the basic information of each application in an illegal application list, if the basic information of the application to be identified is matched with the basic information of the application in the illegal application list, determining that the application to be identified is the illegal application, and finishing the primary identification; otherwise, go to step 1023;
step 1023: comparing the basic information of the application to be identified with the basic information of each application in a legal application list, if the basic information of the application to be identified is matched with the basic information of the application in the legal application list, determining that the application to be identified is the legal application, and finishing the primary identification; otherwise, go to step 1024;
step 1024: and if the basic information of the application to be identified is not matched with the basic information of the applications in the legal application list and the illegal application list, turning to the next step and carrying out deeper comparison.
Step 103: when the basic information of the application to be identified is not matched with the basic information of each application in the legal application list and the illegal application list, determining the legality of the application to be identified according to the basic information and/or the characteristic information of the application to be identified and the basic information and/or the characteristic information of each application in the legal application list and/or the illegal application list according to a preset rule;
specifically, the applications to be identified, which have completed the comparison of the basic information but cannot be identified, can be further identified; firstly, comparing the application to be identified with first specified information in the basic information and/or the characteristic information of each application in a legal application list, and if the first specified information and the basic information and/or the characteristic information are the same, determining that the application to be identified is a legal application; comparing the first specified information in the basic information and/or the characteristic information of each application in the application to be identified and the illegal application list, and if the first specified information and/or the characteristic information are the same, determining that the application to be identified is illegal application; here, the first specification information may be one or more pieces of information having a unique characteristic, such as MD5 information or signature certificate information of an application, selected by a user from basic information or feature information; the legality of the application to be identified can be determined by comparing the first specified information;
if the legality of the application to be identified still cannot be determined after the first specified information is compared, determining the legality of the application to be identified which cannot be identified according to the matching degree by determining the matching degree of the application to be identified which cannot be identified and the basic information and/or the feature information of each application in the legal application list; more than one second specifying information can be preset and determined in the basic information and the characteristic information, and the information matching degree of each second specifying information of the application to be identified and each second specifying information applied in the legal application list is determined; determining the product of the information matching degree and a preset weight corresponding to the matching degree; determining the average value of the sum of the products as a similarity, and comparing the similarity with a preset legal threshold value and an illegal threshold value; if the similarity is not smaller than the legal threshold, determining that the application to be identified is legal application, if the similarity is not larger than the illegal threshold, determining that the application to be identified is illegal application, otherwise, determining that the application to be identified is application to be distinguished. Wherein the second specification information may include: information such as application name, and/or application package name, and/or configuration file structure tree, and/or source code directory structure tree, and/or resource file MD5 value;
here, the application to be identified may be compared with matching degrees of all legitimate application list applications, and the highest similarity may be compared with a threshold; or according to the basic information comparison result, one or more legal application list applications in the comparison result are taken to carry out matching degree comparison;
specifically, as shown in fig. 4, the step of further identifying the application to be identified by deeply comparing the basic information and the feature information includes:
step 10201: judging whether the MD5 value of the application to be identified is the same as the MD5 value in the illegal application list application library, if so, indicating that the application to be identified is illegal and ending the identification process; otherwise, go to 1022; because the MD5 value has uniqueness, the application to be identified without complete matching of the basic information can be judged only by the MD5 value;
step 10202: judging whether signature certificate information of the application to be identified is consistent with a signature certificate applied in an illegal application list application library or not, if so, indicating that the application to be identified and the illegal application list application are issued by the same developer or operator, indicating that the application to be identified is illegal application, and ending the identification process; otherwise, go to 1023; because the signature certificate information can indicate developers, generally, applications developed by the same developers are considered to have the same legality, and therefore applications to be identified, which are not completely matched with the basic information, can be judged only through the signature certificate information;
step 10203: judging whether the MD5 value of the application to be identified is the same as the MD5 value in the legal application list application library or not, if so, indicating that the application to be identified is the legal application list application and is the legal application, and ending the identification process; otherwise, entering 1024;
step 10204: judging whether the signature certificate information of the application to be identified is consistent with the signature certificate applied in the legal application list application library, if so, indicating that the application to be identified and the legal application list application are issued by the same developer or operator, indicating that the application to be identified is the legal application, and ending the identification process; otherwise, proceed to 1025. Before or after the comparison of the signature certificates, the application names and the package names in the basic information can be matched and compared with the application names and the package names in the legal application list, and the application to be identified is further identified; if the application name, the package name and the certificate are the same, the application is possible to be an application upgrading version or a historical version in a legal application list and is legal application; if the name and the package name are different but the signature certificate is the same, the application to be identified and the legal application list application are issued by the same developer, belong to the legal application list application and are legal applications;
step 10205: calculating similarity values of the application name to be identified and the application name in the legal application list by adopting a keyword matching algorithm; performing word segmentation calculation on application names with a plurality of words, wherein a group with the highest similarity value can be selected;
step 10206: calculating similarity values of the application package names to be identified and the application package names in the legal application list by adopting a keyword matching algorithm; during package name similarity analysis, common keywords such as com, cn, org, android, ios, java, lang, string and the like can be removed, and the accuracy of similarity value results is improved;
step 10207: analyzing the similarity between the feature tree of the application configuration file to be identified and the feature tree of the application configuration file in a legal application list, mainly analyzing the four major component structures of Service, Intent, Content and Activity declared in the Android manifest file for Android applications, and skipping the component structures in the IOS;
step 10208: analyzing the similarity between a source code directory structure tree of an application to be identified and an application source code directory structure tree in a legal application list, wherein the analysis on the similarity of the source code directory structure tree can be analyzed from three dimensions of the similarity of a source code file directory structure, the similarity of the size of a source code file and the similarity of a key function of the source code file, and a final similarity value is calculated according to weight;
step 10209: analyzing the similarity between the value of the resource file MD5 under the application resource file directory to be identified and the value of the resource file MD5 applied by the legal application list; the source codes are applied or implanted, configuration files are added and part of resource files are tampered in most illegal applications, so that the original resource files are slightly changed;
step 10210: calculating the similarity value of the application to be identified and the application in the legal application list application library according to the weight occupied by each similarity set by the algorithm rule; the calculation rule can be expressed by expression (1):
Figure BDA0001079198720000121
wherein S represents similarity, N represents application name similarity, and w1Corresponding to the application name similarity weight, P represents the application package name similarity, w2Corresponding to the application package name similarity weight, single CiRepresenting the similarity of the configuration file structure, w3Corresponding to the weight of the similarity of the results of the profiles, SiRepresenting the similarity of individual source code files, w4Corresponding to the source code structureWeight of similarity, AiRepresenting the degree of similarity, w, of a single resource file MD5 value5Corresponding to the resource file MD5 value similarity weight; wherein n represents the number of configuration file structures from 1 to n, m represents the number of source code files from 1 to m, and q represents the number of resource files from 1 to q; the weight can be dynamically adjusted in the algorithm model to achieve the optimal recognition effect;
step 10211: comparing the similarity value obtained by the method with an illegal application similarity threshold value set by an algorithm rule, judging whether the similarity value meets an illegal application threshold value standard, if the similarity is not smaller than the legal threshold value, determining that the application to be identified is legal application, if the similarity is smaller than the illegal threshold value, determining that the application to be identified is illegal application, otherwise, determining that the application to be identified is application to be distinguished; namely:
if S > -LL: judging the application to be legal;
if UL < S < LL: determining the application is suspected to be illegal;
if S < ═ UL: judging the application to be illegal;
wherein S is the overall similarity calculated according to the expression (1); LL is the threshold limit for judging legal application, DL is the threshold limit for judging illegal application;
step 10212: a prompt can be sent for suspected illegal application, and the validity of the suspected illegal application is further confirmed manually;
further, the method provided by the embodiment of the present invention further includes: deleting an illegal application or replacing the illegal application with a legal application; as shown in fig. 5, the step of replacing the illegal application with the legal application may include:
step 501: selecting a piece of software which is identified as illegal application on a user interface, and displaying the information of the illegal application in an information display area without limitation to: name, icon, version, size, etc.;
step 502: triggering illegal application replacement operation from a user interface, unloading the identified illegal application connected with the intelligent terminal through an Android Debug Bridge (ADB) command in an Android system, and replacing the illegal application into a corresponding legal application list application in a legal application list;
step 503: sending an illegal application uninstalling instruction to the intelligent terminal equipment through a message command;
step 504: after receiving an uninstalling appointed illegal application instruction, executing an application uninstalling instruction, and uninstalling the illegal application to be deleted;
step 505: sending a legal application installation instruction;
step 506: and after receiving the application installation instruction, executing installation command operation on the corresponding application software program package, and ending the process.
As shown in fig. 6, the apparatus for identifying an application according to an embodiment of the present invention includes: a setting module 61, a first determining module 62 and a second determining module 63; wherein the content of the first and second substances,
the setting module 61 is configured to establish a legal application list and an illegal application list, and determine basic information and feature information of each application in the legal application list and the illegal application list respectively;
specifically, as shown in fig. 2, the specific steps of the setting module 61 establishing the legal application list and the illegal application list may include:
step 1011: reporting all legal applications or found illegal applications through an automatic information system or a manual mode;
step 1012: the uniqueness of the MD5 value of each application can be verified, and the integrity of each application can be verified;
step 1013: if the MD5 value is repeated or the application is incomplete, refusing to enter a legal application list or an illegal application list, ending the process, otherwise, entering the step 1014;
step 1014: extracting basic information of each application; here, some existing application basic information extraction methods, such as existing software or instructions, may be adopted to extract the basic information of the application; according to the reported categories, a legal application list or an illegal application list is selected and input; the entered basic information of the legitimate application list is not limited to: application name, and/or signature certificate information, and/or application package name, and/or application tag, and/or application file size, and/or application version number, and/or application MD5 value, and/or application installation file information, developer, operator, etc.; the basic information of the entered illegal application list is not limited to: the method comprises the following steps of (1) applying a name, signature certificate information, an application package name, an application mark, an application file size, an application version number, an application MD5 value, application installation file information, a discovery mode, a download amount, illegal feature description and the like;
step 1015: extracting characteristic information of the application; the characteristic information of the application can be deeply extracted through reverse application software, and a legal application list or an illegal application list is selected and input according to the reported categories; the characteristic information includes: a configuration file directory table, and/or a source code directory structure tree, and/or a configuration file structure tree, and/or a resource file directory table, and/or a resource file MD5 value, and/or a resource file size, and/or a configuration file structure index table, and/or a resource file structure index table, the characteristic information comprising: a configuration file directory table, and/or a source code directory structure tree, and/or a configuration file structure tree, and/or a resource file directory table, and/or a resource file MD5 value, and/or a library file, and/or an SDK file structure, and/or a resource file structure, etc.; wherein the source code directory structure tree comprises: a source code file directory structure, and/or a source code file size, and/or a source code file key function;
in practical applications, the application identification may be performed by a terminal in which the application to be identified is installed or an external terminal through data connection to the terminal in which the application to be identified is installed.
The first determining module 62 is configured to determine the validity of the application to be identified according to whether the basic information of the application to be identified matches the basic information of each application in the legal application list and the illegal application list;
first, the first determining module 62 may compare the basic information of the application to be identified, and determine that the application to be identified is a valid application if the basic information of the application to be identified matches the basic information of the applications in the valid application list; if the basic information of the application to be identified is matched with the basic information of the application in the illegal application list, determining that the application to be identified is an illegal application; here, all the acquired basic information may be compared;
as shown in fig. 3, the specific step of determining the validity of the application to be identified by comparing the basic information includes:
step 1021: extracting basic information of an installed application to be identified, wherein the basic information of the application to be identified comprises: application name, and/or signature certificate information, and/or application package name, and/or application tag, and/or application file size, and/or application version number, and/or application MD5 value, and/or application installation file information, etc.;
step 1022: comparing the basic information of the application to be identified with the basic information of each application in an illegal application list, if the basic information of the application to be identified is matched with the basic information of the application in the illegal application list, determining that the application to be identified is the illegal application, and finishing the primary identification; otherwise, go to step 1023;
step 1023: comparing the basic information of the application to be identified with the basic information of each application in a legal application list, if the basic information of the application to be identified is matched with the basic information of the application in the legal application list, determining that the application to be identified is the legal application, and finishing the primary identification; otherwise, go to step 1024;
step 1024: and if the basic information of the application to be identified is not matched with the basic information of the applications in the legal application list and the illegal application list, turning to the next step and carrying out deeper comparison.
The second determining module 63 is configured to determine, according to a preset rule, validity of the application to be identified according to the basic information and/or the feature information of the application to be identified and the basic information and/or the feature information of each application in the legal application list and/or the illegal application list when the basic information of the application to be identified is not matched with the basic information of each application in the legal application list and the illegal application list;
specifically, the second determining module 63 may further identify the to-be-identified application that has completed the comparison of the basic information but cannot be identified; firstly, comparing the application to be identified with first specified information in the basic information and/or the characteristic information of each application in a legal application list, and if the first specified information and the basic information and/or the characteristic information are the same, determining that the application to be identified is a legal application; comparing the first specified information in the basic information and/or the characteristic information of each application in the application to be identified and the illegal application list, and if the first specified information and/or the characteristic information are the same, determining that the application to be identified is illegal application; here, the first specification information may be one or more pieces of information having a unique characteristic, such as MD5 information or signature certificate information of an application, selected by a user from basic information or feature information; the legality of the application to be identified can be determined by comparing the first specified information;
if the legality of the application to be identified still cannot be determined after the first specified information is compared, determining the legality of the application to be identified which cannot be identified according to the matching degree by determining the matching degree of the application to be identified which cannot be identified and the basic information and/or the feature information of each application in the legal application list; more than one second specifying information can be preset and determined in the basic information and the characteristic information, and the information matching degree of each second specifying information of the application to be identified and each second specifying information applied in the legal application list is determined; determining the product of the information matching degree and a preset weight corresponding to the matching degree; determining the average value of the sum of the products as a similarity, and comparing the similarity with a preset legal threshold value and an illegal threshold value; if the similarity is not smaller than the legal threshold, determining that the application to be identified is legal application, if the similarity is not larger than the illegal threshold, determining that the application to be identified is illegal application, otherwise, determining that the application to be identified is application to be distinguished. Wherein the second specification information may include: information such as application name, and/or application package name, and/or configuration file structure tree, and/or source code directory structure tree, and/or resource file MD5 value;
here, the application to be identified may be compared with matching degrees of all legitimate application list applications, and the highest similarity may be compared with a threshold; or according to the basic information comparison result, one or more legal application list applications in the comparison result are taken to carry out matching degree comparison;
specifically, as shown in fig. 4, the step of further identifying the application to be identified by deeply comparing the basic information and the feature information includes:
step 10201: judging whether the MD5 value of the application to be identified is the same as the MD5 value in the illegal application list application library, if so, indicating that the application to be identified is illegal and ending the identification process; otherwise, go to 1022; because the MD5 value has uniqueness, the application to be identified without complete matching of the basic information can be judged only by the MD5 value;
step 10202: judging whether signature certificate information of the application to be identified is consistent with a signature certificate applied in an illegal application list application library or not, if so, indicating that the application to be identified and the illegal application list application are issued by the same developer or operator, indicating that the application to be identified is illegal application, and ending the identification process; otherwise, go to 1023; because the signature certificate information can indicate developers, generally, applications developed by the same developers are considered to have the same legality, and therefore applications to be identified, which are not completely matched with the basic information, can be judged only through the signature certificate information;
step 10203: judging whether the MD5 value of the application to be identified is the same as the MD5 value in the legal application list application library or not, if so, indicating that the application to be identified is the legal application list application and is the legal application, and ending the identification process; otherwise, entering 1024;
step 10204: judging whether the signature certificate information of the application to be identified is consistent with the signature certificate applied in the legal application list application library, if so, indicating that the application to be identified and the legal application list application are issued by the same developer or operator, indicating that the application to be identified is the legal application, and ending the identification process; otherwise, proceed to 1025. Before or after the comparison of the signature certificates, the application names and the package names in the basic information can be matched and compared with the application names and the package names in the legal application list, and the application to be identified is further identified; if the application name, the package name and the certificate are the same, the application is possible to be an application upgrading version or a historical version in a legal application list and is legal application; if the name and the package name are different but the signature certificate is the same, the application to be identified and the legal application list application are issued by the same developer, belong to the legal application list application and are legal applications;
step 10205: calculating similarity values of the application name to be identified and the application name in the legal application list by adopting a keyword matching algorithm; performing word segmentation calculation on application names with a plurality of words, wherein a group with the highest similarity value can be selected;
step 10206: calculating similarity values of the application package names to be identified and the application package names in the legal application list by adopting a keyword matching algorithm; during package name similarity analysis, common keywords such as com, cn, org, android, ios, java, lang, string and the like can be removed, and the accuracy of similarity value results is improved;
step 10207: analyzing the similarity between the feature tree of the application configuration file to be identified and the feature tree of the application configuration file in a legal application list, mainly analyzing the four major component structures of Service, Intent, Content and Activity declared in the Android manifest file for Android applications, and skipping the component structures in the IOS;
step 10208: analyzing the similarity between a source code directory structure tree of an application to be identified and an application source code directory structure tree in a legal application list, wherein the analysis on the similarity of the source code directory structure tree can be analyzed from three dimensions of the similarity of a source code file directory structure, the similarity of the size of a source code file and the similarity of a key function of the source code file, and a final similarity value is calculated according to weight;
step 10209: analyzing the similarity between the value of the resource file MD5 under the application resource file directory to be identified and the value of the resource file MD5 applied by the legal application list; the source codes are applied or implanted, configuration files are added and part of resource files are tampered in most illegal applications, so that the original resource files are slightly changed;
step (ii) of10210: calculating the similarity value of the application to be identified and the application in the legal application list application library according to the weight occupied by each similarity set by the algorithm rule; the calculation rule may be represented by expression (1); wherein S represents similarity, N represents application name similarity, and w1Corresponding to the application name similarity weight, P represents the application package name similarity, w2Corresponding to the application package name similarity weight, single CiRepresenting the similarity of the configuration file structure, w3Corresponding to the weight of the similarity of the results of the profiles, SiRepresenting the similarity of individual source code files, w4Corresponding to the source code structure similarity weight, AiRepresenting the degree of similarity, w, of a single resource file MD5 value5Corresponding to the resource file MD5 value similarity weight; wherein n represents the number of configuration file structures from 1 to n, m represents the number of source code files from 1 to m, and q represents the number of resource files from 1 to q; the weight can be dynamically adjusted in the algorithm model to achieve the optimal recognition effect;
step 10211: comparing the similarity value obtained by the method with an illegal application similarity threshold value set by an algorithm rule, judging whether the similarity value meets an illegal application threshold value standard, if the similarity is not smaller than the legal threshold value, determining that the application to be identified is legal application, if the similarity is smaller than the illegal threshold value, determining that the application to be identified is illegal application, otherwise, determining that the application to be identified is application to be distinguished; namely:
if S > -LL: judging the application to be legal;
if UL < S < LL: determining the application is suspected to be illegal;
if S < ═ UL: judging the application to be illegal;
wherein S is the overall similarity calculated according to the expression (1); LL is the threshold limit for judging legal application, DL is the threshold limit for judging illegal application;
step 10212: a prompt can be sent for suspected illegal application, and the validity of the suspected illegal application is further confirmed manually;
further, the apparatus provided in the embodiment of the present invention further includes: a replacement module 64 for deleting an illegal application or replacing the illegal application with a legitimate application; as shown in fig. 5, the step of replacing the illegal application with the legal application by the replacing module 64 may include:
step 501: selecting a piece of software which is identified as illegal application on a user interface, and displaying the information of the illegal application in an information display area without limitation to: name, icon, version, size, etc.;
step 502: triggering illegal application replacement operation from a user interface, unloading the identified illegal application on the connected intelligent terminal through an ADB command in an Android system, and replacing the illegal application with a corresponding legal application list application in a legal application list;
step 503: sending an illegal application uninstalling instruction to the intelligent terminal equipment through a message command;
step 504: after receiving an uninstalling appointed illegal application instruction, executing an application uninstalling instruction, and uninstalling the illegal application to be deleted;
step 505: sending a legal application installation instruction;
step 506: and after receiving the application installation instruction, executing installation command operation on the corresponding application software program package, and ending the process.
In practical applications, the setting module 61, the first determining module 62, the second determining module 63, and the replacing module 64 can be implemented by a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like in a terminal.
The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (6)

1. A method for identifying an application, the method comprising: establishing a legal application list and an illegal application list, and respectively determining the basic information and the characteristic information of each application in the legal application list and the illegal application list; the method further comprises the following steps:
determining the legality of the application to be identified according to whether the basic information of the application to be identified is matched with the basic information of each application in the legal application list and the illegal application list; the basic information includes at least one of: the method comprises the following steps of (1) obtaining application name, signature certificate information, an application package name, an application mark, application file size, application version number, application MD5 value and application installation file information;
when the basic information of the application to be identified is not matched with the basic information of each application in the legal application list and the illegal application list, determining the legality of the application to be identified according to the characteristic information of the application to be identified and the characteristic information of each application in the legal application list and the illegal application list and a preset rule, wherein the method comprises the following steps: comparing the application to be identified with first specified information in the characteristic information of each application in the legal application list, and if the first specified information is the same as the first specified information, determining that the application to be identified is a legal application; comparing first designated information in the characteristic information of each application in the application to be identified and the illegal application list, and if the first designated information is the same as the first designated information, determining that the application to be identified is an illegal application; if the first designated information of the application to be identified is different from the first designated information of each application in the legal application list and the illegal application list, determining the matching degree of the application to be identified and the characteristic information of each application in the legal application list, and determining the legality of the application to be identified according to the matching degree; the characteristic information includes at least one of: the values of a configuration file directory table, a source code directory structure tree, a configuration file structure tree, a resource file directory table and a resource file MD 5; the source code directory structure tree includes at least one of: a source code file directory structure, a source code file size and a source code file key function; the first specifying information includes: a configuration file directory table, and/or a resource file directory table.
2. The method according to claim 1, wherein the determining a matching degree between the application to be identified and the feature information of each application in the legal application list, and determining the validity of the application to be identified according to the matching degree comprises:
presetting and determining more than one second specifying information in the feature information, and determining the information matching degree of each second specifying information of the application to be identified and each second specifying information applied in the legal application list;
determining the product of each information matching degree and a preset weight corresponding to the information matching degree;
determining the average value of the sum of the products as the similarity, and comparing the similarity with a preset legal threshold value and an illegal threshold value;
if the similarity is not smaller than the legal threshold, determining that the application to be identified is legal application, if the similarity is not larger than the illegal threshold, determining that the application to be identified is illegal application, otherwise, determining that the application to be identified is application to be distinguished;
the second specifying information includes: a configuration file structure tree, and/or a source code directory structure tree, and/or a resource file MD5 value.
3. The method of claim 1, wherein determining the characteristic information of each application in the legal application list and the illegal application list comprises: and (3) unpacking and reversing each application by using an automatic reversing technology to obtain the characteristic information of each application.
4. The method of claim 1, further comprising: deleting an illegal application or replacing the illegal application with a legitimate application.
5. An apparatus for identifying an application, the apparatus comprising: the device comprises a setting module, a first determining module and a second determining module; wherein the content of the first and second substances,
the setting module is used for establishing a legal application list and an illegal application list and respectively determining the basic information and the characteristic information of each application in the legal application list and the illegal application list; the basic information includes at least one of: the method comprises the following steps of (1) obtaining application name, signature certificate information, an application package name, an application mark, application file size, application version number, application MD5 value and application installation file information;
the first determining module is used for determining the legality of the application to be identified according to whether the basic information of the application to be identified is matched with the basic information of each application in the legal application list and the illegal application list;
the second determining module is configured to determine, according to a preset rule, validity of the application to be identified according to the feature information of the application to be identified and the feature information of each application in the legal application list and the illegal application list when the basic information of the application to be identified does not match with the basic information of each application in the legal application list and the illegal application list, and includes: comparing the application to be identified with first specified information in the characteristic information of each application in the legal application list, and if the first specified information is the same as the first specified information, determining that the application to be identified is a legal application; comparing first designated information in the characteristic information of each application in the application to be identified and the illegal application list, and if the first designated information is the same as the first designated information, determining that the application to be identified is an illegal application; if the first designated information of the application to be identified is different from the first designated information of each application in the legal application list and the illegal application list, determining the matching degree of the application to be identified and the characteristic information of each application in the legal application list, and determining the legality of the application to be identified according to the matching degree; the characteristic information includes at least one of: the values of a configuration file directory table, a source code directory structure tree, a configuration file structure tree, a resource file directory table and a resource file MD 5; the source code directory structure tree includes at least one of: a source code file directory structure, a source code file size and a source code file key function; the first specifying information includes: a configuration file directory table, and/or a resource file directory table.
6. The apparatus of claim 5, wherein the second determining module is specifically configured to:
presetting and determining more than one second specifying information in the feature information, and determining the information matching degree of each second specifying information of the application to be identified and each second specifying information applied in the legal application list;
determining the product of each information matching degree and a preset weight corresponding to the information matching degree;
determining the average value of the sum of the products as the similarity, and comparing the similarity with a preset legal threshold value and an illegal threshold value;
if the similarity is not smaller than the legal threshold, determining that the application to be identified is legal application, if the similarity is not larger than the illegal threshold, determining that the application to be identified is illegal application, otherwise, determining that the application to be identified is application to be distinguished;
the second specifying information includes: a configuration file structure tree, and/or a source code directory structure tree, and/or a resource file MD5 value.
CN201610670627.7A 2016-08-15 2016-08-15 Application identification method and device Active CN107766342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610670627.7A CN107766342B (en) 2016-08-15 2016-08-15 Application identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610670627.7A CN107766342B (en) 2016-08-15 2016-08-15 Application identification method and device

Publications (2)

Publication Number Publication Date
CN107766342A CN107766342A (en) 2018-03-06
CN107766342B true CN107766342B (en) 2021-11-23

Family

ID=61259930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610670627.7A Active CN107766342B (en) 2016-08-15 2016-08-15 Application identification method and device

Country Status (1)

Country Link
CN (1) CN107766342B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955450B (en) * 2019-12-16 2023-09-29 北京智游网安科技有限公司 Attribution statistical method, system and storage medium of application package file
CN111143833B (en) * 2019-12-23 2022-03-11 绿盟科技集团股份有限公司 Illegal application program category identification method and device
CN113312591A (en) * 2021-05-28 2021-08-27 杭州迈冲科技有限公司 Control method and device based on Android system application white list
CN113934625A (en) * 2021-09-18 2022-01-14 深圳市飞泉云数据服务有限公司 Software detection method, device and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7062567B2 (en) * 2000-11-06 2006-06-13 Endeavors Technology, Inc. Intelligent network streaming and execution system for conventionally coded applications
CN102222199B (en) * 2011-06-03 2013-05-08 奇智软件(北京)有限公司 Method and system for identifying identification of application program
CN103544046A (en) * 2013-10-25 2014-01-29 苏州通付盾信息技术有限公司 Mobile application software reinforcement method
CN104123493B (en) * 2014-07-31 2017-09-26 百度在线网络技术(北京)有限公司 The safety detecting method and device of application program
CN105426706B (en) * 2015-11-20 2018-06-15 北京奇虎科技有限公司 Piracy applies detection method and device, system

Also Published As

Publication number Publication date
CN107766342A (en) 2018-03-06

Similar Documents

Publication Publication Date Title
US11188635B2 (en) File authentication method and apparatus
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN106961419B (en) WebShell detection method, device and system
CN107766342B (en) Application identification method and device
US9349006B2 (en) Method and device for program identification based on machine learning
CN107688743B (en) Malicious program detection and analysis method and system
CN112084497A (en) Method and device for detecting malicious program of embedded Linux system
KR101582601B1 (en) Method for detecting malignant code of android by activity string analysis
CN114077741B (en) Software supply chain safety detection method and device, electronic equipment and storage medium
CN111104579A (en) Identification method and device for public network assets and storage medium
CN102222199A (en) Method and system for identifying identification of application program
CN110084064B (en) Big data analysis processing method and system based on terminal
CN113032792A (en) System service vulnerability detection method, system, equipment and storage medium
CN110071924B (en) Big data analysis method and system based on terminal
CN110135153A (en) The credible detection method and device of software
CN106790025B (en) Method and device for detecting link maliciousness
KR20160090566A (en) Apparatus and method for detecting APK malware filter using valid market data
Feichtner et al. Obfuscation-resilient code recognition in Android apps
CN110691090B (en) Website detection method, device, equipment and storage medium
CN112149115A (en) Method and device for updating virus library, electronic device and storage medium
CN113297583B (en) Vulnerability risk analysis method, device, equipment and storage medium
CN108322912B (en) Method and device for distinguishing short messages
CN107229865B (en) Method and device for analyzing Webshell intrusion reason
CN113434826A (en) Detection method and system for counterfeit mobile application and related products
CN114189585A (en) Crank call abnormity detection method and device and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant