CN106557695B

CN106557695B - A kind of malicious application detection method and system

Info

Publication number: CN106557695B
Application number: CN201510621631.XA
Authority: CN
Inventors: 周建宁; 沈岩; 王巍; 刘志诚
Original assignee: Aspire Digital Technologies Shenzhen Co Ltd
Current assignee: Aspire Digital Technologies Shenzhen Co Ltd
Priority date: 2015-09-25
Filing date: 2015-09-25
Publication date: 2019-05-10
Anticipated expiration: 2035-09-25
Also published as: CN106557695A

Abstract

The present invention relates to a kind of malicious application detection method and systems.The described method includes: S1, the application program to be detected progress static code scanning to receiving, three dimensional analysis application programs are exported with the presence or absence of the malicious act of any malicious act information met in malicious act information bank based on authority application, function call and information, malicious act if it exists, the application program is then labeled as doubtful malicious application, the application program is then labeled as normal use by malicious act if it does not exist；S2, by carried out between the malicious application sample being labeled as in the application program and malicious application sample database of doubtful malicious application based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image file similarity analysis, and the application program that similarity meets setting value is labeled as malicious application.The invention avoids the performance bottlenecks for loading application execution by virtual machine and analyzing, and effectively reduce rate of false alarm, promote the accuracy of identification.

Description

A kind of malicious application detection method and system

Technical field

The present invention relates to development of Mobile Internet technology, more specifically to a kind of malicious application detection method and system.

Background technique

As the universal of mobile intelligent terminal, mobile Internet business flourish, the quantity of mobile application software is in Existing rapid growth trend.The prelude for having opened mobile Internet industry development, intelligence are changed in the subversiveness that mobile intelligent terminal causes Energy terminal changes the Working Life mode of people, and the safety of mobile application software also faces severe situation.

The rapid growth of mobile application software brings spreading unchecked on a large scale for the applications such as various piracies, malicious application, virus. Relative to traditional PC terminal, the malicious application feature of mobile terminal is more obvious, and the mutation speed of malicious application is very fast, daily There is a large amount of mutation malicious application to occur.

Ended for the end of the year 2014, Android platform application software quantity breaks through 2,000,000, becomes the most system of application software Platform, and because the application and development mode of Android platform is determined that, relative to traditional PC terminal, the mutation of malicious application is more For publisher's self-developing and to propagate, the mutation period is longer, and because Android application is easy to carry out reverse engineering, malice Code is easy to formation mutation after being recompiled packing and issues again, so the mutation of malicious application is more easier, to make Frequent at mutation, the period is very short.Therefore for mobile terminal malicious application prevent and treat, how the change of effective solution malicious application Kind identification is particularly important.

In traditional PC terminal, the identification for mutation mainly uses three kinds of methods:

1, it being based on broad spectral features code: being also gene expression characteristics code, gene code, which detects, summarizes the feature of certain class malicious application, and one A gene code can correspond to a major class malicious application.In addition to this, gene code can also effectively tackle mutation malicious application, centainly The awkward situation that condition code killing has no way out to unknown malicious application is compensated in degree.

But there are following limitations for the identification technology based on broad spectral features code:

(1) probability of wrong report is increased.Gene expression characteristics code killing is easy to sentence the normal software with certain feature codes To threaten, some normal softwares can be reported by mistake.

(2) gene expression characteristics code analysis extraction difficulty is very big, needs very professional technical staff, and the extraction of condition code Quality greatly affects final malicious application judgement, therefore the artifical influence factor of this method is very big, and effect depends on In the quality of safe professional technician.

(3) it needs a large amount of sample to be analyzed, before gene expression characteristics code is analyzed and extracts, malice can not be coped with and answered Propagation, it is fast for mobile terminal from malicious application mutation, period short feature is propagated, this method can not effectively solve malice The killing problem of application.

2, trigger-initiated scanning technology: the also known as malicious application scanning technique of Behavior-based control analysis, is to analyze malicious application Behavioral characteristics different from normal software distinguish malicious application, so also can effectively find unknown malicious application, with And the various mutation of malicious application.In security expert's eye, the behavior of malicious application and ordinary procedure is made a world of difference, such as common journey Sequence will not generate file in system core catalogue, will not hang up hook in system, will not register service topsy-turvy etc. Deng.Inspirational education realizes some analytical mathematics of security expert using computer automatic analysis technology, according to the row of application To whether there is malicious act to determine to apply.

The limitation of trigger-initiated scanning technology is as follows:

(1) rate of false alarm is very high.The software of same behavior might not all be malicious application, such as read address list to finger The behavior for determining address transmission, is not necessarily and steals user information, it is also possible to be data backup software.

(2) operational efficiency is low.Since it is desired that running malicious application in virtual machine, and it is collected simultaneously the row of malicious application It is analyzed for data, the operational efficiency of this mode is low, more suitable for running in background server, for there is user's interaction Anti-virus tools, user experience is bad.

3, be based on artificial intelligence (AI) technology: artificial intelligence technology is by behavior integration analysis, to malicious application It practises, constantly voluntarily optimizes the malicious application behavioural characteristic library of oneself, while automatically extracting condition code.From the malicious application of most initial Behavioural characteristic code ultimately forms more optimized behavioural characteristic code library, by continuing to optimize, increasing to cope with various unknown malice Using with malicious application mutation；Simultaneously by automatically extracting malicious application condition code, to enhance the killing applied to known malicious Efficiency.

Artificial intelligence technology main problem is as follows:

(1) artificial intelligence is the process for needing constantly to learn, only when malicious application sample is enough, people Work intelligent engine can complete the learning process of oneself, so that behavioural characteristic code is improved, so this technological lag is in malice The propagation of application.

(2) algorithm model of artificial intelligence is extremely complex, while the feature of malicious application is again changeable, designs good Habit model is extremely difficult, and often a kind of model is not able to satisfy the needs of all applications.

(3) on mobile terminals, malicious application mutation is characterized in that quantity is more, variation is fast, the propagation time is short, a mutation May only propagate several days will disappear, and other mutation occurs, very low using artificial intelligence efficiency under this feature.

In short, mobile Internet business under the new situation, the quantity of mobile terminal has substantially exceeded the number of PC terminal Amount, mobile application becomes the following most important application form can no longer meet using the malicious application detection method in PC epoch It needs, only seeks a kind of completely new solution, could ensure the interests of user, ensure the information security of user, promote to produce The sustainable development of industry chain.

Summary of the invention

The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, providing a kind of malicious application inspection Method and system is surveyed, automatization level is detected to improve the mutation malicious application of mobile application, reduces False Rate, promoted to unknown The discovery efficiency of mutation.

The technical solution adopted by the present invention to solve the technical problems is: propose a kind of malicious application detection method, including Following steps:

S1, static code scanning is carried out to the application program to be detected that receives, based on authority application, function call and Information, which exports three dimensional analysis application programs, whether there is any malicious act information met in malicious act information bank Malicious act, malicious act, then be labeled as doubtful malicious application for the application program, if it does not exist malicious act if it exists, The application program is then labeled as normal use；

S2, it will be labeled as between the malicious application sample in the application program and malicious application sample database of doubtful malicious application Carry out based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image file similarity analysis, and by phase Malicious application is labeled as like the application program that degree meets setting value.

According to one embodiment of present invention, the method also includes:

S3, the application program deposit erroneous judgement information bank that malicious application will be not labeled as in the step S2；

S4, the result based on application program in manual analysis erroneous judgement information bank will not be that malice is answered in the erroneous judgement information bank Application program is labeled as normal use deposit normal use library, and the information of the normal use is stored in white list library；

S5, the result based on application program in manual analysis erroneous judgement information bank will be malicious applications in the erroneous judgement information bank Application program be labeled as malicious application deposit malicious application library, and by the malicious application be stored in malicious application sample database.

According to one embodiment of present invention, the step S1 further comprises:

S11, the application program decompiling to be detected received is formed into code file and corresponding competence profile And resource file, and parse the Apply Names of application program, packet name, signing certificate and bibliographic structure；

S12, reachability matrix model is called, exports three dimensional searches and analysis from authority application, function call and information With the presence or absence of times met in malicious act information bank in code file, competence profile and the resource file that decompiling is formed The malicious act of one malicious act information, wherein the reachability matrix model is based on malicious act information bank and white list library Pre-generated；

S13, the application program that malicious act will be present are labeled as doubtful malicious application and are stored in doubtful malicious application library, will Normal use is labeled as there is no the application program of malicious act and is stored in normal use library.

According to one embodiment of present invention, the step S2 further comprises:

S21, will be labeled as doubtful malicious application application program signing certificate and malicious application sample database in malice It is matched using sample, if the signing certificate is present in malicious application sample database, is directly labeled as the application program Malicious application is simultaneously stored in malicious application library；

If S22, the signing certificate are not present in malicious application sample database, the application name of the further progress application program The similarity analysis of title and packet name, finds out sample set similar with the Apply Names and packet name from malicious application sample database；

If finding the sample set in S23, step S22, by the sample in the sample set respectively with application to be analyzed Program carry out bibliographic structure, text file and image file similarity analysis, calculate similarity value, and have sample with to point When the similarity of the application program of analysis meets setting value, which is labeled as malicious application and is stored in malicious application library In；

If not found in the sample set or step S23 in S24, step S22 does not have sample and application program to be analyzed Similarity when meeting setting value, by malicious application sample database whole samples and application program to be analyzed carry out catalogue knot The similarity analysis of structure, text file and image file calculates similarity value, and is having sample and application program to be analyzed When similarity meets setting value, which is labeled as malicious application and is stored in malicious application library.

According to one embodiment of present invention, the similarity analysis of Apply Names and packet name uses editing distance algorithm, Bibliographic structure similarity analysis uses catalogue Comparison Method, and text file similarity analysis uses editing distance algorithm, image file Similarity analysis uses perceptual hash algorithm.

The present invention is to solve its technical problem also to propose a kind of malicious application detection system, comprising:

Malicious act information bank saves various evils for exporting three dimensions according to authority application, function call and information Meaning behavioural information；

Malicious application sample database, for storing the information of various malicious application samples；

Static inspirational education subsystem, for carrying out static code scanning to the application program to be detected received, It whether there is based on authority application, function call and information three dimensional analysis application programs of output and meet malicious act information The malicious act of any malicious act information in library, malicious act, then be labeled as doubtful malice for the application program if it exists Using malicious act, then be labeled as normal use for the application program if it does not exist；

Similarity analysis subsystem, for doubtful malicious application will to be labeled as by the static inspirational education subsystem It is carried out between malicious application sample in application program and malicious application sample database based on Apply Names, packet name, signing certificate, mesh The similarity analysis of directory structures, text file and image file, and the application program that similarity meets setting value is labeled as disliking Meaning application.

According to one embodiment of present invention, the system also includes:

Doubtful malicious application library is labeled as doubtful malicious application by the static inspirational education subsystem for saving Application program；

Information bank is judged by accident, for saving the application program for not being labeled as malicious application by the similarity analysis subsystem；

Normal use library, for saving the application journey for being labeled as normal use by the static inspirational education subsystem Sequence and the result for judging application program in information bank by accident based on manual analysis are labeled as the application program of normal use；

White list library is labeled as normal use for saving the result based on application program in manual analysis erroneous judgement information bank Application program information；

Malicious application library, for saving the application program for being labeled as malicious application by the similarity analysis subsystem.

According to one embodiment of present invention, the static state inspirational education subsystem further comprises:

Reachability matrix algorithm assembly, it is pre-generated based on permission Shen for loading malicious act information bank and white list library Please, function call and information export the reachability matrix model of three dimensions；

Decompiling component, for the application program decompiling to be detected received to be formed code file and corresponding power Configuration file and resource file are limited, and parses the Apply Names of application program, packet name, signing certificate and bibliographic structure；

Malicious act analytic unit, for calling reachability matrix model, from authority application, function call and information output three With the presence or absence of satisfaction malice in code file, competence profile and the resource file that a dimensional searches and analysis decompiling are formed The malicious act of any malicious act information in behavioural information library；

Component is dispatched, the application program for malicious act to will be present is labeled as doubtful malicious application and is stored in doubtful malice Application library is labeled as normal use there will be no the application program of malicious act and is stored in normal use library.

According to one embodiment of present invention, the similarity analysis subsystem further comprises:

Signing certificate matching component, for obtaining the signing certificate of application program to be analyzed from doubtful malicious application library It is matched with the malicious application sample in malicious application sample database, if the signing certificate is present in malicious application sample database, The application program is directly then labeled as malicious application and is stored in malicious application library；

First similarity analysis component, for being not present in malicious application sample in the signing certificate of application program to be analyzed When in this library, the Apply Names of the further progress application program and the similarity analysis of packet name, from malicious application sample database Find out sample set similar with the Apply Names and packet name；

Second similarity analysis component, for when the first similarity analysis component finds the sample set, by this Sample in sample set carries out the similarity of bibliographic structure, text file and image file with application program to be analyzed respectively Analysis calculates similarity value, and when there is the similarity of sample and application program to be analyzed to meet setting value, this is applied journey Sequence is labeled as malicious application and is stored in malicious application library；

Third similarity analytic unit, for when the first similarity analysis component does not find the sample set or When the similarity that the second similarity analysis component does not find sample and application program to be analyzed meets setting value, it will dislike Meaning carries out the phase of bibliographic structure, text file and image file using whole samples in sample database with application program to be analyzed It is analyzed like degree, calculates similarity value, and when there is the similarity of sample and application program to be analyzed to meet setting value, this is answered Malicious application is labeled as with program and is stored in malicious application library.

According to one embodiment of present invention, the first similarity analysis component carries out application name using editing distance algorithm Title and packet name similarity analysis；Catalogue comparison is respectively adopted in the second similarity analysis component and third similarity analytic unit Method carries out bibliographic structure similarity analysis, text file similarity analysis is carried out using editing distance algorithm, using perceptual hash Algorithm carries out image file similarity analysis.

Malicious application detection method of the invention and system are run based on inspirational education for inspirational education The problem of inefficiency, proposes to analyze scanning technique using static behavior, to avoid passing through virtual machine load application execution simultaneously The performance bottleneck of analysis；By on the basis of inspirational education, increasing similarity analysis process, effectively solves inspirational education and miss The high problem of report rate；By similarity analysis, fuzzy matching is carried out to the signing certificate of application, title and packet name, then cooperate and answer The various analysis such as the similarity with code, resource file and bibliographic structure combine, and effectively reduce rate of false alarm, promote identification Accuracy.

Detailed description of the invention

Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:

Fig. 1 is the structural schematic diagram of the malicious application detection system of one embodiment of the invention；

Fig. 2 is the flow chart of the malicious application detection method of one embodiment of the invention；

Fig. 3 is the flow chart of a specific embodiment of step S210 in Fig. 2；

Fig. 4 is the flow chart of a specific embodiment of step S220 in Fig. 2.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Fig. 1 shows the structural schematic diagram of malicious application detection system 100 according to an embodiment of the invention.Such as Fig. 1 Shown, malicious application detection system 100 is mainly by static inspirational education subsystem 110, similarity analysis subsystem 120, evil Meaning behavioural information library 130, erroneous judgement information bank 150, malicious application sample database 160, normal use library 170, is doubted at white list library 140 It is constituted like malicious application library 180 and malicious application library 190.Wherein, static inspirational education subsystem 110 and similarity analysis System 120 is the core of system 100.Static inspirational education subsystem 110 carries out the application program to be detected received Static code scanning exports three dimensional analysis application programs with the presence or absence of full based on authority application, function call and information The malicious act of any malicious act information in sufficient malicious act information bank 130, malicious act, then apply journey for this if it exists Sequence is labeled as doubtful malicious application, if it does not exist malicious act, then the application program is labeled as normal use.It is inspired by static state The doubtful malicious application that formula scanning subsystem 110 detects, into similarity analysis subsystem 120.Similarity analysis subsystem 120 will be labeled as the application program and malicious application sample database 160 of doubtful malicious application by static inspirational education subsystem 110 In malicious application sample between carry out based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image text The similarity analysis of part, and the application program that similarity meets setting value is labeled as malicious application.Wherein, malicious act information Library 130, which is used to export three dimensions according to authority application, function call and information, saves various malicious act information；Malicious application Sample database 160 is used to store the information of various malicious application samples；Doubtful malicious application library 180 is for saving by static heuristic Scanning subsystem 110 is labeled as the application program of doubtful malicious application；Information bank 150 is judged by accident for saving not by similarity analysis Subsystem 120 is labeled as the application program of malicious application；Normal use library 170 is for saving by static inspirational education subsystem 110 are labeled as the application program of normal use and the result mark based on application program in manual analysis erroneous judgement information bank 150 For the application program of normal use；White list library 140 is used to save based on application program in manual analysis erroneous judgement information bank 150 As a result it is labeled as the information of the application program of normal use；Malicious application library 190 is for saving by similarity analysis subsystem 120 It is labeled as the application program of malicious application.

Further as shown in Figure 1, static inspirational education subsystem 110 is by reachability matrix algorithm assembly 111, decompiling group Part 112, malicious act analytic unit 113 and scheduling component 114 are constituted.Reachability matrix algorithm assembly 111 is used to inspire in static state When formula scanning subsystem 110 starts, load malicious act information bank 130 and white list library 140 it is pre-generated based on authority application, Function call and information export the reachability matrix model of three dimensions.Decompiling component 112 is used for static inspirational education The application program to be detected that system 110 receives carries out APK decompiling, forms Smali code file and corresponding permission is matched File and resource file are set, and parses the Apply Names of application program, packet name, signing certificate and bibliographic structure.Then, it dislikes The reachability matrix model that behavioural analysis component 113 of anticipating calls reachability matrix algorithm assembly 111 to generate, from authority application, function tune With code file, the competence profile for exporting three dimensional searches and analysis 112 decompiling of decompiling component formation with information And resource file, judge in application program with the presence or absence of any malicious act information met in malicious act information bank 130 Malicious act.When there is the case where meeting a certain malicious act in application program, scheduling component 114 labels it as doubtful evil Meaning is applied and is stored in doubtful malicious application library 180.When there is no meet in malicious act information bank 130 maliciously to go in application program For the case where when, scheduling component 114 label it as normal use and be stored in normal use library 170.

Similarity analysis subsystem 120 is for further screening the application program in doubtful malicious application library 180. After similarity analysis subsystem 120 starts, the sample information in malicious application sample database 160 can be loaded, then from doubtful malice Application program to be analyzed is obtained in application library 180 carries out similarity analysis.Specifically as shown in Figure 1, similarity analysis subsystem 120 by signing certificate matching component 121, the first similarity analysis component 122, the second similarity analysis component 123 and third phase It is constituted like degree analytic unit 124.Signing certificate matching component 121 obtains the signing certificate of doubtful malicious application to be analyzed, with Malicious application sample in malicious application sample database 160 carries out signing certificate matching.If it find that the doubtful malicious application is made Signing certificate is present in malicious application sample database 160, then the application program is directly labeled as malicious application and deposited Enter in malicious application library 190, detection terminates.If signing certificate matching is unsatisfactory for, made by the first similarity analysis component 122 With the similarity analysis of such as Apply Names of the editing distance algorithm further progress application program and packet name, from malicious application Sample set similar with the Apply Names and packet name is found out in sample database 160.If the sample set exists, by the second phase Like degree analytic unit 123 using the sample set as analyst coverage, by the sample in the sample set respectively with application to be analyzed Program carries out the similarity analysis of bibliographic structure, text file and image file, calculates similarity value.In specific embodiment, the Two similarity analytic units 123 are respectively adopted catalogue Comparison Method and carry out bibliographic structure similarity analysis, using editing distance algorithm Text file similarity analysis is carried out, image file similarity analysis is carried out using perceptual hash algorithm.When hair available sample with When the similarity of application program to be analyzed meets setting value, then the application program is labeled as malicious application and is stored in maliciously to answer With library 190.If the first similarity component 122 does not find sample set or the second similarity analysis component 123 is not sent out When now thering is the similarity of sample and application program to be analyzed to meet setting value in the sample set, then by third similarity analysis Component 124 by malicious application sample database whole samples and application program to be analyzed carry out bibliographic structure, text file and The similarity analysis of image file calculates similarity value.Similarly, in specific embodiment, third similarity analytic unit 124 divides Not Cai Yong catalogue Comparison Method carry out bibliographic structure similarity analysis, text file similarity point is carried out using editing distance algorithm Analysis carries out image file similarity analysis using perceptual hash algorithm.When third similarity analytic unit 124 send out available sample with When the similarity of application program to be analyzed meets setting value, which is labeled as malicious application and is stored in malicious application Library 190.If being not labeled as malicious application through the analysis of third similarity analytic unit 124, corresponding application program is deposited Enter to judge by accident information bank 150, further artificial treatment is carried out by operation maintenance personnel 300.Judge the application program warp in information bank 150 by accident After crossing artificial treatment, according to manual analysis as a result, be not malicious application application program be noted as normal use deposit just Normal application library 170, and the information of the normal use is also stored in white list library 140.If belonging to new evil through manual analysis Meaning is in application, the application program is noted as malicious application deposit malicious application library 190, while the application program is stored in malice Using sample database 160.This part work of operation maintenance personnel 300 belongs to daily maintenance work, will carry out for a long time, with maintenance knowledge library Update.

Malicious application detection system 100 described above is run for inspirational education and is imitated based on inspirational education The low problem of rate, using static behavior analyze scanning technique, thus avoid passing through virtual machine load application execution and analyze Performance bottleneck will form Smali code first, to the permission of application in the form of static analysis code after Android application decompiling Application, function call and information output etc. are analyzed, so that discovery has the application of malicious act.In order to solve inspirational education The high problem of rate of false alarm in technology, malicious application detection system 100 pass through similarity point in the doubtful malicious application having found Analysis, by various analysis knots such as Apply Names, packet name, signing certificate, bibliographic structure, text file and image file similarities It closes, effectively reduces rate of false alarm, promote the accuracy of identification.

Based on malicious application detection system of the invention described above, the present invention also proposes a kind of malicious application detection Method.Fig. 2 shows the flow charts of malicious application detection method 200 according to an embodiment of the invention.As shown in Fig. 2, should Malicious application detection method 200 includes the following steps:

In step S210, static code scanning is carried out to the application program to be detected received, is based on authority application, letter Number calls and information exports three dimensional analysis application programs and whether there is any malice met in malicious act information bank The malicious act of behavioural information, malicious act, then be labeled as doubtful malicious application for the application program, dislike if it does not exist if it exists The application program is then labeled as normal use by meaning behavior.

In step S220, the malicious application in the application program and malicious application sample database of doubtful malicious application will be labeled as The similarity based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image file point is carried out between sample Analysis, and the application program that similarity meets setting value is labeled as malicious application.

In step S230, the application program deposit erroneous judgement information of malicious application will be not labeled as in the step S220 Library.

In step S240, based on manual analysis erroneous judgement information bank in application program as a result, by the erroneous judgement information bank Be not malicious application application program be labeled as normal use deposit normal use library, and by the information of the normal use be stored in it is white List library.

In step S250, based on manual analysis erroneous judgement information bank in application program as a result, by the erroneous judgement information bank Be malicious application application program be labeled as malicious application deposit malicious application library, and by the malicious application be stored in malicious application sample This library.

Above-mentioned malicious application detection method of the invention combine static two kinds of technologies of inspirational education and similarity analysis into The detection of row malicious application, avoids the performance bottleneck for loading application execution by virtual machine and analyzing, effectively reduces rate of false alarm, mention Rise the accuracy of identification.

Fig. 3 shows a specific reality of static inspirational education step S210 in above-mentioned malicious application detection method 200 Apply the flow chart of example.As shown in figure 3, step S210 specifically comprises the following steps:

In step S211, by the application program decompiling to be detected received formation Smali code file and accordingly Competence profile and resource file, and parse the Apply Names of application program, packet name, signing certificate and bibliographic structure.

In later step S212, reachability matrix model is called, exports three dimensions from authority application, function call and information Code file, competence profile and the resource file that scanning and analysis decompiling are formed, judge to whether there is in application program Meet the malicious act of any malicious act information in malicious act information bank.Wherein, reachability matrix model starts in system When load malicious act information bank and white list library it is pre-generated.It is a in one specific example, it is carried out by reachability matrix model The specific algorithm of scanning and analysis is as follows:

The first step constructs basic behavioural information table: construction authority configuration, function call and information output information table, from evil Meaning takes the content that the corresponding authority application of malicious act, function call and information export three dimensions in behavioural information library respectively Out, unified basic behavioural information table is configured to after duplicate removal.

Second step, construct malicious act information matrix: the number of the rectangular array is the length of basic behavioural information table, row Number is the number of malicious act information, and matrix element 0,1 is constituted.

Third step, construct scanning result matrix: the matrix is one-column matrix, and capable length is the length of basic behavioural information table Degree, by scanning competence profile, Smali code file and the resource file of application to be detected, and with basic behavioural information Table is matched, and when with a certain matching in the table, it is just 1 that matrix, which corresponds to row, is otherwise 0.

4th step constructs malicious act trip current: being transported by malicious act information matrix and scanning result matrix multiple It calculates, obtains malicious act trip current, which is row vector, and the number of column is the number of malicious act information.

When the value of column a certain in malicious act trip current is 1, that is, it is corresponding to indicate that the application program has met the column , that is, there is malicious act in malicious act rule.

In later step S213, the application program that malicious act will be present is labeled as doubtful malicious application and is stored in doubtful evil Meaning application library, is labeled as normal use there will be no the application program of malicious act and is stored in normal use library.

Fig. 4 shows a specific embodiment of similarity analysis step S220 in above-mentioned malicious application detection method 200 Flow chart.As shown in figure 4, step S220 specifically comprises the following steps:

In step S221, application program to be analyzed is obtained from doubtful malicious application library.

In later step S222, signing certificate and the malice in malicious application sample database for the application program being analysed to are answered It is matched with sample.

In later step S223, judge that signing certificate used in application program to be analyzed whether there is in malicious application In sample database.If the signing certificate is present in malicious application sample database, S224 is thened follow the steps, directly by the application program mark Note is malicious application and is stored in malicious application library that process terminates, no to then follow the steps S225.

In step S225, the similarity analysis of the Apply Names of the further progress application program, packet name, from malicious application Sample set similar with the Apply Names and packet name is searched in sample database.

In later step S226, judge whether to find sample set similar with the Apply Names and packet name, if finding, Step S227 is executed, it is no to then follow the steps S228.

In step S227, the sample in the sample set found is subjected to catalogue knot with application program to be analyzed respectively The similarity analysis of structure, text file and image file calculates similarity value, and is having sample and application program to be analyzed When similarity meets setting value, which is labeled as malicious application and is stored in malicious application library.Further, if Similarity in the sample set found without sample and application program to be analyzed meets setting value, then with complete in sample database Portion's sample is set to execute above-mentioned similarity analysis.

In step S228, for not finding the feelings of sample set similar with the Apply Names of application program and packet name Condition, by the whole samples and application program progress bibliographic structure, text file and image to be analyzed in malicious application sample database The similarity analysis of file calculates similarity value, and is having sample and the similarity of application program to be analyzed to meet setting value When, which is labeled as malicious application and is stored in malicious application library.The malicious application not being marked in step S228 Application program, then be stored into erroneous judgement information bank in, further artificial treatment is carried out by operation maintenance personnel.

In a specific example according to the present invention, the decision rule of similarity analysis are as follows:

1,85% or more code similarity；

2,60% or more text file similarity；

3,75% or more image file similarity；

4,70% or more bibliographic structure similarity.

Meet the above rule, is then judged to adjusting after malicious application, the above parameter can be analyzed according to operation data.

In a specific example according to the present invention, the similarity analysis of bibliographic structure uses catalogue method of comparison, algorithm phase To relatively simply, based on the bibliographic structure of malicious application sample, directory hierarchy is pressed with the bibliographic structure of application to be analyzed It compares, calculates the same directory number between application to be analyzed and sample application, divided by the resulting percentage of total directories, Up to bibliographic structure similarity value.

In a specific example according to the present invention, text file similarity analysis uses editing distance algorithm, i.e. source word Symbol string, at least needs to can be deformed into target string by how many edit operation, this value is smaller, and supporting paper is more similar.Most Whole calculating formula of similarity are as follows: (1- editing distance/file size) * 100%.The similarity value of each file is calculated separately, most Average value is calculated again eventually, the similarity value for final two applications that you can get it.

In a specific example according to the present invention, image file similarity analysis uses perceptual hash algorithm, to two Picture to be compared of the same name respectively generates one 64 " fingerprint " (fingerprint) character string, then compares two pictures Fingerprint.As a result closer, just illustrate that picture is more similar.The comparison of " fingerprint " character string uses Hamming distance method, does not distinguish character Position is compared 64 characters, and the kinds of characters number found is Hamming distance value.Hamming distance value is maximum with 10 Value, illustrates that image is completely dissimilar greater than 10, illustrates that image is similar less than 5.Finally all images are compared and analyzed, It obtains Hamming distance value, calculates average Hamming distance value, the similarity of image resource is calculated through this.Final similarity calculation is public Formula are as follows: (1- be averaged Hamming distance value/10) * 100%.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of malicious application detection method, which comprises the steps of:

S1, static code scanning is carried out to the application program to be detected received, is based on authority application, function call and information Exporting three dimensional analysis application programs whether there is the evil for meeting any malicious act information in malicious act information bank Meaning behavior, malicious act, then be labeled as doubtful malicious application for the application program, if it does not exist malicious act if it exists, then will The application program is labeled as normal use；

S2, it will be carried out between the malicious application sample being labeled as in the application program and malicious application sample database of doubtful malicious application Based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image file similarity analysis, and by similarity The application program for meeting setting value is labeled as malicious application；

Wherein, the step S2 further comprises:

S21, will be labeled as doubtful malicious application application program signing certificate and malicious application sample database in malicious application Sample is matched, if the signing certificate is present in malicious application sample database, the application program is directly labeled as malice Using and be stored in malicious application library；

If S22, the signing certificate are not present in malicious application sample database, the Apply Names of the further progress application program and The similarity analysis of packet name finds out sample set similar with the Apply Names and packet name from malicious application sample database；

If finding the sample set in S23, step S22, by the sample in the sample set respectively with application program to be analyzed The similarity analysis of bibliographic structure, text file and image file is carried out, calculates similarity value, and having sample and to be analyzed When the similarity of application program meets setting value, which is labeled as malicious application and is stored in malicious application library；

If not finding in S24, step S22 does not have the phase of sample with application program to be analyzed in the sample set or step S23 When meeting setting value like degree, by malicious application sample database whole samples and application program to be analyzed carry out bibliographic structure, The similarity analysis of text file and image file calculates similarity value, and in the phase for having sample with application program to be analyzed When meeting setting value like degree, which is labeled as malicious application and is stored in malicious application library.

2. malicious application detection method according to claim 1, which is characterized in that the method also includes:

S4, the result based on application program in manual analysis erroneous judgement information bank will not be malicious applications in the erroneous judgement information bank Application program is labeled as normal use deposit normal use library, and the information of the normal use is stored in white list library；

S5, the result based on application program in manual analysis erroneous judgement information bank will be answering for malicious application in the erroneous judgement information bank It is labeled as malicious application deposit malicious application library with program, and the malicious application is stored in malicious application sample database.

3. malicious application detection method according to claim 2, which is characterized in that the step S1 further comprises:

S11, the application program decompiling to be detected received is formed code file and corresponding competence profile and Resource file, and parse the Apply Names of application program, packet name, signing certificate and bibliographic structure；

S12, reachability matrix model is called, exports three dimensional searches and the anti-volume of analysis from authority application, function call and information It translates in the code file to be formed, competence profile and resource file with the presence or absence of any evil met in malicious act information bank The malicious act for behavioural information of anticipating, wherein the reachability matrix model is preparatory based on malicious act information bank and white list library It generates；

S13, the application program that malicious act will be present are labeled as doubtful malicious application and are stored in doubtful malicious application library, will not deposit Normal use is labeled as in the application program of malicious act and is stored in normal use library.

4. malicious application detection method according to claim 2, which is characterized in that the similarity of Apply Names and packet name point Analysis uses editing distance algorithm, and bibliographic structure similarity analysis uses catalogue Comparison Method, and text file similarity analysis is using volume Distance algorithm is collected, image file similarity analysis uses perceptual hash algorithm.

5. a kind of malicious application detection system characterized by comprising

Malicious act information bank saves various malice rows for exporting three dimensions according to authority application, function call and information For information；

Static inspirational education subsystem is based on for carrying out static code scanning to the application program to be detected received Authority application, function call and information three dimensional analysis application programs of output, which whether there is, to be met in malicious act information bank Any malicious act information malicious act, malicious act, then be labeled as doubtful malicious application for the application program if it exists, The application program is then labeled as normal use by malicious act if it does not exist；

Similarity analysis subsystem, for the application of doubtful malicious application will to be labeled as by the static inspirational education subsystem It is carried out between malicious application sample in program and malicious application sample database based on Apply Names, packet name, signing certificate, catalogue knot The similarity analysis of structure, text file and image file, and the application program that similarity meets setting value is labeled as malice and is answered With；

Wherein, the similarity analysis subsystem further comprises:

Signing certificate matching component, for obtaining the signing certificate and evil of application program to be analyzed from doubtful malicious application library Meaning is matched using the malicious application sample in sample database, if the signing certificate is present in malicious application sample database, directly It connects and the application program is labeled as malicious application and is stored in malicious application library；

First similarity analysis component, for being not present in malicious application sample database in the signing certificate of application program to be analyzed When middle, the Apply Names of the further progress application program and the similarity analysis of packet name are found out from malicious application sample database Sample set similar with the Apply Names and packet name；

Second similarity analysis component, for when the first similarity analysis component finds the sample set, by the sample Sample in set carries out the similarity point of bibliographic structure, text file and image file with application program to be analyzed respectively Analysis calculates similarity value, and when there is the similarity of sample and application program to be analyzed to meet setting value, by the application program It is labeled as malicious application and is stored in malicious application library；

Third similarity analytic unit, for when the first similarity analysis component does not find the sample set or described When the similarity that second similarity analysis component does not find sample and application program to be analyzed meets setting value, will maliciously it answer The similarity of bibliographic structure, text file and image file is carried out with application program to be analyzed with whole samples in sample database Analysis calculates similarity value, and when there is the similarity of sample and application program to be analyzed to meet setting value, this is applied journey Sequence is labeled as malicious application and is stored in malicious application library.

6. malicious application detection system according to claim 5, which is characterized in that the system also includes:

Doubtful malicious application library, for saving the application for being labeled as doubtful malicious application by the static inspirational education subsystem Program；

Normal use library, for save by the static inspirational education subsystem be labeled as normal use application program and Result based on application program in manual analysis erroneous judgement information bank is labeled as the application program of normal use；

White list library is labeled as answering for normal use for saving the result based on application program in manual analysis erroneous judgement information bank With the information of program；

7. malicious application detection system according to claim 6, which is characterized in that the static state inspirational education subsystem Further comprise:

Reachability matrix algorithm assembly, it is pre-generated based on authority application, letter for loading malicious act information bank and white list library Number calls and information exports the reachability matrix model of three dimensions；

Decompiling component, for matching the application program decompiling to be detected received formation code file and corresponding permission File and resource file are set, and parses the Apply Names of application program, packet name, signing certificate and bibliographic structure；

Malicious act analytic unit exports three dimensions from authority application, function call and information for calling reachability matrix model It whether there is in code file, competence profile and the resource file that degree scanning and analysis decompiling are formed and meet malicious act The malicious act of any malicious act information in information bank；

Component is dispatched, the application program for malicious act to will be present is labeled as doubtful malicious application and is stored in doubtful malicious application Library is labeled as normal use there will be no the application program of malicious act and is stored in normal use library.

8. malicious application detection system according to claim 6, which is characterized in that the first similarity analysis component is using volume It collects distance algorithm and carries out Apply Names and packet name similarity analysis；The second similarity analysis component and third similarity analysis Component is respectively adopted catalogue Comparison Method and carries out bibliographic structure similarity analysis, and it is similar to carry out text file using editing distance algorithm Degree analysis carries out image file similarity analysis using perceptual hash algorithm.