CN108710802A - A kind of preferred Android of feature extorts software detecting method - Google Patents

A kind of preferred Android of feature extorts software detecting method Download PDF

Info

Publication number
CN108710802A
CN108710802A CN201810585511.2A CN201810585511A CN108710802A CN 108710802 A CN108710802 A CN 108710802A CN 201810585511 A CN201810585511 A CN 201810585511A CN 108710802 A CN108710802 A CN 108710802A
Authority
CN
China
Prior art keywords
feature
detection
software
sample
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810585511.2A
Other languages
Chinese (zh)
Inventor
曾庆凯
时良民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810585511.2A priority Critical patent/CN108710802A/en
Publication of CN108710802A publication Critical patent/CN108710802A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Abstract

The invention discloses a kind of preferred Android of feature to extort software detecting method, according to initial characteristics set, extracts the feature of training sample, and forms initial sample characteristics library;The measured value for calculating each feature in initial sample characteristics library chooses the feature that measured value is more than detection characteristic threshold value, forms detection characteristic set;The grader that software is extorted using the training of detection characteristic set, obtains detection grader.The present invention solves that feature in characteristic set is more, detection speed is relatively slow and the more low technical problem of accuracy of detection.

Description

A kind of preferred Android of feature extorts software detecting method
Technical field
The present invention relates to one kind extorting software detecting method, and especially a kind of preferred Android of feature extorts software inspection Survey method.
Background technology
With popularizing for smart mobile phone, the security threat that smart mobile phone is subjected to is also more and more.Mobile phone extorts software Refer to all application programs that malicious operation can be executed on smart mobile phone or tablet computer, this software of extorting passes through lock Determine user equipment or encryption data so that user can not normal use, and user is coerced with this and pays unlock or decryption expense Malware brings huge security threat to user's mobile device.Software issue is extorted in Android platform to be increasingly becoming One the problem of must paying close attention to.The detection method for extorting software based on machine learning detection has very much, but common Android extorts software detecting method in feature selecting there is also the larger, detection speeds of characteristic set slow, nicety of grading The problems such as not high.Therefore, the present invention proposes that a kind of preferred Android of feature based extorts the detection method of software.
Invention content
The technical problem to be solved by the present invention is to a kind of preferred Android of feature to extort software detecting method, solution Feature in characteristic set of having determined is more, detection speed is relatively slow and the more low technical problem of accuracy of detection.
In order to solve the above technical problems, the technical solution adopted in the present invention is:A kind of preferred Android of feature is extorted Software detecting method, it is characterised in that comprise the steps of:
Step 1:Sample characteristics extraction operation:The each training sample being directed in training sample set, is extracted respectively Permission features, intent features, api features and the package name features for going out each training sample, are carried by above-mentioned The initial sample characteristics library of feature combination producing of taking-up;
Step 2:Sample characteristics selection operation:Using information gain method to the initial sample characteristics library that extracts into Row processing, calculates the measured value of each feature in initial sample characteristics library, and sort from big to small by measured value, selects important The detection characteristic set that feature is formed as characteristic of division;
Step 3:Grader generates operation:The detection selected uses characteristic set as the input parameter of vector machine interface, Support vector machines interface in python is called, detection grader is obtained;
Step 4:Software under testing detection operation:It reads software under testing and extracts the feature of software under testing, use and divide as detection The input of class device is detected software under testing with grader using detection, the boolean exported with sorter model according to detection Value, judges that test sample extorts software or benign software.
2, the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, and feature exists In:The step 1 is specially:Training sample apk is utilized the Android static analysis tools androguard's to increase income Androlyze.py handles the apk files in training sample set;By in order decompiling training sample Classes.dex files extract api features and package name features from training sample set, and pass through order The file of manifest.xml in decompiling training sample, extracts permission features from training sample and intent is special Sign;Then corresponding api features, package name features, api features and Intent features are respectively written into initial sample characteristics In the AnalysisFile of library.
3, a kind of preferred Android of feature extorts software detecting method according to claim 2, and feature exists In:In the step 1, the sample apk in gathering for qualified training is inputted, detailed process is
1.1 initial actuating;
The 1.2 initial sample characteristics library AnalysisFile of initialization, initial value are sky;
1.3 read the sample apk extorted in training sample set Ranset and benign training sample set;
1.4 carry out decompiling processing using android static analysis tools androguard to the apk files of input;
1.5 obtain generation class.dex files and manifest.xml files after androguard decompilings;
1.6 judge whether pending file is that class.dex files then go to 1.7 if it is class.dex files, If not class.dex files, then 1.8 are gone to;
1.7 read class.dex byte code files, and byte code files are analyzed and marked using androgexf.py Note;
1.8 read manifest.xml files, and are divided manifest.xml files using androapkinfo.py It analyses and marks;
1.9 extract api features and package name features from class.dex byte code files;
1.10 extract permission features and intent features from manifest.xml files.
4, the preferred Android of a kind of feature described in accordance with the claim 3 extorts software detecting method, and feature exists In:It in the step 1, exports as initial sample characteristics library AnalysisFile, detailed process is
2.1 the feature extracted is written in initial sample characteristics library AnalysisFile;
2.2 judge whether class.dex byte code files and manifest.xml files are labeled and handle, if Labeled and processing, then go to 2.4, if completely labeled and processing, goes to 2.3;
2.3 read not labeled class.dex files and manifest.xml files;
2.4 judge that whether from sample set Ranset is extorted, sample set Ranset is extorted if come from for the sample, 2.5 are gone to, if not from sample set Ranset is extorted, then goes to step 2.6;
Initial sample characteristics library first row numerical value is set as 1 by 2.5;
Initial sample characteristics library first row numerical value is set as 0 by 2.6;
2.7 judge whether extort sample set Ranset and optimum sample set Benset traversals completes, if traversal does not have There is completion, then go to step 1.1, if traversal is completed, goes to step 2.8;
2.8 part operations terminate.
5, the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, and feature exists In:The step 2 is specially
Total value of each feature in the total value and benign software in extorting software is calculated first, is then passed through just The numerical value of beginning sample characteristics library AnalysisFile first rows judges that this feature extorts the feature or benign software of software Feature, to initial sample characteristics library AnalysisFile processing;Counting statistics goes out each feature and is extorting software and good Property software in shared ratio;The initial sample characteristics library AnalysisFile of training sample is carried out using information gain algorithm Corresponding processing, obtains the measured value of the information gain of each feature, and be ranked up by the size of information gain measured value, obtains To detection characteristic set FeatureSet.
6, a kind of preferred Android of feature extorts software detecting method according to claim 5, and feature exists In:The formula of described information gain algorithm is:
Wherein use XiIndicate that the feature in training sample, i indicate the ith feature in training sample, CmIt is big to be broadly divided into two The value of class, m is expressed as benign software for 0, and the value of m is expressed as extorting software, P (X for 1i) indicate XiFeature occurs general Rate,Then indicate feature XiThe probability not occurred, conditional probability P (Cm|Xi) indicate in feature XiThe case where appearance subordinate In classification CmProbability.
7, a kind of preferred Android of feature extorts software detecting method according to claim 5, and feature exists In:Input is the initial sample characteristics library AnalysisFile of training sample set in the step 2, and detailed process is
3.1 initial actuating;
3.2 read initial sample analysis file AnalysisFile;
3.3 call counting function, calculate in initial sample characteristics library AnalysisFile and extort sample size NumtotalR;
3.4 call counting function, calculate optimum sample quantity in initial sample characteristics library AnalysisFile NumtotalB;
3.5 initial characteristics set FeatureArray, initial value are sky;
3.6 judge whether initial sample characteristics library AnalysisFile first rows numerical value is 1, if first row numerical value is 1, 3.7 are then gone to, if first row numerical value is not 1, goes to 3.8;
3.7 calculate in initial sample characteristics library AnalysisFile, quantity numR of each feature in extorting software [feature];
3.8 calculate in initial sample characteristics library AnalysisFile, quantity numB of each feature in benign software [feature];
3.9 calculate ratio numR&#91 of each feature in extorting sample;feature]/NumTotalR;
3.10 calculate ratio numB&#91 of each feature in optimum sample;feature]/NumTotalB.
8, a kind of preferred Android of feature extorts software detecting method according to claim 5, and feature exists In:Output is detection characteristic set FeatureSet in the step 2, and detailed process is
In the 4.1 ratio write-in characteristic matrix F eatureArray for accounting for each feature in extorting sample;
In the 4.2 ratio write-in characteristic matrix F eatureArray for accounting for each feature in optimum sample;
4.3 utilize information gain algorithm, calculate the information gain measured value Ig&#91 of each feature;feature];
Whether 4.4 judge detection with the information gain measured value in eigenmatrix FeatureArray more than detection threshold Value, if it exceeds detection characteristic threshold value, then be transferred to 4.6, if being not above detection characteristic threshold value, is transferred to 4.5;
4.5 abandon this feature, are not processed, and terminate;
4.6 this feature is written in initial sample characteristics library AnalysisFile;
4.7 judge whether detection is completed with eigenmatrix FeatureArray traversals, if traversal is completed, are transferred to step 4.8, if traversal is not completed, it is transferred to 4.4;
4.8 export and generate detection characteristic set FeatureSet;
4.9 part operations terminate.
9, the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, and feature exists In:The step 3 is specially
It from by information gain treated detection characteristic set FeatureSet, is handled again, from detection use Select detection characteristic set subset FeatureSubset in characteristic set FeatureSet, and by the detection characteristic set Inputs of the subset FeatureSubset as algorithm of support vector machine in machine learning algorithm, obtains detection grader;
Above process input is detection characteristic set FeatureSet, is exported as detection sorter model;Specifically Flow is as follows:
5.0 initial actuating;
5.1 initialization detection characteristic set subset FeatureSubset, and it is sky that initial value, which is arranged,;
5.2 read detection characteristic set FeatureSet;
5.3 be to judge whether detection characteristic set subset is less than detection characteristic set threshold value, is used if reaching detection Characteristic set threshold value, then be transferred to 5.6, if not reaching detection characteristic set threshold value, is transferred to 5.4;
5.4 select feature to be added to detection characteristic set subset in detection in characteristic set FeatureSet In FeatureSubset;
5.5 delete this feature from detection in characteristic set FeatureSet;
5.6 generate detection character subset FeatureSubset;
Detection is used characteristic set subset FeatureSubset as support vector machines interface Svm () parameter by 5.7, is called Support vector machines interface Svm () in Python machine learning library mlpy,;
5.8 obtain detection sorter model mysvm;
5.9 part operations terminate.
10, the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, and feature exists In:The step 4 is specially
The characteristic set TestFeature of software under testing is extracted, detection sorter model is then inputted, is used according to detection The output Boolean of sorter model as a result, discriminating test sample is Android extort software either Android it is benign soft Part;
Above process input is the apk files of software to be detected, exports the type for sample to be tested;Specific flow is such as Under:
6.0 initial actuating;
6.1 read the apk files of software under testing;
6.2, according to detection feature listed in characteristic set FeatureSet, extract software under testing apk individual features TestFeature;
Characteristic set TestFeatureSet to be measured is written in the TestFeature extracted by 6.3;
6.4 call detection sorter model mysvm, using characteristic set TestFeatureSet to be measured as parameter, profit With mysvm.predict () interface, to complete software under testing apk sort operations;
6.5 export whether the result of Boolean is 1 according to detection grader, if output result is 1, are transferred to 6.6, If it is 1 to export result not, it is transferred to 6.7;
6.6 export the software under testing be the information for extorting software;
6.7 export the information that the software under testing is benign software;
6.8 part operations terminate.
Compared with prior art, the present invention haing the following advantages and effect:Android provided by the invention extorts software spy Sign selection operation has filtered the feature of relevant redundancy according to information gain Feature Selection, preferably goes out in characteristic set efficiently Feature, the quantity of feature in characteristic set can be efficiently reduced, and then reduce the time of detection classifier training and identification, Detection speed is very fast;The detection process of the present invention, use by preferably with extort that software is directly related, discrimination is higher Characteristic set trains detection grader, therefore higher to extorting the classification of software, accuracy of identification.
Description of the drawings
Fig. 1 is that a kind of preferred Android of feature of the present invention extorts the flow chart of software detecting method.
Fig. 2 is the flow chart of the step one of the present invention.
Fig. 3 is the flow chart of the step two of the present invention.
Fig. 4 is the flow chart of the step three of the present invention.
Fig. 5 is the flow chart of the step four of the present invention.
Specific implementation mode
The present invention is described in further detail below in conjunction with the accompanying drawings and by embodiment, and following embodiment is to this hair Bright explanation and the invention is not limited in following embodiments.
As shown in Figure 1, a kind of preferred Android of feature based of the present invention extorts software detecting method, primary operational Process includes sample characteristics extraction operation, sample characteristics selection operation, grader generates operation and software under testing detects operation etc. Four parts.
Sample characteristics extraction operation:The each training sample being directed in training sample set, extracts each respectively Permission features, intent features, api features and the package name features of training sample, are gone out by said extracted The initial sample characteristics library of feature combination producing.
Sample characteristics selection operation:The initial sample characteristics library extracted is handled using the method for information gain, The measured value of each feature in initial sample characteristics library is calculated, and is sorted from big to small by measured value, important feature is selected to make For the detection characteristic set of characteristic of division composition.
Generate grader operation:The detection selected uses characteristic set as the input parameter of vector machine interface, calls Support vector machines interface in python obtains detection grader.
Software under testing detection operation:It reads software under testing and extracts the feature of software under testing, as detection grader Input, is detected software under testing with grader using detection, according to the Boolean that detection is exported with sorter model, judges Test sample extorts software or benign software.
As shown in Fig. 2, being characterized extraction operation flow chart.The operation is mainly to utilize training sample apk to increase income The androlyze.py of Android static analysis tools androguard handles the apk files in training sample set. By classes.dex files in order decompiling training sample, extracted from training sample set api features and Package name features, and by the file of manifest.xml in order decompiling training sample, from training sample Extract permission features and intent features.Then will corresponding api features, package name features, api features and Intent features are respectively written into initial sample characteristics library AnalysisFile.
The input of this process is the sample apk in qualified training set, is exported as initial sample characteristics library AnalysisFile.Specific flow is as follows:Step 20 is initial actuating;Step 21 initializes initial sample characteristics library AnalysisFile, initial value are sky;Step 22 reading is extorted in training sample set Ranset and benign training sample set Sample apk;Step 23 carries out decompiling processing using android static analysis tools androguard to the apk files of input; Step 23 generates class.dex files and manifest.xml files after obtaining androguard decompilings;Step 25 judgement waits for Whether the file of processing is that class.dex files then go to step 26 if it is class.dex files, if not Class.dex files, then go to step 27;Step 26 reads class.dex byte code files, and utilizes androgexf.py pairs Byte code files are analyzed and are marked;Step 27 reads manifest.xml files, and utilizes androapkinfo.py pairs Manifest.xml files are analyzed and are marked;Step 28 extracted from class.dex byte code files api features and Package name features;Step 29 lifts permission features and intent features from manifest.xml files;Step The feature extracted is written in initial sample characteristics library AnalysisFile rapid 2A;Step 2B judges class.dex bytecodes Whether labeled and processing goes to step 2D if being labeled and handling for file and manifest.xml files, if Completely labeled and processing, then go to step 2C;Step 2C read not labeled class.dex files and Manifest.xml files;Step 2D judges the sample whether from sample set Ranset is extorted, if from sample is extorted Set Ranset then goes to step 2E, if not from sample set Ranset is extorted, then goes to step 2F;Step 2E will Initial sample characteristics library first row numerical value is set as 1;Initial sample characteristics library first row numerical value is set as 0 by step 2F;Step Sample set Ranset is extorted in 2G judgements and optimum sample set Benset traversals are completed, if traversal is not completed, is gone to Step 21, if traversal is completed, step 2H is gone to;The step 2H part operations terminate.
As shown in figure 3, being characterized selection operation flow chart.The operation is mainly the initial sample characteristics library that will be extracted AnalysisFile processing.Sum of each feature in the total value and benign software in extorting software is calculated first Value, then judges that this feature is to extort the feature of software by the numerical value of initial sample characteristics library AnalysisFile first rows Or the feature of benign software, by initial sample characteristics library AnalysisFile processing.Counting statistics goes out each spy Levy shared ratio in extorting software and benign software.Finally, using information gain algorithm to the initial sample of training sample Feature database AnalysisFile is handled accordingly, obtains the measured value of the information gain of each feature, and press information gain The size of measured value is ranked up, and obtains detection characteristic set FeatureSet.
The formula of information gain algorithm is:
Wherein use XiIndicate that the feature in training sample, i indicate the ith feature in training sample, CmIt is big to be broadly divided into two The value of class, m is expressed as benign software for 0, and the value of m is expressed as extorting software, P (X for 1i) indicate XiFeature occurs general Rate,Then indicate feature XiThe probability not occurred, conditional probability P (Cm|Xi) indicate in feature XiThe case where appearance subordinate In classification CmProbability.
The input of this part operation is the initial sample characteristics library AnalysisFile of training sample set, is exported as detection With characteristic set FeatureSet.Specific flow is as follows:Step 30 initial actuating, step 31 read initial sample analysis file AnalysisFile;Step 32 calls counting function, calculates in initial sample characteristics library AnalysisFile and extorts sample size NumtotalR;Step 33 calls counting function, calculates optimum sample quantity in initial sample characteristics library AnalysisFile NumtotalB;Step 34 initial characteristics set FeatureArray, initial value are sky;Step 35 judges initial sample characteristics library Whether AnalysisFile first rows numerical value is 1, if first row numerical value is 1, step 36 is gone to, if first row numerical value is not It is 1, then goes to step 37;Step 36 calculates in initial sample characteristics library AnalysisFile, each feature is in extorting software Quantity numR[feature];Step 37 calculates in initial sample characteristics library AnalysisFile, each feature is in benign software In quantity numB[feature];Step 38 calculates ratio numR&#91 of each feature in extorting sample;feature]/ NumTotalR;Step 39 calculates ratio numB&#91 of each feature in optimum sample;feature]/NumTotalB;Step 3A In the ratio write-in characteristic matrix F eatureArray that each feature is accounted in extorting sample;Step 3B is by each feature good In the ratio write-in characteristic matrix F eatureArray accounted in property sample;Step 3C utilizes information gain algorithm, calculates each spy The information gain measured value Ig&#91 of sign;feature];Step 3D judges that the information in detection eigenmatrix FeatureArray increases Whether beneficial measured value is more than detection with threshold value (for example, setting between 0.05-0.25 the detection to characteristic threshold value), if More than detection characteristic threshold value, then it is transferred to step 3F, if being not above detection characteristic threshold value, is transferred to step 3E;Step 3E abandons this feature, is not processed, and terminates;This feature is written in initial sample characteristics library AnalysisFile step 3F;Step Rapid 3G judges whether detection is completed with eigenmatrix FeatureArray traversals, if traversal is completed, is transferred to step 3H, if Traversal is not completed, then is transferred to step 3D;Step 3H is exported and is generated detection characteristic set FeatureSet;The portions step 3I Operation is divided to terminate.
As shown in figure 4, constituent class device operational flowchart of making a living.The operation mainly from by information gain treated inspection It surveys and uses characteristic set FeatureSet, handled again, it is special with detection is selected in characteristic set FeatureSet from detection Zygote collection FeatureSubset is collected, and uses characteristic set subset FeatureSubset as machine learning algorithm the detection The input of middle algorithm of support vector machine obtains detection grader.
The input of this process is detection characteristic set FeatureSet, is exported as detection sorter model.Specifically Flow is as follows:Step 40 is initial actuating, and step 41 initializes detection characteristic set subset FeatureSubset, and is arranged Initial value is sky;Step 42 reads detection characteristic set FeatureSet;Step 43 is to judge that detection is with characteristic set subset It is no to be less than detection with characteristic set threshold value (for example, setting between 0.55-0.85 the detection to characteristic threshold value), if reached To detection characteristic set threshold value, then it is transferred to step 46, if not reaching detection characteristic set threshold value, is transferred to step 44;Step 44 selects feature to be added to detection characteristic set subset in detection in characteristic set FeatureSet In FeatureSubset;Step 45 deletes this feature from detection in characteristic set FeatureSet;Step 46 generates detection With character subset FeatureSubset;Detection is used characteristic set subset FeatureSubset as support vector machines by step 47 Interface Svm () parameter calls support vector machines interface Svm () in Python machine learning library mlpy,;Step 48 is detected With sorter model mysvm;Step 49 part operation terminates.
As shown in figure 5, detecting operational flowchart for software under testing.The operation is mainly to extract the characteristic set of software under testing TestFeature, then input detection sorter model, according to detection sorter model output Boolean as a result, Discriminating test sample is that Android extorts software either Android benign softwares.
The input of this process is the apk files of software to be detected, exports the type for sample to be tested.Specific flow is such as Under:Step 50 is initial actuating;Step 51 is to read the apk files of software under testing;Step 52 is according to detection characteristic set Listed feature in FeatureSet, extraction software under testing apk individual features TestFeature;Step 53 will extract Characteristic set TestFeatureSet to be measured is written in TestFeature;Step 54 is to call detection sorter model mysvm, It is to be measured soft to complete using mysvm.predict () interface using characteristic set TestFeatureSet to be measured as parameter Part apk sort operations;Step 55 is to export whether the result of Boolean is 1 according to detection grader, if output result is 1, then it is transferred to step 56, if output result is not 1, is transferred to step 57;It is to extort software that step 56, which exports the software under testing, Information;Step 57 exports the information that the software under testing is benign software;Step 58 part operation terminates.
A kind of preferred Android of feature based of present invention offer extorts software detecting method.According to initial characteristics set, The feature of training sample is extracted, and forms initial sample characteristics library;The measured value of each feature in initial sample characteristics library is calculated, Choose the feature that measured value is more than detection characteristic threshold value, forms detection characteristic set.It is trained with characteristic set using detection The grader for extorting software obtains detection grader.Android provided by the invention extorts software features selection operation, root It is believed that breath gain characteristics selection technique, has filtered the feature of relevant redundancy, has preferably gone out efficient feature in characteristic set, it can be effective Ground reduces the quantity of feature in characteristic set, and then reduces the time of detection classifier training and identification, and detection speed is very fast; The detection process of the present invention, use by preferably with extort that software is directly related, the higher characteristic set of discrimination is trained Detection grader, thus it is higher to extorting the classification of software, accuracy of identification.
Described in this specification above content is only illustrations made for the present invention.Technology belonging to the present invention The technical staff in field can do various modifications or supplement to described specific embodiment or substitute by a similar method, only The guarantor of the present invention should all be belonged to without departing from the content or beyond the scope defined by this claim of description of the invention Protect range.

Claims (10)

1. a kind of preferred Android of feature extorts software detecting method, it is characterised in that comprise the steps of:
Step 1:Sample characteristics extraction operation:The each training sample being directed in training sample set extracts each respectively Permission features, intent features, api features and the package name features of a training sample, are gone out by said extracted The initial sample characteristics library of feature combination producing;
Step 2:Sample characteristics selection operation:Using the method for information gain at the initial sample characteristics library that extracts Reason, calculates the measured value of each feature in initial sample characteristics library, and sort from big to small by measured value, selects important feature Detection characteristic set as characteristic of division composition;
Step 3:Grader generates operation:The detection selected uses characteristic set as the input parameter of vector machine interface, calls Support vector machines interface in python obtains detection grader;
Step 4:Software under testing detection operation:It reads software under testing and extracts the feature of software under testing, as detection grader Input, software under testing is detected with grader using detection, according to the Boolean that detection is exported with sorter model, is sentenced Disconnected test sample extorts software or benign software.
2. the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, it is characterised in that:Institute Stating step 1 is specially:Training sample apk is utilized the Android static analysis tools androguard's to increase income Androlyze.py handles the apk files in training sample set;By in order decompiling training sample Classes.dex files extract api features and package name features from training sample set, and pass through order The file of manifest.xml in decompiling training sample, extracts permission features from training sample and intent is special Sign;Then corresponding api features, package name features, api features and Intent features are respectively written into initial sample characteristics In the AnalysisFile of library.
3. a kind of preferred Android of feature extorts software detecting method according to claim 2, it is characterised in that:Institute It states in step 1, inputs the sample apk in gathering for qualified training, detailed process is
1.1 initial actuating;
The 1.2 initial sample characteristics library AnalysisFile of initialization, initial value are sky;
1.3 read the sample apk extorted in training sample set Ranset and benign training sample set;
1.4 carry out decompiling processing using android static analysis tools androguard to the apk files of input;
1.5 obtain generation class.dex files and manifest.xml files after androguard decompilings;
1.6 judge whether pending file is that class.dex files then go to 1.7 if it is class.dex files, if It is not class.dex files, then goes to 1.8;
1.7 read class.dex byte code files, and byte code files are analyzed and marked using androgexf.py;
1.8 read manifest.xml files, and are analyzed simultaneously manifest.xml files using androapkinfo.py Label;
1.9 extract api features and package name features from class.dex byte code files;
1.10 extract permission features and intent features from manifest.xml files.
4. the preferred Android of a kind of feature described in accordance with the claim 3 extorts software detecting method, it is characterised in that:Institute It states in step 1, exports as initial sample characteristics library AnalysisFile, detailed process is
2.1 the feature extracted is written in initial sample characteristics library AnalysisFile;
2.2 judge whether class.dex byte code files and manifest.xml files are labeled and handle, if marked Note and processing, then go to 2.4, if completely labeled and processing, goes to 2.3;
2.3 read not labeled class.dex files and manifest.xml files;
2.4 judge that the sample whether from sample set Ranset is extorted, if from sample set Ranset is extorted, is gone to 2.5, if not from sample set Ranset is extorted, then go to step 2.6;
Initial sample characteristics library first row numerical value is set as 1 by 2.5;
Initial sample characteristics library first row numerical value is set as 0 by 2.6;
2.7 judge whether extort sample set Ranset and optimum sample set Benset traversals completes, if traversal is not complete At, then go to step 1.1, if traversal complete, go to step 2.8;
2.8 part operations terminate.
5. the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, it is characterised in that:Institute Stating step 2 is specially
Total value of each feature in the total value and benign software in extorting software is calculated first, then passes through initial sample The numerical value of eigen library AnalysisFile first rows judges that this feature extorts the feature of software or the spy of benign software Sign, to initial sample characteristics library AnalysisFile processing;Counting statistics goes out each feature and is extorting software and benign soft Shared ratio in part;The initial sample characteristics library AnalysisFile of training sample is carried out using information gain algorithm corresponding Processing, obtain the measured value of the information gain of each feature, and be ranked up by the size of information gain measured value, examined It surveys and uses characteristic set FeatureSet.
6. a kind of preferred Android of feature extorts software detecting method according to claim 5, it is characterised in that:Institute The formula for stating information gain algorithm is:
Wherein use XiIndicate that the feature in training sample, i indicate the ith feature in training sample, CmIt is broadly divided into two major classes, m Value be 0 to be expressed as benign software, the value of m is 1 to be expressed as extorting software, P (Xi) indicate XiThe probability that feature occurs,Then indicate feature XiThe probability not occurred, conditional probability P (Cm|Xi) indicate in feature XiBelong to class in the case of appearance Other CmProbability.
7. a kind of preferred Android of feature extorts software detecting method according to claim 5, it is characterised in that:Institute The initial sample characteristics library AnalysisFile that input in step 2 is training sample set is stated, detailed process is
3.1 initial actuating;
3.2 read initial sample analysis file AnalysisFile;
3.3 call counting function, calculate in initial sample characteristics library AnalysisFile and extort sample size NumtotalR;
3.4 call counting function, calculate optimum sample quantity NumtotalB in initial sample characteristics library AnalysisFile;
3.5 initial characteristics set FeatureArray, initial value are sky;
3.6 judge whether initial sample characteristics library AnalysisFile first rows numerical value is 1, if first row numerical value is 1, turn To 3.7, if first row numerical value is not 1,3.8 are gone to;
3.7 calculate in initial sample characteristics library AnalysisFile, quantity numR of each feature in extorting software [feature];
3.8 calculate in initial sample characteristics library AnalysisFile, quantity numB of each feature in benign software [feature];
3.9 calculate ratio numR&#91 of each feature in extorting sample;feature]/NumTotalR;
3.10 calculate ratio numB&#91 of each feature in optimum sample;feature]/NumTotalB.
8. a kind of preferred Android of feature extorts software detecting method according to claim 5, it is characterised in that:Institute It is detection characteristic set FeatureSet to state output in step 2, and detailed process is
In the 4.1 ratio write-in characteristic matrix F eatureArray for accounting for each feature in extorting sample;
In the 4.2 ratio write-in characteristic matrix F eatureArray for accounting for each feature in optimum sample;
4.3 utilize information gain algorithm, calculate the information gain measured value Ig&#91 of each feature;feature];
4.4 judge whether the information gain measured value in detection eigenmatrix FeatureArray is more than detection threshold value, such as Fruit is more than detection characteristic threshold value, then is transferred to 4.6, if being not above detection characteristic threshold value, is transferred to 4.5;
4.5 abandon this feature, are not processed, and terminate;
4.6 this feature is written in initial sample characteristics library AnalysisFile;
4.7 judge whether detection is completed with eigenmatrix FeatureArray traversals, if traversal is completed, are transferred to step 4.8, If traversal is not completed, it is transferred to 4.4;
4.8 export and generate detection characteristic set FeatureSet;
4.9 part operations terminate.
9. the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, it is characterised in that:Institute Stating step 3 is specially
From by information gain treated detection characteristic set FeatureSet, handled again, from detection feature Select detection characteristic set subset FeatureSubset in set FeatureSet, and by detection characteristic set subset Inputs of the FeatureSubset as algorithm of support vector machine in machine learning algorithm, obtains detection grader;
Above process input is detection characteristic set FeatureSet, is exported as detection sorter model;Specific flow It is as follows:
5.0 initial actuating;
5.1 initialization detection characteristic set subset FeatureSubset, and it is sky that initial value, which is arranged,;
5.2 read detection characteristic set FeatureSet;
5.3 be to judge whether detection characteristic set subset is less than detection characteristic set threshold value, if reaching detection feature Gather threshold value, is then transferred to 5.6, if not reaching detection characteristic set threshold value, is transferred to 5.4;
5.4 select feature to be added to detection characteristic set subset in detection in characteristic set FeatureSet In FeatureSubset;
5.5 delete this feature from detection in characteristic set FeatureSet;
5.6 generate detection character subset FeatureSubset;
Detection is used characteristic set subset FeatureSubset as support vector machines interface Svm () parameter by 5.7, is called Support vector machines interface Svm () in Python machine learning library mlpy,;
5.8 obtain detection sorter model mysvm;
5.9 part operations terminate.
10. the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, it is characterised in that:Institute Stating step 4 is specially
The characteristic set TestFeature of software under testing is extracted, detection sorter model is then inputted, is classified according to detection The output Boolean of device model as a result, discriminating test sample is Android extort software either Android benign softwares;
Above process input is the apk files of software to be detected, exports the type for sample to be tested;Specific flow is as follows:
6.0 initial actuating;
6.1 read the apk files of software under testing;
6.2, according to detection feature listed in characteristic set FeatureSet, extract software under testing apk individual features TestFeature;
Characteristic set TestFeatureSet to be measured is written in the TestFeature extracted by 6.3;
6.4 call detection sorter model mysvm, using characteristic set TestFeatureSet to be measured as parameter, utilize Mysvm.predict () interface, to complete software under testing apk sort operations;
6.5 export whether the result of Boolean is 1 according to detection grader, if output result is 1, are transferred to 6.6, if It is 1 to export result not, then is transferred to 6.7;
6.6 export the software under testing be the information for extorting software;
6.7 export the information that the software under testing is benign software;
6.8 part operations terminate.
CN201810585511.2A 2018-06-08 2018-06-08 A kind of preferred Android of feature extorts software detecting method Pending CN108710802A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810585511.2A CN108710802A (en) 2018-06-08 2018-06-08 A kind of preferred Android of feature extorts software detecting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810585511.2A CN108710802A (en) 2018-06-08 2018-06-08 A kind of preferred Android of feature extorts software detecting method

Publications (1)

Publication Number Publication Date
CN108710802A true CN108710802A (en) 2018-10-26

Family

ID=63872466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810585511.2A Pending CN108710802A (en) 2018-06-08 2018-06-08 A kind of preferred Android of feature extorts software detecting method

Country Status (1)

Country Link
CN (1) CN108710802A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI682303B (en) * 2018-12-11 2020-01-11 中華電信股份有限公司 Computer system and ransomware detection method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740424A (en) * 2016-01-29 2016-07-06 湖南大学 Spark platform based high efficiency text classification method
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework
CN107577942A (en) * 2017-08-22 2018-01-12 中国民航大学 A kind of composite character screening technique for Android malware detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740424A (en) * 2016-01-29 2016-07-06 湖南大学 Spark platform based high efficiency text classification method
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework
CN107577942A (en) * 2017-08-22 2018-01-12 中国民航大学 A kind of composite character screening technique for Android malware detection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI682303B (en) * 2018-12-11 2020-01-11 中華電信股份有限公司 Computer system and ransomware detection method thereof

Similar Documents

Publication Publication Date Title
CN107609399A (en) Malicious code mutation detection method based on NIN neutral nets
Singh et al. Potato plant leaves disease detection and classification using machine learning methodologies
CN106096411B (en) A kind of Android malicious code family classification methods based on bytecode image clustering
CN104216349B (en) Utilize the yield analysis system and method for the sensing data of manufacturing equipment
CN109359439A (en) Software detecting method, device, equipment and storage medium
CN107315954A (en) A kind of file type identification method and server
CN103106365B (en) The detection method of the malicious application software on a kind of mobile terminal
CN106709349B (en) A kind of malicious code classification method based on various dimensions behavioural characteristic
CN109753800A (en) Merge the Android malicious application detection method and system of frequent item set and random forests algorithm
CN108229499A (en) Certificate recognition methods and device, electronic equipment and storage medium
CN109101469A (en) The information that can search for is extracted from digitized document
CN109598124A (en) A kind of webshell detection method and device
CN104331436A (en) Rapid classification method of malicious codes based on family genetic codes
CN106530200A (en) Deep-learning-model-based steganography image detection method and system
CN110414277B (en) Gate-level hardware Trojan horse detection method based on multi-feature parameters
CN108022146A (en) Characteristic item processing method, device, the computer equipment of collage-credit data
CN106845220A (en) A kind of Android malware detecting system and method
CN110263566B (en) Method for detecting and classifying authority-raising behaviors of massive logs
CN111915437A (en) RNN-based anti-money laundering model training method, device, equipment and medium
CN109933851B (en) Bench endurance test data processing and analyzing method
Cai et al. Machine learning algorithms improve the power of phytolith analysis: A case study of the tribe Oryzeae (Poaceae)
CN107958154A (en) A kind of malware detection device and method
CN104156690A (en) Gesture recognition method based on image space pyramid bag of features
CN104933365B (en) A kind of malicious code based on calling custom automates homologous decision method and system
Al‐Tahhan et al. Accurate automatic detection of acute lymphatic leukemia using a refined simple classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181026

RJ01 Rejection of invention patent application after publication