CN108710802A - A kind of preferred Android of feature extorts software detecting method - Google Patents
A kind of preferred Android of feature extorts software detecting method Download PDFInfo
- Publication number
- CN108710802A CN108710802A CN201810585511.2A CN201810585511A CN108710802A CN 108710802 A CN108710802 A CN 108710802A CN 201810585511 A CN201810585511 A CN 201810585511A CN 108710802 A CN108710802 A CN 108710802A
- Authority
- CN
- China
- Prior art keywords
- feature
- detection
- software
- sample
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
Abstract
The invention discloses a kind of preferred Android of feature to extort software detecting method, according to initial characteristics set, extracts the feature of training sample, and forms initial sample characteristics library;The measured value for calculating each feature in initial sample characteristics library chooses the feature that measured value is more than detection characteristic threshold value, forms detection characteristic set;The grader that software is extorted using the training of detection characteristic set, obtains detection grader.The present invention solves that feature in characteristic set is more, detection speed is relatively slow and the more low technical problem of accuracy of detection.
Description
Technical field
The present invention relates to one kind extorting software detecting method, and especially a kind of preferred Android of feature extorts software inspection
Survey method.
Background technology
With popularizing for smart mobile phone, the security threat that smart mobile phone is subjected to is also more and more.Mobile phone extorts software
Refer to all application programs that malicious operation can be executed on smart mobile phone or tablet computer, this software of extorting passes through lock
Determine user equipment or encryption data so that user can not normal use, and user is coerced with this and pays unlock or decryption expense
Malware brings huge security threat to user's mobile device.Software issue is extorted in Android platform to be increasingly becoming
One the problem of must paying close attention to.The detection method for extorting software based on machine learning detection has very much, but common
Android extorts software detecting method in feature selecting there is also the larger, detection speeds of characteristic set slow, nicety of grading
The problems such as not high.Therefore, the present invention proposes that a kind of preferred Android of feature based extorts the detection method of software.
Invention content
The technical problem to be solved by the present invention is to a kind of preferred Android of feature to extort software detecting method, solution
Feature in characteristic set of having determined is more, detection speed is relatively slow and the more low technical problem of accuracy of detection.
In order to solve the above technical problems, the technical solution adopted in the present invention is:A kind of preferred Android of feature is extorted
Software detecting method, it is characterised in that comprise the steps of:
Step 1:Sample characteristics extraction operation:The each training sample being directed in training sample set, is extracted respectively
Permission features, intent features, api features and the package name features for going out each training sample, are carried by above-mentioned
The initial sample characteristics library of feature combination producing of taking-up;
Step 2:Sample characteristics selection operation:Using information gain method to the initial sample characteristics library that extracts into
Row processing, calculates the measured value of each feature in initial sample characteristics library, and sort from big to small by measured value, selects important
The detection characteristic set that feature is formed as characteristic of division;
Step 3:Grader generates operation:The detection selected uses characteristic set as the input parameter of vector machine interface,
Support vector machines interface in python is called, detection grader is obtained;
Step 4:Software under testing detection operation:It reads software under testing and extracts the feature of software under testing, use and divide as detection
The input of class device is detected software under testing with grader using detection, the boolean exported with sorter model according to detection
Value, judges that test sample extorts software or benign software.
2, the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, and feature exists
In:The step 1 is specially:Training sample apk is utilized the Android static analysis tools androguard's to increase income
Androlyze.py handles the apk files in training sample set;By in order decompiling training sample
Classes.dex files extract api features and package name features from training sample set, and pass through order
The file of manifest.xml in decompiling training sample, extracts permission features from training sample and intent is special
Sign;Then corresponding api features, package name features, api features and Intent features are respectively written into initial sample characteristics
In the AnalysisFile of library.
3, a kind of preferred Android of feature extorts software detecting method according to claim 2, and feature exists
In:In the step 1, the sample apk in gathering for qualified training is inputted, detailed process is
1.1 initial actuating;
The 1.2 initial sample characteristics library AnalysisFile of initialization, initial value are sky;
1.3 read the sample apk extorted in training sample set Ranset and benign training sample set;
1.4 carry out decompiling processing using android static analysis tools androguard to the apk files of input;
1.5 obtain generation class.dex files and manifest.xml files after androguard decompilings;
1.6 judge whether pending file is that class.dex files then go to 1.7 if it is class.dex files,
If not class.dex files, then 1.8 are gone to;
1.7 read class.dex byte code files, and byte code files are analyzed and marked using androgexf.py
Note;
1.8 read manifest.xml files, and are divided manifest.xml files using androapkinfo.py
It analyses and marks;
1.9 extract api features and package name features from class.dex byte code files;
1.10 extract permission features and intent features from manifest.xml files.
4, the preferred Android of a kind of feature described in accordance with the claim 3 extorts software detecting method, and feature exists
In:It in the step 1, exports as initial sample characteristics library AnalysisFile, detailed process is
2.1 the feature extracted is written in initial sample characteristics library AnalysisFile;
2.2 judge whether class.dex byte code files and manifest.xml files are labeled and handle, if
Labeled and processing, then go to 2.4, if completely labeled and processing, goes to 2.3;
2.3 read not labeled class.dex files and manifest.xml files;
2.4 judge that whether from sample set Ranset is extorted, sample set Ranset is extorted if come from for the sample,
2.5 are gone to, if not from sample set Ranset is extorted, then goes to step 2.6;
Initial sample characteristics library first row numerical value is set as 1 by 2.5;
Initial sample characteristics library first row numerical value is set as 0 by 2.6;
2.7 judge whether extort sample set Ranset and optimum sample set Benset traversals completes, if traversal does not have
There is completion, then go to step 1.1, if traversal is completed, goes to step 2.8;
2.8 part operations terminate.
5, the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, and feature exists
In:The step 2 is specially
Total value of each feature in the total value and benign software in extorting software is calculated first, is then passed through just
The numerical value of beginning sample characteristics library AnalysisFile first rows judges that this feature extorts the feature or benign software of software
Feature, to initial sample characteristics library AnalysisFile processing;Counting statistics goes out each feature and is extorting software and good
Property software in shared ratio;The initial sample characteristics library AnalysisFile of training sample is carried out using information gain algorithm
Corresponding processing, obtains the measured value of the information gain of each feature, and be ranked up by the size of information gain measured value, obtains
To detection characteristic set FeatureSet.
6, a kind of preferred Android of feature extorts software detecting method according to claim 5, and feature exists
In:The formula of described information gain algorithm is:
Wherein use XiIndicate that the feature in training sample, i indicate the ith feature in training sample, CmIt is big to be broadly divided into two
The value of class, m is expressed as benign software for 0, and the value of m is expressed as extorting software, P (X for 1i) indicate XiFeature occurs general
Rate,Then indicate feature XiThe probability not occurred, conditional probability P (Cm|Xi) indicate in feature XiThe case where appearance subordinate
In classification CmProbability.
7, a kind of preferred Android of feature extorts software detecting method according to claim 5, and feature exists
In:Input is the initial sample characteristics library AnalysisFile of training sample set in the step 2, and detailed process is
3.1 initial actuating;
3.2 read initial sample analysis file AnalysisFile;
3.3 call counting function, calculate in initial sample characteristics library AnalysisFile and extort sample size
NumtotalR;
3.4 call counting function, calculate optimum sample quantity in initial sample characteristics library AnalysisFile
NumtotalB;
3.5 initial characteristics set FeatureArray, initial value are sky;
3.6 judge whether initial sample characteristics library AnalysisFile first rows numerical value is 1, if first row numerical value is 1,
3.7 are then gone to, if first row numerical value is not 1, goes to 3.8;
3.7 calculate in initial sample characteristics library AnalysisFile, quantity numR of each feature in extorting software
[feature];
3.8 calculate in initial sample characteristics library AnalysisFile, quantity numB of each feature in benign software
[feature];
3.9 calculate ratio numR[ of each feature in extorting sample;feature]/NumTotalR;
3.10 calculate ratio numB[ of each feature in optimum sample;feature]/NumTotalB.
8, a kind of preferred Android of feature extorts software detecting method according to claim 5, and feature exists
In:Output is detection characteristic set FeatureSet in the step 2, and detailed process is
In the 4.1 ratio write-in characteristic matrix F eatureArray for accounting for each feature in extorting sample;
In the 4.2 ratio write-in characteristic matrix F eatureArray for accounting for each feature in optimum sample;
4.3 utilize information gain algorithm, calculate the information gain measured value Ig[ of each feature;feature];
Whether 4.4 judge detection with the information gain measured value in eigenmatrix FeatureArray more than detection threshold
Value, if it exceeds detection characteristic threshold value, then be transferred to 4.6, if being not above detection characteristic threshold value, is transferred to 4.5;
4.5 abandon this feature, are not processed, and terminate;
4.6 this feature is written in initial sample characteristics library AnalysisFile;
4.7 judge whether detection is completed with eigenmatrix FeatureArray traversals, if traversal is completed, are transferred to step
4.8, if traversal is not completed, it is transferred to 4.4;
4.8 export and generate detection characteristic set FeatureSet;
4.9 part operations terminate.
9, the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, and feature exists
In:The step 3 is specially
It from by information gain treated detection characteristic set FeatureSet, is handled again, from detection use
Select detection characteristic set subset FeatureSubset in characteristic set FeatureSet, and by the detection characteristic set
Inputs of the subset FeatureSubset as algorithm of support vector machine in machine learning algorithm, obtains detection grader;
Above process input is detection characteristic set FeatureSet, is exported as detection sorter model;Specifically
Flow is as follows:
5.0 initial actuating;
5.1 initialization detection characteristic set subset FeatureSubset, and it is sky that initial value, which is arranged,;
5.2 read detection characteristic set FeatureSet;
5.3 be to judge whether detection characteristic set subset is less than detection characteristic set threshold value, is used if reaching detection
Characteristic set threshold value, then be transferred to 5.6, if not reaching detection characteristic set threshold value, is transferred to 5.4;
5.4 select feature to be added to detection characteristic set subset in detection in characteristic set FeatureSet
In FeatureSubset;
5.5 delete this feature from detection in characteristic set FeatureSet;
5.6 generate detection character subset FeatureSubset;
Detection is used characteristic set subset FeatureSubset as support vector machines interface Svm () parameter by 5.7, is called
Support vector machines interface Svm () in Python machine learning library mlpy,;
5.8 obtain detection sorter model mysvm;
5.9 part operations terminate.
10, the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, and feature exists
In:The step 4 is specially
The characteristic set TestFeature of software under testing is extracted, detection sorter model is then inputted, is used according to detection
The output Boolean of sorter model as a result, discriminating test sample is Android extort software either Android it is benign soft
Part;
Above process input is the apk files of software to be detected, exports the type for sample to be tested;Specific flow is such as
Under:
6.0 initial actuating;
6.1 read the apk files of software under testing;
6.2, according to detection feature listed in characteristic set FeatureSet, extract software under testing apk individual features
TestFeature;
Characteristic set TestFeatureSet to be measured is written in the TestFeature extracted by 6.3;
6.4 call detection sorter model mysvm, using characteristic set TestFeatureSet to be measured as parameter, profit
With mysvm.predict () interface, to complete software under testing apk sort operations;
6.5 export whether the result of Boolean is 1 according to detection grader, if output result is 1, are transferred to 6.6,
If it is 1 to export result not, it is transferred to 6.7;
6.6 export the software under testing be the information for extorting software;
6.7 export the information that the software under testing is benign software;
6.8 part operations terminate.
Compared with prior art, the present invention haing the following advantages and effect:Android provided by the invention extorts software spy
Sign selection operation has filtered the feature of relevant redundancy according to information gain Feature Selection, preferably goes out in characteristic set efficiently
Feature, the quantity of feature in characteristic set can be efficiently reduced, and then reduce the time of detection classifier training and identification,
Detection speed is very fast;The detection process of the present invention, use by preferably with extort that software is directly related, discrimination is higher
Characteristic set trains detection grader, therefore higher to extorting the classification of software, accuracy of identification.
Description of the drawings
Fig. 1 is that a kind of preferred Android of feature of the present invention extorts the flow chart of software detecting method.
Fig. 2 is the flow chart of the step one of the present invention.
Fig. 3 is the flow chart of the step two of the present invention.
Fig. 4 is the flow chart of the step three of the present invention.
Fig. 5 is the flow chart of the step four of the present invention.
Specific implementation mode
The present invention is described in further detail below in conjunction with the accompanying drawings and by embodiment, and following embodiment is to this hair
Bright explanation and the invention is not limited in following embodiments.
As shown in Figure 1, a kind of preferred Android of feature based of the present invention extorts software detecting method, primary operational
Process includes sample characteristics extraction operation, sample characteristics selection operation, grader generates operation and software under testing detects operation etc.
Four parts.
Sample characteristics extraction operation:The each training sample being directed in training sample set, extracts each respectively
Permission features, intent features, api features and the package name features of training sample, are gone out by said extracted
The initial sample characteristics library of feature combination producing.
Sample characteristics selection operation:The initial sample characteristics library extracted is handled using the method for information gain,
The measured value of each feature in initial sample characteristics library is calculated, and is sorted from big to small by measured value, important feature is selected to make
For the detection characteristic set of characteristic of division composition.
Generate grader operation:The detection selected uses characteristic set as the input parameter of vector machine interface, calls
Support vector machines interface in python obtains detection grader.
Software under testing detection operation:It reads software under testing and extracts the feature of software under testing, as detection grader
Input, is detected software under testing with grader using detection, according to the Boolean that detection is exported with sorter model, judges
Test sample extorts software or benign software.
As shown in Fig. 2, being characterized extraction operation flow chart.The operation is mainly to utilize training sample apk to increase income
The androlyze.py of Android static analysis tools androguard handles the apk files in training sample set.
By classes.dex files in order decompiling training sample, extracted from training sample set api features and
Package name features, and by the file of manifest.xml in order decompiling training sample, from training sample
Extract permission features and intent features.Then will corresponding api features, package name features, api features and
Intent features are respectively written into initial sample characteristics library AnalysisFile.
The input of this process is the sample apk in qualified training set, is exported as initial sample characteristics library
AnalysisFile.Specific flow is as follows:Step 20 is initial actuating;Step 21 initializes initial sample characteristics library
AnalysisFile, initial value are sky;Step 22 reading is extorted in training sample set Ranset and benign training sample set
Sample apk;Step 23 carries out decompiling processing using android static analysis tools androguard to the apk files of input;
Step 23 generates class.dex files and manifest.xml files after obtaining androguard decompilings;Step 25 judgement waits for
Whether the file of processing is that class.dex files then go to step 26 if it is class.dex files, if not
Class.dex files, then go to step 27;Step 26 reads class.dex byte code files, and utilizes androgexf.py pairs
Byte code files are analyzed and are marked;Step 27 reads manifest.xml files, and utilizes androapkinfo.py pairs
Manifest.xml files are analyzed and are marked;Step 28 extracted from class.dex byte code files api features and
Package name features;Step 29 lifts permission features and intent features from manifest.xml files;Step
The feature extracted is written in initial sample characteristics library AnalysisFile rapid 2A;Step 2B judges class.dex bytecodes
Whether labeled and processing goes to step 2D if being labeled and handling for file and manifest.xml files, if
Completely labeled and processing, then go to step 2C;Step 2C read not labeled class.dex files and
Manifest.xml files;Step 2D judges the sample whether from sample set Ranset is extorted, if from sample is extorted
Set Ranset then goes to step 2E, if not from sample set Ranset is extorted, then goes to step 2F;Step 2E will
Initial sample characteristics library first row numerical value is set as 1;Initial sample characteristics library first row numerical value is set as 0 by step 2F;Step
Sample set Ranset is extorted in 2G judgements and optimum sample set Benset traversals are completed, if traversal is not completed, is gone to
Step 21, if traversal is completed, step 2H is gone to;The step 2H part operations terminate.
As shown in figure 3, being characterized selection operation flow chart.The operation is mainly the initial sample characteristics library that will be extracted
AnalysisFile processing.Sum of each feature in the total value and benign software in extorting software is calculated first
Value, then judges that this feature is to extort the feature of software by the numerical value of initial sample characteristics library AnalysisFile first rows
Or the feature of benign software, by initial sample characteristics library AnalysisFile processing.Counting statistics goes out each spy
Levy shared ratio in extorting software and benign software.Finally, using information gain algorithm to the initial sample of training sample
Feature database AnalysisFile is handled accordingly, obtains the measured value of the information gain of each feature, and press information gain
The size of measured value is ranked up, and obtains detection characteristic set FeatureSet.
The formula of information gain algorithm is:
Wherein use XiIndicate that the feature in training sample, i indicate the ith feature in training sample, CmIt is big to be broadly divided into two
The value of class, m is expressed as benign software for 0, and the value of m is expressed as extorting software, P (X for 1i) indicate XiFeature occurs general
Rate,Then indicate feature XiThe probability not occurred, conditional probability P (Cm|Xi) indicate in feature XiThe case where appearance subordinate
In classification CmProbability.
The input of this part operation is the initial sample characteristics library AnalysisFile of training sample set, is exported as detection
With characteristic set FeatureSet.Specific flow is as follows:Step 30 initial actuating, step 31 read initial sample analysis file
AnalysisFile;Step 32 calls counting function, calculates in initial sample characteristics library AnalysisFile and extorts sample size
NumtotalR;Step 33 calls counting function, calculates optimum sample quantity in initial sample characteristics library AnalysisFile
NumtotalB;Step 34 initial characteristics set FeatureArray, initial value are sky;Step 35 judges initial sample characteristics library
Whether AnalysisFile first rows numerical value is 1, if first row numerical value is 1, step 36 is gone to, if first row numerical value is not
It is 1, then goes to step 37;Step 36 calculates in initial sample characteristics library AnalysisFile, each feature is in extorting software
Quantity numR[feature];Step 37 calculates in initial sample characteristics library AnalysisFile, each feature is in benign software
In quantity numB[feature];Step 38 calculates ratio numR[ of each feature in extorting sample;feature]/
NumTotalR;Step 39 calculates ratio numB[ of each feature in optimum sample;feature]/NumTotalB;Step 3A
In the ratio write-in characteristic matrix F eatureArray that each feature is accounted in extorting sample;Step 3B is by each feature good
In the ratio write-in characteristic matrix F eatureArray accounted in property sample;Step 3C utilizes information gain algorithm, calculates each spy
The information gain measured value Ig[ of sign;feature];Step 3D judges that the information in detection eigenmatrix FeatureArray increases
Whether beneficial measured value is more than detection with threshold value (for example, setting between 0.05-0.25 the detection to characteristic threshold value), if
More than detection characteristic threshold value, then it is transferred to step 3F, if being not above detection characteristic threshold value, is transferred to step 3E;Step
3E abandons this feature, is not processed, and terminates;This feature is written in initial sample characteristics library AnalysisFile step 3F;Step
Rapid 3G judges whether detection is completed with eigenmatrix FeatureArray traversals, if traversal is completed, is transferred to step 3H, if
Traversal is not completed, then is transferred to step 3D;Step 3H is exported and is generated detection characteristic set FeatureSet;The portions step 3I
Operation is divided to terminate.
As shown in figure 4, constituent class device operational flowchart of making a living.The operation mainly from by information gain treated inspection
It surveys and uses characteristic set FeatureSet, handled again, it is special with detection is selected in characteristic set FeatureSet from detection
Zygote collection FeatureSubset is collected, and uses characteristic set subset FeatureSubset as machine learning algorithm the detection
The input of middle algorithm of support vector machine obtains detection grader.
The input of this process is detection characteristic set FeatureSet, is exported as detection sorter model.Specifically
Flow is as follows:Step 40 is initial actuating, and step 41 initializes detection characteristic set subset FeatureSubset, and is arranged
Initial value is sky;Step 42 reads detection characteristic set FeatureSet;Step 43 is to judge that detection is with characteristic set subset
It is no to be less than detection with characteristic set threshold value (for example, setting between 0.55-0.85 the detection to characteristic threshold value), if reached
To detection characteristic set threshold value, then it is transferred to step 46, if not reaching detection characteristic set threshold value, is transferred to step
44;Step 44 selects feature to be added to detection characteristic set subset in detection in characteristic set FeatureSet
In FeatureSubset;Step 45 deletes this feature from detection in characteristic set FeatureSet;Step 46 generates detection
With character subset FeatureSubset;Detection is used characteristic set subset FeatureSubset as support vector machines by step 47
Interface Svm () parameter calls support vector machines interface Svm () in Python machine learning library mlpy,;Step 48 is detected
With sorter model mysvm;Step 49 part operation terminates.
As shown in figure 5, detecting operational flowchart for software under testing.The operation is mainly to extract the characteristic set of software under testing
TestFeature, then input detection sorter model, according to detection sorter model output Boolean as a result,
Discriminating test sample is that Android extorts software either Android benign softwares.
The input of this process is the apk files of software to be detected, exports the type for sample to be tested.Specific flow is such as
Under:Step 50 is initial actuating;Step 51 is to read the apk files of software under testing;Step 52 is according to detection characteristic set
Listed feature in FeatureSet, extraction software under testing apk individual features TestFeature;Step 53 will extract
Characteristic set TestFeatureSet to be measured is written in TestFeature;Step 54 is to call detection sorter model mysvm,
It is to be measured soft to complete using mysvm.predict () interface using characteristic set TestFeatureSet to be measured as parameter
Part apk sort operations;Step 55 is to export whether the result of Boolean is 1 according to detection grader, if output result is
1, then it is transferred to step 56, if output result is not 1, is transferred to step 57;It is to extort software that step 56, which exports the software under testing,
Information;Step 57 exports the information that the software under testing is benign software;Step 58 part operation terminates.
A kind of preferred Android of feature based of present invention offer extorts software detecting method.According to initial characteristics set,
The feature of training sample is extracted, and forms initial sample characteristics library;The measured value of each feature in initial sample characteristics library is calculated,
Choose the feature that measured value is more than detection characteristic threshold value, forms detection characteristic set.It is trained with characteristic set using detection
The grader for extorting software obtains detection grader.Android provided by the invention extorts software features selection operation, root
It is believed that breath gain characteristics selection technique, has filtered the feature of relevant redundancy, has preferably gone out efficient feature in characteristic set, it can be effective
Ground reduces the quantity of feature in characteristic set, and then reduces the time of detection classifier training and identification, and detection speed is very fast;
The detection process of the present invention, use by preferably with extort that software is directly related, the higher characteristic set of discrimination is trained
Detection grader, thus it is higher to extorting the classification of software, accuracy of identification.
Described in this specification above content is only illustrations made for the present invention.Technology belonging to the present invention
The technical staff in field can do various modifications or supplement to described specific embodiment or substitute by a similar method, only
The guarantor of the present invention should all be belonged to without departing from the content or beyond the scope defined by this claim of description of the invention
Protect range.
Claims (10)
1. a kind of preferred Android of feature extorts software detecting method, it is characterised in that comprise the steps of:
Step 1:Sample characteristics extraction operation:The each training sample being directed in training sample set extracts each respectively
Permission features, intent features, api features and the package name features of a training sample, are gone out by said extracted
The initial sample characteristics library of feature combination producing;
Step 2:Sample characteristics selection operation:Using the method for information gain at the initial sample characteristics library that extracts
Reason, calculates the measured value of each feature in initial sample characteristics library, and sort from big to small by measured value, selects important feature
Detection characteristic set as characteristic of division composition;
Step 3:Grader generates operation:The detection selected uses characteristic set as the input parameter of vector machine interface, calls
Support vector machines interface in python obtains detection grader;
Step 4:Software under testing detection operation:It reads software under testing and extracts the feature of software under testing, as detection grader
Input, software under testing is detected with grader using detection, according to the Boolean that detection is exported with sorter model, is sentenced
Disconnected test sample extorts software or benign software.
2. the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, it is characterised in that:Institute
Stating step 1 is specially:Training sample apk is utilized the Android static analysis tools androguard's to increase income
Androlyze.py handles the apk files in training sample set;By in order decompiling training sample
Classes.dex files extract api features and package name features from training sample set, and pass through order
The file of manifest.xml in decompiling training sample, extracts permission features from training sample and intent is special
Sign;Then corresponding api features, package name features, api features and Intent features are respectively written into initial sample characteristics
In the AnalysisFile of library.
3. a kind of preferred Android of feature extorts software detecting method according to claim 2, it is characterised in that:Institute
It states in step 1, inputs the sample apk in gathering for qualified training, detailed process is
1.1 initial actuating;
The 1.2 initial sample characteristics library AnalysisFile of initialization, initial value are sky;
1.3 read the sample apk extorted in training sample set Ranset and benign training sample set;
1.4 carry out decompiling processing using android static analysis tools androguard to the apk files of input;
1.5 obtain generation class.dex files and manifest.xml files after androguard decompilings;
1.6 judge whether pending file is that class.dex files then go to 1.7 if it is class.dex files, if
It is not class.dex files, then goes to 1.8;
1.7 read class.dex byte code files, and byte code files are analyzed and marked using androgexf.py;
1.8 read manifest.xml files, and are analyzed simultaneously manifest.xml files using androapkinfo.py
Label;
1.9 extract api features and package name features from class.dex byte code files;
1.10 extract permission features and intent features from manifest.xml files.
4. the preferred Android of a kind of feature described in accordance with the claim 3 extorts software detecting method, it is characterised in that:Institute
It states in step 1, exports as initial sample characteristics library AnalysisFile, detailed process is
2.1 the feature extracted is written in initial sample characteristics library AnalysisFile;
2.2 judge whether class.dex byte code files and manifest.xml files are labeled and handle, if marked
Note and processing, then go to 2.4, if completely labeled and processing, goes to 2.3;
2.3 read not labeled class.dex files and manifest.xml files;
2.4 judge that the sample whether from sample set Ranset is extorted, if from sample set Ranset is extorted, is gone to
2.5, if not from sample set Ranset is extorted, then go to step 2.6;
Initial sample characteristics library first row numerical value is set as 1 by 2.5;
Initial sample characteristics library first row numerical value is set as 0 by 2.6;
2.7 judge whether extort sample set Ranset and optimum sample set Benset traversals completes, if traversal is not complete
At, then go to step 1.1, if traversal complete, go to step 2.8;
2.8 part operations terminate.
5. the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, it is characterised in that:Institute
Stating step 2 is specially
Total value of each feature in the total value and benign software in extorting software is calculated first, then passes through initial sample
The numerical value of eigen library AnalysisFile first rows judges that this feature extorts the feature of software or the spy of benign software
Sign, to initial sample characteristics library AnalysisFile processing;Counting statistics goes out each feature and is extorting software and benign soft
Shared ratio in part;The initial sample characteristics library AnalysisFile of training sample is carried out using information gain algorithm corresponding
Processing, obtain the measured value of the information gain of each feature, and be ranked up by the size of information gain measured value, examined
It surveys and uses characteristic set FeatureSet.
6. a kind of preferred Android of feature extorts software detecting method according to claim 5, it is characterised in that:Institute
The formula for stating information gain algorithm is:
Wherein use XiIndicate that the feature in training sample, i indicate the ith feature in training sample, CmIt is broadly divided into two major classes, m
Value be 0 to be expressed as benign software, the value of m is 1 to be expressed as extorting software, P (Xi) indicate XiThe probability that feature occurs,Then indicate feature XiThe probability not occurred, conditional probability P (Cm|Xi) indicate in feature XiBelong to class in the case of appearance
Other CmProbability.
7. a kind of preferred Android of feature extorts software detecting method according to claim 5, it is characterised in that:Institute
The initial sample characteristics library AnalysisFile that input in step 2 is training sample set is stated, detailed process is
3.1 initial actuating;
3.2 read initial sample analysis file AnalysisFile;
3.3 call counting function, calculate in initial sample characteristics library AnalysisFile and extort sample size NumtotalR;
3.4 call counting function, calculate optimum sample quantity NumtotalB in initial sample characteristics library AnalysisFile;
3.5 initial characteristics set FeatureArray, initial value are sky;
3.6 judge whether initial sample characteristics library AnalysisFile first rows numerical value is 1, if first row numerical value is 1, turn
To 3.7, if first row numerical value is not 1,3.8 are gone to;
3.7 calculate in initial sample characteristics library AnalysisFile, quantity numR of each feature in extorting software
[feature];
3.8 calculate in initial sample characteristics library AnalysisFile, quantity numB of each feature in benign software
[feature];
3.9 calculate ratio numR[ of each feature in extorting sample;feature]/NumTotalR;
3.10 calculate ratio numB[ of each feature in optimum sample;feature]/NumTotalB.
8. a kind of preferred Android of feature extorts software detecting method according to claim 5, it is characterised in that:Institute
It is detection characteristic set FeatureSet to state output in step 2, and detailed process is
In the 4.1 ratio write-in characteristic matrix F eatureArray for accounting for each feature in extorting sample;
In the 4.2 ratio write-in characteristic matrix F eatureArray for accounting for each feature in optimum sample;
4.3 utilize information gain algorithm, calculate the information gain measured value Ig[ of each feature;feature];
4.4 judge whether the information gain measured value in detection eigenmatrix FeatureArray is more than detection threshold value, such as
Fruit is more than detection characteristic threshold value, then is transferred to 4.6, if being not above detection characteristic threshold value, is transferred to 4.5;
4.5 abandon this feature, are not processed, and terminate;
4.6 this feature is written in initial sample characteristics library AnalysisFile;
4.7 judge whether detection is completed with eigenmatrix FeatureArray traversals, if traversal is completed, are transferred to step 4.8,
If traversal is not completed, it is transferred to 4.4;
4.8 export and generate detection characteristic set FeatureSet;
4.9 part operations terminate.
9. the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, it is characterised in that:Institute
Stating step 3 is specially
From by information gain treated detection characteristic set FeatureSet, handled again, from detection feature
Select detection characteristic set subset FeatureSubset in set FeatureSet, and by detection characteristic set subset
Inputs of the FeatureSubset as algorithm of support vector machine in machine learning algorithm, obtains detection grader;
Above process input is detection characteristic set FeatureSet, is exported as detection sorter model;Specific flow
It is as follows:
5.0 initial actuating;
5.1 initialization detection characteristic set subset FeatureSubset, and it is sky that initial value, which is arranged,;
5.2 read detection characteristic set FeatureSet;
5.3 be to judge whether detection characteristic set subset is less than detection characteristic set threshold value, if reaching detection feature
Gather threshold value, is then transferred to 5.6, if not reaching detection characteristic set threshold value, is transferred to 5.4;
5.4 select feature to be added to detection characteristic set subset in detection in characteristic set FeatureSet
In FeatureSubset;
5.5 delete this feature from detection in characteristic set FeatureSet;
5.6 generate detection character subset FeatureSubset;
Detection is used characteristic set subset FeatureSubset as support vector machines interface Svm () parameter by 5.7, is called
Support vector machines interface Svm () in Python machine learning library mlpy,;
5.8 obtain detection sorter model mysvm;
5.9 part operations terminate.
10. the preferred Android of a kind of feature described in accordance with the claim 1 extorts software detecting method, it is characterised in that:Institute
Stating step 4 is specially
The characteristic set TestFeature of software under testing is extracted, detection sorter model is then inputted, is classified according to detection
The output Boolean of device model as a result, discriminating test sample is Android extort software either Android benign softwares;
Above process input is the apk files of software to be detected, exports the type for sample to be tested;Specific flow is as follows:
6.0 initial actuating;
6.1 read the apk files of software under testing;
6.2, according to detection feature listed in characteristic set FeatureSet, extract software under testing apk individual features
TestFeature;
Characteristic set TestFeatureSet to be measured is written in the TestFeature extracted by 6.3;
6.4 call detection sorter model mysvm, using characteristic set TestFeatureSet to be measured as parameter, utilize
Mysvm.predict () interface, to complete software under testing apk sort operations;
6.5 export whether the result of Boolean is 1 according to detection grader, if output result is 1, are transferred to 6.6, if
It is 1 to export result not, then is transferred to 6.7;
6.6 export the software under testing be the information for extorting software;
6.7 export the information that the software under testing is benign software;
6.8 part operations terminate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810585511.2A CN108710802A (en) | 2018-06-08 | 2018-06-08 | A kind of preferred Android of feature extorts software detecting method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810585511.2A CN108710802A (en) | 2018-06-08 | 2018-06-08 | A kind of preferred Android of feature extorts software detecting method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108710802A true CN108710802A (en) | 2018-10-26 |
Family
ID=63872466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810585511.2A Pending CN108710802A (en) | 2018-06-08 | 2018-06-08 | A kind of preferred Android of feature extorts software detecting method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108710802A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI682303B (en) * | 2018-12-11 | 2020-01-11 | 中華電信股份有限公司 | Computer system and ransomware detection method thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740424A (en) * | 2016-01-29 | 2016-07-06 | 湖南大学 | Spark platform based high efficiency text classification method |
CN107180192A (en) * | 2017-05-09 | 2017-09-19 | 北京理工大学 | Android malicious application detection method and system based on multi-feature fusion |
CN107491531A (en) * | 2017-08-18 | 2017-12-19 | 华南师范大学 | Chinese network comment sensibility classification method based on integrated study framework |
CN107577942A (en) * | 2017-08-22 | 2018-01-12 | 中国民航大学 | A kind of composite character screening technique for Android malware detection |
-
2018
- 2018-06-08 CN CN201810585511.2A patent/CN108710802A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740424A (en) * | 2016-01-29 | 2016-07-06 | 湖南大学 | Spark platform based high efficiency text classification method |
CN107180192A (en) * | 2017-05-09 | 2017-09-19 | 北京理工大学 | Android malicious application detection method and system based on multi-feature fusion |
CN107491531A (en) * | 2017-08-18 | 2017-12-19 | 华南师范大学 | Chinese network comment sensibility classification method based on integrated study framework |
CN107577942A (en) * | 2017-08-22 | 2018-01-12 | 中国民航大学 | A kind of composite character screening technique for Android malware detection |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI682303B (en) * | 2018-12-11 | 2020-01-11 | 中華電信股份有限公司 | Computer system and ransomware detection method thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107609399A (en) | Malicious code mutation detection method based on NIN neutral nets | |
Singh et al. | Potato plant leaves disease detection and classification using machine learning methodologies | |
CN106096411B (en) | A kind of Android malicious code family classification methods based on bytecode image clustering | |
CN104216349B (en) | Utilize the yield analysis system and method for the sensing data of manufacturing equipment | |
CN109359439A (en) | Software detecting method, device, equipment and storage medium | |
CN107315954A (en) | A kind of file type identification method and server | |
CN103106365B (en) | The detection method of the malicious application software on a kind of mobile terminal | |
CN106709349B (en) | A kind of malicious code classification method based on various dimensions behavioural characteristic | |
CN109753800A (en) | Merge the Android malicious application detection method and system of frequent item set and random forests algorithm | |
CN108229499A (en) | Certificate recognition methods and device, electronic equipment and storage medium | |
CN109101469A (en) | The information that can search for is extracted from digitized document | |
CN109598124A (en) | A kind of webshell detection method and device | |
CN104331436A (en) | Rapid classification method of malicious codes based on family genetic codes | |
CN106530200A (en) | Deep-learning-model-based steganography image detection method and system | |
CN110414277B (en) | Gate-level hardware Trojan horse detection method based on multi-feature parameters | |
CN108022146A (en) | Characteristic item processing method, device, the computer equipment of collage-credit data | |
CN106845220A (en) | A kind of Android malware detecting system and method | |
CN110263566B (en) | Method for detecting and classifying authority-raising behaviors of massive logs | |
CN111915437A (en) | RNN-based anti-money laundering model training method, device, equipment and medium | |
CN109933851B (en) | Bench endurance test data processing and analyzing method | |
Cai et al. | Machine learning algorithms improve the power of phytolith analysis: A case study of the tribe Oryzeae (Poaceae) | |
CN107958154A (en) | A kind of malware detection device and method | |
CN104156690A (en) | Gesture recognition method based on image space pyramid bag of features | |
CN104933365B (en) | A kind of malicious code based on calling custom automates homologous decision method and system | |
Al‐Tahhan et al. | Accurate automatic detection of acute lymphatic leukemia using a refined simple classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181026 |
|
RJ01 | Rejection of invention patent application after publication |