CN106203117A

CN106203117A - A kind of malice mobile applications decision method based on machine learning

Info

Publication number: CN106203117A
Application number: CN201610547624.4A
Authority: CN
Inventors: 何清林; 马秀娟; 张家琦; 王子厚; 王大伟; 朱佳伟; 刘培朋; 李海灵
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2016-07-12
Filing date: 2016-07-12
Publication date: 2016-12-07

Abstract

The invention discloses a kind of malice mobile applications decision method based on machine learning, by whether being that malicious act automatically learns and judges to the combined network communication behavior of this application program, and then judge that whether this application program is the method for malice；The method relates to moving to the fields such as application program detection, can be used to develop the application program of similar detection function, is arranged separately on smart mobile phone use, it is also possible to support the application program malice detection kit etc. that the exploitation of third party testing agency is special.

Description

A kind of malice mobile applications decision method based on machine learning

Technical field

The invention belongs to mobile Internet security technology area, be specifically related to a kind of malice based on machine learning move should Use programmed decision method.

Background technology

Along with smart mobile phone is more and more universal, various mobile applications emerge in an endless stream, the application of corresponding all kinds of malice Program occurs the most therewith.A lot of the most resident backstages of rogue program, steal the privacy of user data such as user communication record, note also on Passing to remote service end, or infection becomes corpse wooden horse and controlled node, in the case of user's the unknown, DDoS is initiated in timing Attacking, to individual privacy, network security etc. all causes the biggest infringement.

How going to identify and detect which application program is malice, becomes a difficulties.The most much sides of detection Method is all to make a decision according to simple behavior characteristics, formulates corresponding baseline strategy, if the behavior of certain application program is special Levy and exceeded baseline, it is determined that malice.This type of method is typically all the detection analysis for single application program, lacks complete The association analysis of office's property, the most well uses the knowledge base etc. of the overall situation.

Summary of the invention

In view of this, it is an object of the invention to provide a kind of malice mobile applications judgement side based on machine learning Method, it is possible to judge on smart mobile phone whether mobile applications is rogue program.

A kind of malice mobile applications decision method based on machine learning, comprises the steps:

S1, first collect a number of normal mobile applications and malice mobile applications；

S2, smart mobile phone is connected into network, this smart mobile phone is installed successively and starts each application journey that S1 obtains Sequence, and trigger each application program by manual operation, network is carried out lasting monitoring, gets all nets of application program Network Content of communciation, extracts all data messages of the request content that application program is sent to remote server；

S3, all transmission solicited messages that each application program is captured, by putting in order, initial and end successively is connected, It is integrated into a long character string, records the classification of application program corresponding to this long character string, i.e. normal mobile applications or evil Meaning mobile applications；For each long character string, if length N represents, respectively from the 1st, 2 ... N number of character starts backward Intercepted length is the character cell of M, searches the character cell repeated, and records number of repetition；Described M is long much smaller than long character string Degree N；

All different character cells corresponding to S4, each application program obtained by S3 is as feature space, character Unit number of repetition, as eigenvalue, forms a sample, according to the record of S3, sample carries out category label, all application journeys The sample that ordered pair is answered forms training sample set, and uses the method for machine learning to carry out secondary classification learning training, obtains one Grader；

S5, the mobile applications that certain needs is judged, the method first using S2, it is thus achieved that this mobile applications institute All data messages of the request content sent, then the method using S3, obtain string elements and number of repetition, finally utilize The grader that S4 obtains judges that this mobile applications is whether as malice mobile applications.

It is also preferred that the left application program in described S1 by the industry organizations such as similar China anti-virus network alliance share black The open channel of list and white list application program obtains.

It is also preferred that the left described machine learning uses support vector machines theory of learning.

It is also preferred that the left the kernel functional parameter used in SVM study is gaussian kernel function.

It is also preferred that the left described M value is 4 or 5.

There is advantages that

The invention discloses and a kind of on smart mobile phone, judge that whether mobile applications is the method for rogue program, pass through Whether the combined network communication behavior to this application program is that malicious act automatically learns and judges, and then judge this application program Whether it is method maliciously.The method relates to moving to the fields such as application program detection, can be used to develop similar detection function Application program, be arranged separately on smart mobile phone use, it is also possible to support the third party testing agency special application journey of exploitation Sequence malice detection kit etc..

Detailed description of the invention

Major part rogue program all can have the networking behavior accompanied therewith, actively can send request to far-end server and disappear Breath, general by http protocol or other proprietary protocols, this type of protocol data has generally comprised privacy of user data or The relevant informations such as wooden horse control.The invention provides a kind of method based on machine Learning Theory, it is possible to well utilize these The information that rogue program sends, utilizes support vector machine learning algorithm to learn to corresponding model, to unknown mobile process Connected network communication behavior automatically learns and judges, it is achieved judge that this application program is whether as the function of rogue program；

For solving the problems referred to above, the invention provides and a kind of based on machine learning, whether the networking behavior of application program is disliked Meaning judges, and then judges application program method the most maliciously, and it is as follows that the method comprising the steps of:

S1, first collect a number of normal mobile process and malice mobile process, this two classes application program can lead to Cross the shared blacklist of the industry organizations such as similar China's anti-virus network alliance (anva.org.cn) and white list application program etc. Open channel obtains；

S2, smart mobile phone is connected into network, this smart mobile phone is installed successively and starts these application programs, and leading to Crossing manual operation and trigger this application program, the network port at local network or this smart mobile phone carries out lasting network prison Listen, got the all-network Content of communciation of this application program by network packet capturing, extract this application program to remote service All data messages of the request content that device is sent；

S3, all transmission solicited messages that each application program is captured, such as HTTP request content, or other API content etc., by putting in order, initial and end successively is connected, and is handled as follows the most again:

If this sample of S3.1 is rogue program, the class label labelling 1 of sample；Otherwise, 0 it is labeled as；

The request data that each application program sample is sent by S3.2, it is assumed that its long string length is N, respectively from the 1st, 2 ... it is the string elements of M that N number of character starts intercepted length backward, searches the string elements repeated, and records repetition time Number；Described M is much smaller than long string length N；M is typically based on empirical value and chooses 4 or 5；

M metacharacter collection element in all samples as feature space, the eigenvalue of the feature of each sample is by S3.3 The number of times that M metacharacter collection corresponding to this feature occurs in this sample

S4, each application program sample that S3.3 is obtained all different M metacharacter unit as feature space, Sample, as eigenvalue, is carried out normal or malice according to the record of S3 by the number of times that M metacharacter repeats in this sample Category label, forms a sample.Sample corresponding for all application programs collected in S1 is formed training sample set, and uses Support vector machines theory of learning carries out secondary classification learning training, obtains a grader；

The kernel functional parameter used in SVM study is gaussian kernel function；

In sum, these are only presently preferred embodiments of the present invention, be not intended to limit protection scope of the present invention. All within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. made, should be included in the present invention's Within protection domain.

Claims

1. a malice mobile applications decision method based on machine learning, it is characterised in that comprise the steps:

S2, smart mobile phone is connected into network, this smart mobile phone is installed successively and starts each application program that S1 obtains, and And trigger each application program by manual operation, and network is carried out lasting monitoring, the all-network getting application program leads to News content, extracts all data messages of the request content that application program is sent to remote server；

S3, all transmission solicited messages being captured each application program, by putting in order, initial and end successively is connected, and integrates Becoming a long character string, record the classification of application program corresponding to this long character string, i.e. normal mobile applications or malice are moved Dynamic application program；For each long character string, if length N represents, respectively from the 1st, 2 ... N number of character starts to intercept backward The character cell of a length of M, searches the character cell repeated, and records number of repetition；Described M is much smaller than long string length N；

All different character cells corresponding to S4, each application program obtained by S3 is as feature space, character cell Number of repetition, as eigenvalue, forms a sample, according to the record of S3, sample carries out category label, all application programs pair The sample answered forms training sample set, and uses the method for machine learning to carry out secondary classification learning training, obtains a classification Device；

S5, the mobile applications that certain needs is judged, the method first using S2, it is thus achieved that this mobile applications is sent All data messages of request content, then the method using S3, obtain string elements and number of repetition, finally utilize S4 to obtain To grader judge that this mobile applications is whether as malice mobile applications.

2. malice mobile applications decision method based on machine learning as claimed in claim 1, it is characterised in that described Blacklist that application program in S1 is shared by the industry organization such as anti-virus network alliance of similar China and white list application journey The open channel of sequence obtains.

3. malice mobile applications decision method based on machine learning as claimed in claim 1, it is characterised in that described Machine learning uses support vector machines theory of learning.

4. malice mobile applications decision method based on machine learning as claimed in claim 3, it is characterised in that SVM The kernel functional parameter used in study is gaussian kernel function.

5. malice mobile applications decision method based on machine learning as claimed in claim 1, it is characterised in that described M value is 4 or 5.