CN107194251B - Malicious application detection method and device for Android platform - Google Patents

Malicious application detection method and device for Android platform Download PDF

Info

Publication number
CN107194251B
CN107194251B CN201710214419.0A CN201710214419A CN107194251B CN 107194251 B CN107194251 B CN 107194251B CN 201710214419 A CN201710214419 A CN 201710214419A CN 107194251 B CN107194251 B CN 107194251B
Authority
CN
China
Prior art keywords
android application
application
malicious
android
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710214419.0A
Other languages
Chinese (zh)
Other versions
CN107194251A (en
Inventor
朱大立
金昊
杨莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201710214419.0A priority Critical patent/CN107194251B/en
Publication of CN107194251A publication Critical patent/CN107194251A/en
Application granted granted Critical
Publication of CN107194251B publication Critical patent/CN107194251B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Abstract

The invention provides a method and a device for detecting malicious applications of an Android platform, wherein the method comprises the following steps: calling a FlowDroid tool, and extracting the static data flow characteristics of the Android application to be tested; processing the static data stream characteristics of the Android application to be tested by utilizing an SUSI technology to generate a characteristic vector of the data stream of the Android application to be tested; and inputting the generated characteristic vector of the data stream of the Android application to be detected into a pre-trained deep belief network detection model to obtain a detection result of whether the Android application to be detected is a malicious application. The method can accurately detect the malicious application of the Android platform, avoid the problem of path coverage existing in dynamic stain tracking, overcome two challenges that a static data flow analysis technology needs to accurately model an application operation process and accurately acquire a target component for communication among components, realize accurate and comprehensive extraction of the sensitive data flow of the Android application, and overcome the limitation of the traditional shallow machine learning algorithm in the construction of a detection model.

Description

Malicious application detection method and device for Android platform
Technical Field
The invention relates to the technical field of mobile security and machine learning, in particular to a method and a device for detecting malicious application of an Android platform.
Background
In the field of mobile intelligent terminals, a large amount of malicious software exists, and the malicious software can be used for concealing and acquiring private data stored on equipment by a user without being noticed by the user and sending the private data to a mailbox or a server of an attacker, so that great troubles are brought to financial security and privacy security of the user. With the popularization of Android platform intelligent terminals, privacy stealing attack and malicious application detection technologies on the Android intelligent terminals are more and more valued by people.
At present, the existing data flow analysis technology for detecting malicious applications of an Android platform mainly comprises: dynamic taint tracking and static dataflow analysis techniques. And the dynamic stain tracking is to perform stain marking on the sensitive data, perform dynamic tracking on the stain data when the application runs, and judge whether malicious leakage occurs. The static data flow analysis technology is used for monitoring the information flow direction of the application in the whole system by constructing a function call graph of the application, analyzing the reachable functions one by one and transmitting sensitive source information.
However, dynamic taint tracking faces the challenge of how to cover all code paths in a program; in addition, part of malicious applications can judge the existence of the dynamic monitor and hide the malicious behavior of the dynamic monitor, so that certain false negative exists in the detection result. The static data flow analysis technology needs to accurately model the application operation flow; moreover, Android applications employ a large amount of inter-component communication, where one component can send an intent to call another component, into which data may be placed. How to accurately obtain the target component of inter-component communication is also a difficulty of static dataflow analysis techniques.
In addition, due to the wide application of machine learning and data mining technologies in recent years, the Android platform malicious application is also detected by using a traditional machine learning algorithm. The method firstly uses a static method to extract the characteristics of the application such as authority, API call, function call and the like, or uses a dynamic method to extract the characteristics of the application such as behavior, system parameter change, system call and the like; then, a machine learning algorithm is selected, such as: the decision tree, naive Bayes, a support vector machine and the like train the feature data to construct a malicious application detection model; and finally, the security of the application is judged by using the model.
However, different machine learning algorithms have different detection results for the same type of application behavior feature. And a proper machine learning algorithm is selected to process the proper type of application behavior characteristics, which is of great importance to the final detection result. Meanwhile, the traditional machine learning algorithm has a shallow model structure and has certain influence on the final detection result.
In view of this, how to provide a method and a device for detecting malicious applications of an Android platform to avoid the problem of path coverage existing in dynamic stain tracking, overcome two challenges that a static data stream analysis technology needs to accurately model an application running process and accurately acquire a target component for inter-component communication, realize accurate and comprehensive extraction of sensitive data streams of the Android application, overcome limitations existing in a traditional shallow machine learning algorithm in building a detection model, and realize accurate detection of malicious applications of the Android platform, which is a technical problem that needs to be solved at present.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method and a device for detecting malicious applications of an Android platform, which can avoid the problem of path coverage existing in dynamic stain tracking, overcome two challenges that a static data flow analysis technology needs to accurately model an application operation process and accurately acquire a target component for communication between components, realize accurate and comprehensive extraction of sensitive data flows of the Android application, overcome limitations existing in a traditional shallow machine learning algorithm during construction of a detection model, and realize accurate detection of malicious applications of the Android platform.
In a first aspect, the invention provides a method for detecting malicious applications of an Android platform, which comprises the following steps:
calling a FlowDroid tool, and extracting the static data flow characteristics of the Android application to be tested;
processing the static data stream characteristics of the Android application to be tested by utilizing an SUSI technology to generate a characteristic vector of the data stream of the Android application to be tested;
and inputting the generated characteristic vector of the data stream of the Android application to be detected into a pre-trained deep belief network detection model to obtain a detection result of whether the Android application to be detected is a malicious application.
Optionally, before the feature vector of the generated data stream of the Android application to be tested is input into a pre-trained deep confidence network detection model, the method further includes:
obtaining an Android application sample, wherein the Android application sample comprises: a secure Android application sample and a malicious Android application sample;
calling a FlowDroid tool, and extracting the static data flow characteristics of the Android application sample;
processing the static data stream characteristics of the Android application sample by using an SUSI technology to generate a characteristic vector of the data stream of the Android application sample;
and training according to the characteristic vector of the data stream of the Android application sample, and constructing a deep confidence network detection model.
Optionally, the training according to the feature vector of the data stream of the sample Android application to construct a deep belief network detection model includes:
taking the feature vectors of data streams of unmarked safe Android application samples and malicious Android application samples as the input of the RBM of the restricted Boltzmann machine at the bottommost layer, adopting an unsupervised learning method, pre-training a plurality of layers of RBMs from bottom to top layer by layer, and generating a Deep Belief Network (DBN) until the DBN is in a balanced state;
adding a classification layer after the last hidden layer of the DBN network;
and inputting the characteristic vectors of the marked data streams of the safe Android application samples and the malicious Android application samples into the classification layer, and finely adjusting the parameters of each layer of the whole network layer by layer from top to bottom by adopting a supervised learning method until convergence.
Optionally, the step of inputting the feature vectors of the data streams of the marked safe Android application samples and the marked malicious Android application samples into the classification layer, and fine-tuning parameters of each layer of the whole network layer by layer from top to bottom by adopting a supervised learning method until convergence includes:
and inputting the characteristic vectors of the marked data streams of the safe Android application sample and the malicious Android application sample into the classification layer, and adopting a back propagation BP algorithm to supervise and finely adjust the parameters of each layer of the whole network until convergence.
Optionally, after obtaining a detection result of whether the Android application to be detected is a malicious application, the method further includes:
if the Android application to be tested is the malicious application, prompting the user that the Android application to be tested is the malicious application, and displaying an analysis report that the Android application to be tested is the malicious application to the user;
and if the to-be-tested Android application is not malicious application, prompting a user that the to-be-tested Android application is not malicious application.
In a second aspect, the present invention provides an apparatus for detecting malicious applications on an Android platform, including:
the second extraction module is used for calling a FlowDroid tool and extracting the static data flow characteristics of the Android application to be tested;
the second processing module is used for processing the static data stream characteristics of the Android application to be detected by utilizing the SUSI technology and generating the characteristic vector of the data stream of the Android application to be detected;
and the detection module is used for inputting the generated characteristic vector of the data stream of the Android application to be detected into a pre-trained deep belief network detection model to obtain a detection result of whether the Android application to be detected is a malicious application.
Optionally, the apparatus further comprises:
the system comprises an acquisition module and a processing module, wherein the acquisition module is used for acquiring an Android application sample, and the Android application sample comprises: a secure Android application sample and a malicious Android application sample;
the first extraction module is used for calling a FlowDroid tool and extracting the static data flow characteristics of the Android application sample;
the first processing module is used for processing the static data stream characteristics of the Android application sample by utilizing the SUSI technology to generate a characteristic vector of the data stream of the Android application sample;
and the construction module is used for training according to the characteristic vector of the data stream of the Android application sample and constructing a deep confidence network detection model.
Optionally, the building module includes:
the pre-training unit is used for pre-training a plurality of layers of RBMs from bottom to top by adopting an unsupervised learning method by taking the feature vectors of the data streams of the unmarked safe Android application samples and the malicious Android application samples as the input of the RBMs of the restricted Boltzmann machine at the bottommost layer, and generating a Deep Belief Network (DBN) until the DBN is in a balanced state;
an adding unit, configured to add a classification layer after a last hidden layer of the DBN network;
and the fine tuning unit is used for inputting the characteristic vectors of the marked data streams of the safe Android application samples and the malicious Android application samples into the classification layer, and fine tuning parameters of each layer of the whole network layer by layer from top to bottom by adopting a supervised learning method until convergence.
Optionally, the fine tuning unit, in particular for
And inputting the characteristic vectors of the marked data streams of the safe Android application sample and the malicious Android application sample into the classification layer, and adopting a back propagation BP algorithm to supervise and finely adjust the parameters of each layer of the whole network until convergence.
Optionally, the apparatus further comprises:
the first prompting module is used for prompting the user that the Android application to be tested is malicious application if the Android application to be tested is the malicious application, and displaying an analysis report that the Android application to be tested is the malicious application to the user;
and the second prompting module is used for prompting the user that the Android application to be tested is not malicious application if the Android application to be tested is not malicious application.
According to the technical scheme, the Android platform malicious application detection method and device provided by the invention have the advantages that the static data flow characteristics of the Android application to be detected are extracted by calling a FlowDry tool, the static data flow characteristics of the Android application to be detected are processed by utilizing the SUSI technology to generate the characteristic vector of the data flow of the Android application to be detected, the generated characteristic vector of the data flow of the Android application to be detected is input into a pre-trained deep belief network detection model to obtain the detection result of whether the Android application to be detected is a malicious application, therefore, the path coverage problem existing in dynamic stain tracking can be avoided, the two challenges that the static data flow analysis technology needs to accurately model the application operation flow and needs to accurately obtain a target component for communication among the components are overcome, the accurate and comprehensive extraction of the sensitive data flow of the Android application is realized, and the limitation existing in the traditional shallow machine learning algorithm in the construction of the detection model is overcome, and the malicious application of the Android platform can be accurately detected.
Drawings
Fig. 1 is a schematic flowchart of a method for detecting malicious applications on an Android platform according to an embodiment of the present invention;
FIG. 2 is a diagram of step 101 of FIG. 1 invoking the FlowDroid tool to extract a data flow from source to sink in an example application;
fig. 3 is a schematic structural diagram of an Android platform malicious application detection apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 shows a flowchart of a method for detecting malicious applications on an Android platform according to an embodiment of the present invention, and as shown in fig. 1, the method for detecting malicious applications on an Android platform according to the embodiment is as follows.
101. And calling a FlowDroid tool, and extracting the static data flow characteristics of the Android application to be tested.
It is understood that an Android application contains multiple components, such as Activity, Service, ContentProvider, Broadcast Receiver, where Activity is the main entry of analysis. Unlike a traditional Java program, no main function exists in an Android application (program), so that in the analysis process, a control flow graph cannot be constructed by simply finding an inlet and an outlet of the program through the main function. However, each component of the Android application has a function to reflect the lifecycle of the component, and a control flow graph of the Android application can be constructed according to the lifecycle. In order to generate a control flow graph of an Android application, the embodiment calls a FlowDroid, which is an open source tool proposed by Arzt, s. The implementation principle of FlowDroid is explained below with an example of a piece of code in some Android application.
Figure BDA0001261859780000071
Figure BDA0001261859780000081
The application program reads the position information of the user, including longitude, latitude and detailed address, by calling an API provided by the Baidu map, and then sends the position information to the mobile phone number '+ 111' through a short message. This simple application reflects a common privacy attack approach, and sensitive data streams flow from locationclient (source) to SMS API (sink), resulting in privacy leakage.
The data flow analysis process of the FlowDroid tool is execution order dependent and can be divided into two parts: forward taint analysis is used to find out where the contaminated variable passed, and backward request-name analysis is used to find all aliases to the same contaminated flap location before source. Referring to fig. 2, FlowDroid extracts the data flow from source to sink in the above application using the following steps:
(1) the longitude, latitude and detail obtained from source are passed forward to the function loc (detail, latitude, longitude) as contaminated variables;
(2) when the function loc (detail, latiude, length) is called, it is found that length, latiude and detail are passed in as parameters. Therefore, backward analysis is performed to find that parameters a, b and c of the calling function, i.e. parameters longtude, latitude and detail of the called function, pollute a, b and c.
(3) Continuing to track contamination of a, b and c, l;
(4) finding l is the return value of the called function, carrying out backward analysis, and polluting the location;
(5) and carrying out forward analysis on the location to judge that the location is transmitted to the sink.
It will be appreciated that since the analysis process of the FlowDroid tool is context dependent, it is possible to effectively distinguish between different calls based on different parameters to the function loc (). Meanwhile, the FlowDroid tool handles the invocation of library functions in a special handwriting summarization way. In addition, the FlowDroid tool adopts a large number of optimization means to expand the operation scale and reduce noise, so that sensitive data stream characteristic data in the Android application can be accurately and comprehensively extracted.
102. And processing the static data stream characteristics of the Android application to be tested by utilizing the SUSI technology to generate a characteristic vector of the data stream (normalized) of the Android application to be tested.
It will be appreciated that the original static data flow features extracted by the FlowDroid tool contain the complete source and sink function names in the information. However, in the Android function library, there are thousands of source and sink functions, and only a few of these functions are called in one application. If the data flow between these functions is used as a feature, the feature vector is a sparse vector and processing is required to improve the training results. Therefore, the present embodiment utilizes the SUSI technique proposed by Rasthofer, s. The SUSI technology is based on a machine learning algorithm, and can judge that the SUSI technology belongs to a source function or a sink function according to a function code; meanwhile, the SUSI technology divides the current source function and sink function into 17 source classes and 19 sink classes respectively; wherein:
the 17 source classes include:
(1)UNIQUE_IDENTIFIER;
(2)LOCATION_INFORMATION;
(3)NETWORK_INFORMATION;
(4)ACCOUNT_INFORMATION;
(5)FILE_INFORMATION;
(6)BLUETOOTH_INFORMATION;
(7)DATABASE_INFORMATION;
(8)EMAIL;
(9)SYNCHRONIZATION_DATA;
(10)SMS_MMS;
(11)CONTACT_INFORMATION;
(12)CALENDAR_INFORMATION;
(13)SYSTEM_SETTING;
(14)IMAGE;
(15)BROWSER_INFORMATION;
(16)NFC;
(17)NO_CATEGORY。
the 19 sink classes include:
(1)LOCATION_INFORMATION;
(2)PHONE_CONNECTION;
(3)VOIP;
(4)PHONE_STATE;
(5)EMAIL;
(6)BLUETOOTH;
(7)ACCOUNT_SETTING;
(8)AUDIO;
(9)SYNCHRONIZATION_DATA;
(10)NETWORK;
(11)FILE;
(12)LOG;
(13)SMS_MMS;
(14)CONTACT_INFORMATION;
(15)CALENDAR_INFORMATION;
(16)SYSTEM_SETTING;
(17)NFC;
(18)BROWSER_INFORMATION;
(19)NO_CATEGORY。
the information types and the information leakage paths of some malicious applications can be more clearly known through SUSI classification.
In this embodiment, two technologies, namely a FlowDroid tool and a SUSI, are combined, 323 data stream features can be extracted from each Android application, and the feature vector can be expressed as:
features(app)=(src_category1→sink_category1,src_category1→sink_category2,…,src_category17→src_category18,src_category17→src_category19)
when a source class src _ categoryi(i ═ 1,2, …,17) and a sink class sink_categoryj(j ═ 1,2, …,19) when there is a data stream between them, the corresponding value src _ category in the feature vectori→sink_categoryjOtherwise, the value is 0.
It can be understood that step 102 is to process the data flow features by using an open-source sensitive API classification tool SUSI, and the formed feature vector can better reflect the flow direction of the sensitive data inside each application, in view of the problem that the data flow features extracted by FlowDroid are not beneficial to training.
103. And inputting the generated characteristic vector of the data stream of the Android application to be detected into a pre-trained deep belief network detection model to obtain a detection result of whether the Android application to be detected is a malicious application.
In a specific application, before the step 103, the method further includes steps S1-S4 not shown in the figure:
s1, obtaining an Android application sample, wherein the Android application sample comprises: a secure Android application sample and a malicious Android application sample.
Specifically, in step S1, a crawler technology may be used to continuously fetch the latest Android malicious applications from the network as malicious Android application samples in the Android application samples, and a crawler technology may be used to continuously fetch the latest Android security applications from the network as security Android application samples in the Android application samples.
In a specific application, the step S1 may divide the Android application sample into two parts: one part is an unmarked safe Android application sample and a malicious Android application sample, and the other part is a marked safe Android application sample and a malicious Android application sample.
And S2, calling a FlowDroid tool, and extracting the static data flow characteristics of the Android application sample.
Specifically, the principle of the FlowDroid tool in this step can be referred to the description of step 101, and is not described herein again.
And S3, processing the static data stream characteristics of the Android application sample by using the SUSI technology, and generating the characteristic vector of the data stream of the Android application sample.
Specifically, the SUSI technique in this step may refer to the description of step 102, and is not described here again.
And S4, training according to the characteristic vectors of the data streams of the Android application samples, and constructing a deep confidence network detection model.
It can be understood that the deep confidence network detection model based on deep learning has deeper structure and stronger feature description capability than the traditional shallow model, so that the application security reflected by the application data stream features can be mined more deeply, and the detection accuracy is higher.
Specifically, the step S4 may specifically include steps S41-S43 not shown in the figure:
s41, taking the feature vectors of data streams of the unlabeled safe Android application samples and the malicious Android application samples as the input of the RBMs of the restricted Boltzmann machine at the bottommost layer, and pre-training the RBMs at multiple layers from bottom to top by adopting an unsupervised learning method to generate a Deep Belief Network (DBN) until the DBN is in a balanced state.
It should be noted that the deep belief network DBN is composed of a plurality of restricted boltzmann machine RBM layers, each RBM layer includes a visible layer for receiving input data and a hidden layer for outputting data, there is a connection between the layers, but there is no connection between the units in the layers.
For example, in step S41, the feature vector of the data stream of the unlabeled secure Android application sample may be used as an input of the lowest restricted boltzmann machine RBM, and a layer-by-layer greedy algorithm is adopted to pre-train multiple layers of RBMs layer by layer from bottom to top unsupervised, so as to generate the deep confidence network DBN until the DBN network is in an equilibrium state.
And S42, adding a classification layer after the last hidden layer of the DBN network.
And S43, inputting the characteristic vectors of the marked data streams of the safe Android application samples and the malicious Android application samples into the classification layer, and finely adjusting the parameters of each layer of the whole network layer by layer from top to bottom by adopting a supervised learning method until convergence.
For example, in the step S43, the characteristic vectors of the data streams of the marked safe Android application samples and the marked malicious Android application samples may be input into the classification layer, and parameters of each layer of the entire network are supervised and finely adjusted by using a Back Propagation (BP) algorithm until convergence.
It should be noted that, because the training process of each layer of RBM is independent, the training of each layer of RBM can only obtain the best expression of the hidden layer to the visible layer of the layer. However, the entire DBN network is not the best representation of the input data. Therefore, a classification layer (supervised learning network), such as a BP neural network, is added after the last hidden layer of the DBN network. Thus, the whole DBN network can be regarded as a multi-layer BP neural network, fine adjustment is carried out from top to bottom, and the process can be regarded as initialization of a deep BP network weight parameter. This training process of the DBN can effectively ameliorate possible problems of local optimality and long training times. Different classifiers can be selected by the top supervised learning layer (i.e. classification layer) of the DBN network according to different specific applications. Therefore, training of a deep belief network detection model for detecting the Android malicious application based on the deep learning algorithm is completed.
In a specific application, after the step 103, the method of this embodiment may further include:
if the Android application to be tested is the malicious application, prompting the user that the Android application to be tested is the malicious application, and displaying a detailed analysis report that the Android application to be tested is the malicious application to the user;
and if the to-be-tested Android application is not malicious application, prompting a user that the to-be-tested Android application is not malicious application.
The Android malicious application for realizing privacy stealing attack is different from the Android security application on one hand and has certain same place with other malicious applications on the other hand in a mode of processing sensitive data in the application. Therefore, in the embodiment, the machine learning algorithm is used for analyzing the differences and the similarities, the development conditions of the data stream analysis technology of the Android platform intelligent terminal and the malicious application detection technology based on the machine learning algorithm are mainly concerned, deep analysis is performed based on sensitive data stream information in the Android application, and the differences between the malicious application and the security application in sensitive data processing are explored, so that malicious application is detected.
In the Android platform malicious application detection method of the embodiment, a FlowDroid tool is called to extract static data flow characteristics of an Android application to be detected, the static data flow characteristics of the Android application to be detected are processed by utilizing an SUSI technology to generate characteristic vectors of the data flow of the Android application to be detected, the generated characteristic vectors of the data flow of the Android application to be detected are input into a pre-trained depth confidence network detection model to obtain a detection result of whether the Android application to be detected is malicious application, so that the Android platform malicious application can be detected with high accuracy, the problem of path coverage existing in dynamic stain tracking can be avoided, two challenges that an application operation flow needs to be accurately modeled and a target component needs to be accurately obtained for communication among the components in a static data flow analysis technology are overcome, accurate and comprehensive extraction of sensitive data flow of the Android application is realized, and limitations existing in a traditional shallow layer machine learning algorithm in construction of a detection model are overcome at the same time, the detection rate of unknown malicious applications can be improved to a great extent.
Fig. 3 shows a schematic structural diagram of an Android platform malicious application detection apparatus according to an embodiment of the present invention, and as shown in fig. 3, the Android platform malicious application detection apparatus according to the embodiment includes: a second extraction module 31, a second processing module 32 and a detection module 33; wherein:
the second extraction module 31 is configured to invoke a FlowDroid tool and extract a static data flow feature of the Android application to be tested;
the second processing module 32 is configured to process the static data stream characteristics of the Android application to be tested by using the SUSI technology, and generate a characteristic vector of the data stream of the Android application to be tested;
the detection module 33 is configured to input the generated feature vector of the data stream of the Android application to be detected into a pre-trained deep belief network detection model, and obtain a detection result of whether the Android application to be detected is a malicious application.
In a specific application, the apparatus may further include:
the system comprises an acquisition module and a processing module, wherein the acquisition module is used for acquiring an Android application sample, and the Android application sample comprises: a secure Android application sample and a malicious Android application sample;
the first extraction module is used for calling a FlowDroid tool and extracting the static data flow characteristics of the Android application sample;
the first processing module is used for processing the static data stream characteristics of the Android application sample by utilizing the SUSI technology to generate a characteristic vector of the data stream of the Android application sample;
and the construction module is used for training according to the characteristic vector of the data stream of the Android application sample and constructing a deep confidence network detection model.
In specific application, the obtaining module may continuously capture the latest Android malicious application from the network as a malicious Android application sample in the Android application samples by using a crawler technology, and continuously capture the latest Android security application from the network as a security Android application sample in the Android application samples by using the crawler technology.
In a specific application, the obtaining module may divide the Android application sample into two parts: one part is an unmarked safe Android application sample and a malicious Android application sample, and the other part is a marked safe Android application sample and a malicious Android application sample.
In a specific application, the building module may include:
the pre-training unit is used for pre-training a plurality of layers of RBMs from bottom to top by adopting an unsupervised learning method by taking the feature vectors of the data streams of the unmarked safe Android application samples and the malicious Android application samples as the input of the RBMs of the restricted Boltzmann machine at the bottommost layer, and generating a Deep Belief Network (DBN) until the DBN is in a balanced state;
an adding unit, configured to add a classification layer after a last hidden layer of the DBN network;
and the fine tuning unit is used for inputting the characteristic vectors of the marked data streams of the safe Android application samples and the malicious Android application samples into the classification layer, and fine tuning parameters of each layer of the whole network layer by layer from top to bottom by adopting a supervised learning method until convergence.
Specifically, for example, the fine tuning unit may input the feature vectors of the data streams of the marked security Android application samples and malicious Android application samples into the classification layer, and perform supervised fine tuning on parameters of each layer of the entire network by using a back propagation BP algorithm until convergence.
In a specific application, the apparatus according to this embodiment may further include:
the first prompting module is used for prompting the user that the Android application to be tested is malicious application if the Android application to be tested is the malicious application, and displaying an analysis report that the Android application to be tested is the malicious application to the user;
and the second prompting module is used for prompting the user that the Android application to be tested is not malicious application if the Android application to be tested is not malicious application.
The Android malicious application for realizing privacy stealing attack is different from the Android security application on one hand and has certain same place with other malicious applications on the other hand in a mode of processing sensitive data in the application. Therefore, in the embodiment, the machine learning algorithm is used for analyzing the differences and the similarities, the development conditions of the data stream analysis technology of the Android platform intelligent terminal and the malicious application detection technology based on the machine learning algorithm are mainly concerned, deep analysis is performed based on sensitive data stream information in the Android application, and the differences between the malicious application and the security application in sensitive data processing are explored, so that malicious application is detected.
It should be noted that, for the device/system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.
The Android platform malicious application detection device can detect Android platform malicious applications with high accuracy, can avoid the problem of path coverage existing in dynamic stain tracking, overcomes two challenges that an application operation flow needs to be accurately modeled and a target component for inter-component communication needs to be accurately acquired by a static data flow analysis technology, realizes accurate and comprehensive extraction of Android application sensitive data flows, overcomes limitations existing in a traditional shallow machine learning algorithm during construction of a detection model, and can improve the detection rate of unknown malicious applications to a great extent.
The Android platform malicious application detection device of this embodiment may be configured to execute the technical scheme of the foregoing method embodiment, and the implementation principle and the technical effect of the device are similar, which are not described herein again.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The terms "upper", "lower", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the description of the present invention, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention is not limited to any single aspect, nor is it limited to any single embodiment, nor is it limited to any combination and/or permutation of these aspects and/or embodiments. Moreover, each aspect and/or embodiment of the present invention may be utilized alone or in combination with one or more other aspects and/or embodiments thereof.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (6)

1. A malicious application detection method for an Android platform is characterized by comprising the following steps:
calling a FlowDroid tool, and extracting the static data flow characteristics of the Android application to be tested;
processing the static data stream characteristics of the Android application to be tested by utilizing an SUSI technology to generate a characteristic vector of the data stream of the Android application to be tested;
inputting the generated characteristic vector of the data stream of the Android application to be detected into a pre-trained deep belief network detection model to obtain a detection result of whether the Android application to be detected is a malicious application;
before the feature vector of the generated data stream of the Android application to be tested is input into a pre-trained deep confidence network detection model, the method further comprises the following steps:
obtaining an Android application sample, wherein the Android application sample comprises: a secure Android application sample and a malicious Android application sample;
calling a FlowDroid tool, and extracting the static data flow characteristics of the Android application sample;
processing the static data stream characteristics of the Android application sample by using an SUSI technology to generate a characteristic vector of the data stream of the Android application sample;
training according to the characteristic vector of the data stream of the Android application sample, and constructing a deep confidence network detection model;
the training according to the characteristic vector of the data stream of the Android application sample to construct the deep belief network detection model comprises the following steps:
taking the feature vectors of data streams of unmarked safe Android application samples and malicious Android application samples as the input of the RBM of the restricted Boltzmann machine at the bottommost layer, adopting an unsupervised learning method, pre-training a plurality of layers of RBMs from bottom to top layer by layer, and generating a Deep Belief Network (DBN) until the DBN is in a balanced state; the unsupervised learning method comprises a greedy algorithm layer by layer;
adding a classification layer after the last hidden layer of the DBN network;
and inputting the characteristic vectors of the marked data streams of the safe Android application samples and the malicious Android application samples into the classification layer, and finely adjusting the parameters of each layer of the whole network layer by layer from top to bottom by adopting a supervised learning method until convergence.
2. The method according to claim 1, wherein the step of inputting the characteristic vectors of the data streams of the marked safe Android application samples and the malicious Android application samples into the classification layer, and the step of fine-tuning parameters of all layers of the whole network layer by layer from top to bottom by adopting a supervised learning method until convergence comprises the following steps:
and inputting the characteristic vectors of the marked data streams of the safe Android application sample and the malicious Android application sample into the classification layer, and adopting a back propagation BP algorithm to supervise and finely adjust the parameters of each layer of the whole network until convergence.
3. The method according to claim 1 or 2, wherein after the obtaining of the detection result of whether the Android application to be detected is a malicious application, the method further comprises:
if the Android application to be tested is the malicious application, prompting the user that the Android application to be tested is the malicious application, and displaying an analysis report that the Android application to be tested is the malicious application to the user;
and if the to-be-tested Android application is not malicious application, prompting a user that the to-be-tested Android application is not malicious application.
4. An Android platform malicious application detection device, comprising:
the second extraction module is used for calling a FlowDroid tool and extracting the static data flow characteristics of the Android application to be tested;
the second processing module is used for processing the static data stream characteristics of the Android application to be detected by utilizing the SUSI technology and generating the characteristic vector of the data stream of the Android application to be detected;
the detection module is used for inputting the generated characteristic vector of the data stream of the Android application to be detected into a pre-trained deep belief network detection model to obtain a detection result of whether the Android application to be detected is malicious application;
the device further comprises:
the system comprises an acquisition module and a processing module, wherein the acquisition module is used for acquiring an Android application sample, and the Android application sample comprises: a secure Android application sample and a malicious Android application sample;
the first extraction module is used for calling a FlowDroid tool and extracting the static data flow characteristics of the Android application sample;
the first processing module is used for processing the static data stream characteristics of the Android application sample by utilizing the SUSI technology to generate a characteristic vector of the data stream of the Android application sample;
the construction module is used for training according to the characteristic vector of the data stream of the Android application sample and constructing a deep confidence network detection model;
the building module comprises:
the pre-training unit is used for pre-training a plurality of layers of RBMs from bottom to top by adopting an unsupervised learning method by taking the feature vectors of the data streams of the unmarked safe Android application samples and the malicious Android application samples as the input of the RBMs of the restricted Boltzmann machine at the bottommost layer, and generating a Deep Belief Network (DBN) until the DBN is in a balanced state; the unsupervised learning method comprises a greedy algorithm layer by layer;
an adding unit, configured to add a classification layer after a last hidden layer of the DBN network;
and the fine tuning unit is used for inputting the characteristic vectors of the marked data streams of the safe Android application samples and the malicious Android application samples into the classification layer, and fine tuning parameters of each layer of the whole network layer by layer from top to bottom by adopting a supervised learning method until convergence.
5. Device according to claim 4, characterized in that the fine-tuning unit, in particular for
And inputting the characteristic vectors of the marked data streams of the safe Android application sample and the malicious Android application sample into the classification layer, and adopting a back propagation BP algorithm to supervise and finely adjust the parameters of each layer of the whole network until convergence.
6. The apparatus of claim 4 or 5, further comprising:
the first prompting module is used for prompting the user that the Android application to be tested is malicious application if the Android application to be tested is the malicious application, and displaying an analysis report that the Android application to be tested is the malicious application to the user;
and the second prompting module is used for prompting the user that the Android application to be tested is not malicious application if the Android application to be tested is not malicious application.
CN201710214419.0A 2017-04-01 2017-04-01 Malicious application detection method and device for Android platform Active CN107194251B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710214419.0A CN107194251B (en) 2017-04-01 2017-04-01 Malicious application detection method and device for Android platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710214419.0A CN107194251B (en) 2017-04-01 2017-04-01 Malicious application detection method and device for Android platform

Publications (2)

Publication Number Publication Date
CN107194251A CN107194251A (en) 2017-09-22
CN107194251B true CN107194251B (en) 2020-02-14

Family

ID=59871820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710214419.0A Active CN107194251B (en) 2017-04-01 2017-04-01 Malicious application detection method and device for Android platform

Country Status (1)

Country Link
CN (1) CN107194251B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108718310B (en) * 2018-05-18 2021-02-26 安徽继远软件有限公司 Deep learning-based multilevel attack feature extraction and malicious behavior identification method
CN110532773B (en) * 2018-05-25 2023-04-07 阿里巴巴集团控股有限公司 Malicious access behavior identification method, data processing method, device and equipment
CN110555305A (en) * 2018-05-31 2019-12-10 武汉安天信息技术有限责任公司 Malicious application tracing method based on deep learning and related device
CN110858247A (en) * 2018-08-23 2020-03-03 北京京东尚科信息技术有限公司 Android malicious application detection method, system, device and storage medium
CN109508545B (en) * 2018-11-09 2021-06-04 北京大学 Android Malware classification method based on sparse representation and model fusion
CN110472415B (en) * 2018-12-13 2021-08-10 成都亚信网络安全产业技术研究院有限公司 Malicious program determination method and device
CN110096265B (en) * 2019-05-09 2023-06-20 趋新科技(北京)有限公司 Software design method, software design tool and software operation platform based on data stream and element
CN113110986A (en) * 2020-01-13 2021-07-13 深信服科技股份有限公司 WebShell script file detection method and system
CN113111346A (en) * 2020-01-13 2021-07-13 深信服科技股份有限公司 Multi-engine WebShell script file detection method and system
CN112287341A (en) * 2020-09-22 2021-01-29 哈尔滨安天科技集团股份有限公司 Android malicious application detection method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793650A (en) * 2013-12-02 2014-05-14 北京邮电大学 Static analysis method and static analysis device for Android application program
CN104392174A (en) * 2014-10-23 2015-03-04 腾讯科技(深圳)有限公司 Generation method and device for characteristic vectors of dynamic behaviors of application program
CN105320887A (en) * 2015-10-12 2016-02-10 湖南大学 Static characteristic extraction and selection based detection method for Android malicious application
CN106096415A (en) * 2016-06-24 2016-11-09 康佳集团股份有限公司 A kind of malicious code detecting method based on degree of depth study and system
CN106228068A (en) * 2016-07-21 2016-12-14 江西师范大学 Android malicious code detecting method based on composite character

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7526804B2 (en) * 2004-02-02 2009-04-28 Microsoft Corporation Hardware assist for pattern matches
CN101266550B (en) * 2007-12-21 2011-02-16 北京大学 Malicious code detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793650A (en) * 2013-12-02 2014-05-14 北京邮电大学 Static analysis method and static analysis device for Android application program
CN104392174A (en) * 2014-10-23 2015-03-04 腾讯科技(深圳)有限公司 Generation method and device for characteristic vectors of dynamic behaviors of application program
CN105320887A (en) * 2015-10-12 2016-02-10 湖南大学 Static characteristic extraction and selection based detection method for Android malicious application
CN106096415A (en) * 2016-06-24 2016-11-09 康佳集团股份有限公司 A kind of malicious code detecting method based on degree of depth study and system
CN106228068A (en) * 2016-07-21 2016-12-14 江西师范大学 Android malicious code detecting method based on composite character

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于混合特征的恶意安卓程序检测方法研究与实现;徐林溪;《中国优秀硕士学位论文全文数据库信息科技辑》;20170315(第3期);论文第1.2节至第5.3节 *

Also Published As

Publication number Publication date
CN107194251A (en) 2017-09-22

Similar Documents

Publication Publication Date Title
CN107194251B (en) Malicious application detection method and device for Android platform
CN109379377B (en) Encrypted malicious traffic detection method and device, electronic equipment and storage medium
US10665251B1 (en) Multi-modal anomaly detection
US20180018970A1 (en) Neural network for recognition of signals in multiple sensory domains
KR102057565B1 (en) Computing device to detect malware
JP6239807B1 (en) Method and system for behavior analysis of mobile device behavior based on user persona information
CN105426760B (en) A kind of detection method and device of Android malicious application
CN105376255B (en) A kind of Android platform intrusion detection method based on K-means cluster
KR20190072563A (en) Method and apparatus for detecting facial live varnish, and electronic device
US20210227349A1 (en) Geo-fence selection system
CN110138745B (en) Abnormal host detection method, device, equipment and medium based on data stream sequence
CN107112008A (en) Recognition sequence based on prediction
KR20170115532A (en) Methods and systems for detecting fake user interactions with a mobile device for improved malware protection
JP7071504B2 (en) Distributed identification in networked systems
AU2014235429A1 (en) Multi-factor location verification
KR20160048708A (en) Recognition method and apparatus for communication message
CN111460446A (en) Malicious file detection method and device based on model
Shezan et al. Read between the lines: An empirical measurement of sensitive applications of voice personal assistant systems
CN109495513B (en) Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium
CN109657539B (en) Face value evaluation method and device, readable storage medium and electronic equipment
KR20240036624A (en) Privacy Safe Joint Identification Protocol
US11366890B1 (en) Input/output privacy tool
JP2020509622A (en) Wireless network type detection method and apparatus and electronic device
CN106600243A (en) Mobile payment method and system based on mixed mode
US11652768B2 (en) Systems, devices, and methods for determining a non-ephemeral message status in a communication system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant