CN108647517B - Vulnerability detection system and method for Android mixed application code injection - Google Patents

Vulnerability detection system and method for Android mixed application code injection Download PDF

Info

Publication number
CN108647517B
CN108647517B CN201810473411.0A CN201810473411A CN108647517B CN 108647517 B CN108647517 B CN 108647517B CN 201810473411 A CN201810473411 A CN 201810473411A CN 108647517 B CN108647517 B CN 108647517B
Authority
CN
China
Prior art keywords
application
code
codes
detected
vulnerability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810473411.0A
Other languages
Chinese (zh)
Other versions
CN108647517A (en
Inventor
李瑞轩
涂建伟
汤俊伟
韩洪木
辜希武
张婧
代德顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201810473411.0A priority Critical patent/CN108647517B/en
Publication of CN108647517A publication Critical patent/CN108647517A/en
Application granted granted Critical
Publication of CN108647517B publication Critical patent/CN108647517B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security

Abstract

The invention discloses a vulnerability detection system and method for Android mixed application code injection, wherein the system comprises an authority feature extraction module, a data channel feature extraction module and a vulnerability detection module authority feature extraction module, wherein the authority feature extraction module is used for extracting a sensitive authority application set of a mixed application to be detected from the mixed application code to be detected; the data channel feature extraction module is used for extracting a source point set and a receiving point set of a data channel from the hybrid application code to be detected; and the first input end of the vulnerability detection module is connected with the output end of the authority feature extraction module, and the second input end of the vulnerability detection module is connected with the output end of the data channel feature extraction module and used for judging whether vulnerability codes are injected into the hybrid application to be detected or not by utilizing the vulnerability detection model according to the sensitive authority and the source point set and the receiving point set of the data channel. Compared with the traditional detection method based on the control flow and the program call graph, the method has the advantages of higher efficiency, high classification accuracy and good usability.

Description

Vulnerability detection system and method for Android mixed application code injection
Technical Field
The invention belongs to the field of mobile security and vulnerability detection, and particularly relates to a vulnerability detection system and method for Android mixed application code injection.
Background
With the development of internet technology and the portability of mobile terminals, smart phones are increasingly popular. Meanwhile, in order to ensure the daily life and entertainment requirements of users, developers are developing more and more applications, and the application market, such as GooglePlay, provides people with various applications, such as social contact, shopping, games, photographing, and news. However, the security problem brought by the smart phone is more serious, and the private data (geographical location, address book, account password) of the user is also exposed in the smart phone, which becomes an attack target of a malicious user. Market research results show that by the fourth quarter of 2017, AndroidOS has a market share of 86.1% and much higher than a 12.1% IOS. But also because of its openness, the Android platform security issues are becoming more severe. The Tencent safety union laboratory 2017 shows that as the smart phone is widely used and the manufacturing cost of malicious programs of an Android platform is reduced, the number of Android virus packages is increased by 465 thousands, and the Android virus packages are increased by 21.42% in a same ratio. On different operating system platforms, developers need to develop application programs of different program language versions based on the system.
In order to solve the problem, more and more developers are concerned with a hybrid application based on standard Web technology, which uses technologies such as HTML5, CSS, JavaScript, and the like, and has good development convenience and cross-platform portability. But all operating systems do not support JS and HTML by themselves, and in order to be able to expose the HTML5 user interface and execute JavaScript code, this application needs to be embedded in a Web browser component (called WebView in Android and UIWebView in IOS). The Android hybrid application integrates hybrid application frameworks such as PhoneGap, AppMobi and Mosync. In this way, these applications can call native Java code and can access system resources. Although the hybrid application has good cross-platform portability, the technology using the embedded Web browser breaks through a sandbox protection mechanism of an android operating system for the Web browser, a certain security problem may be caused while external resources are accessed, and a JavaScript code of a program is easily attacked by code injection, so that a series of potential security hazards such as leakage of data of a user are caused.
In Web sites, this attack is known as a cross-site scripting attack, and an attacker can sneak malicious JS code into links, databases, or forms. If the server does not recognize the attack codes, the JS engine executes the malicious codes when the data is loaded and exposed in the browser. For these mixed applications based on the PhoneGap framework, when the user accepts external data using channels such as WiFi, bluetooth, two-dimensional code and SMS, malicious code can also be injected into the device through this interactive mode, thereby realizing an attack means. In the prior art, the method for detecting the Android mixed application code injection vulnerability basically stays in the traditional static analysis, in the traditional static detection technology, as the mixed application receives external data channels and a plurality of APIs capable of triggering malicious scripts, the constructed function call graph is complicated to analyze, when an APK file is large or the number of APKs to be detected is large, the function vulnerability call sequence searching efficiency is poor, and some asynchronous function calls may be missed.
Disclosure of Invention
Aiming at the defects, the invention provides a vulnerability detection system and a vulnerability detection method for Android mixed application code injection, aiming at solving the technical problem that the existing vulnerability detection system finds a vulnerability attack way by constructing a function call graph to realize vulnerability detection, so that the existing vulnerability detection method is complex.
In order to achieve the above object, as another aspect of the present invention, the present invention provides a vulnerability detection system for Android mixed application code injection, including:
the authority feature extraction module is used for extracting a sensitive authority application set of the hybrid application to be detected from the hybrid application code to be detected;
the data channel feature extraction module is used for extracting a source point set and a receiving point set of the data channel from the hybrid application code to be detected; and
the vulnerability detection module is connected with the output end of the authority feature extraction module at a first input end and is connected with the output end of the data channel feature extraction module at a second input end, and is used for judging whether vulnerability codes are injected into the hybrid application to be detected or not by using a vulnerability detection model according to the sensitive authority and the source point set and the receiving point set of the data channel;
the vulnerability detection model is obtained by learning the characteristics of the hole-free mixed application code and the characteristics of the vulnerability mixed application code, wherein the characteristics comprise a sensitive authority application set and a source point set and a receiving point set of a data channel.
Preferably, the authority feature extraction module comprises a first data preprocessing unit and a sensitive authority feature extraction unit, wherein the input end of the sensitive authority feature extraction unit is connected with the output end of the first data preprocessing unit;
the first data preprocessing unit is used for extracting configuration application codes from the mixed application codes to be detected, and the sensitive authority feature extracting unit is used for extracting a sensitive authority application set from the configuration application codes of the mixed application codes to be detected according to a preset sensitive authority set.
Preferably, the data channel feature extraction module comprises a second data preprocessing unit and a channel head and tail point extraction unit, wherein the input end of the channel head and tail point extraction unit is connected with the output end of the second data preprocessing unit;
the second data preprocessing unit is used for decomposing the code for realizing the application interface and the service logic into a plurality of code chips according to the calling relationship, the channel head and end point extracting unit is used for extracting the plug-in function from each code chip as a source point of a code chip data channel, taking an API (application programming interface) triggering script codes in data transmitted by the plug-in as a receiving point of the code chip data channel, and combining the source points and the receiving points of the code data channels of all the code chips as a source point set and a receiving point set of the data channel to output.
Preferably, the data channel feature extraction module further includes a data filtering unit having an output end connected to an input end of the second data preprocessing unit, and is configured to remove plug-in library function codes, JQuery API function code annotation codes, and obfuscated codes from the application interface and service logic codes.
As another aspect of the present invention, the present invention provides a vulnerability detection method for Android mixed application code injection, including the following steps:
step S110: extracting a sensitive authority application set of the hybrid application to be detected from the configuration application code of the hybrid application code to be detected; extracting a source point set and a receiving point set of a data channel from an implementation application interface and a service logic code of the hybrid application code to be detected;
step S120: judging whether vulnerability codes are injected into the hybrid application to be detected or not by utilizing a vulnerability detection model according to the sensitive authority application set and the source point set and the receiving point set of the data channel;
the vulnerability detection model is obtained by learning the characteristics of the hole-free mixed application code and the characteristics of the vulnerability mixed application code, wherein the characteristics comprise a sensitive authority application set and a source point set and a receiving point set of a data channel.
Preferably, in step S110, a sensitive permission application set is extracted from the configuration application code according to a preset sensitive permission set; the preset sensitive permission set is obtained according to the corresponding relation between a plug-in API and permission application used when a JavaScript code in WebView accesses the resources of the mobile equipment.
Preferably, step S110 obtains the source point and the sink point of the data channel by:
step S111: code fragmentation processing is carried out on the application interface and the service logic codes according to the mutual calling relation, and fragmentation codes are output;
step S112: extracting plug-in function characteristics from the fragment codes according to a preset plug-in set to serve as source points of a fragment code data channel, extracting API (application programming interface) characteristics of script codes in trigger plug-in transmission data to serve as receiving points of the fragment code data channel;
step S113: combining the source points and the receiving points of all the fragment code data channels to be used as a source point set and a receiving point set of the data channels for output;
the preset plug-in set is a plug-in which the mobile device needs to use for data transmission with the outside or other local applications.
Preferably, the following steps are further included before step S111:
and removing plug-in library function codes, API function code annotation codes of JQuery and confusion codes from the implementation application interface and the business logic codes of the hybrid application codes to be detected.
Through the technical scheme, compared with the existing detection technology, the invention has the following beneficial effects:
1. the vulnerability detection system extracts the sensitive authority and the source point and the receiving point of the data channel in the mixed application as vulnerability detection characteristics, and obtains a vulnerability detection model by using a machine learning method; by extracting features from the hybrid application code to be detected, the vulnerability detection model is utilized, and effective vulnerability detection can be performed on the hybrid application.
2. According to the vulnerability detection system, the plug-in library function codes, the API function code annotation codes of JQuery and the confusion codes are removed from the application interface and the service logic codes through the data filtering unit, so that part of code confusion can be resisted, and the vulnerability detection efficiency is improved.
3. The vulnerability detection system provided by the invention integrates the code fragments when extracting the unsafe API for injecting the channel and triggering the malicious code, more effectively finds the characteristics of the source point and the receiving point, improves the effectiveness of characteristic extraction and improves the accuracy of vulnerability detection.
Drawings
Fig. 1 is an overall framework diagram of a vulnerability detection system for Android hybrid application code injection provided by the present invention;
FIG. 2 is a processing flow chart of the vulnerability detection method for Android hybrid application code injection provided by the invention;
fig. 3 is a detailed flowchart of step S210 in the vulnerability detection method provided in the embodiment of the present invention;
fig. 4 is a detailed flowchart of step S220 in the vulnerability detection method provided in the embodiment of the present invention;
fig. 5 is a detailed flowchart of step S230 in the vulnerability detection method according to the embodiment of the present invention;
fig. 6 is a detailed flowchart of step S240 in the vulnerability detection method according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and improvements of the present invention more comprehensible, the present invention is described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The technical terms of the present invention are explained and explained first:
android OS: the Linux kernel-based mobile operating system developed by Google has a high occupancy rate in the market of the mobile operating system due to the open characteristic of the Linux kernel-based mobile operating system;
android mixed application: the Android hybrid application mainly calls JS and Native, a mechanism of 'one-time development and multi-output operation' is realized from the development level, the Android hybrid application is really suitable for cross-platform development, and the Android hybrid application has the advantages of good user experience of Native application and low cost of cross-platform development by using HTML 5. The Android mixed application mainly comprises an HTML5 cloud website and an APP client, and has two main parts in the aspect of implementation: the first part is a WebView component consisting of HTML, CSS and JavaScript codes, the WebView has the main function in application that a WebKit engine is used for presenting data and content on a Web page, and a JavsScript engine can be used for processing JS codes; the second part is an intermediate framework implemented by native code, which APIs can access the resources of the device and interact with the application.
WebView: WebView is a very practical component in Android, and is based on a WebKit webpage rendering engine like chrome, and an interface of an application program can be conveniently displayed in a mode of loading HTML data. And the WebView can interact with the Android native code through the functional plug-in, so that local resource data can be acquired.
WebView is used as a carrier control for bearing a webpage, and can generate some time in the webpage display process and call back to an application program, so that the task which the application program wants to process (such as webpage display loading speed, webpage loading errors and other events) can be completed in the webpage loading process.
Hybrid application development framework: the hybrid application integrates Web technologies (HTML, CSS and JavaScript) and can call locally realized device resource functions of the Android system through a plug-in API, so that the hybrid application framework mainly provides functions of locally rendering a Web page and calling local Java by using the JavaScript in the general view. The hybrid application framework (e.g., PhoneGap) is composed of two major parts: the bridge module is mainly used for enabling the WebView to interact with the outside, and JavaScript codes in the WebView can call external Java function functions through interfaces; the plug-in module calls the Android system to call the equipment resources mainly through JavaAPI. PhoneGap has provided common functional plug-ins such as SMS, file, camera, address book, WiFi, two-dimensional code, etc., and developers can also create own plug-ins according to own business requirements.
Code injection attacks: in the hybrid application, the addJavascript interface provided by WebView can support the mutual calling of JS and local Java. Therefore, WebView can bind the local Java class, so that the JS code can directly acquire the resources of the equipment by calling the functional plug-in. An attacker can embed malicious codes in the resource channel data, so that the malicious codes can be triggered when the WebView displays contents, and code injection attacks can be caused.
The main reason for generating the Android mixed application code injection attack vulnerability is that WebView breaks through a sandbox protection mechanism in a traditional Web browser, so that WebView can access local resource files. In the hybrid application, the hybrid application supports the inter-call of JS and local Java through the addjavascript interface provided by WebView, and the call principle is analyzed in detail in the foregoing. Therefore, WebView can bind the local Java class, so that the JS code can directly acquire the resources of the equipment by calling the functional plug-in. Meanwhile, when WebView binds a Java object, all the Web pages in the middle of WebView obtain the operating right to access the resource, and no other requirements exist, so that the homologous protection strategy in the traditional browser is damaged. Data can be transmitted to the application in the modes of two-dimensional codes, WiFi, files, address books and the like, so that the mode of code injection attack becomes diversified. Data can be transmitted to the application in the modes of two-dimensional codes, WiFi, files, address books and the like, so that the mode of code injection attack becomes diversified.
Aiming at the existing problems, the research of the subject firstly analyzes the general architecture based on the HTML-5Android mixed application, analyzes the principle and common function interface of a WebView component in the mixed application, analyzes the main third-party plug-in framework PhoneGap in the Android mixed application, and then aims at the safety problems of privacy data leakage, account password stealing and the like caused by code injection attack which possibly exists when data communication is carried out between the third-party plug-in framework and the outside in the mixed application and Web content is rendered on a Web page.
In order to achieve the purpose, the invention provides an Android mixed application code injection vulnerability detection system, which aims at mixed application code injection vulnerability detection based on a PhoneGap development framework. The vulnerability detection system includes an authority feature extraction module 110, a data channel feature extraction module 120 and a vulnerability detection module 130. The output end of the authority feature extraction module 110 is connected to a first input end of the vulnerability detection module 130, and the output end of the data channel feature extraction module 120 is connected to a second input end of the vulnerability detection module 130.
The authority feature extraction module 110 is configured to extract a to-be-detected mixed application sensitive authority application set from the to-be-detected mixed application code; the data channel feature extraction module 120 is configured to extract a source point set and a receiving point set of a data channel from the hybrid application code to be detected; the vulnerability detection module 130 determines whether vulnerability code injection exists in the hybrid application to be detected by using a vulnerability detection model according to the sensitive permission application set output by the permission feature extraction module 110 and the source point set and the receiving point set of the data channel output by the data channel feature extraction module 120, wherein the vulnerability detection model is obtained by learning the characteristics of the non-vulnerability hybrid application code and the characteristics of the vulnerability hybrid application code, and the characteristics include the sensitive permission application set and the source point set and the receiving point set of the data channel.
According to the vulnerability detection system, the permission feature extraction module 110 extracts the sensitive permission application, the data channel feature extraction module 120 extracts the data channel, only part of features in the mixed application code to be detected are judged, the vulnerability detection model is obtained through training learning to judge whether the features are abnormal or not, vulnerability detection is further achieved, the function call graph of the application to be detected does not need to be determined, and complexity in vulnerability detection is reduced.
In the embodiment provided by the invention, the authority feature extraction module 110 comprises a first data preprocessing unit and a sensitive feature extraction unit, wherein the output end of the first data preprocessing unit is connected with the input end of the sensitive feature extraction unit, the first data preprocessing unit is used for extracting the configuration application code from the mixed application code to be detected, and the sensitive feature extraction unit is used for extracting the sensitive authority application set from the configuration application code according to a preset sensitive authority set, wherein the preset sensitive authority set is a corresponding relation between a plug-in API and an authority application used when accessing the mobile device resource according to a JavaScript code in WebView.
In the embodiment provided by the present invention, the data channel feature extraction module 120 includes a second data preprocessing unit, a channel head and end point extraction unit, and a data filtering unit, wherein an output end of the data filtering unit is connected to an input end of the second data preprocessing unit, an output end of the second data preprocessing unit is connected to an input end of the channel head and end point extraction unit, the data filtering unit is configured to remove plug-in library function codes, an API function of JQuery, obfuscated codes, and annotated codes from implementation application interfaces and service logic codes, the second data preprocessing unit is configured to decompose codes output by the data filtering unit to implementation application interfaces and service interfaces into a plurality of code fragments according to a call relationship, and the channel head and end point extraction unit is configured to extract a source point and a receiving point of a code fragment data channel from each code fragment.
In the above embodiment provided by the present invention, the source point of the data channel is a plug-in function in the code slice, and the receiving point of the data channel is an API that triggers the plug-in to transmit the script code in the data.
According to the trend of data flow, when the mixed application acquires external data, the WebView renders an HTML page and displays unsafe APIs (application programming interfaces) which may be used by Web content, so that malicious codes embedded in the data are triggered to cause code injection attacks, therefore, according to two source points and receiving points generated by code injection holes, a plug-in function in a data channel is extracted, an API of script codes in data is triggered to be transmitted by the plug-in, and sensitive permission applications of application applications are extracted to form feature vectors; and vulnerability detection is carried out by utilizing the characteristic vectors, so that vulnerability detection accuracy can be improved, and vulnerability detection efficiency is improved.
The invention provides a vulnerability detection method for Android mixed application code injection, which comprises the following steps as shown in FIG. 2:
step S110: extracting a sensitive permission application set of the hybrid application to be detected from the configuration application code of the hybrid application code to be detected; extracting a source point set and a receiving point set of a data channel from an implementation application interface and a service logic code of the hybrid application code to be detected;
step S120: judging whether vulnerability codes are injected into the hybrid application to be detected or not by utilizing a vulnerability detection model according to the sensitive permission application set and the source point set and the receiving point set of the data channel;
the vulnerability detection model is obtained by learning the characteristics of the hole-free mixed application code and the characteristics of the vulnerability mixed application code, wherein the characteristics comprise a sensitive authority application set and a source point set and a receiving point set of a data channel.
The step S110 of extracting the source point set and the receiving point set of the data channel includes the following steps:
step S111: removing a PhoneGap framework plug-in library function and an API function of JQuery from an implementation application interface and a business logic code of a hybrid application code to be detected; and remove comments and obfuscated code in the code. The PhoneGap framework plug-in library functions comprise JavaScript library functions, Java code library functions and PhoneGap framework library functions.
Step S112: carrying out code fragmentation processing on the application interface and service logic codes which are subjected to the elimination processing; the implementation splits all code with call relationships together.
Step S113: and extracting plug-in function characteristics from the fragment codes according to a preset plug-in set to serve as source points of the data channel, and extracting API characteristics capable of triggering the plug-in to transmit script codes in the data to serve as receiving points of the data channel.
The preset plug-in set is a PhoneGap plug-in which is required to be used for data transmission between the mobile device and the outside or other local applications.
In the step S110: determining a preset sensitive permission set according to the corresponding relation between a plug-in API and permission applications used when JavaScript codes in WebView access equipment resources, extracting all permission applications possibly used when external and internal channels are used for data transmission in a hybrid application to be detected, comparing the permission applications and the preset sensitive permission set of the hybrid application to be detected, using the preset sensitive permission set in the preset sensitive permission set existing in the permission applications of the hybrid application to be detected as the sensitive permission set of the hybrid application to be detected, and obtaining the sensitive permission application set of the hybrid application to be detected.
In the step S120, obtaining the training model includes the following steps:
the initial vulnerability detection model is obtained after training and learning are carried out on the non-hole-leakage mixed application code characteristics and the vulnerability mixed application code characteristics by using naive Bayes, SVM, decision trees, random forests and other machine learning algorithms, wherein the characteristics comprise a sensitive authority application set and a source point set and a receiving point set of a data channel.
And then classifying and predicting the residual features, performing feature combination and screening according to the feature data scatter diagram and the distribution diagram, optimizing a prediction model, and finally performing prediction result evaluation and analysis.
The Android mixed application code injection vulnerability detection method provided by the invention analyzes the WebView and development framework principle of the Android mixed application and provides a way for an attacker to mutually call and embed malicious codes by utilizing JavaScript and local Java, and the method is realized by the following steps: performing decompiling on the mixed application, extracting android files, HTML files and JavaScript files, and performing code fragmentation processing; extracting sensitive permission application in android, analyzing a PhoneGap framework plug-in API called by WebView access equipment resources, analyzing an unsafe API for triggering malicious codes when WebView performs content display, extracting vulnerability characteristics and generating a characteristic vector; and training the feature vectors by using a machine learning algorithm, constructing a classification model, and predicting and evaluating the test set according to the classification model. Compared with the traditional detection method based on the control flow and the program call graph, the method has higher efficiency, can abstract the characteristics of the code injection attack vulnerability and establish a classification prediction model, has high classification accuracy and has good usability.
The Android mixed application code injection vulnerability detection method provided by the embodiment of the invention comprises the following steps:
step S210: the Android mixed application code is obtained by the steps shown in FIG. 3:
step S211: android mixed applications based on HTML-5Web standard technology and Android mixed application data sets with code injection holes are downloaded in the GooglePlay application market.
Step S212: hybrid applications based on the PhongeGap hybrid application framework are extracted.
Step S213: and (3) decompiling the APK of the mixed application based on the PhongeGap mixed application framework by using a decompilation technology to obtain a resource file and a source code file of the application. The method specifically comprises the following steps: and (3) decompiling the apk by using the apktool, wherein the operation command is 'apktool d-f { apkPath } -o { outputPath }', and obtaining a smali file, a resource file, an HTML file, a JavaScript file and other files.
Step S214: and analyzing the APK file package structure, and extracting an android Manifest xml file for configuring the application and HTML and JavaScript files for realizing an application interface and business logic.
Step S220: preprocessing the Android mixed application code by adopting the steps shown in FIG. 4, wherein the preprocessing is divided into permission application set acquisition and code fragmentation processing in the mixed application to be detected;
the code slicing processing comprises the following steps:
step S221: separating HTML and JS codes realized by the APK according to the structure of the file, the reference of the HTML file to the frame plug-in and the reference of the Jquery library function;
using the regular expression "/\{ 1,2} [ \ S ]? \\/|/[ \ S ]? \ n' deletes comments and obfuscated codes in the file;
step S222: and associating the HTML file with the function call of the JS file quoted by the HTML file by using a deep traversal algorithm, then integrating the functions with the call relation, and cutting all codes with the call relation together to realize code fragmentation processing.
Step S223: and extracting information with a < uses-permission/> permission statement in the android Mainfest file, and acquiring a permission application set in the hybrid application to be detected.
Step S224: and merging the fragmentation codes output in the step S222 and the permission application set in the hybrid application to be detected output in the step S223, and outputting the merged codes as a target file.
Step S230: the steps shown in fig. 5 are adopted to extract features, including extracting a source point set and a receiving point set of a sensitive permission application set and a data channel:
step S231: according to the functional plug-in the PhoneGap framework, a plug-in function set containing Camera, Contacts, File, MediaCapture, SMS, Bluetooth, NFC and BarcodeServer is constructed and used as a preset plug-in function set, all plug-in functions of a fragment code are extracted, all plug-in functions of the fragment code are compared with the preset plug-in function set, if the plug-in functions of the fragment code belong to the preset plug-in function set, the plug-in functions of the fragment code are used as source points of a data channel of the fragment code, otherwise, the plug-in functions of the fragment code are discarded, and finally the source point characteristics of the data channel of the fragment code are constructed;
step S232: the JQuery library function can trigger malicious codes in data occasionally when WebView displays contents, construct function features of' document. write (), document. innerHTML, html (), ap (), pre (), before (), and raplaceAll (), extract the features in the codes, and construct a receiving point feature set of a fragment code data channel;
step S233: extracting a sensitive authority application set as follows:
determining a preset sensitive permission set according to a corresponding relation between a plug-in API used when a JavaScript code in WebView accesses equipment resources and a permission application, wherein the preset sensitive permission set in the embodiment is as follows:
CAMERA、ACCESS_FINE_LOCATION、RECORD_AUDIO、ACCESS_COARSE_LOCATION、READ_CONTACTS、BLUETOOTH、ACCESS_NETWORK_STATE、WRITE_EXTERNAL_STORAGE、RECORD_VIDEO、MODIFY_AUDIO_SETTINGS、GET_ACCOUNTS;
comparing all authority applications separated in the AndriodManifest with a preset sensitive authority set, if the authority applications in the authority set separated in the AndriodManifest exist in the preset sensitive authority set, setting the authority applications as sensitive authority applications of the hybrid application to be detected, otherwise, discarding the authority applications, and finally constructing the sensitive authority application set of the hybrid application to be detected.
Step S234: and constructing 0 and 1 feature vectors according to the extracted features, wherein each bit of the feature vectors respectively represents the use condition of API features for constructing a sensitive permission set, a plug-in function set and a script code in the trigger plug-in transmission data.
Step S240: adopting the steps shown in fig. 6 to obtain a vulnerability detection model, and performing vulnerability prediction:
step S241 is to train the feature vectors by using a plurality of machine learning algorithms such as naive Bayes, SVM, decision trees, random trees and random forest algorithms for the previously constructed feature vectors, and construct a vulnerability detection model.
Step S242: performing vulnerability prediction by using the characteristics of the non-hole-leaking mixed application codes and the characteristics of the vulnerability-containing mixed application codes for training, counting TP, TN, FP and FN values of each mixed application code, and calculating the accuracy, recall rate, precision rate and F value of vulnerability prediction results;
analyzing the vulnerability prediction result, analyzing a data scatter diagram and a distribution diagram of each mixed application code characteristic, screening and combining the characteristics, adjusting the parameters of the characteristics, and obtaining a plurality of optimized prediction models.
Step S243: and predicting the optimized models by using the characteristics of the test mixed application codes, comparing the prediction effects, and selecting the optimal optimized prediction model as the final vulnerability detection model.
Step S244: and judging whether the hybrid application to be detected has loophole code injection or not by utilizing a loophole detection model according to the sensitive authority application set and the source point set and the receiving point set of the data channel.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. The utility model provides a vulnerability detection system that Android mixed application code pours into which characterized in that includes:
the authority feature extraction module (110) is used for extracting a sensitive authority application set of the hybrid application to be detected from the configuration application code of the hybrid application code to be detected;
the data channel characteristic extraction module (120) is used for extracting a source point set and a receiving point set of the data channel from the application interface and the service logic code to be detected in the mixed application code; and
the vulnerability detection module (130) is connected with the output end of the authority feature extraction module (110) at the first input end and connected with the output end of the data channel feature extraction module (120) at the second input end, and is used for judging whether vulnerability codes are injected into the hybrid application to be detected or not by utilizing a vulnerability detection model according to the sensitive authority application set and the source point set and the receiving point set of the data channel;
the vulnerability detection model is obtained by learning the characteristics of the hole-free mixed application code and the characteristics of the vulnerability mixed application code, wherein the characteristics comprise a sensitive authority application set and a source point set and a receiving point set of a data channel;
the data channel feature extraction module (120) comprises a second data preprocessing unit and a channel head and tail point extraction unit, wherein the input end of the channel head and tail point extraction unit is connected with the output end of the second data preprocessing unit;
the second data preprocessing unit is used for decomposing a code for realizing an application interface and business logic into a plurality of code chips according to a calling relationship, the channel head and tail point extracting unit is used for extracting a plug-in function from each code chip as a source point of a code chip data channel, an API (application programming interface) for triggering script codes in data transmitted by the plug-in is used as a receiving point of the code chip data channel, and the source points and the receiving points of the code data channels of all the code chips are combined to be used as a source point set and a receiving point set of the data channel to be output;
the specific way of extracting the sensitive permission application set of the hybrid application to be detected from the configuration application code of the hybrid application code to be detected by the permission feature extraction module (110) comprises the following steps: determining a preset sensitive permission set according to the corresponding relation between a plug-in API and permission applications used when JavaScript codes in WebView access equipment resources, extracting all permission applications possibly used when external and internal channels are used for data transmission in a hybrid application to be detected, comparing the permission applications and the preset sensitive permission set of the hybrid application to be detected, presetting sensitive applications in the preset sensitive permission set existing in the permission applications of the hybrid application to be detected as the sensitive permission set of the hybrid application to be detected, and obtaining the sensitive permission application set of the hybrid application to be detected;
the vulnerability detection model obtaining mode comprises the following steps:
training feature vectors formed by a sensitive authority application set and a source point set and a receiving point set of a data channel by using a plurality of machine learning algorithms of naive Bayes, SVM, a decision tree, a random tree and a random forest algorithm to construct a vulnerability detection model;
performing vulnerability prediction by using the characteristics of the non-hole-leaking mixed application codes and the characteristics of the vulnerability-containing mixed application codes for training, counting TP, TN, FP and FN values of each mixed application code, and calculating the accuracy, recall rate, precision rate and F value of vulnerability prediction results;
analyzing a vulnerability prediction result, analyzing a data scatter diagram and a distribution diagram of each mixed application code characteristic, screening and combining the characteristics, adjusting the parameters of the characteristics, and obtaining a plurality of optimized prediction models;
and predicting the optimized models by using the characteristics of the test mixed application codes, comparing the prediction effects, and selecting the optimal optimized prediction model as the final vulnerability detection model.
2. The vulnerability detection system of claim 1, wherein the permission feature extraction module (110) comprises a first data preprocessing unit and a sensitive permission feature extraction unit with an input connected to an output of the first data preprocessing unit;
the first data preprocessing unit is used for extracting configuration application codes from the mixed application codes to be detected, and the sensitive authority feature extracting unit is used for extracting a sensitive authority application set from the configuration application codes of the mixed application codes to be detected according to a preset sensitive authority set.
3. The vulnerability detection system of claim 1 or 2, characterized in that the data channel feature extraction module (120) further comprises a data filtering unit with an output connected to an input of the second data preprocessing unit for removing plug-in library function codes, JQuery API function code annotation codes and obfuscated codes from the implementation application interface and business logic codes.
4. A vulnerability detection method for Android mixed application code injection is characterized by comprising the following steps:
step S110: extracting a sensitive authority application set of the hybrid application to be detected from the configuration application code of the hybrid application code to be detected; extracting a source point set and a receiving point set of a data channel from an implementation application interface and a service logic code of the hybrid application code to be detected;
step S120: judging whether vulnerability codes are injected into the hybrid application to be detected or not by utilizing a vulnerability detection model according to the sensitive authority application set and the source point set and the receiving point set of the data channel;
the vulnerability detection model is obtained by learning the characteristics of the hole-free mixed application code and the characteristics of the vulnerability mixed application code, wherein the characteristics comprise a sensitive authority application set and a source point set and a receiving point set of a data channel;
step S110 obtains the source point and the sink point of the data channel by:
step S111: code fragmentation processing is carried out on the application interface and the service logic codes according to the mutual calling relation, and fragmentation codes are output;
step S112: extracting plug-in function characteristics from the fragment codes according to a preset plug-in set to serve as source points of a fragment code data channel, extracting API (application programming interface) characteristics of script codes in trigger plug-in transmission data to serve as receiving points of the fragment code data channel;
step S113: combining the source points and the receiving points of all the fragment code data channels to be used as a source point set and a receiving point set of the data channels for output;
the preset plug-in set is a plug-in needed by data transmission between the mobile equipment and the outside or other local applications;
in step S110, the specific manner of extracting the sensitive permission application set of the hybrid application to be detected from the configuration application code of the hybrid application code to be detected includes: determining a preset sensitive permission set according to the corresponding relation between a plug-in API and permission applications used when JavaScript codes in WebView access equipment resources, extracting all permission applications possibly used when external and internal channels are used for data transmission in a hybrid application to be detected, comparing the permission applications and the preset sensitive permission set of the hybrid application to be detected, presetting sensitive applications in the preset sensitive permission set existing in the permission applications of the hybrid application to be detected as the sensitive permission set of the hybrid application to be detected, and obtaining the sensitive permission application set of the hybrid application to be detected;
the vulnerability detection model obtaining mode comprises the following steps:
training feature vectors formed by a sensitive authority application set and a source point set and a receiving point set of a data channel by using a plurality of machine learning algorithms of naive Bayes, SVM, a decision tree, a random tree and a random forest algorithm to construct a vulnerability detection model;
performing vulnerability prediction by using the characteristics of the non-hole-leaking mixed application codes and the characteristics of the vulnerability-containing mixed application codes for training, counting TP, TN, FP and FN values of each mixed application code, and calculating the accuracy, recall rate, precision rate and F value of vulnerability prediction results;
analyzing a vulnerability prediction result, analyzing a data scatter diagram and a distribution diagram of each mixed application code characteristic, screening and combining the characteristics, adjusting the parameters of the characteristics, and obtaining a plurality of optimized prediction models;
and predicting the optimized models by using the characteristics of the test mixed application codes, comparing the prediction effects, and selecting the optimal optimized prediction model as the final vulnerability detection model.
5. The vulnerability detection method of claim 4, wherein in step S110, a sensitive permission application set is extracted from the configuration application code according to a preset sensitive permission set; the preset sensitive permission set is obtained according to the corresponding relation between a plug-in API and permission application used when a JavaScript code in WebView accesses the resources of the mobile equipment.
6. The vulnerability detection method of claim 4 or 5, further comprising, before step S111, the steps of:
and removing plug-in library function codes, API function code annotation codes of JQuery and confusion codes from the implementation application interface and the business logic codes of the hybrid application codes to be detected.
CN201810473411.0A 2018-05-17 2018-05-17 Vulnerability detection system and method for Android mixed application code injection Active CN108647517B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810473411.0A CN108647517B (en) 2018-05-17 2018-05-17 Vulnerability detection system and method for Android mixed application code injection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810473411.0A CN108647517B (en) 2018-05-17 2018-05-17 Vulnerability detection system and method for Android mixed application code injection

Publications (2)

Publication Number Publication Date
CN108647517A CN108647517A (en) 2018-10-12
CN108647517B true CN108647517B (en) 2021-02-09

Family

ID=63756509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810473411.0A Active CN108647517B (en) 2018-05-17 2018-05-17 Vulnerability detection system and method for Android mixed application code injection

Country Status (1)

Country Link
CN (1) CN108647517B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109542509A (en) * 2018-11-13 2019-03-29 北京梆梆安全科技有限公司 A kind of risk checking method and device of resource file
CN111177729B (en) * 2019-12-17 2023-03-10 腾讯云计算(北京)有限责任公司 Program bug test method and related device
CN113326539B (en) * 2021-06-23 2022-05-17 支付宝(杭州)信息技术有限公司 Method, device and system for private data leakage detection aiming at applet
CN114780952A (en) * 2022-03-09 2022-07-22 浙江吉利控股集团有限公司 Method, system and storage medium for detecting sensitive application calling scene

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122667A (en) * 2017-03-08 2017-09-01 中国科学院信息工程研究所 One kind application leak detection method and system
CN107832610A (en) * 2017-09-25 2018-03-23 暨南大学 Android malware detection method based on assemblage characteristic pattern

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999420B (en) * 2011-09-13 2016-02-03 阿里巴巴集团控股有限公司 Based on cross site scripting leak method of testing and the system of DOM
CN104537309A (en) * 2015-01-23 2015-04-22 北京奇虎科技有限公司 Application program bug detection method, application program bug detection device and server
CN107180192B (en) * 2017-05-09 2020-05-29 北京理工大学 Android malicious application detection method and system based on multi-feature fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122667A (en) * 2017-03-08 2017-09-01 中国科学院信息工程研究所 One kind application leak detection method and system
CN107832610A (en) * 2017-09-25 2018-03-23 暨南大学 Android malware detection method based on assemblage characteristic pattern

Also Published As

Publication number Publication date
CN108647517A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN108647517B (en) Vulnerability detection system and method for Android mixed application code injection
US10089464B2 (en) De-obfuscating scripted language for network intrusion detection using a regular expression signature
Pan et al. Dark hazard: Large-scale discovery of unknown hidden sensitive operations in Android apps
Barmpatsalou et al. A critical review of 7 years of Mobile Device Forensics
CN108133139B (en) Android malicious application detection system based on multi-operation environment behavior comparison
Spreitzenbarth et al. Mobile-Sandbox: combining static and dynamic analysis with machine-learning techniques
KR101373986B1 (en) Method and apparatus to vet an executable program using a model
CN112685737A (en) APP detection method, device, equipment and storage medium
KR20150044490A (en) A detecting device for android malignant application and a detecting method therefor
CN103617395A (en) Method, device and system for intercepting advertisement programs based on cloud security
US11431751B2 (en) Live forensic browsing of URLs
Faruki et al. Droidanalyst: Synergic app framework for static and dynamic app analysis
Xiao et al. Detection and prevention of code injection attacks on HTML5-based apps
Hale et al. A testbed and process for analyzing attack vectors and vulnerabilities in hybrid mobile apps connected to restful web services
Alfalqi et al. Android platform malware analysis
Bastys et al. Tracking Information Flow via Delayed Output: Addressing Privacy in IoT and Emailing Apps
Salem et al. Repackman: A tool for automatic repackaging of android apps
KR101557455B1 (en) Application Code Analysis Apparatus and Method For Code Analysis Using The Same
CN114398673A (en) Application compliance detection method and device, storage medium and electronic equipment
Stirparo et al. In-memory credentials robbery on android phones
Zhang et al. Android malware detection combined with static and dynamic analysis
Kaushik et al. Malware detection techniques in android
Spreitzenbarth Dissecting the Droid: Forensic analysis of android and its malicious applications
Mao et al. Automatic permission inference for hybrid mobile apps
Bokolo et al. Hybrid analysis based cross inspection framework for android malware detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant