CN110287722B

CN110287722B - Sensitive permission extraction method for privacy regulation check in iOS application

Info

Publication number: CN110287722B
Application number: CN201910408770.2A
Authority: CN
Inventors: 徐国爱; 张淼; 贺雪乔; 刘诗楠
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-05-13
Filing date: 2019-05-13
Publication date: 2020-11-24
Anticipated expiration: 2039-05-13
Also published as: CN110287722A

Abstract

The invention belongs to the technical field of information security, and relates to a sensitive permission extraction method for privacy regulation check in iOS application, which comprises the following steps: downloading an application to a mobile phone end, performing shell breaking on the shell added application, and transmitting the processed ipa packet to a pc end; decompressing the ipa packet, reading the authority information applied by the application, and positioning an executable file; static parsing of the executable with ida; establishing a sensitive API and a sensitive permission mapping table by text similarity detection; collecting flow information by combining an automatic click frame with manual input; identifying A, B, C types of sensitive information; outputting the use condition of the application to the sensitive information; and obtaining a mapping table of the sensitive information and the sensitive permission. The method combines static detection and flow detection, overcomes the defect that static analysis cannot distinguish whether the permission is called by a third party, and can obtain the sensitive API called by the application so as to obtain the sensitive permission; and constructing a mapping table of the flow sensitive information and the sensitive permission.

Description

Sensitive permission extraction method for privacy regulation check in iOS application

Technical Field

The invention belongs to the technical field of software security in information security, relates to a sensitive permission extraction method for privacy regulation check in iOS application, and particularly relates to a method for extracting iOS application sensitive permission during application calling and in the process of consistency check on the sensitive permission and the privacy regulation.

Background

There are currently some tools for consistency checking of sensitive rights and privacy regulations of applications. Wherein the methods employed on the text processing of the privacy regulations and the final consistency check of the privacy regulations with the sensitive rights are substantially identical.

In the aspect of privacy regulation analysis, a text analysis method is adopted, not only privacy regulations provided by an application are considered, but also the description of the application is crawled, and the sensitive authority which the application claims to acquire and sensitive behaviors to be generated by the application are comprehensively analyzed. On the aspect of text analysis, five sentence patterns are screened and processed, negative meaning words are considered, and key verbs and negative meaning thereof in privacy regulations are analyzed, for example: verbs, Collect, Use, Retain, and Disclose.

In the consistency check, the consistency is judged by comparing the established sensitive authority with the established comparison table of the behavior description in the privacy regulation, and performing text similarity calculation on the contents which are not in the comparison table through ESA (Explicit Semantic Analysis) to judge the consistency.

There are differences between tools in the extraction of sensitive rights or sensitive APIs to an application.

The following outlines the differences between these common software (i.e., tools):

the PPChecker is a tool for detecting privacy regulations and application behavior consistency on an android platform, and what is adopted in analyzing application behaviors and extracting application sensitive authorities is that: for the method of applying dynamic and static combination.

The PPChecker performs static analysis on the application, and extracts the sensitive authority used by the application by analyzing the android Manifest file; for the dex file, the sensitive API used in extracting the code is analyzed. Because malicious software can obtain the android hidden API by using a JAVA reflection mechanism and obtain the API and the permission which can be called in the system, the application is dynamically tested by the Xpos framework and the Droidbot to obtain the application hidden API, and whether the application or a third-party library calls the API is judged. And combining the static result and the dynamic result, and extracting the sensitive permission obtained by the application from the sensitive API and the sensitive permission corresponding table.

Recon is a cross-platform traffic analysis-based tool for detecting sensitive information leakage, and mainly detects system privacy, namely the leakage of A-type privacy and B-type privacy to third parties. The method comprises the steps that the Recon manual test obtains flow packages of iOS, android and windows phone, through manual marking of data in the flow packages, keywords are sorted out, a large number of flow packages obtained dynamically through android are added, machine learning is conducted through a decision tree model, a model for identifying sensitive data is established, the model is continuously improved through the feedback of a user, and the effect of detecting sensitive data leakage is achieved.

And a plurality of feasible analysis methods are provided for extracting the application sensitivity right at home and abroad, and can be used for reference by people to further research the application sensitivity behavior.

In the existing consistency analysis, the sensitive behavior is mainly analyzed as follows: static and dynamic analysis and flow analysis.

One, static analysis and dynamic analysis

Static analysis generally refers to program static analysis, and dynamic analysis generally refers to program dynamic debugging techniques, which are briefly described below.

Program static analysis refers to: and a code analysis technique for scanning the program code by the techniques of lexical analysis, syntactic analysis, control flow analysis, data flow analysis, and the like, without running the code.

For android application, static analysis is to perform decompiling on an apk package to obtain a smali code, an android manifest. And then analyzing the code to obtain the information of the sensitive API called in the code.

And performing decompiling on an ipa package of an application aiming at static detection of the iOS application program to obtain an applied binary file, an info, plist and other resource files, wherein the binary file refers to a Mach-O file and a Fat file, and the Fat file is essentially a set of the Mach-O file, so the static detection mainly extracts related information from the Mach-O file, and detects a sensitive method name to obtain information of a sensitive API.

The program dynamic debugging technology generally analyzes a program by observing the state of the program in the running process, such as register content, function execution results, memory use conditions and the like, analyzing function functions, specifying code logic and the like.

The Android application usually uses an Xposed framework and a Droidbot to dynamically test the application, wherein the Droidbot is an automatic test tool based on an application UI interface, triggers the behavior of the application during running, finds a hidden sensitive API called by the application and judges whether the API is called by the application itself or a third party.

Dynamic detection of the iOS application often triggers application behavior by script traversing clickable elements in the UI interface or by random click, and then detects the triggered API and screens to obtain the sensitive API.

And finally, correspondingly finding the sensitive authority used by the application according to the sensitive API result obtained by dynamic and static combination and the comparison table of the sensitive API and the sensitive authority.

Flow analysis

The flow analysis is to detect the sensitive information transmitted by the application by analyzing the flow generated in the application using process so as to analyze the sensitive behavior of the application from the side.

The flow analysis tool firstly marks sensitive data in a manually captured data packet, characteristic keywords are obtained by processing a text, then large-scale flow data are run out in an android platform through an automatic testing tool, the flow is marked by the marked characteristic words, then the flow is trained by a decision tree model in machine learning, and a classifier for identifying the sensitive information is obtained. The method for obtaining the sensitive data from the traffic can obtain the sensitive authority possibly used by the application from the side.

The method combining static analysis and dynamic analysis can comprehensively analyze the sensitive API called by the application, particularly for an open source system such as android; however, on the iOS platform, the sensitive API that is called by the application itself or by the third-party library cannot be obtained by only performing dynamic and static analysis on the code, so that it cannot be further determined whether the sensitive authority of the application serves the application itself or is shared with the third party, and it cannot be thoroughly checked for consistency with the sensitive regulations.

The analysis of the flow makes up the defect that the static and dynamic analysis combined method cannot analyze whether the sensitive API is called by a third party, but only has the following defects in the flow analysis:

firstly, the sensitive API called by the application cannot be obtained, so that the sensitive authority used by the application cannot be further known;

secondly, when identifying the sensitive information of the flow, the existing method adopting machine learning can only identify the sensitive information generated by the system, and cannot identify the sensitive information input by the user;

thirdly, because each node of the application cannot be tested in detail by manual testing in a short time, the sensitive permission which the application may use cannot be completely obtained by flow analysis alone.

The problem of permission abuse exists in many applications of the current mobile platform, and many applications often apply for unnecessary sensitive permission or do not apply for corresponding permission, but call an interface of the sensitive permission when running.

For application developers, many developers use third-party libraries which are not known by themselves, but it is unclear which sensitive authorities the third-party libraries call, and detailed sensitive regulations cannot be written for users to refer to.

Only the static analysis of the codes cannot comprehensively analyze the sensitive authority of the application call, and meanwhile, whether the authority is acquired by a third party cannot be judged, the sensitive data transmitted by the application is acquired by combining the analysis of the flow, and then the corresponding relation between the sensitive data transmission and the sensitive authority is established; the black box detection and the white box detection are combined, so that the sensitivity authority acquired by application is comprehensively obtained, and a foundation is laid for final consistency detection.

Disclosure of Invention

Aiming at the defects in the prior art, the main objective of the invention is to develop a sensitive permission extraction method for privacy regulation check in the iOS application, which can analyze the sensitive permission used by the application according to the flow data packet of the input application and the ipa packet of the application and provide a precondition for the consistency detection of the sensitive permission and the privacy regulation of the application.

In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:

a sensitive permission extraction method for privacy regulation check in iOS application mainly comprises the following steps:

1) downloading an application to be detected to a mobile phone end which has crossed the prison, using frida to break the shell of the application, transmitting the processed ipa packet to a pc end, and configuring a path of the ipa packet;

2) reading authority information applied in an info.plist file after decompressing the ipa packet, and positioning an executable file;

3) performing static analysis on the executable file obtained in the step 2) by using ida, traversing each method in the rule file, and directly using a cross reference function to see where the function is referred to for the static function; for the OC method, searching a method name character string and a selector, then searching which addresses refer to the character string or the selector, calling the method, and recording the found key function (sensitive API);

4) establishing a mapping table of sensitive APIs and sensitive authorities through text similarity detection, adopting an ESA text similarity detection method for sensitive APIs which cannot find corresponding relations on the mapping table, and when the similarity exceeds a given threshold, considering that the sensitive authorities corresponding to the sensitive APIs are found;

5) starting flow test, configuring a network agent, starting to use an application to be detected, automatically clicking a frame (a plurality of open source clicking frames are arranged on the network and can be selected according to requirements) through semi-automatic operation, and collecting flow information of the application as input by combining manual input of personal information;

6) constructing an A, B-class privacy classifier, distinguishing A, B-class sensitive information leaked in flow by using a A, B-class privacy classifier, identifying whether the A, B-class sensitive information is acquired by a third-party library or acquired by an application, and detecting whether the C-class sensitive information is shared with the third-party library or not by using a text matching method for C-class privacy;

7) outputting the use condition of the application to each type of sensitive information;

8) and obtaining a mapping table between the sensitive information and the sensitive authority in the flow data.

On the basis of the technical scheme, in the step 5), capturing all ip flow packets by using a fiddler, then filtering out a required http packet, and obtaining decrypted data by the http packet through an SSL Kill Switch 2 plug-in installed in the mobile phone; SSL Kill Switch 2 is a technique that overrides and disables the default certificate verification of the system and any type of custom certificate verification by patching certain low level SSL functions in the Secure Transport API.

And screening the traffic packets sent to the third-party library as output through comparing and judging the fields 'domain', 'HOST' and 'USER-AGENT' in the ip traffic packet.

On the basis of the technical scheme, the sensitive information comprises an address list, a geographical position, a mailbox, a home address and the like.

On the basis of the above technical solution, the C-type sensitive information in step 6) refers to: special data input by a user;

because the special data input by the user is aimed at, all the permissions required by the application are met by adopting the simplest text matching method;

by controlling special data input by a user and then matching specific character strings in the flow data, the use condition of the application to the C-type sensitive information is obtained;

the C-type sensitive information comprises: personal basic information, bank card transaction information, and the like.

On the basis of the technical scheme, the specific steps of constructing the A, B-class privacy classifier in the step 6) are as follows:

firstly, finding out characteristic words corresponding to each type of sensitive information through processing a text, preprocessing and word segmentation are carried out on the text by adopting NLTK or Standard NLP, because of the particularity of a flow packet, one type of words need to be filtered out (namely, the words are removed), because the sensitive information is leaked more than once, the words with too low occurrence frequency are filtered out (namely, the words with too low occurrence frequency are removed) through setting a threshold value, and a batch of characteristic words of the A and B type sensitive information are determined by combining manual selection;

the generic word includes: "content-length" and "en-us";

marking data of the flow packet, training a classifier by means of an open-source data analysis platform KNIME, selecting 9/10 training classifiers from the data by using a 10-time cross validation method during training, and checking the accuracy of the classifier by using the rest 1/10 data; and (4) labeling each section of flow stream by 0, 1 according to the feature words obtained in the last step, and inputting the labeled flow stream into a trainer for training to obtain A, B types of privacy classifiers.

On the basis of the technical scheme, the words are segmented by comparing a regexp module and a word _ token function carried by the NLTK with a standby-posttagger module carried by the standby NLP, wherein the word segmentation effect of the standby-posttagger is optimal.

On the basis of the technical scheme, the KNIME supports methods such as decision trees, Bayesian clustering, rule derivation and neural networks, and sensitive information is often structured data (such as device _ id: xxxx).

On the basis of the technical scheme, the decision tree supported by KNIME trains a classifier by using a J48 algorithm in a C4.5 decision tree.

On the basis of the technical scheme, the mapping table between the sensitive information and the sensitive permission in the step 8) is obtained by combining a large amount of experimental data with an apple official development document, is configured through an XML table, and is extensible.

On the basis of the above technical solution, the mapping table between the sensitive information and the sensitive permission in step 8) includes: and the READ _ PHONE _ STATE _ DEVICE corresponds to the information such as the equipment number, the system version, the equipment model and the like detected in the traffic.

The invention has the following beneficial technical effects:

(1) extracting application sensitive permissions by combining static analysis and flow analysis

In the conventional research method, a pure white box test is utilized, a sensitive API (application program interface) is extracted aiming at a code, and the sensitive authority used by the application is found, so that the authority used by a user is not known; or analyzing the application behavior only from the data output by the application by analyzing the flow and utilizing the pure black box operation, and because the short-term test is incomplete, each function of the application cannot be traversed and operated, and the sensitive authority acquired by the application cannot be comprehensively obtained.

The invention combines static detection and flow detection, makes up the defect that static analysis cannot distinguish whether the permission is called by a third party, and can obtain the sensitive API called by the application as completely as possible so as to obtain the sensitive permission.

(2) Mapping table for constructing flow sensitive information and sensitive permission

The previous research only judges whether sensitive information is leaked in the flow, but does not summarize the corresponding relation between the sensitive information in the flow and the application sensitive authority, and summarizes a corresponding table of the sensitive information and the sensitive authority based on the sensitive information in the flow.

Drawings

The invention has the following drawings:

FIG. 1 is a schematic flow diagram of the process of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1, the method for extracting sensitive permission for privacy regulation check in iOS application according to the present invention mainly includes the following steps:

the generic word includes: "content-length" and "en-us";

The key points of the technology to be protected by the invention are as follows:

extensible traffic sensitive information and a mapping table of application sensitive permissions.

The corresponding table of the sensitive information in the flow and the sensitive authority of the application is summarized through continuous experiments and can be configured through an XML table, so that the detected flow sensitive information and the application sensitive authority mapping table supported by the invention are extensible.

And (4) carrying out classification analysis on sensitive information of A, B and C types in the flow.

Most of analysis on application privacy disclosure is to singly detect the revealed A and B sensitive information or singly detect the revealed C sensitive information.

The foregoing is considered to be merely preferred embodiments of this invention, rather than all embodiments thereof. All equivalent structural or process changes made by using the contents of the specification and the drawings of the invention, or directly or indirectly applied to other related technical fields, are included in the scope of the patent protection of the invention.

Those not described in detail in this specification are within the knowledge of those skilled in the art.

Claims

1. A sensitive permission extraction method for privacy regulation check in iOS application is characterized by comprising the following steps:

3) performing static analysis on the executable file obtained in the step 2) by using ida, traversing each method in the rule file, and directly using a cross reference function to see where the function is referred to for the static function; for the OC method, searching a method name character string and a selector, then searching which addresses refer to the character string or the selector, calling the search method, and recording the found key function;

4) establishing a mapping table of sensitive APIs and sensitive authorities through text similarity detection, adopting an ESA text similarity detection method for sensitive APIs which cannot find corresponding relations on the mapping table, and finding the sensitive authorities corresponding to the sensitive APIs when the similarity exceeds a given threshold;

5) starting flow test, configuring a network agent, starting to use an application to be detected, automatically clicking a frame through semi-automatic operation in combination with manual input of personal information, and collecting flow information of the application as input;

8) obtaining a mapping table between sensitive information and sensitive authority in flow data;

the specific steps of constructing the A, B-class privacy classifier in the step 6) are as follows:

firstly, finding out characteristic words corresponding to each type of sensitive information through processing a text, preprocessing and word segmentation are carried out on the text by adopting NLTK or Standard NLP, one type of words need to be filtered out due to the particularity of a flow packet, and words with too low occurrence frequency are screened out through setting a threshold value due to the fact that sensitive information is leaked more than once, and a batch of characteristic words of the type A and type B sensitive information are determined by combining manual selection;

the generic word includes: "content-length" and "en-us";

2. The method for sensitive permission extraction for privacy regulation checking in iOS applications of claim 1, characterized by: in the step 5), capturing all ip flow packets by using a fiddler, filtering out a required http packet, and obtaining decrypted data by the http packet through an SSL Kill Switch 2 plug-in installed in the mobile phone;

3. The method for sensitive permission extraction for privacy regulation checking in iOS applications of claim 1, characterized by: the sensitive information comprises an address list, a geographical position, a mailbox and a home address.

4. The method for sensitive permission extraction for privacy regulation checking in iOS applications of claim 1, characterized by: the C-type sensitive information in the step 6) refers to: special data input by a user;

the C-type sensitive information comprises: personal basic information and bank card transaction information.

5. The method for sensitive permission extraction for privacy regulation checking in iOS applications of claim 1, characterized by: the KNIME supports decision trees, Bayesian clustering, rule derivation and neural network methods.

6. The method for sensitive permission extraction for privacy regulation checking in iOS applications of claim 5, characterized by: the KNIME supported decision tree trains the classifier using the J48 algorithm in the C4.5 decision tree.

7. The method for sensitive permission extraction for privacy regulation checking in iOS applications of claim 1, characterized by: the mapping table between the sensitive information and the sensitive authority in the step 8) is obtained by combining experimental data with an apple official development document and is configured through an XML table.

8. The method of sensitive rights extraction for privacy regulation checking in iOS applications of claim 1 or 7, characterized by: the mapping table between the sensitive information and the sensitive authority in the step 8) comprises: and the READ _ PHONE _ STATE _ DEVICE corresponding to the equipment number, the system version and the equipment model information detected in the traffic.