CN106802866B - method for restoring execution path of Android program - Google Patents

method for restoring execution path of Android program Download PDF

Info

Publication number
CN106802866B
CN106802866B CN201710062753.9A CN201710062753A CN106802866B CN 106802866 B CN106802866 B CN 106802866B CN 201710062753 A CN201710062753 A CN 201710062753A CN 106802866 B CN106802866 B CN 106802866B
Authority
CN
China
Prior art keywords
current
code
edge
android
code block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710062753.9A
Other languages
Chinese (zh)
Other versions
CN106802866A (en
Inventor
董玮
卜佳俊
陈纯
陈共龙
赵志为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201710062753.9A priority Critical patent/CN106802866B/en
Publication of CN106802866A publication Critical patent/CN106802866A/en
Application granted granted Critical
Publication of CN106802866B publication Critical patent/CN106802866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

the invention discloses a method for restoring an execution path of an Android program, which comprises the following steps: converting the Android program package into an Android virtual machine bytecode file; analyzing Android component information from an Android virtual machine bytecode file to obtain an Android lifecycle control flow; analyzing user self-defined function information from the Android lifecycle control flow to obtain user self-defined function control flow information; generating the content of code block instrumentation according to a coding algorithm in combination with an Android lifecycle control flow and a user-defined function control flow; inserting piles into the byte code file of the Android virtual machine; packaging the instrumented Android virtual machine bytecode file into a new Android program package; a user installs a new Android package; recording a control flow log of the Android program used by a user; analyzing an execution path coding array of the Android program according to the recorded Android program control flow log; restoring an execution path of the Android program by using a decoding algorithm according to the execution path coding array of the Android program; and generating an edge outlet probability model file according to the restored execution path of the Android program.

Description

method for restoring execution path of Android program
Technical Field
The invention relates to a method for restoring an execution path of an Android program, and belongs to the field of program analysis and test.
Background
Android is an Open source mobile device operating system based on Linux, which is mainly used for smart phones and tablet computers, and is established by Google as an Open Handset Alliance (OHA) to continuously lead and develop. Android is popular among software developers due to its openness, and numerous novel and unique Android applications come from numerous sources. In order to ensure that the Android can generate continuous attraction for the user, developers need to discover and solve the performance problem of the Android in time. In the process of distributing the Android of the test version to the user for use, recording detailed program stream log data can influence the use of the user and hinder the application test process; the method reduces the recording content of the log, loses necessary analysis data, and reduces the efficiency of problem diagnosis, so how to encode the applied program flow log by formulating proper rules is one of important support technologies for Android testing.
The software test mainly comprises 4 steps: 1) code to select an appropriate encoding rule to generate a log record from the applied program stream; 2) inserting a designated logging code into an application; 3) distributing the application after the pile insertion to users for use; 4) collecting and analyzing program flow logs generated by the application, restoring scenes with performance problems, and finding and solving the performance problems. In step 3), the application executes the encoding rule formulated in step 1) in the process of use by the user, generating an encoding result of the program stream. And 4) restoring a path executed by the application program according to the encoding result of the program stream and the structure of the program stream, thereby finding the root of the problem. The method of formulating program stream codes for an application is also referred to as path restoration.
the existing path reduction methods are mainly B.L. (Ball and Larus) and pap (profiling All path). For n loop-free paths, the coding range of B.L. is an integer interval [0, n-1], but when processing a loop path containing multiple execution times, the algorithm splits the loop path into a plurality of loop-free sub-paths, and the codes of the loop path are formed by the codes of the plurality of loop-free sub-paths, so that huge storage overhead is generated; the PAP distinguishes different incoming edges of the same code block by means of multiplication and addition, and utilizes different incoming edges to distinguish different paths, so as to restore a loop-free path and a loop-containing path containing multiple execution times. In addition, both of the above two schemes adopt a fixed mode coding scheme, and the coding scheme is not adjusted according to the probability executed by different paths, so that the path with high execution probability uses longer coding, and the path with low execution probability uses shorter coding.
disclosure of Invention
The invention aims to provide a method for restoring an execution path of an Android program, so as to reduce the log storage overhead, reduce the influence on the program function and improve the program testing efficiency.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the method for restoring the execution path of the Android program comprises the following steps:
(1) Analyzing and instrumentation the Android program, comprising:
1) Converting the Android program package into an Android virtual machine bytecode file;
2) Obtaining a bytecode file of an Android program component from the bytecode file of the Android virtual machine;
3) Partitioning the bytecode file of the Android program component according to the Android lifecycle function to generate a lifecycle control flow file of the Android program;
4) obtaining a user-defined function from a lifecycle control flow file of an Android program, and partitioning the user-defined function according to the byte code syntax of the Android virtual machine to generate a control flow file of the user-defined function;
5) obtaining an initial code block of the Android program, an end code block of the Android program, all multi-edge-out starting point code blocks and all multi-edge-out end point code blocks from a lifecycle control flow file of the Android program and a control flow file of a user-defined function, and obtaining the execution probability of each edge out of all the multi-edge-out starting point code blocks from a current edge-out probability model file;
6) if the current edge-out probability model file is the initial edge-out probability model file, setting the probabilities of all the edges of all the multi-edge-out starting point code blocks in the current edge-out probability model file to be equal, and then executing the step 7), otherwise, directly executing the step 7);
7) acquiring the instrumentation content of all multi-edge starting code blocks, all multi-edge end code blocks, the initial code block of the Android program and the termination code block of the Android program by using a coding algorithm;
8) performing instrumentation on the Android virtual machine bytecode file according to instrumentation contents of all the multi-edge starting point code blocks, all the multi-edge ending point code blocks, the Android program initial code block and the Android program ending code block;
9) Packaging the instrumented Android virtual machine byte code file into an Android program package;
(2) installing an Android package by a user;
(3) Restoring an execution path of an Android program according to a program control flow log file generated by a user by using the Android program, comprising the following steps of:
(i) obtaining an execution path coding array of the Android program from the program control flow log file;
(ii) according to the execution path coding array of the Android program, the lifecycle control flow file of the Android program and the control flow file of the user-defined function, restoring the execution path of the Android program by using a decoding algorithm;
(iii) and counting the edge-out execution frequency of all the multi-edge-out starting point code blocks in the execution path according to the restored execution path of the Android program, thereby calculating the execution probability of each edge-out and updating the current edge-out probability model file.
Further, step 7) of the present invention is executed as follows:
After the last code in the initial code block of the Android program, generating instrumentation content, wherein the instrumentation content comprises a first code and a second code in sequence, the first code is used for outputting a signal indicating that the current Android program starts to execute, and the second code is used for initializing a current coding interval;
After the last sentence of code in all the multi-edge starting point code blocks in the Android program, generating the same instrumentation content respectively, wherein each instrumentation content contains a third code, and each third code is used for configuring the multi-edge starting point code block where the third code is located as the starting point of all the edges of the multi-edge starting point code block;
generating the same instrumentation content respectively before the first sentence code in all the multi-edge-out end point code blocks in the Android program, wherein the instrumentation content sequentially comprises the following codes according to the sequence:
a) a fourth code, configured to configure the multi-outgoing-edge end point code block where the fourth code is located as an end point of all incoming edges of the multi-outgoing-edge end point code block;
b) the fifth code is used for configuring an incoming edge, which passes when the Android program of the multi-outgoing-edge end point code block where the fifth code is located is executed, as the current edge of the multi-outgoing-edge end point code block;
c) sixth code, configured to retrieve, from a current outgoing probability model file, an execution probability of a current edge of a multi-outgoing-edge end point code block where the sixth code is located;
d) A seventh code, configured to update the current coding interval to a coding sub-interval corresponding to a current edge of a multi-edge-out end point code block where the seventh code is located, where the coding sub-interval is retrieved from the current edge-out probability model file;
e) eighth code for recording a minimum value in a binary form within a current coding section and enlarging the current coding section stepwise in proportion to satisfy the following formula (3) when the decimal place of the current coding section is greater than the capacity of the register:
in formula (1), beg _ bin represents the decimal place number of beg after the current coding interval [ beg, end) is expanded, end _ bin represents the decimal place number of end after the current coding interval [ beg, end) is expanded, L represents the capacity of a register, max (x) represents the maximum decimal place number of x, and sub (x) represents all decimal places numbers of beg _ bin or end _ bin satisfying the condition x;
f) a ninth code, configured to increase an execution probability that a current edge of the multiple edge-out end point code block where the ninth code is located is in the current edge-out probability model file;
g) A tenth code, configured to reduce, in the current outgoing probability model file, execution probabilities of all non-current edges of a start point code block at a current edge of the multiple outgoing end point code block where the tenth code is located;
generating instrumentation content before the last sentence of code block in the termination code block of the Android program, wherein the instrumentation content contains an eleventh code, and the eleventh code is used for recording the minimum value in the binary form in the current coding interval.
further, in step (ii) of the present invention, the "restoring the execution path of the Android program using the decoding algorithm" includes the following steps:
Firstly, initializing the execution path restoration of the Android program, which comprises the following contents:
initializing a current coding interval;
taking out a first element from an execution path coding array of the Android program as a current execution path coding value;
obtaining an initial function executed when the Android program is started from a lifecycle control flow file of the Android program, and taking the initial function as a current function;
Extracting an initial code block in the current function from a lifecycle control flow file of the Android program and a control flow file of a user-defined function, and taking the initial code block as the current code block;
Searching whether the current code block contains a code of a calling function, if so, recording the file name of the code of the calling function and the number of lines of the code in the file, finding the file of the called function, adding the recorded file name of the code of the calling function and the number of lines of the file into all termination code blocks of the called function, taking the called function as the current function, and returning to the execution step; otherwise, executing the step IV;
judging whether the category of the current code block belongs to the code block with only one outgoing edge, the starting code block with more outgoing edges and the ending code block of the current function, and executing the step (v):
if the current code block is judged to be the code block with only one outgoing edge and the judgment conclusion is obtained for the first time, recording the outgoing edge of the current code block in the array space with variable length;
if the current code block is judged to be the code block with only one outgoing edge and the judgment conclusion is not obtained firstly, recording the outgoing edge of the current code block after the outgoing edge recorded last time in the variable length array space, taking the code block directly pointed by the outgoing edge as the current code block, and returning to the execution step (c);
if the current code block is judged to be a multi-edge starting point code block, the following steps are executed:
i) selecting an outgoing edge corresponding to the coding subinterval where the current execution path coding value is located from the current outgoing edge probability model file;
ii) updating the current coding interval to the coding subinterval where the current execution path coding value is located;
iii) if the decimal number of the current coding interval is larger than the capacity of the register, taking out the next path coding value from the execution path coding array as the current execution path coding value, and gradually expanding the current coding interval according to the proportion of the step e) until the current coding interval meets the formula (1);
iv) returning to the step (iii) by taking the code block directly pointed by the outgoing edge selected in the step i) as the current code block;
if the current code block is judged to be the termination code block of the current function, further judging whether the current code block records the file name of the code of the calling function and the number of lines of the code in the file, if so, finding the code of the calling function according to the record, taking the code of the calling function as the current code block, and returning to the execution step IV; if not, step (iii) is performed.
further, the "ratio" in the "scale-up of the current coding interval" according to the present invention satisfies the following formula (4):
in the formula (4), scale represents the expansion ratio of the current coding interval; bin (x) represents the decimal place where the value x is obtained; the beg _ bin represents the decimal number of beg after the current coding interval is expanded; end _ bin represents the decimal number of end after the current coding interval is expanded; l represents the capacity of the register; means that after the coding regions are enlarged more than once according to the scale, the coding regions can be enlargedandand at the same time.
Further, in step 7) of the present invention, the encoding algorithm is an entropy encoding-based encoding algorithm.
Further, the coding algorithm of the present invention is an arithmetic coding algorithm or a huffman coding algorithm.
further, in the step (ii) of the present invention, the decoding algorithm is an entropy coding-based decoding algorithm.
further, the decoding algorithm of the present invention is an arithmetic coding decoding algorithm or a huffman coding decoding algorithm.
Compared with the prior art, the invention has the beneficial effects that: (1) according to the method and the device, semi-automatic Android application program testing is realized by means of instrumentation and reduction of the execution path to the Android application program, and compared with a method of manually testing the program by a developer, the method and the device reduce the space of the program path to be tested, thereby reducing the log storage overhead, reducing the influence on the program function and improving the efficiency of the program testing. (2) When the execution path is restored, the invention can speculate the breakpoint through the condition that the decimal digit of the coding interval is greater than the maximum capacity of the register, and removes the overhead of recording the breakpoint by using the PAP (profiling All Path) algorithm. (3) According to the method, the probability of each outgoing edge of the multi-outgoing-edge starting point code block is adjusted according to the execution probability of different paths of the Android program, so that the paths with high execution probability use codes which occupy less storage space, and compared with the B.L. (Ball and Larus) and the PAP algorithm which use fixed codes for all execution paths, the expected storage overhead of logs is reduced. (4) According to the method, the coded data are only recorded when the decimal digit of the coding interval is larger than the maximum capacity of the register and the Android program is ended, compared with the B.L algorithm which records the coded data when each cycle is ended and the Android program is ended, particularly when the Android program with a large number of cycles is processed, the access times of the register are reduced, the influence on the functions of the Android program is reduced, and the efficiency of program testing is improved.
drawings
Fig. 1 is a flowchart of a restoring method for an execution path of an Android program according to the present invention.
Detailed Description
the invention is described in detail below by taking a Nexus 5 smart phone carrying Android 4.4 as a test platform and taking a test of more than one million times of open source items Android Wifi Tether downloaded on Google Play as an example, and the method specifically comprises the following steps:
(1) The Android Wifi Tether analysis and pile insertion method comprises the following steps:
(1.1) Android Wifi Tether is converted to an Android virtual machine (i.e., Dalvik) bytecode file using Apktool tool.
And (1.2) obtaining the bytecode file of the Android program component from the bytecode file of the Android virtual machine (the Android program component comprises Activity, Service, Broadcast Receiver and Content Provider).
And (1.3) according to the control flow relation among Android lifecycle functions (such as 'onCreate ()', 'onStop ()') and the like), partitioning the bytecode file of the Android program component to generate an Android lifecycle control flow file. For example, after the Android program is started, the Android lifecycle functions "onCreate ()" and "onStart ()" are run in sequence, and then an edge directly pointing to "onStart ()" from "onCreate ()" exists between the two Android lifecycle functions.
and (1.4) obtaining a user self-defined function from a lifecycle control flow file of the Android program, and blocking the user self-defined function according to the byte code grammar (such as ": cond _ 0", ": goto _ 0", and the like) of the Android virtual machine to generate the control flow file of the user self-defined function. Information (such as the number of outgoing edges, the number of incoming edges, addresses of pluggable stub codes and the like) related to the program control flow is recorded in each code block, and each code block is connected according to keywords of a first sentence code and a last sentence code in each code block. For example, the last sentence of code block C1 is "goto: goto _0 ", code block C2 first sentence code is": goto _0 ", then code block C1 and code block C2 are connected.
And (1.5) obtaining an initial code block, an end code block, all multi-edge start code blocks and all multi-edge end code blocks of the Android program from a lifecycle control flow file of the Android program and a control flow file of a user-defined function. The starting code block of the Android program refers to the first code block of the first function executed when the Android program is started, the ending code block of the Android program refers to the last code block of the last function executed when the Android program is ended, the starting code block with more than two edges is referred to, and the ending code block with more edges is referred to as the code block to which the starting code block with more edges directly points.
(1.6) obtaining the execution probability of each outgoing edge of all the outgoing edge starting point code blocks from the current outgoing edge probability model under the/data/directory of the Android Wifi Tether, if the current outgoing edge probability model file is the initial outgoing edge probability model file, setting the probability of each outgoing edge of all the outgoing edge starting point code blocks in the current outgoing edge probability model file to be equal, and then executing the step (1.7), otherwise, directly executing the step (1.7).
(1.7) after the last code in the Android program initial code block of the 'OnCreate ()' function of the main Activity, generating instrumentation content, wherein the instrumentation content sequentially comprises a first code and a second code, the first code is used for outputting a signal (namely, 'Android Wifi TetherStart') indicating that the current Android Wifi Tether starts to execute, and the second code is used for initializing a current coding interval [ beg, end) to [0, 1 ].
after the last sentence of code in all the multi-edge starting code blocks in the Android Wifi Tether, the same instrumentation content is generated, and each instrumentation content third code is used for configuring the multi-edge starting code block outedge _ start _ id where the third code is located with the starting points of all the edges of the multi-edge starting code block.
Generating instrumentation contents before a first sentence of code in all multi-edge terminal code blocks in the Android Wifi Tether, wherein the instrumentation contents sequentially comprise the following fourth to tenth codes:
a) And a fourth code, configured to configure the multi-edge-out end point code block outedge _ end _ id where the fourth code is located as an end point of all the edges of the multi-edge-out end point code block.
b) and a fifth code, configured to configure an incoming edge, which passes when the Android Wifi tee of the multiple outgoing edge end point code block where the fifth code is located is executed, as a current edge (i.e., edge _ current) of the multiple outgoing edge end point code block, where the current edge _ current is uniquely represented by a cartesian product < edge _ start _ id, and an edge _ end _ id > of the multiple outgoing edge start point code block and the multiple outgoing edge end point code block.
c) Sixth code for retrieving an execution probability of outegge _ current from the current edge-out probability model file.
d) a seventh code, configured to update the current coding interval [ beg, end) to a coding subinterval [ beg _ oudge, end _ oudge) corresponding to the oudge _ current retrieved from the current edge-out probability model file, where, in the edge-out probability model file, the condition (5) is satisfied for all edge-outs that use the same multi-edge-out start point code block as a start point:
wherein, the outer edgeirepresenting the ith outgoing edge, Pr (outedge) in the set outedge with all the same multi-outgoing-edge starting code blocks as the starting pointsi) Representing an outldgeiCorresponding execution probability, CDF (outedge)i) Representing an outldgeiAnd define the CDF (outlead)-1) Is 0, the coding subinterval corresponding to the ouwedge _ current is calculated according to the method described in formula (6):
Wherein CDF (outedge)i) I.e. the cumulative distribution function corresponding to the ouedge _ current.
e) eighth code for, when the decimal number of the beg or end is greater than the capacity of the register, recording a minimum value in a binary form within the current coding interval range, and gradually increasing the values of the beg and end by an increase scale until a condition shown in formula (8) is satisfied, where the increase scale should satisfy the condition (7):
wherein, beg and end represent the current coding interval [ beg, end), Bin (x) represents the decimal place number of obtaining the value x, beg _ bin and end _ bin represent the decimal place number of beg and end after expanding respectively, L represents the capacity of the register, max (x) represents the decimal place number of x maximum, sub (x) represents all the decimal places of beg _ bin or end _ bin satisfying the condition x,Means that after the coding regions are enlarged more than once according to the scale, the coding regions can be enlargedandand at the same time.
f) ninth code for increasing an execution probability of the outedge _ current in the current edge-out probability model file.
g) Tenth code for reducing the execution probability of all non-edge-current edges of the edge _ start _ id in the current edge-out probability model file, wherein the method for increasing and reducing the execution probability is according to formula (9):
Wherein, Pr (outer)i) Representing an outldgeiCorresponding execution probability, inc represents the increased execution probability, count (outemp) represents the number of the set outemp, outempiindicates the current edge output _ current, outputjRepresenting all non-current outgoing edges.
According to the method, the execution probability of the outegge _ current edge is increased, the execution probability of the non-outegge _ current edge is reduced, and a path with high execution probability can be represented by using less coding cost when the Android program executes coding, so that the expected path coding cost is reduced.
generating instrumentation content before the last sentence of code block in termination code blocks of functions of 'OnStop ()' and 'OnDestroy ()' of all components in the Android Wifi Tether, wherein the instrumentation content contains an eleventh code, and the eleventh code is used for recording the minimum value in the binary form in the current coding interval.
and (1.8) according to the analyzed instrumentation content of all the code blocks, performing instrumentation on all the Dalvik byte code files.
(1.9) packing the instrumented Dalvik byte code file into an Android program package Android Wifi Tether Beta by using an Apktool tool.
(2) And (4) loading the Android Wifi Tether Beta generated by packaging into a Nexus 5 mobile phone, and using the user participating in the test for a week according to own habits.
(3) reading the collected program control flow log at a computer end, and restoring an execution path of the Android Wifi Tether Beta, wherein the method comprises the following steps:
and (3.1) obtaining an execution Path coding array Encoded _ Path of the Android Wifi Tether Beta from the program control flow log file.
(3.2) according to the Encoded _ Path of the Android Wifi Tether Beta, the lifecycle control flow file of the Android program and the user-defined function control flow file, restoring the execution Path of the Android program by using a decoding algorithm of arithmetic coding, and specifically, according to the following steps:
(3.2.1) performing initialization operation of executing path reduction of the Android Wifi Tether Beta, wherein the initialization operation comprises the following contents:
The current coding interval [ beg, end) is initialized to [0, 1).
And taking out a first element from an execution Path coding array Encoded _ Path of the Android Wifi Tether Beta as a current execution Path coding value. (wherein, the first element in the execution path coding array refers to the minimum value in binary form in the current coding interval recorded when the decimal digit of the current coding interval is greater than the capacity of the register for the first time);
an 'OnCreate ()' function of main Activity executed when the Android program is started is obtained from a lifecycle control flow file of the Android Wifi Tether Beta, and the initial function is used as a current function.
and (3.2.2) obtaining an initial code block in the current function according to the lifecycle control flow file of the Android Wifi Tether Beta and the control flow file of the user-defined function, and taking the initial code block as the current code block.
(3.2.3) searching whether the current code block contains a code of a calling function (namely, whether a smal keyword 'invoke-') or not, if so, recording the file name ClassPath _ Activity of the code of the calling function and the line number LineNum of the code in the file, finding the file of the called function, adding the ClassPath _ Activity and LineNum of the recorded code of the calling function into all termination code blocks of the called function, taking the called function as the current function, and returning to the execution step (3.2.2); otherwise, executing the step (3.2.4);
(3.2.4) determining whether the class of the current code block belongs to a code block having only one outgoing edge, a code block having a starting point of multiple outgoing edges, or a code block for terminating the current function, and performing step (3.2.5):
(3.2.5) if the current code block is judged to be the code block with only one outgoing edge and the judgment result is obtained for the first time, recording the outgoing edge of the current code block in the array space with variable length;
if the current code block is judged to be the code block with only one outgoing edge and the judgment conclusion is not obtained firstly, recording the outgoing edge of the current code block after the outgoing edge recorded last time in the variable length array space, taking the code block directly pointed by the outgoing edge as the current code block, and returning to the execution step (3.2.3);
if the current code block is judged to be a multi-edge starting point code block, the following steps are executed:
(3.2.5.1) selecting an edge-out edge _ current corresponding to the coding subinterval [ beg, end) where the current execution path coding value is located from the current edge-out probability model file;
(3.2.5.2) let outedgeiFor the edge-out _ current, updating the current coding interval to the coding subinterval where the current execution path coding value is located according to the method described in the formula (6);
(3.2.5.3) if the decimal digit of the current coding interval is greater than the capacity of the register, taking out the next Path coding value from the Encoded _ Path in the execution Path coding array as the current execution Path coding value, and gradually expanding the current coding interval according to the expansion ratio scale in the step e) until the current coding interval meets the condition in the formula (8), by means of the mode, the invention can trigger the operation of taking out the Path coding value when the decimal digit of the beg or end is greater than the capacity of the register each time, thereby removing the expense of recording the breakpoint indicating the operation of taking out the Path coding value;
(3.5.2.4) taking the code block pointed by the outgoing edge selected in the step (3.2.5.1) as the current code block, and returning to execute the step (3.2.3);
If the current code block is judged to be the termination code block of the current function, further judging whether the current code block records the file name ClassPath _ Activity of the code of the calling function and the line number LineNum of the code of the calling function in the file, if so, finding the code of the calling function according to the record, taking the code of the calling function as the current code block, and returning to the execution step (3.2.4); if not, step (3.3) is performed.
And (3.3) counting the edge-out execution frequency of all the edge-out starting point code blocks in the path according to the restored execution path of the Android Wifi Tether Beta, thereby calculating the execution probability of each edge-out and updating the current edge-out probability model file.
in the present invention, the encoding algorithm is preferably based on an entropy-encoded encoding algorithm, and the decoding algorithm is preferably based on an entropy-encoded decoding algorithm. Besides the arithmetic coding algorithm in the above embodiments, the coding algorithm may also use huffman coding algorithm, pap (profiling All path) coding algorithm, B.L (Ball and Larus) coding algorithm, etc.; decoding algorithm in addition to the decoding algorithm of arithmetic coding in the above-described embodiment, a decoding algorithm of huffman coding, a decoding algorithm of PAP, a decoding algorithm of B.L, and the like can be used.

Claims (12)

1. A method for restoring an execution path of an Android program is characterized by comprising the following steps:
(1) analyzing and instrumentation the Android program, comprising:
1) Converting the Android program package into an Android virtual machine bytecode file;
2) obtaining a bytecode file of an Android program component from the bytecode file of the Android virtual machine;
3) Partitioning the bytecode file of the Android program component according to the Android lifecycle function to generate a lifecycle control flow file of the Android program;
4) obtaining a user-defined function from a lifecycle control flow file of an Android program, and partitioning the user-defined function according to the byte code syntax of the Android virtual machine to generate a control flow file of the user-defined function;
5) obtaining an initial code block of the Android program, an end code block of the Android program, all multi-edge-out starting point code blocks and all multi-edge-out end point code blocks from a lifecycle control flow file of the Android program and a control flow file of a user-defined function, and obtaining the execution probability of each edge out of all the multi-edge-out starting point code blocks from a current edge-out probability model file;
6) If the current edge-out probability model file is the initial edge-out probability model file, setting the probabilities of all the edges of all the multi-edge-out starting point code blocks in the current edge-out probability model file to be equal, and then executing the step 7), otherwise, directly executing the step 7);
7) Acquiring the instrumentation content of all multi-edge starting code blocks, all multi-edge end code blocks, the initial code block of the Android program and the termination code block of the Android program by using a coding algorithm according to the following method:
After the last code in the initial code block of the Android program, generating instrumentation content, wherein the instrumentation content comprises a first code and a second code in sequence, the first code is used for outputting a signal indicating that the current Android program starts to execute, and the second code is used for initializing a current coding interval;
After the last sentence of code in all the multi-edge starting point code blocks in the Android program, generating the same instrumentation content respectively, wherein each instrumentation content contains a third code, and each third code is used for configuring the multi-edge starting point code block where the third code is located as the starting point of all the edges of the multi-edge starting point code block;
generating the same instrumentation content respectively before the first sentence code in all the multi-edge-out end point code blocks in the Android program, wherein the instrumentation content sequentially comprises the following codes according to the sequence:
a) A fourth code, configured to configure the multi-outgoing-edge end point code block where the fourth code is located as an end point of all incoming edges of the multi-outgoing-edge end point code block;
b) the fifth code is used for configuring an incoming edge, which passes when the Android program of the multi-outgoing-edge end point code block where the fifth code is located is executed, as the current edge of the multi-outgoing-edge end point code block;
c) sixth code, configured to retrieve, from a current outgoing probability model file, an execution probability of a current edge of a multi-outgoing-edge end point code block where the sixth code is located;
d) a seventh code, configured to update the current coding interval to a coding sub-interval corresponding to a current edge of a multi-edge-out end point code block where the seventh code is located, where the coding sub-interval is retrieved from the current edge-out probability model file;
e) eighth code for recording a minimum value in a binary form within a current coding section and enlarging the current coding section stepwise in proportion until it satisfies the following formula (1) when the decimal number of the current coding section is greater than the capacity of the register:
in formula (1), beg _ bin represents the decimal place number of beg after the current coding interval [ beg, end) is expanded, end _ bin represents the decimal place number of end after the current coding interval [ beg, end) is expanded, L represents the capacity of a register, max (x) represents the maximum decimal place number of x, and sub (x) represents all decimal places numbers of beg _ bin or end _ bin satisfying the condition x;
f) a ninth code, configured to increase an execution probability that a current edge of the multiple edge-out end point code block where the ninth code is located is in the current edge-out probability model file;
g) A tenth code, configured to reduce, in the current outgoing probability model file, execution probabilities of all non-current edges of a start point code block at a current edge of the multiple outgoing end point code block where the tenth code is located;
generating instrumentation content before the last sentence of code block in the termination code block of the Android program, wherein the instrumentation content contains an eleventh code, and the eleventh code is used for recording the minimum value in the binary form in the current coding interval;
8) performing instrumentation on the Android virtual machine bytecode file according to instrumentation contents of all the multi-edge starting point code blocks, all the multi-edge ending point code blocks, the Android program initial code block and the Android program ending code block;
9) Packaging the instrumented Android virtual machine byte code file into an Android program package;
(2) installing an Android package by a user;
(3) restoring an execution path of an Android program according to a program control flow log file generated by a user by using the Android program, comprising the following steps of:
(i) Obtaining an execution path coding array of the Android program from the program control flow log file;
(ii) according to the execution path coding array of the Android program, the lifecycle control flow file of the Android program and the control flow file of the user-defined function, restoring the execution path of the Android program by using a decoding algorithm;
(iii) And counting the edge-out execution frequency of all the multi-edge-out starting point code blocks in the execution path according to the restored execution path of the Android program, thereby calculating the execution probability of each edge-out and updating the current edge-out probability model file.
2. The method for restoring the execution path of the Android program according to claim 1, characterized in that: in the step (ii), the "restoring the execution path of the Android program by using the decoding algorithm" includes the following steps:
Firstly, initializing the execution path restoration of the Android program, which comprises the following contents:
initializing a current coding interval;
taking out a first element from an execution path coding array of the Android program as a current execution path coding value;
obtaining an initial function executed when the Android program is started from a lifecycle control flow file of the Android program, and taking the initial function as a current function;
extracting an initial code block in the current function from a lifecycle control flow file of the Android program and a control flow file of a user-defined function, and taking the initial code block as the current code block;
searching whether the current code block contains a code of a calling function, if so, recording the file name of the code of the calling function and the number of lines of the code in the file, finding the file of the called function, adding the recorded file name of the code of the calling function and the number of lines of the file into all termination code blocks of the called function, taking the called function as the current function, and returning to the execution step; otherwise, executing the step IV;
judging whether the category of the current code block belongs to the code block with only one outgoing edge, the starting code block with more outgoing edges and the ending code block of the current function, and executing the step (v):
if the current code block is judged to be the code block with only one outgoing edge and the judgment conclusion is obtained for the first time, recording the outgoing edge of the current code block in the array space with variable length;
if the current code block is judged to be the code block with only one outgoing edge and the judgment conclusion is not obtained firstly, recording the outgoing edge of the current code block after the outgoing edge recorded last time in the variable length array space, taking the code block directly pointed by the outgoing edge as the current code block, and returning to the execution step (c);
if the current code block is judged to be a multi-edge starting point code block, the following steps are executed:
i) selecting an outgoing edge corresponding to the coding subinterval where the current execution path coding value is located from the current outgoing edge probability model file;
ii) updating the current coding interval to the coding subinterval where the current execution path coding value is located;
iii) if the decimal number of the current coding interval is larger than the capacity of the register, taking out the next path coding value from the execution path coding array as the current execution path coding value, and gradually expanding the current coding interval according to the proportion of the step e) until the current coding interval meets the formula (1);
iv) returning to the step (iii) by taking the code block directly pointed by the outgoing edge selected in the step i) as the current code block;
If the current code block is judged to be the termination code block of the current function, further judging whether the current code block records the file name of the code of the calling function and the number of lines of the code in the file, if so, finding the code of the calling function according to the record, taking the code of the calling function as the current code block, and returning to the execution step IV; if not, step (iii) is performed.
3. The restoring method for the execution path of the Android program according to claim 1 or 2, characterized in that: the "ratio" in the "scale-up current coding section" satisfies the following formula (2):
In the formula (2), scale represents the expansion ratio of the current coding interval; bin (x) represents the decimal place where the value x is obtained; the beg _ bin represents the decimal number of beg after the current coding interval is expanded; end _ bin represents the decimal number of end after the current coding interval is expanded; l represents the capacity of the register; Means that after the coding regions are enlarged more than once according to the scale, the coding regions can be enlargedandand at the same time.
4. The restoring method for the execution path of the Android program according to claim 1 or 2, characterized in that: in the step 7), the encoding algorithm is an entropy encoding-based encoding algorithm.
5. the restoring method for the execution path of the Android program according to claim 3, characterized in that: in the step 7), the encoding algorithm is an entropy encoding-based encoding algorithm.
6. the restoring method for the execution path of the Android program according to claim 4, characterized in that: the coding algorithm is arithmetic coding or Huffman coding.
7. the restoring method for the execution path of the Android program according to claim 5, characterized in that: the coding algorithm is arithmetic coding or Huffman coding.
8. the recovery method for the execution path of the Android program according to claim 1, 2, 5, 6 or 7, characterized in that: in the step (ii), the decoding algorithm is an entropy coding-based decoding algorithm.
9. The restoring method for the execution path of the Android program according to claim 3, characterized in that: in the step (ii), the decoding algorithm is an entropy coding-based decoding algorithm.
10. The restoring method for the execution path of the Android program according to claim 4, characterized in that: in the step (ii), the decoding algorithm is an entropy coding-based decoding algorithm.
11. the method for restoring the execution path of the Android program according to claim 8, characterized in that: the decoding algorithm is an arithmetic coding decoding algorithm or a Huffman coding decoding algorithm.
12. the method for restoring the execution path of the Android program according to claim 9 or 10, wherein the method comprises: the decoding algorithm is an arithmetic coding decoding algorithm or a Huffman coding decoding algorithm.
CN201710062753.9A 2017-01-23 2017-01-23 method for restoring execution path of Android program Active CN106802866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710062753.9A CN106802866B (en) 2017-01-23 2017-01-23 method for restoring execution path of Android program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710062753.9A CN106802866B (en) 2017-01-23 2017-01-23 method for restoring execution path of Android program

Publications (2)

Publication Number Publication Date
CN106802866A CN106802866A (en) 2017-06-06
CN106802866B true CN106802866B (en) 2019-12-10

Family

ID=58988466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710062753.9A Active CN106802866B (en) 2017-01-23 2017-01-23 method for restoring execution path of Android program

Country Status (1)

Country Link
CN (1) CN106802866B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228461A (en) * 2018-01-08 2018-06-29 平安科技(深圳)有限公司 A kind of daily record generation method, device, computer equipment and storage medium
CN109408063B (en) * 2018-11-13 2022-11-04 北京奇虎科技有限公司 Instruction pile inserting method and device based on virtual machine
CN112685316A (en) * 2021-01-04 2021-04-20 广州品唯软件有限公司 Code execution path acquisition method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183655A (en) * 2015-09-25 2015-12-23 南京大学 Android application program data race detection based on predictability analysis
CN105426298A (en) * 2014-08-25 2016-03-23 腾讯科技(深圳)有限公司 ADB (Android debug bridge)-based software test method and system
CN105677569A (en) * 2016-01-11 2016-06-15 南京理工大学 Automatic Android testing tool based on event processor and testing method
CN105701403A (en) * 2014-11-25 2016-06-22 卓望数码技术(深圳)有限公司 Password processing path identification method of Android application, and device thereby

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10460112B2 (en) * 2014-02-07 2019-10-29 Northwestern University System and method for privacy leakage detection and prevention system without operating system modification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426298A (en) * 2014-08-25 2016-03-23 腾讯科技(深圳)有限公司 ADB (Android debug bridge)-based software test method and system
CN105701403A (en) * 2014-11-25 2016-06-22 卓望数码技术(深圳)有限公司 Password processing path identification method of Android application, and device thereby
CN105183655A (en) * 2015-09-25 2015-12-23 南京大学 Android application program data race detection based on predictability analysis
CN105677569A (en) * 2016-01-11 2016-06-15 南京理工大学 Automatic Android testing tool based on event processor and testing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"恶意代码分析技术综述";陈共龙;《无线互联科技》;20151015;全文 *

Also Published As

Publication number Publication date
CN106802866A (en) 2017-06-06

Similar Documents

Publication Publication Date Title
CN108256164B (en) Boolean logic in a state machine lattice
CN107239392B (en) Test method, test device, test terminal and storage medium
US8990792B2 (en) Method for constructing dynamic call graph of application
CN106802866B (en) method for restoring execution path of Android program
CN110941424B (en) Compiling parameter optimization method and device and electronic equipment
CN108459964B (en) Test case selection method, device, equipment and computer readable storage medium
CN111611152B (en) Test case generation method and device, electronic equipment and readable storage medium
US20110154299A1 (en) Apparatus and method for executing instrumentation code
CN108628898B (en) Method, device and equipment for data storage
CN105446975B (en) A kind of file packing method and device
CN112154420A (en) Automatic intelligent cloud service testing tool
CN111367873A (en) Log data storage method and device, terminal and computer storage medium
CN112732321A (en) Firmware modification method and device, computer readable storage medium and equipment
CN114840427A (en) Code testing and test case generating method and device
CN108829872B (en) Method, device, system and storage medium for rapidly processing lossless compressed file
CN113419971A (en) Android system service vulnerability detection method and related device
CN117971598A (en) Log storage method and device, electronic equipment and storage medium
US9858170B2 (en) Function-calling-information collection method and computer-readable recording medium
CN112286706A (en) Remote and rapid acquisition method for application information of android application and related equipment
CN107451050B (en) Function acquisition method and device and server
CN108073709B (en) Data recording operation method, device, equipment and storage medium
CN109840182A (en) A kind of resource monitoring method, device and electronic equipment
WO2008026957A1 (en) A split stage call sequence restoration method
CN112783500B (en) Method and device for generating compiling optimization information and electronic equipment
CN112579618B (en) Feature library upgrading method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant