CN113407442B - Pattern-based Python code memory leak detection method - Google Patents

Pattern-based Python code memory leak detection method Download PDF

Info

Publication number
CN113407442B
CN113407442B CN202110586274.3A CN202110586274A CN113407442B CN 113407442 B CN113407442 B CN 113407442B CN 202110586274 A CN202110586274 A CN 202110586274A CN 113407442 B CN113407442 B CN 113407442B
Authority
CN
China
Prior art keywords
type
child node
mode
belongs
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110586274.3A
Other languages
Chinese (zh)
Other versions
CN113407442A (en
Inventor
陈洁
姜涛
俞东进
胡海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110586274.3A priority Critical patent/CN113407442B/en
Publication of CN113407442A publication Critical patent/CN113407442A/en
Application granted granted Critical
Publication of CN113407442B publication Critical patent/CN113407442B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/366Software debugging using diagnostics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3644Software debugging by instrumenting at runtime

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a mode-based Python code memory leak detection method. The method acquires the type information of the Python code by type inference, and performs memory leak detection by combining a self-defined mode to obtain the circular reference causing the memory leak. The detection method has the characteristics of high precision, high speed and the like, can effectively detect the memory leakage existing in the code before the code runs, and timely notifies relevant developers to adopt corresponding solutions. The method is different from the characteristic that the prior detection method only analyzes the memory use condition when the code runs, and the mode-based Python code memory leak detection method is suitable for detection of a coding stage in the software development process and is beneficial to finding the defect code as early as possible.

Description

Pattern-based Python code memory leak detection method
Technical Field
The invention relates to the technical field of software, in particular to a mode-based Python code memory leak detection method.
Background
The memory leak is a common error in software engineering, that is, after a program dynamically applies for a memory, the memory is not released before the program finishes using the memory, so that the memory resource is occupied for a long time. Memory leaks are usually free of any obvious symptoms early in the project. The memory leak occurs in a continuous process, and as the memory leak accumulates in the system, the number of leaking objects increases, which leads to a continuous decrease in memory resources in the system. When the memory resources in the system are exhausted, the applications in the system may be suspended temporarily, waiting for the memory to be reallocated. If the memory reallocation process takes a long time, it will cause a system crash.
In recent years, dynamic memory management mechanisms such as garbage collection mechanisms are adopted in programming languages to help programmers prevent leakage defects. However, the garbage collection mechanism handles memory leaks by analyzing the runtime, which requires the program to be detected, monitored, or tested. Meanwhile, enough test cases are needed to ensure that codes causing memory leakage can be triggered in the test process. On the other hand, researchers also use some static analysis methods to detect memory leaks. Static analysis methods static analysis treats memory leak detection as a reachability analysis problem, allowing memory leaks to be detected without actually running the program. However, for Python, there is no explicit memory allocation and release statement for the code, so the cost of statically determining the accessibility of objects can be very expensive.
Disclosure of Invention
In order to overcome the defects of the prior art, a mode-based Python code memory leak detection method is provided. The invention uses the code mode to solve the problem of detecting the memory leakage, and can effectively solve the problem. The technical scheme adopted by the invention is as follows:
a mode-based Python code memory leak detection method comprises the following steps:
s1, inputting the source code of the project, traversing all code files in the project, and loading each Python code file
Figure GDA0003452615200000011
Obtaining each Python code file by using ast module in Python standard library
Figure GDA0003452615200000012
Corresponding abstract syntax tree
Figure GDA0003452615200000013
S2, obtaining the abstract syntax tree based on the step S1 by using the abstract interpreter technology
Figure GDA0003452615200000021
Performing type inference to obtain type tree
Figure GDA0003452615200000022
Wherein the type tree
Figure GDA0003452615200000023
The nodes in the graph represent abstract types, and the relationship among the nodes represents the dependency relationship;
s3, traversing the type tree obtained in the step S2
Figure GDA0003452615200000024
Each instance type in
Figure GDA00034526152000000211
Get each instance type
Figure GDA00034526152000000212
In all the child nodes in the type tree, checking whether each child node b has memory leakage by using a predefined memory leakage mode, if one of the memory leakage modes is satisfied, recording cyclic references causing the memory leakage, wherein each cyclic reference is recorded as a node sequence [ v ] v0,v1,...vn]Wherein each node v in the sequence of nodesiFor the nodes in the type tree, reference relations exist between adjacent nodes, and v is satisfied0=vn
Preferably, the specific steps of deriving the type tree through type inference in step S2 are as follows:
s21, packaging a plurality of abstract types according to the type defined by Python;
s22, for each Python code file
Figure GDA0003452615200000025
Corresponding abstract syntax tree
Figure GDA0003452615200000026
Type tree derivation using abstract interpreter for type inference
Figure GDA0003452615200000027
Type tree
Figure GDA0003452615200000028
Each node in (1) represents an abstract type and an abstract syntax tree
Figure GDA0003452615200000029
Adding the corresponding module type into the type tree T as a root node, wherein the relationship among the nodes represents the dependency relationship, namely the definition of the child node is in a father node;
s23 traversing the type tree in sequence
Figure GDA00034526152000000210
Obtaining all function types by the node of each function type in the system; for each function type, judging whether the function type has a call in type inference, namely whether the function type called by at least one call type is the function type; if the function type is the new function type, the judgment of the next function type is skipped, if not, the function type is regarded as the new function type and a call type is created, the unknown type is used as an incoming parameter, and the abstract interpreter is used for calling the newly created call type at one time.
Preferably, the abstract types packaged in step S21 include the following 11 types:
1) the type of the module: mod < id >, where id represents the unique identifier of the module;
2) function type: fun < id >, where id represents a unique identifier for the function;
3) the calling type is as follows: invoke < fun, [ tau ], τ >, where fun represents the function type of the call, [ tau ] represents the type of parameter needed by the call, τ represents the type of return value of the call;
4) class type: cls < id, [ cls ] >, where id represents a unique identifier of the class and [ cls ] represents the type of the parent class of the class;
5) example types: ins < cls >, where cls represents the class type to which the instance belongs;
6) the method comprises the following steps: meth < fun, ins >, where fun denotes the function type and ins denotes the instance type to which the method belongs;
7) the combination type is as follows: any type of collection;
8) dictionary type: ditt < τ, τ >, where two τ's represent the type of a key sum value in the dictionary, respectively;
9) list type: list < τ >, where τ represents the type of element in the list;
10) tuple type: tuple < τ >, where τ represents the type of element in the tuple;
11) set type: set < τ >, where τ denotes the type of element in the set.
Preferably, the specific steps of checking whether there is a memory leak in each child node b using the predefined memory leak mode in step S3 are as follows:
s31, predefining mode 1 as memory leakage caused by self-reference, judging whether the child node b meets mode 1, if b and
Figure GDA0003452615200000033
if they are the same, then mode 1 is considered satisfied, and the cycle that records memory leaks is referred to as
Figure GDA0003452615200000031
S32, predefining mode 2 as the memory leakage caused by the circulation reference between an instance and a container, judging whether the child node b satisfies the mode 2, if b strictly contains the mode 2
Figure GDA0003452615200000032
Then mode 2 is deemed satisfied, at which point the cycle reference that recorded the memory leak is
Figure GDA0003452615200000034
S33, predefining mode 3 is a memory leak caused by cyclic referencing between instances and methods, determining whether mode 3 is satisfied, and the determining steps are as follows, in S331 and S332:
s331, if the child node b belongs to the method type, checking the instance type b.ins to which the child node b belongs, and if b.ins and b.ins
Figure GDA0003452615200000036
If they are the same, then mode 3 is considered satisfied, and the cycle that records memory leaks is referred to as
Figure GDA0003452615200000035
S332, if the child node b belongs to the combination type, checking each type t contained in the child node b, and if t of one method type exists, the instance type t.ins and the instance type t.i
Figure GDA0003452615200000037
If they are the same, then mode 3 is considered satisfied, and the cycle that records memory leaks is referred to as
Figure GDA0003452615200000038
S34, predefining mode 4 as memory leakage caused by cyclic referencing between two instances, determining whether the child node b satisfies mode 4, and the determining steps are as follows, for example, S341 and S342:
s341, if the child node b belongs to the instance type, and b is associated with
Figure GDA00034526152000000313
Not identical, each child node of b is checked, if there is one child node c that is instance type and identical to
Figure GDA0003452615200000039
Equivalently, then mode 4 is deemed satisfied, at which point the cycle that records memory leaks is referenced as
Figure GDA00034526152000000310
S342, if the child node b belongs to the combination type, each type t contained in the b is checked, if there is one t belonging to the instance type, each child node c in the t is further checked, if there is one child node c being the instance type and being associated with
Figure GDA00034526152000000311
Equivalently, then mode 4 is deemed satisfied, at which point the cycle that records memory leaks is referenced as
Figure GDA00034526152000000312
S35, predefining pattern 5 is a memory leak caused by cyclic referencing between two instances and a container, and determining whether the child node b satisfies pattern 5, where the determining steps are S351 and S352:
s351, if the child node b belongs toIn instance type, and b and
Figure GDA0003452615200000041
not identical, further check each child node within b if there is one child node c that is instance type and contains
Figure GDA0003452615200000042
Then mode 5 is deemed satisfied, at which point the cycle reference that recorded the memory leak is
Figure GDA0003452615200000043
S352, if the child node b belongs to the combination type, checking each type t contained in the b, if one t belongs to the instance type, further checking each child node c in the t, if one child node c is the instance type and contains
Figure GDA0003452615200000044
Then mode 5 is deemed satisfied, at which point the cycle reference that recorded the memory leak is
Figure GDA0003452615200000045
S36, predefining a mode 6 as memory leakage caused by cyclic reference between two examples and methods, and judging whether the child node b meets the mode 6, wherein the judgment steps are as follows:
if the child node b belongs to the instance type, each child node c of b is further checked, if there is one child node c which is the method type and to which instance type c
Figure GDA0003452615200000046
If the same, then mode 6 is deemed satisfied, and the cycle that records memory leaks is referenced as
Figure GDA0003452615200000047
Preferably, in step S32, for any two types x and y, the strict inclusion relationship between the two types x and y is determined as follows:
1) if x belongs to the dictionary type, checking whether the key or value of x is equivalent to y, and if so, considering that the type x contains the type y;
2) if x belongs to a set type, a list type or a meta-ancestor type, checking whether y belongs to one of x, and if so, considering that the type x contains the type y;
3) if x belongs to the combination type, checking whether a type t in x contains the type y, if so, considering that the type x contains the type y.
Preferably, in step S34, for any two types x and y, the equivalence relation determination step between the two types x and y is as follows:
1) if x does not belong to the combination type, judging whether x and y are the same, and if so, considering x and y to be equivalent;
2) if x belongs to the combination type, check if there is a type t in x that is equivalent to y, and if so, consider x and y to be equivalent.
Preferably, in step S35, for any two types x and y, the inclusion relationship determining step between the two types x and y is as follows:
1) if x belongs to the dictionary type, check if the key or value of x is the same as y, and if so, consider type x to contain type y.
2) If x belongs to the set type, list type or meta-ancestor type, then check if y belongs to x, and if so, then consider type x to contain type y.
3) If x belongs to the combination type, check if there is a type t containing type y in x, if yes, then consider type x containing type y.
The invention uses abstract interpreter to deduce type, and uses mode-based detection method to detect memory leakage in code based on type tree, the invention has following benefits: 1. acquiring the type information of the code by utilizing type inference so that the memory leak detection is suitable for dynamic languages such as Python; 2. by using the mode-based memory leak detection method, the memory leak has the characteristics of high accuracy and high speed.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
As shown in fig. 1, a method for detecting a memory leak of a pattern-based Python code according to the present invention includes the following steps:
s1, code extraction: inputting the source code of the project, traversing all code files in the project, and loading each Python code file
Figure GDA0003452615200000051
Obtaining each Python code file by using the corresponding function of the ast module in the Python standard library
Figure GDA0003452615200000052
Corresponding abstract syntax tree
Figure GDA0003452615200000053
S2, type analysis: the abstract syntax tree obtained based on step S1 using the abstract interpreter technique
Figure GDA0003452615200000054
Performing type inference to obtain type tree
Figure GDA0003452615200000055
Wherein the type tree
Figure GDA0003452615200000056
The nodes in (1) represent abstract types, and the relationships between the nodes represent dependencies.
In the present embodiment, the abstract syntax tree obtained based on step S1 is utilized by the abstract interpreter technique
Figure GDA0003452615200000057
The specific steps for performing type inference to obtain a type tree are as follows:
s21, packaging a plurality of abstract types according to the type defined by Python;
in this embodiment, the encapsulated abstract types include the following 11 types:
1) the type of the module: mod < id >, where id represents the unique identifier of the module;
2) function type: fun < id >, where id represents a unique identifier for the function;
3) the calling type is as follows: invoke < fun, [ tau ], τ >, where fun represents the function type of the call, [ tau ] represents the type of parameter needed by the call, τ represents the type of return value of the call;
4) class type: cls < id, [ cls ] >, where id represents a unique identifier of the class and [ cls ] represents the type of the parent class of the class;
5) example types: ins < cls >, where cls represents the class type to which the instance belongs;
6) the method comprises the following steps: meth < fun, ins >, where fun denotes the function type and ins denotes the instance type to which the method belongs;
7) the combination type is as follows: any type of collection, i.e., a collection of multiple abstract types;
8) dictionary type: ditt < τ, τ >, where two τ's represent the type of a key sum value in the dictionary, respectively;
9) list type: list < τ >, where τ represents the type of element in the list;
10) tuple type: tuple < τ >, where τ represents the type of element in the tuple;
11) set type: set < τ >, where τ denotes the type of element in the set.
S22, for each Python code file
Figure GDA0003452615200000067
Corresponding abstract syntax tree
Figure GDA0003452615200000061
Type tree derivation using abstract interpreter for type inference
Figure GDA0003452615200000062
Type tree
Figure GDA0003452615200000063
Each node in (a) represents an abstract type, the relationship between nodes represents the definition of dependencies, i.e. children, within a parent node, and an abstract syntax tree
Figure GDA0003452615200000064
Adding the corresponding module type into the type tree T as a root node
S23 traversing the type tree in sequence
Figure GDA0003452615200000065
Obtaining all function types by the node of each function type in the system; for each function type in all function types, judging whether the function type has a call in type inference, namely whether the function type called by at least one call type is the function type; if the function type is the new function type, the judgment of the next function type is skipped, if not, the function type is regarded as the new function type and a call type is created, the unknown type is used as an incoming parameter, and the abstract interpreter is used for calling the newly created call type at one time.
S3, detecting memory leakage: traversing the type tree obtained in step S2
Figure GDA0003452615200000066
Each instance type in
Figure GDA0003452615200000068
Get each instance type
Figure GDA0003452615200000069
In all the child nodes in the type tree, checking whether each child node b has memory leakage by using a predefined memory leakage mode, if one of the memory leakage modes is satisfied, recording cyclic references causing the memory leakage, wherein each cyclic reference is recorded as a node sequence [ v ] v0,v1,...vn]Wherein each node v in the sequence of nodesiFor the nodes in the type tree, reference relations exist between adjacent nodes, and v is satisfied0=vn
In this embodiment, the specific steps of checking whether there is a memory leak in each child node b using the predefined memory leak mode are as follows:
s31, predefining mode 1 as memory leakage caused by self-reference, judging whether the child node b meets mode 1, if b and
Figure GDA0003452615200000072
if they are the same, then mode 1 is considered satisfied, and the cycle that records memory leaks is referred to as
Figure GDA0003452615200000071
S32, predefining mode 2 as the memory leakage caused by the circulation reference between an instance and a container, judging whether the child node b satisfies the mode 2, if b strictly contains the mode 2
Figure GDA0003452615200000074
Then mode 2 is deemed satisfied, at which point the cycle reference that recorded the memory leak is
Figure GDA0003452615200000073
Wherein, judging whether b strictly comprises
Figure GDA0003452615200000075
When it is needed to be provided withAnd determining a judgment rule of strict inclusion relationship between the two. Due to b and
Figure GDA0003452615200000076
both represent an abstract type, and therefore, in this embodiment, for any two abstract types x and y, the strict inclusion relationship between the two abstract types x and y is determined as follows:
1) if x belongs to the dictionary type, checking whether the key or value of x is equivalent to y, and if so, considering that the type x contains the type y;
2) if x belongs to a set type, a list type or a meta-ancestor type, checking whether y belongs to one of x, and if so, considering that the type x contains the type y;
3) if x belongs to the combination type, checking whether a type t in x contains the type y, if so, considering that the type x contains the type y.
S33, predefining mode 3 is a memory leak caused by cyclic referencing between instances and methods, determining whether mode 3 is satisfied, and the determining steps are as follows, in S331 and S332:
s331, if the child node b belongs to the method type, checking the instance type b.ins to which the child node b belongs, and if b.ins and b.ins
Figure GDA0003452615200000078
If they are the same, then mode 3 is considered satisfied, and the cycle that records memory leaks is referred to as
Figure GDA0003452615200000077
S332, if the child node b belongs to the combination type, checking each type t contained in the child node b, and if t of one method type exists, the instance type t.ins and the instance type t.i
Figure GDA00034526152000000713
If they are the same, then mode 3 is considered satisfied, and the cycle that records memory leaks is referred to as
Figure GDA0003452615200000079
S34, predefining mode 4 as memory leakage caused by cyclic referencing between two instances, determining whether the child node b satisfies mode 4, and the determining steps are as follows, for example, S341 and S342:
s341, if the child node b belongs to the instance type, and b is associated with
Figure GDA00034526152000000710
Not identical, each child node of b is checked, if there is one child node c that is instance type and identical to
Figure GDA00034526152000000711
Equivalently, then mode 4 is deemed satisfied, at which point the cycle that records memory leaks is referenced as
Figure GDA00034526152000000712
S342, if the child node b belongs to the combination type, each type t contained in the b is checked, if there is one t belonging to the instance type, each child node c in the t is further checked, if there is one child node c being the instance type and being associated with
Figure GDA0003452615200000081
Equivalently, then mode 4 is deemed satisfied, at which point the cycle that records memory leaks is referenced as
Figure GDA0003452615200000082
Wherein c is determined
Figure GDA0003452615200000083
When the two are equivalent, a judgment rule of the equivalence relation between the two needs to be set. Due to c and
Figure GDA0003452615200000084
both represent an abstract type, and therefore, in this embodiment, for any two abstract types x and y, the equivalence relation between the two abstract types x and y is determined as follows:
1) if x does not belong to the combination type, judging whether x and y are the same, and if so, considering x and y to be equivalent;
2) if x belongs to the combination type, check if there is a type t in x that is equivalent to y, and if so, consider x and y to be equivalent.
S35, predefining pattern 5 is a memory leak caused by cyclic referencing between two instances and a container, and determining whether the child node b satisfies pattern 5, where the determining steps are S351 and S352:
s351, if the child node b belongs to the instance type, and b is compared with
Figure GDA0003452615200000085
Not identical, further check each child node within b if there is one child node c that is instance type and contains
Figure GDA0003452615200000087
Then mode 5 is deemed satisfied, at which point the cycle reference that recorded the memory leak is
Figure GDA0003452615200000086
S352, if the child node b belongs to the combination type, checking each type t contained in the b, if one t belongs to the instance type, further checking each child node c in the t, if one child node c is the instance type and contains
Figure GDA0003452615200000088
Then mode 5 is deemed satisfied, at which point the cycle reference that recorded the memory leak is
Figure GDA0003452615200000089
Wherein, it is determined whether c includes
Figure GDA00034526152000000814
In this case, a judgment rule of the inclusion relationship between the two is required to be set. Due to c and
Figure GDA00034526152000000812
both represent an abstract type, and therefore, in this embodiment, for any two types x and y, the inclusion relationship between the two types x and y is determined as follows:
1) if x belongs to the dictionary type, check if the key or value of x is the same as y, and if so, consider type x to contain type y.
2) If x belongs to the set type, list type or meta-ancestor type, then check if y belongs to x, and if so, then consider type x to contain type y.
S36, predefining a mode 6 as memory leakage caused by cyclic reference between two examples and methods, and judging whether the child node b meets the mode 6, wherein the judgment steps are as follows:
if the child node b belongs to the instance type, each child node c of b is further checked, if there is one child node c which is the method type and to which instance type c
Figure GDA00034526152000000810
If the same, then mode 6 is deemed satisfied, and the cycle that records memory leaks is referenced as
Figure GDA00034526152000000811
3) If x belongs to the combination type, check if there is a type t containing type y in x, if yes, then consider type x containing type y.
The above-mentioned S31-S36 define different memory leakage patterns in 6, and each child node b is regarded as having memory leakage as long as it meets any one pattern, and is regarded as not having memory leakage if it does not meet any one pattern. Through the operations of S1-S3, the memory leak detection of Python codes can be realized, the cyclic references causing the memory leak are located and obtained, and finally all cyclic reference sets causing the memory leak can be obtained. The method is suitable for detecting the coding stage in the software development process, can effectively detect the memory leakage existing in the code before the code runs, and timely informs relevant developers to adopt corresponding solutions.
The above-mentioned steps S1-S3 are applied to an embodiment to show the technical effects thereof.
Examples
The steps of this embodiment are the same as those of the specific embodiment, and are not described herein again. The following shows some of the implementation processes and implementation results:
data source acquisition: the code used in this embodiment is the source code of 4 real open source items obtained from the GitHub open source community. Since the project does not store information about real memory leaks, it is necessary to run test cases of each project to collect memory leak information generated during the running process, wherein the relevant statistical information of each project is shown in table 1. Then, the collected memory leaks are manually classified into 6 modes according to the technical scheme of the invention.
And (5) result verification: in this embodiment, the cyclic reference set causing the memory leak collected in the actual code running process is compared with the cyclic reference set which is detected by applying the technical scheme of the present invention to the project source code and may cause the memory leak, so as to evaluate the validity of the scheme. In order to verify the technical effect of the technical scheme of the invention, four indexes are selected to measure the detection performance:
SP is the number of cyclic references which are found by the technical scheme of the invention and exist in actual operation.
SN is the number of cyclic references found by the technical scheme of the invention and not existing in actual operation.
DP is the number of cyclic references that exist in actual operation and are found by the technical solution of the present invention.
DN is the number of cyclic references which exist in actual operation and cannot be found by the technical scheme of the invention.
In order to calculate the above four indexes, it is necessary to determine a cyclic reference S that may cause memory leakage and is found by the technical solution of the present inventioniWhether to compare with a cyclic reference D collected in actual operation and causing memory leakageiThe same is true. Assume that the present embodiment sets the cyclic references that cause actual memory leakage collected during the actual code execution process into D ═ D1,D2,...,Dn}. The technical scheme of the invention is applied to the cyclic reference set which is detected by the project source code and possibly causes memory leakage, and the cyclic reference set is S ═ S1,S2,...,Sm}. Suppose Si=[s1,s2,...,sk]And Di=[d1,d2,...,dk]Are respectively cyclic references containing k types and belonging to the same mode, if a constant l (0 ≦ l)<k),si=di+lThen, determine SiAnd DiThe same is true. Wherein s isi=djThe judgment method is as follows:
(1)siand djThe same;
(2) if d isjAnd siSame, then djAnd siThe same subtype of
Table 2 shows the comparison of the memory leak detection results with the inventive solution on 4 data sets with the memory leak collected from the actual code run. As can be seen from the table, the DNs of the 4 entries are all 0, which proves the effectiveness of the mode-based memory leak detection method provided by the present invention, that is, in the 4 entries, the technical solution of the present invention can find all the memory leaks occurring in actual operation. In addition, for the three items of boto, djblets and libNeuroML, as can be seen from the table, the method of the present invention finds out 3, 3 and 6 cyclic references that do not exist in the actual code operation, respectively, and through manual verification, the above 12 memory leaks that do not collect in the actual operation but actually exist. This is because the memory leaks collected by the code execution depend on the test cases, and it is difficult to collect all the memory leaks when the test cases are insufficient. The technical scheme of the invention belongs to static analysis and does not depend on a test case. In general, the SP is larger than the DP, which shows that the method has high accuracy, and can find the memory leakage which cannot be collected in the actual operation because the test case cannot cover the memory leakage collected in the actual operation besides the memory leakage collected in the actual operation.
Actual operation of table 14 real items obtains statistical information table of memory leak data set
Figure GDA0003452615200000101
TABLE 2 comparison of the results of the test according to the method of the invention with the results collected from the actual code run
Name of item SP SN DP DN
boto 315 3 274 0
djblets 26 3 25 0
libNeuroML 256 6 63 0
pymtl 56 0 37 0
The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims (6)

1. A method for detecting memory leak of Python code based on mode is characterized by comprising the following steps:
s1, inputting the source code of the project, traversing all code files in the project, and loading each Python code file
Figure FDA0003452615190000011
Obtaining each Python code file by using ast module in Python standard library
Figure FDA0003452615190000012
Corresponding abstract syntax tree
Figure FDA0003452615190000013
S2, obtaining the abstract syntax tree based on the step S1 by using the abstract interpreter technology
Figure FDA0003452615190000014
Performing type inference to obtain type tree
Figure FDA0003452615190000015
Wherein the type tree
Figure FDA00034526151900000112
The nodes in the graph represent abstract types, and the relationship among the nodes represents the dependency relationship;
s3, traversing the type tree obtained in the step S2
Figure FDA0003452615190000016
Obtaining all child nodes of each instance type i in the type tree, then checking whether memory leakage exists in each child node b by using a predefined memory leakage mode, if one memory leakage mode is met, recording cyclic references causing the memory leakage, and recording each cyclic reference as a node sequence [ v ] v0,v1,...vn]Wherein each node v in the sequence of nodesiFor the nodes in the type tree, reference relations exist between adjacent nodes, and v is satisfied0=vn
The specific steps of deriving the type tree by type inference in step S2 are as follows:
s21, packaging a plurality of abstract types according to the type defined by Python;
s22, for each Python code file
Figure FDA0003452615190000017
Corresponding abstract syntax tree
Figure FDA0003452615190000018
Type tree derivation using abstract interpreter for type inference
Figure FDA0003452615190000019
Type tree
Figure FDA00034526151900000110
Each node in (1) represents an abstract type and an abstract syntax tree
Figure FDA00034526151900000113
Adding the corresponding module type into the type tree T as a root node, wherein the relationship among the nodes represents the dependency relationship, namely the definition of the child node is in a father node;
s23 traversing the type tree in sequence
Figure FDA00034526151900000111
Obtaining all function types by the node of each function type in the system; for each function type, judging whether the function type has a call in type inference, namely whether the function type called by at least one call type is the function type; if the function type is the new function type, the judgment of the next function type is skipped, if not, the function type is regarded as the new function type and a call type is created, the unknown type is used as an incoming parameter, and the abstract interpreter is used for calling the newly created call type at one time.
2. The method according to claim 1, wherein the abstract types encapsulated in step S21 include the following 11 types:
1) the type of the module: mod < id >, where id represents the unique identifier of the module;
2) function type: fun < id >, where id represents a unique identifier for the function;
3) the calling type is as follows: invoke < fun, [ tau ], τ >, where fun represents the function type of the call, [ tau ] represents the type of parameter needed by the call, τ represents the type of return value of the call;
4) class type: cls < id, [ cls ] >, where id represents a unique identifier of the class and [ cls ] represents the type of the parent class of the class;
5) example types: ins < cls >, where cls represents the class type to which the instance belongs;
6) the method comprises the following steps: meth < fun, ins >, where fun denotes the function type and ins denotes the instance type to which the method belongs;
7) the combination type is as follows: any type of collection;
8) dictionary type: ditt < τ, τ >, where two τ's represent the type of a key sum value in the dictionary, respectively;
9) list type: list < τ >, where τ represents the type of element in the list;
10) tuple type: tuple < τ >, where τ represents the type of element in the tuple;
11) set type: set < τ >, where τ denotes the type of element in the set.
3. The method according to claim 1, wherein the specific step of checking whether there is a memory leak in each child node b using the predefined memory leak mode in step S3 is as follows:
s31, predefining a mode 1 as memory leakage caused by self-reference, judging whether the child node b meets the mode 1, if b is the same as i, considering that the mode 1 is met, and recording the cyclic reference causing the memory leakage as [ i ] at the moment;
s32, predefining a mode 2 as a memory leak caused by cyclic reference between an instance and a container, judging whether a child node b meets the mode 2, if b strictly contains i, considering that the mode 2 is met, and recording the cyclic reference causing the memory leak as [ b, i ];
s33, predefining mode 3 is a memory leak caused by cyclic referencing between instances and methods, determining whether mode 3 is satisfied, and the determining steps are as follows, in S331 and S332:
s331, if the child node b belongs to the method type, checking the instance type b.ins to which the child node b belongs, if the b.ins is the same as the i, determining that the mode 3 is met, and recording the cyclic reference causing the memory leakage as [ b.ins, i ];
s332, if the child node b belongs to the combination type, each type t contained in the child node b is checked, if a method type t exists, and the instance types t.ins and i of the method type t are the same, the mode 3 is considered to be met, and at the moment, the cyclic reference causing memory leakage is recorded as [ t.ins, i ];
s34, predefining mode 4 as memory leakage caused by cyclic referencing between two instances, determining whether the child node b satisfies mode 4, and the determining steps are as follows, for example, S341 and S342:
s341, if the child node b belongs to the instance type and b is different from i, each child node of b is checked, if one child node c is the instance type and is equivalent to i, the mode 4 is considered to be met, and at the moment, the cyclic reference causing the memory leakage is recorded as [ c, i ];
s342, if the child node b belongs to the combined type, each type t contained in the b is checked, if one t belongs to the instance type, each child node c in the t is further checked, if one child node c exists, the child node c is the instance type and is equivalent to the i, the mode 4 is considered to be met, and at the moment, the cyclic reference causing the memory leakage is recorded as [ c, i ];
s35, predefining pattern 5 is a memory leak caused by cyclic referencing between two instances and a container, and determining whether the child node b satisfies pattern 5, where the determining steps are S351 and S352:
s351, if the child node b belongs to the instance type and b is different from i, further checking each child node in b, if one child node c exists, is the instance type and contains i, considering that the mode 5 is met, and recording the cyclic reference causing memory leakage as [ c, i ];
s352, if the child node b belongs to the combined type, each type t contained in the b is checked, if one t belongs to the instance type, each child node c in the t is further checked, if one child node c exists, the child node c is the instance type and contains i, the mode 5 is considered to be met, and at the moment, the cyclic reference causing memory leakage is recorded as [ c, i ];
s36, predefining a mode 6 as memory leakage caused by cyclic reference between two examples and methods, and judging whether the child node b meets the mode 6, wherein the judgment steps are as follows:
if child node b belongs to the instance type, each child node c of b is further checked, if there is one child node c that is the method type and the instance type c.ins and i to which it belongs, then pattern 6 is considered to be satisfied, at which point the circular reference that caused the memory leak is recorded as [ c.ins, i ].
4. The method according to claim 1, wherein in step S32, for any two types x and y, the strict inclusion relation between them is determined as follows:
1) if x belongs to the dictionary type, checking whether the key or value of x is equivalent to y, and if so, considering that the type x contains the type y;
2) if x belongs to a set type, a list type or a meta-ancestor type, checking whether y belongs to one of x, and if so, considering that the type x contains the type y;
3) if x belongs to the combination type, checking whether a type t in x contains the type y, if so, considering that the type x contains the type y.
5. The method according to claim 1, wherein in step S34, for any two types x and y, the equivalence relation between them is determined as follows:
1) if x does not belong to the combination type, judging whether x and y are the same, and if so, considering x and y to be equivalent;
2) if x belongs to the combination type, check if there is a type t in x that is equivalent to y, and if so, consider x and y to be equivalent.
6. The method according to claim 1, wherein in step S35, for any two types x and y, the step of determining the inclusion relationship between them is as follows:
1) if x belongs to the dictionary type, checking whether the key or value of x is the same as y, and if so, considering that the type x contains the type y;
2) if x belongs to a set type, a list type or a meta-ancestor type, checking whether y belongs to x, and if so, considering that the type x contains the type y;
3) if x belongs to the combination type, check if there is a type t containing type y in x, if yes, then consider type x containing type y.
CN202110586274.3A 2021-05-27 2021-05-27 Pattern-based Python code memory leak detection method Active CN113407442B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110586274.3A CN113407442B (en) 2021-05-27 2021-05-27 Pattern-based Python code memory leak detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110586274.3A CN113407442B (en) 2021-05-27 2021-05-27 Pattern-based Python code memory leak detection method

Publications (2)

Publication Number Publication Date
CN113407442A CN113407442A (en) 2021-09-17
CN113407442B true CN113407442B (en) 2022-02-18

Family

ID=77674737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110586274.3A Active CN113407442B (en) 2021-05-27 2021-05-27 Pattern-based Python code memory leak detection method

Country Status (1)

Country Link
CN (1) CN113407442B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017458A (en) * 2007-03-02 2007-08-15 北京邮电大学 Software safety code analyzer based on static analysis of source code and testing method therefor
CN105912381A (en) * 2016-04-27 2016-08-31 华中科技大学 Compile-time code security detection method based on rule base

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198260B (en) * 2013-03-28 2016-06-08 中国科学院信息工程研究所 A kind of binary program leak automatization localization method
US20150363196A1 (en) * 2014-06-13 2015-12-17 The Charles Stark Draper Laboratory Inc. Systems And Methods For Software Corpora
CN107967208B (en) * 2016-10-20 2020-01-17 南京大学 Python resource sensitive defect code detection method based on deep neural network
CN111736980B (en) * 2019-03-25 2024-01-16 华为技术有限公司 Memory management method and device
CN111352829A (en) * 2019-11-21 2020-06-30 杭州迪普科技股份有限公司 Memory leak test method, device and equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017458A (en) * 2007-03-02 2007-08-15 北京邮电大学 Software safety code analyzer based on static analysis of source code and testing method therefor
CN105912381A (en) * 2016-04-27 2016-08-31 华中科技大学 Compile-time code security detection method based on rule base

Also Published As

Publication number Publication date
CN113407442A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
Xu et al. Finding low-utility data structures
US7797687B2 (en) Parameterized unit tests with behavioral purity axioms
US7886272B1 (en) Prioritize code for testing to improve code coverage of complex software
US8875110B2 (en) Code inspection executing system for performing a code inspection of ABAP source codes
US10606570B2 (en) Representing software with an abstract code graph
US7971193B2 (en) Methods for performining cross module context-sensitive security analysis
CN109033843B (en) Java file dependency analysis method and module for distributed static detection system
US20060253739A1 (en) Method and apparatus for performing unit testing of software modules with use of directed automated random testing
Le Hanh et al. Selecting an efficient OO integration testing strategy: an experimental comparison of actual strategies
US20130212563A1 (en) Method and a System for Searching for Parts of a Computer Program Which Affects a Given Symbol
US20080028378A1 (en) Utilizing prior usage data for software build optimization
CN103577324A (en) Static detection method for privacy information disclosure in mobile applications
CN105760292A (en) Assertion verification method and device for unit testing
Xu et al. Scalable runtime bloat detection using abstract dynamic slicing
CN111831562A (en) Fuzzy test case generation method based on machine learning, computer equipment and readable storage medium for operating method
CN109408366B (en) Data source configuration test method, system, computer equipment and storage medium
CN111767076A (en) Code reconstruction method and device
CN114510722A (en) Static detection method and detection system for incremental code
CN114328213A (en) Parallelization fuzzy test method and system based on target point task division
CN108897678B (en) Static code detection method, static code detection system and storage device
He et al. IFDS-based context debloating for object-sensitive pointer analysis
CN113407442B (en) Pattern-based Python code memory leak detection method
US20130152053A1 (en) Computer memory access monitoring and error checking
US8997064B2 (en) Symbolic testing of software using concrete software execution
CN114490413A (en) Test data preparation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant