CN114691197A - Code analysis method and device, electronic equipment and storage medium - Google Patents

Code analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114691197A
CN114691197A CN202210335613.5A CN202210335613A CN114691197A CN 114691197 A CN114691197 A CN 114691197A CN 202210335613 A CN202210335613 A CN 202210335613A CN 114691197 A CN114691197 A CN 114691197A
Authority
CN
China
Prior art keywords
analysis
code
analyzed
source code
syntax tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210335613.5A
Other languages
Chinese (zh)
Inventor
付威
李粒
章磊
李孝岩
齐向东
吴云坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Original Assignee
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxin Technology Group Co Ltd, Secworld Information Technology Beijing Co Ltd filed Critical Qianxin Technology Group Co Ltd
Priority to CN202210335613.5A priority Critical patent/CN114691197A/en
Publication of CN114691197A publication Critical patent/CN114691197A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a code analysis method, a code analysis device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type; and analyzing the syntax tree by using the analysis rule to determine the code content of the target defect type in the source code to be analyzed. Thus, defects in the source code in the code library can be determined by this method.

Description

Code analysis method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a code analysis method, apparatus, electronic device, and storage medium.
Background
The code library can be used for storing source code of an application program, and in practical application, the source code in the code library can be subjected to static analysis to determine defects in the source code. Therefore, how to perform static analysis on the source code is crucial.
Disclosure of Invention
An object of the embodiments of the present application is to provide a code analysis method, apparatus, electronic device and storage medium, which are used to solve the problems in the prior art.
A first aspect of an embodiment of the present application provides a code analysis method, where the method includes:
acquiring a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type;
and analyzing the syntax tree by using the analysis rule to determine the code content of the target defect type in the source code to be analyzed.
In one embodiment, the method further comprises:
acquiring a code analysis request, wherein the code analysis request comprises the target defect type and identification information of the source code to be analyzed; accordingly, the method can be used for solving the problems that,
obtaining a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type, specifically comprising:
acquiring a syntax tree of the source code to be analyzed according to the identification information of the source code to be analyzed; and acquiring the analysis rule according to the target defect type.
In one embodiment, the method further comprises:
respectively constructing corresponding code analysis tasks aiming at source codes of different application programs and/or source codes of different versions of the same application program in a code library; and the number of the first and second groups,
the code analysis request acquisition specifically includes:
and acquiring the code analysis request under the condition that the code analysis task is triggered.
In an embodiment, obtaining the syntax tree of the source code to be analyzed specifically includes: and obtaining the syntax tree from a syntax tree database.
In an embodiment, obtaining the syntax tree of the source code to be analyzed specifically includes:
according to the grammatical rule of the programming language of the source code to be analyzed, performing lexical analysis on the source code to be analyzed;
and carrying out syntactic analysis on the result of the lexical analysis to generate the syntactic tree.
In one embodiment, the analysis rule corresponding to the target defect type is obtained as follows:
and acquiring the analysis rule corresponding to the target defect type by utilizing the preset corresponding relation between the defect type and the analysis rule.
In an embodiment, analyzing the syntax tree by using the analysis rule to determine the code content of the target defect type in the source code to be analyzed specifically includes:
and performing type analysis, constant analysis, syntax tree analysis, control flow analysis, data flow analysis and/or taint analysis on the syntax tree by using the analysis rule to determine the code content of the target defect type in the source code to be analyzed.
In one embodiment, the method further comprises: highlighting the code content.
A second aspect of the embodiments of the present application provides a code analysis apparatus, including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type;
and the analysis unit is used for analyzing the syntax tree by using the analysis rule so as to determine the code content of the target defect type in the source code to be analyzed.
In one embodiment, the apparatus further comprises: a request obtaining unit, configured to obtain a code analysis request, where the code analysis request includes identification information of the target defect type and identification information of the source code to be analyzed; accordingly, the method has the advantages that,
the obtaining unit is configured to obtain a syntax tree of the source code to be analyzed according to the identification information of the source code to be analyzed; and acquiring the analysis rule through the identification information of the target defect type.
In one embodiment, the apparatus further comprises: the task construction unit is used for respectively constructing corresponding code analysis tasks aiming at source codes of different application programs in the code library and/or source codes of different versions of the same application program; and the number of the first and second groups,
the request acquiring unit specifically includes: a request obtaining subunit, configured to obtain the code analysis request when the code analysis task is triggered.
In an embodiment, the obtaining unit specifically includes: a first obtaining subunit, configured to obtain the syntax tree from a syntax tree database.
In an embodiment, the obtaining unit specifically includes: the second obtaining subunit is used for performing lexical analysis on the source code to be analyzed according to a grammatical rule of the programming language of the source code to be analyzed; and carrying out syntactic analysis on the result of the lexical analysis to generate the syntactic tree.
In an embodiment, the obtaining unit specifically includes: and the third acquisition subunit is used for acquiring the analysis rule corresponding to the target defect type by using the preset corresponding relation between the defect type and the analysis rule.
In one embodiment, the analysis unit specifically includes: and the analysis subunit is used for performing type analysis, constant analysis, syntax tree analysis, control flow analysis, data flow analysis and/or taint analysis on the syntax tree by using the analysis rule so as to determine the code content of the target defect type in the source code to be analyzed.
In one embodiment, the apparatus further comprises: and the display unit is used for highlighting the code content.
A third aspect of embodiments of the present application provides an electronic device, including:
a memory to store a computer program;
a processor configured to perform the method of any of the method embodiments of the present application.
A fourth aspect of an embodiment of the present application provides a storage medium, including: a program which, when run on an electronic device, causes the electronic device to perform the method of any one of the method embodiments of the present application.
The code analysis method provided by the embodiment of the application comprises the steps of obtaining a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type, and analyzing the syntax tree by using the analysis rule, so that the code content of the target defect type in the source code to be analyzed is determined. Thus, defects in the source code in the code library can be determined by this method.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 2 is a schematic flowchart illustrating a specific code analysis method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a code analysis method in a specific application scenario according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a code analysis apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the present application, terms such as "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying a relative importance or order.
As previously described, defects in the source code may be determined by static analysis of the source code in the code library.
For example, the source code of a plurality of different applications may be stored in a code library and may be updated continuously, so that even for the same application, the source code may have a plurality of different versions. How to wait for the source code of these different applications, and even different versions of the same application, to determine their bugs is important.
As shown in fig. 1, the present embodiment provides an electronic apparatus 1 including: at least one processor 11 and a memory 12, one processor being exemplified in fig. 1. The processor 11 and the memory 12 may be connected by a bus 10, and the memory 12 stores instructions executable by the processor 11, and the instructions are executed by the processor 11, so that the electronic device 1 may perform all or part of the flow of the method in the embodiments described below.
The electronic device 1 may be a mobile phone, a notebook computer, a desktop computer, or a large server or a server cluster formed by the mobile phone, the notebook computer, the desktop computer, or the like.
In an embodiment, when a Static Analysis (Program Static Analysis) needs to be performed on a source code stored in a code library, for example, without running the source code, the source code is scanned through syntax tree Analysis, control flow Analysis, data flow Analysis, and other techniques to verify whether the source code meets the criteria of normalization, security, reliability, maintainability, and the like, at this time, the electronic device 1 may perform all or part of the procedures of the methods in the following embodiments to implement the Static Analysis on the source code.
Fig. 2 is a schematic flow chart of a code analysis method according to an embodiment of the present application, and some or all of the steps of the method may be performed by the electronic device 1 shown in fig. 1, which may be taken as an example to describe the method. The method comprises the following steps:
step S21: and acquiring a syntax tree of the source code to be analyzed.
The source code to be analyzed may be a source code of any application program in the code library, or may be a source code of any version of any application program in the code library, for example, a source code of any version of an application program is randomly acquired from the code library and is used as the source code to be analyzed; of course, in practical applications, the source code to be analyzed may also be a source code of a specified version of an application program selected from a code library according to analysis needs.
The syntax tree (AST) is a tree-like representation of the abstract syntax structure of the source code to be analyzed. The syntax tree represents the syntax structure of the programming language of the source code to be analyzed in a tree form, wherein each node on the syntax tree represents one structure in the source code to be analyzed, and the syntax tree is independent of the syntax of the programming language of the source code to be analyzed.
As to the specific way of obtaining the syntax tree of the source code to be analyzed, there are many practical applications, and several methods can be listed here for explanation:
the first method is as follows: and generating a syntax tree of the source code to be analyzed.
In the first mode, the source code to be analyzed may be obtained first, and then the syntax tree may be generated by the source code to be analyzed. The process of generating the syntax tree through the source code to be analyzed may include performing lexical analysis on the source code to be analyzed according to a syntax rule of a programming language of the source code to be analyzed, and then performing syntactic analysis on a result of the lexical analysis to generate the syntax tree.
For example, lexical analysis is performed on the source code to be analyzed according to the grammar rule of the programming language of the source code to be analyzed, so that word segmentation of the source code to be analyzed is realized. In the process, the limited character sequence in the source code to be analyzed can be obtained in sequence, whether the limited character sequence is a word with lexical meaning or not is identified by combining grammar rules of a corresponding programming language, if yes, the limited character sequence is used as a word segmentation, and if not, other characters are continuously obtained. For example, if the programming language of the source code to be analyzed is C language, it may be determined whether the obtained limited character sequence is a word with lexical meaning by combining grammar rules of C language, including an operator, a control character and a legal identifier of C language, and further determining whether the obtained limited character sequence is a word as a word segmentation.
After lexical analysis is carried out on a source code to be analyzed, the obtained lexical analysis result is a plurality of word segments, then syntactic analysis is carried out to determine whether syntactic errors exist, if yes, an error prompt is output, and if not, the lexical analysis result is converted into a tree form, so that a syntactic tree is generated.
The second method comprises the following steps: the syntax tree is retrieved from a syntax tree database.
The syntax tree database is used for storing syntax trees of all source codes, so that the syntax trees of the source codes to be analyzed can be obtained from the syntax tree database.
For example, the syntax tree database may be queried by using the identification information of the source code to be analyzed, so as to obtain the syntax tree of the source code to be analyzed. The identification information of the source code to be analyzed may be a name and a version number of an application program corresponding to the source code to be analyzed, or may be other characters or character strings that can be used to uniquely identify the source code to be analyzed.
In addition, the syntax tree stored in the syntax tree database may be generated in the first mode, and after the syntax tree is generated, the syntax tree is stored in the syntax tree database.
In practical application, the syntax tree of the source code to be analyzed may also be obtained in other manners, for example, the syntax tree of the source code to be analyzed may also be obtained by combining the first manner and the second manner. For example, a syntax tree database is firstly queried through the identification information of the source code to be analyzed, and if the syntax tree of the source code to be analyzed is stored in the syntax tree database, the syntax tree is obtained from the syntax tree database by using a second mode; and if the syntax tree of the source code to be analyzed is not stored in the syntax tree database, generating the syntax tree of the source code to be analyzed by using a first mode.
Of course, after the syntax tree of the source code to be analyzed is generated, the syntax tree may also be stored in the syntax tree database for subsequent retrieval.
Step S22: and acquiring an analysis rule corresponding to the target defect type.
In practical applications, when statically analyzing source code, it may generally include analyzing whether the source code has some specified vulnerability, does not conform to the program specification, and so on. Thus, the target defect type may be specified as any one or more of the following defect types: there is some specified vulnerability, non-compliance with the program specification, etc.
In addition, according to the standard process of the operation, corresponding analysis rules can be set for each defect type in advance, and a preset corresponding relationship between the defect type and the analysis rule is constructed, wherein whether the defect of the corresponding defect type exists in the source code can be determined according to the analysis rule. In this way, when the analysis rule corresponding to the target defect type is obtained, the analysis rule corresponding to the target defect type can be obtained by using the preset corresponding relationship between the defect type and the analysis rule.
For example, identification information of the target defect type may be obtained first, where the identification information may be a number of the target defect type, or other characters or character strings capable of uniquely identifying the target defect type; and then, combining the preset corresponding relation between the defect type and the analysis rule and the identification information of the target defect type to obtain the analysis rule corresponding to the target defect type.
Analysis rules corresponding to various defect types can be generally stored in a rule database, and after identification information of a target defect type is obtained, the analysis rules corresponding to the target defect type can be obtained from the rule database by using a preset corresponding relationship between the defect type and the analysis rules and the identification information of the target defect type.
In practical application, other manners may also be adopted to obtain the analysis rule corresponding to the target defect type, for example, when the rule database does not store the analysis rule corresponding to the target defect type (for example, the target defect type is a newly-appeared defect type), at this time, the target defect type may be analyzed first to determine a plurality of feature information of the target defect type, and then an analysis flow is constructed by using the feature information, including the analysis steps to be executed and the order of execution of each analysis step, and then the analysis rule is generated by using the analysis flow, so that whether the defect of the target defect type exists in the source code can be determined by using the analysis rule. In addition, after the analysis rule is generated, corresponding identification information can be distributed to the analysis rule and stored in a rule database.
Step S23: and analyzing the obtained syntax tree by using the obtained analysis rule to determine the code content of the target defect type in the source code to be analyzed.
After the syntax tree and the parsing rule are obtained in the above steps S21 and S22, respectively, the syntax tree may be parsed by the parsing rule, so as to determine the code content of the target defect type in the source code to be parsed. The specific manner of analyzing the syntax tree by using the analysis rule to determine the code content of the target defect type in the source code to be analyzed may include: and performing type analysis, constant analysis, syntax tree analysis, control flow analysis, data flow analysis and/or taint analysis and the like on the syntax tree by using the analysis rule to determine the code content of the target defect type in the source code to be analyzed.
For example, the parsing rule can be used to perform data flow analysis on the syntax tree. Among them, data flow analysis may be a set of techniques for obtaining relevant information about how data flows along a program execution path (control-flow graph). In all data stream analysis applications, we associate each program point with a data-flow value. This value is an abstract representation of the set of all program states that may be observed at that point. The set of all possible data stream values is called the domain (domain) of this data stream application.
Therefore, when the parsing rule is used to parse the data stream of the syntax tree, the data stream values before and after the statement s corresponding to each node IN the syntax tree can be respectively recorded as IN [ s ] and OUT [ s ], and the data-flow problem (data-flow problem) is to solve a set of constraints. This set of constraints defines the relationship between IN [ s ] and OUT [ s ] for all statements s, and the constraints are divided into two categories: sentence semantics (transfer functions) based constraints and control flow based constraints.
For example, the parsing rule can be used to perform taint analysis on the syntax tree. The taint analysis is a specific application of data flow analysis, and can analyze whether taint data propagation defects exist in code content of a source code to be analyzed, and the taint analysis generally utilizes the following 3 analysis rules:
1. source, i.e., pollution source function; the data generated by the source function is the starting point of pollution propagation, and the data with dependency relationship of all the sources in the program execution path is marked as taint data.
2. sink, i.e. hazard function; ' if taint data enters the sink function, it indicates a potential security breach. The vulnerability can be detected by carrying out pollution transfer analysis between the interface function which is open to the outside and the internal sensitive function.
3. transformer, i.e. pollution spread function; taint data is propagated from one variable to another through a propagation function. The taint propagation function transformer determines how taint information flows in the program.
Taint analysis can detect security vulnerabilities by marking external input points with source and analyzing whether data can reach a danger function (sink, e.g., SQL query, command execution, etc.) according to the program's data stream. If the source of the variable x is untrustworthy in the stream, x can be generally considered contaminated. The contaminated variable x is referred to as taint data. The source of the contamination is the untrusted input of the program, which may be files, network data, keyboard and mouse input, return values of APIs of unknown origin, etc. A variable is contaminated if its value is calculated in dependence of a contamination source. Likewise, the taint propagation process may be passed through variables. Assuming that the variable x is a pollution source, and the information flows from the variable x to the variable y and then from the variable y to the variable z, both the variable y and the variable z are polluted by the variable x.
The code analysis method provided by the embodiment of the application comprises the steps of obtaining a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type, and analyzing the syntax tree by using the analysis rule, so that the code content of the target defect type in the source code to be analyzed is determined. Thus, defects in the source code in the code library can be determined by this method.
It should be further noted that, in the code analysis method, the execution order is not necessarily limited, and for example, step S21 may be executed first, then step S22 and step S23 are executed respectively, step S22 may be executed first, then step S21 and step S23 are executed respectively, step S21 and step S22 may be executed simultaneously, then step S23 is executed, or another execution order is also possible.
In addition, after the step S23 is executed to determine the code content of the target defect type in the source code to be analyzed, the method may further include: the code content is highlighted. For example, if the target defect type is a specific specified bug, the code content of the source code to be analyzed, which has the specified bug, may be highlighted, so that the code content may be modified subsequently to repair the specified bug. The specific way of highlighting may be to mark the code content as a highlight color, bold text line, underline, or the like. Of course, after the code content of the target defect type in the source code to be analyzed is determined, a prompt message may also be sent to prompt relevant personnel to modify the code content.
In practical applications, the source code in the code library may also be statically analyzed in other manners. For example, a source code to be analyzed may be obtained from a code library, and then the source code to be analyzed may be analyzed in a compiling and building environment corresponding to the source code to be analyzed, where the compiling and building environment includes a code editor, a compiler, a debugger, a graphical user interface, and the like. However, in the analysis method, depending on the compiling and building environment corresponding to the source code to be analyzed, the compiling and building environments on which the source codes of different applications depend are usually different, and even the compiling and building environments on which the source codes of different versions of the same application depend may be different, so that when performing static analysis on the source codes of different applications and/or the source codes of different versions of the same application in the code library, a plurality of different compiling and building environments may need to be generated to support the static analysis. Generating a compilation build environment is often time-consuming and labor-intensive, resulting in time-consuming and labor-intensive static analysis.
By the code analysis method provided by the embodiment of the application, when the source code to be analyzed is analyzed, the syntax tree of the source code to be analyzed is analyzed through the analysis rule instead of the static analysis of the source code to be analyzed by relying on the compiling construction environment, so that the code analysis method provided by the embodiment of the application does not need to generate a corresponding compiling construction environment, and the static analysis cost is reduced.
In practical applications, before the step S21, the method may further include obtaining a code analysis request, where the code analysis request includes identification information of a target defect type and identification information of a source code to be analyzed, and then obtaining a syntax tree of the source code to be analyzed through the identification information of the source code to be analyzed, for example, querying a syntax tree database through the identification information, and if the syntax tree of the source code to be analyzed is stored in the syntax tree database, obtaining the syntax tree from the syntax tree database, or if the syntax tree of the source code to be analyzed is not stored in the syntax tree database, obtaining the source code to be analyzed first, and then generating the syntax tree through the source code to be analyzed. Of course, the analysis rule corresponding to the target defect type may also be obtained by analyzing the identification information of the target defect type in the request through the code.
The code analysis request may be a code analysis request generated when a code analysis task is triggered. For example, the source codes stored in the code library include source codes of different application programs and/or source codes of different versions of the same application program, and in order to perform static analysis on the source codes in the code library, corresponding code analysis tasks may be respectively constructed for the source codes of different application programs and/or the source codes of different versions of the same application program in the code library; in this way, when the code analysis task is triggered, a corresponding code analysis request can be generated, and thus the code analysis request can be acquired. For example, the code analysis tasks may be executed in sequence according to the sequence of execution time or generation time to generate a task list.
The electronic device (referred to as a first electronic device) shown in fig. 1 may be connected to another electronic device (referred to as a second electronic device), at this time, the code analysis tasks may be set in the second electronic device, and when a certain code analysis task in the second electronic device is triggered, a code analysis request is generated and sent to the first electronic device, so that the first electronic device can obtain the code analysis request. For example, the second electronic device may be a mobile phone of a user, a notebook computer, etc., and the first electronic device may be a server, etc.
The above is a specific description of the code analysis provided in the embodiments of the present application, and for convenience of understanding, the embodiments of the present application may further describe the method in combination with a specific application scenario. In the application scenario, source codes of multiple versions of the application program a, namely a1, a2, and A3 to An, are stored in the code library, and the source codes a1 to An need to be statically analyzed to determine whether a certain specified vulnerability exists. The syntax tree database stores syntax trees corresponding to the source codes a1 to An, respectively.
At this time, corresponding code analysis tasks can be respectively constructed for the source codes a 1-An in the code library, wherein each code analysis task respectively comprises identification information of a corresponding source code and identification information of the specified vulnerability; then, generating a task list according to the sequence of execution time of the code analysis tasks, and executing the tasks in sequence according to the sequence in the task list; meanwhile, an analysis scheduling module and a defect analysis module can be arranged in the electronic equipment, the analysis scheduling module can monitor whether code analysis tasks needing to be executed exist in the task list, and if so, corresponding code analysis tasks are obtained according to the sequence of the task list; if not, no action may be performed or monitoring may be performed again after a time interval (e.g., 5 minutes or other time). The mode in this scenario can be explained with reference to fig. 3:
step S31: the analysis scheduling module monitors whether a code analysis task needing to be executed exists in the task list, and if so, executes step S32.
Step S32: the code analysis task is obtained.
Step S33: and querying a syntax tree database by using the identification information of the source code in the code analysis task, and acquiring a corresponding syntax tree.
Of course, if the corresponding syntax tree is not stored in the syntax tree database, the corresponding source code may be further obtained by using the identification information, and the syntax tree may be generated by using the source code.
Step S34: and acquiring the analysis rule corresponding to the specified vulnerability from a rule database by using the identification information of the specified vulnerability in the code analysis task and the preset corresponding relation between the defect type and the analysis rule.
Step S35: the analysis scheduling module sends the analysis rule and the syntax tree to a defect analysis module.
Step S36: and the defect analysis module analyzes the syntax tree by using the analysis rule and outputs an analysis result, wherein the analysis result comprises code content of a specified vulnerability in the source code to be analyzed.
After the code analysis task is completed, the process may return to step S31, and the analysis scheduling module continues to loop until the task list is empty.
Based on the same inventive concept as the code analysis method provided in the embodiments of the present application, the embodiments of the present application also provide a code analysis apparatus, and for the embodiments of the apparatus, if there is unclear, the corresponding code content of the embodiments of the method may be referred to. As shown in fig. 4, which is a specific structural diagram of the apparatus 40, the apparatus 40 includes: an acquisition unit 401 and an analysis unit 402, wherein:
an obtaining unit 401, configured to obtain a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type;
an analyzing unit 402, configured to analyze the syntax tree by using the analysis rule to determine the code content of the target defect type in the source code to be analyzed.
With the device 40 provided in the embodiment of the present application, since the device 40 adopts the same inventive concept as the code analysis method provided in the embodiment of the present application, on the premise that the code analysis method can solve the technical problem, the device 40 can also solve the technical problem, and details thereof are not repeated here.
In addition, in practical applications, the technical effect obtained by combining the apparatus 40 with specific hardware devices, cloud technologies, and the like is also within the scope of the present application, for example, different units in the apparatus 40 are arranged in different nodes in a distributed cluster by using a distributed cluster manner, so as to improve efficiency and the like.
In practical applications, before the obtaining unit 401, the apparatus 40 may further include a request obtaining unit, configured to obtain a code analysis request, where the code analysis request includes the target defect type and the identification information of the source code to be analyzed; accordingly, the method can be used for solving the problems that,
the obtaining unit 401 is configured to obtain a syntax tree of the source code to be analyzed according to the identification information of the source code to be analyzed; and acquiring the analysis rule according to the target defect type.
The apparatus 40 may further include: the task construction unit is used for respectively constructing corresponding code analysis tasks aiming at source codes of different application programs and/or source codes of different versions of the same application program in the code library; at this time, the request obtaining unit may specifically include a request obtaining subunit, configured to obtain the code analysis request when the code analysis task is triggered.
In practical applications, the obtaining unit 401 may include a first obtaining sub-unit, configured to obtain the syntax tree from a syntax tree database.
The obtaining unit 401 may further include a second obtaining subunit, configured to generate a syntax tree of the source code to be analyzed, for example, firstly, according to a syntax rule of a programming language of the source code to be analyzed, performing lexical analysis on the source code to be analyzed, and then performing syntactic analysis on a result of the lexical analysis to generate the syntax tree.
The obtaining unit 401 may further include a third obtaining subunit, configured to obtain, by using a preset corresponding relationship between the defect type and the analysis rule, the analysis rule corresponding to the target defect type.
The analysis unit 402 may further include an analysis subunit, configured to perform type analysis, constant analysis, syntax tree analysis, control flow analysis, data flow analysis, and/or taint analysis on the syntax tree using the analysis rule to determine the code content of the target defect type in the source code to be analyzed.
After the analyzing unit 402, the apparatus 40 may further include a display unit for highlighting the code content of the target defect type in the determined source code to be analyzed.
An embodiment of the present invention further provides a storage medium, including: a program that, when executed on an electronic device, causes the electronic device to perform all or part of the procedures of the methods in the above embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like. The storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (11)

1. A method of code analysis, the method comprising:
acquiring a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type;
and analyzing the syntax tree by using the analysis rule to determine the code content of the target defect type in the source code to be analyzed.
2. The method of claim 1, further comprising:
acquiring a code analysis request, wherein the code analysis request comprises identification information of the target defect type and identification information of the source code to be analyzed; accordingly, the method can be used for solving the problems that,
obtaining a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type, specifically comprising:
acquiring a syntax tree of the source code to be analyzed according to the identification information of the source code to be analyzed; and acquiring the analysis rule through the identification information of the target defect type.
3. The method of claim 2, further comprising:
respectively constructing corresponding code analysis tasks aiming at source codes of different application programs in a code library and/or source codes of different versions of the same application program; and (c) a second step of,
acquiring a code analysis request, specifically comprising:
and acquiring the code analysis request under the condition that the code analysis task is triggered.
4. The method of claim 1, wherein obtaining the syntax tree of the source code to be analyzed specifically comprises: and acquiring the syntax tree from a syntax tree database.
5. The method of claim 1, wherein obtaining the syntax tree of the source code to be analyzed specifically comprises:
according to the grammatical rule of the programming language of the source code to be analyzed, performing lexical analysis on the source code to be analyzed;
and carrying out syntactic analysis on the result of the lexical analysis to generate the syntactic tree.
6. The method of claim 1, wherein the analysis rule corresponding to the target defect type is obtained by:
and acquiring the analysis rule corresponding to the target defect type by utilizing the preset corresponding relation between the defect type and the analysis rule.
7. The method according to claim 1, wherein analyzing the syntax tree using the analysis rule to determine the code content of the target defect type in the source code to be analyzed comprises:
and performing type analysis, constant analysis, syntax tree analysis, control flow analysis, data flow analysis and/or taint analysis on the syntax tree by using the analysis rule to determine the code content of the target defect type in the source code to be analyzed.
8. The method of claim 1, further comprising: highlighting the code content.
9. A code analysis apparatus, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a syntax tree of a source code to be analyzed and an analysis rule corresponding to a target defect type;
and the analysis unit is used for analyzing the syntax tree by using the analysis rule so as to determine the code content of the target defect type in the source code to be analyzed.
10. An electronic device, comprising:
a memory to store a computer program;
a processor to perform the method of any one of claims 1 to 8.
11. A storage medium, comprising: program which, when run on an electronic device, causes the electronic device to perform the method of any one of claims 1 to 8.
CN202210335613.5A 2022-03-31 2022-03-31 Code analysis method and device, electronic equipment and storage medium Pending CN114691197A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210335613.5A CN114691197A (en) 2022-03-31 2022-03-31 Code analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210335613.5A CN114691197A (en) 2022-03-31 2022-03-31 Code analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114691197A true CN114691197A (en) 2022-07-01

Family

ID=82141412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210335613.5A Pending CN114691197A (en) 2022-03-31 2022-03-31 Code analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114691197A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150996A (en) * 2023-10-30 2023-12-01 北京云枢创新软件技术有限公司 Method for determining problem source code generating burr signal, electronic equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150996A (en) * 2023-10-30 2023-12-01 北京云枢创新软件技术有限公司 Method for determining problem source code generating burr signal, electronic equipment and medium
CN117150996B (en) * 2023-10-30 2024-01-19 北京云枢创新软件技术有限公司 Method for determining problem source code generating burr signal, electronic equipment and medium

Similar Documents

Publication Publication Date Title
US9715593B2 (en) Software vulnerabilities detection system and methods
CN110383238B (en) System and method for model-based software analysis
US8875110B2 (en) Code inspection executing system for performing a code inspection of ABAP source codes
US7849509B2 (en) Detection of security vulnerabilities in computer programs
US9824214B2 (en) High performance software vulnerabilities detection system and methods
US8473899B2 (en) Automatic optimization of string allocations in a computer program
US9122540B2 (en) Transformation of computer programs and eliminating errors
Nguyen et al. Cross-language program slicing for dynamic web applications
US9311077B2 (en) Identification of code changes using language syntax and changeset data
US10599852B2 (en) High performance software vulnerabilities detection system and methods
CN110059006B (en) Code auditing method and device
US8898649B2 (en) Application program analysis method, analysis system and recording medium for identifying a contributing factor for an invalid operation of an application program
WO2018161509A1 (en) Conditional compilation preprocessing method, terminal and storage medium
CN113901083B (en) Heterogeneous data source operation resource analysis positioning method and equipment based on multiple resolvers
CN105389262A (en) Method and device for generating test suggestions in allusion to interface tests
KR101696694B1 (en) Method And Apparatus For Analysing Source Code Vulnerability By Using TraceBack
Piskachev et al. Secucheck: Engineering configurable taint analysis for software developers
Lin et al. Reverse engineering input syntactic structure from program execution and its applications
CN114691197A (en) Code analysis method and device, electronic equipment and storage medium
CN110321130B (en) Non-repeatable compiling and positioning method based on system call log
Chen et al. Tracking down dynamic feature code changes against Python software evolution
Xiao et al. Performing high efficiency source code static analysis with intelligent extensions
CN115495745B (en) Industrial software source code static detection method and system based on risk function
US20210318858A1 (en) Method, apparatus, and computer readable storage medium for monitoring a data chain
Zhang et al. Propositional projection temporal logic specification mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination