WO2023024714A1 - Static analysis method, apparatus, and device, and computer-readable storage medium - Google Patents

Static analysis method, apparatus, and device, and computer-readable storage medium Download PDF

Info

Publication number
WO2023024714A1
WO2023024714A1 PCT/CN2022/104055 CN2022104055W WO2023024714A1 WO 2023024714 A1 WO2023024714 A1 WO 2023024714A1 CN 2022104055 W CN2022104055 W CN 2022104055W WO 2023024714 A1 WO2023024714 A1 WO 2023024714A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
queue
analysis
popularity
analysis result
Prior art date
Application number
PCT/CN2022/104055
Other languages
French (fr)
Inventor
Pavel MEZHUEV
Lijun Huang
Alexander Gerasimov
Jingliang Shang
Zhenhua Zhang
Veronika BUTKEVICH
Dousheng Zhao
Original Assignee
Xfusion Digital Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from RU2021124956A external-priority patent/RU2021124956A/en
Application filed by Xfusion Digital Technologies Co., Ltd. filed Critical Xfusion Digital Technologies Co., Ltd.
Priority to CN202280057286.XA priority Critical patent/CN117897694A/en
Publication of WO2023024714A1 publication Critical patent/WO2023024714A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding

Definitions

  • This application relates to the field of computer technologies, and more specifically, to a static analysis method, apparatus, and device, and a computer-readable storage medium in a software detection technology.
  • a static analysis technology is a technology of analyzing software program code without executing a program.
  • a key function of the static analysis technology is to check whether a representation and a description of software are consistent and whether there is a conflict or ambiguity.
  • the static analysis technology plays an important role in testing quality of the software program code.
  • a storage scenario is used as an example.
  • One storage system may include a plurality of subsystems.
  • a scale of source code of a single subsystem is up to 5 million lines.
  • analysis is usually performed in a line-by-line code scanning manner. Consequently, analysis processing efficiency is low in an entire analysis process. Therefore, how to provide an efficient static analysis method becomes a technical problem to be resolved as soon as possible.
  • This application provides a static analysis method, apparatus, and device, and a computer-readable storage medium, to improve static analysis efficiency.
  • a static analysis method is provided.
  • the method may be performed by a static analysis tool, for example, static analysis software, or may be performed by a device in which the static analysis tool is installed. This is not limited in this application.
  • the method includes: receiving a first request, where the first request is used to perform problem analysis on a first object; performing problem analysis on a translation unit TU associated with the first object, to generate a problem analysis result of the first object after the analysis is completed; and sending the problem analysis result.
  • the first object includes at least one source file.
  • the source file of the first object is associated with the TU, and the TU may include the at least one source file.
  • the TU is used as a granularity, a call relationship graph of all functions included in the first object does not need to be generated, and a plurality of TUs may be simultaneously selected for parallel processing, to improve static analysis efficiency.
  • the source file included in the first object is scanned, to generate a TU queue.
  • the TU queue includes N TUs, N is an integer greater than or equal to 1, and each TU in the TU queue is associated with at least one function in at least one source file.
  • problem analysis is performed, based on the TU queue, on a function associated with the N TUs in the TU queue, to generate the problem analysis result of the first object.
  • the TU in the TU queue may be associated with a function in one source file, or may be associated with functions in a plurality of source files. If the TU is associated with functions in a plurality of source files, a solution in which problem analysis is performed at a granularity of a TU in this application can be applied to a more complex cross-file problem analysis scenario.
  • a first TU in the TU queue is selected, and a first function set associated with the first TU is determined.
  • the first function set includes a function in a source file associated with the first TU. Any function, for example, a first function, in the first function set is analyzed, and an analysis result of the first function is determined.
  • the analysis result of the first function includes at least one of a call relationship of the first function and a quality problem of the first function.
  • a second function set is analyzed, and an analysis result of a function in the second function set is determined.
  • the second function set includes at least one function having the call relationship with the first function (in other words, the function included in the second function set is called by the first function) .
  • At least two TUs may be selected from the TU queue, and synchronous analysis is performed on functions associated with the two TUs.
  • problem analysis is performed by performing parallel processing on the plurality of TUs, to improve static analysis efficiency.
  • the first TU is deleted from the TU queue.
  • the TU queue is maintained in a memory of a system. Therefore, when analysis of the first TU in the TU queue is completed, or when analysis of the first TU in the TU queue is completed and the function associated with the first TU is not called by the function associated with another TU, the first TU is deleted from the TU queue, to reduce memory consumption of the system.
  • a queue in which the first TU is located before the analysis is completed is referred to as a first TU queue
  • the first TU queue is updated to a second TU queue. If a change in the TU queue is presented in a visual interface in a static analysis process, it can be learned that a quantity of TUs in the first TU queue is greater than a quantity of TUs in the second TU queue, because in a process of analyzing the TU in the TU queue, a TU whose analysis is completed is deleted from the TU queue.
  • each TU in the TU queue includes popularity, and the popularity is used to identify a quantity of times that a function in the TU is called, so that when a TU is selected from the TU queue for analysis, the first TU may be selected based on the popularity of the TU in the TU queue, and a popularity identifier of the first TU is higher than a first threshold.
  • a processing priority of each TU in the TU queue may be determined based on a quantity of times that a function associated with each TU in the TU queue is called, so that a function that is associated with the TU and that is called for a large quantity of times is preferentially analyzed.
  • the function associated with the first TU calls another function, for example, the function associated with the first TU is a third function, the third function calls a fourth function, and a TU associated with the fourth function is a second TU.
  • popularity of the second TU in the TU queue is increased.
  • the TU queue in which the first TU is located before the analysis of the first TU is started is referred to as a third TU queue.
  • the third TU queue is updated to the fourth TU queue. It should be understood that, in this process, the analysis of the first TU is not completed. If a change in the TU queue is presented in the visual interface in the static analysis process, it can be learned that popularity of the second TU in the third TU queue is lower than popularity of the second TU in the fourth TU queue.
  • the first TU queue, the second TU queue, the third TU queue, and the fourth TU queue that are involved in the foregoing description are presented as one TU queue in the memory of the system.
  • the first TU queue, the second TU queue, the third TU queue, and the fourth TU queue are used only to distinguish a change in the TU queue in the static analysis process.
  • the popularity of the TU in the TU queue is updated based on popularity of a function associated with the TU in the TU queue.
  • a larger quantity of times that the function associated with the TU is called leads to higher popularity of the TU in the TU queue.
  • a priority of the TU in the TU queue is ranked based on the updated popularity (namely, popularity that is updated as the quantity that the TU is called increases) .
  • the function associated with the first TU includes a second function
  • the function associated with the first TU includes a second function
  • summary information for example, a return value, a function of the function, internal logic, or an environment variable
  • the summary information of the second function is compressed, and the compressed summary information is stored in a lookup table.
  • the summary information of the function is compressed, to further reduce memory consumption of the system.
  • this application provides a static analysis apparatus.
  • the apparatus includes each module configured to perform the method in any one of the first aspect or the possible implementations of the first aspect.
  • this application provides a static analysis device.
  • the apparatus includes a processor, the processor is coupled to a memory, the memory is configured to store a computer program or instructions, and the processor is configured to execute the computer program or the instructions in the memory, so that the device performs the method in any one of the first aspect or the possible implementations of the first aspect.
  • this application provides a static analysis device.
  • the device includes a processor, and the processor is configured to: call a computer program from a memory, and run the computer program, so that the device performs the method in any one of the first aspect or the possible implementations of the first aspect.
  • this application provides a computer-readable storage medium.
  • the computer-readable medium stores program code to be executed by a computing device, and the program code includes instructions used to perform the method in any one of the first aspect or the implementations of the first aspect.
  • this application may provide more implementations through further combination.
  • FIG. 1 is a diagram of a scenario in which a static analysis method provided in this application is used
  • FIG. 2 is a flowchart of a static analysis method according to this application.
  • FIG. 3 is a schematic diagram of a change in a ranking of a TU in a TU queue according to this application;
  • FIG. 4 is a principle flowchart of performing analysis at a granularity of a TU according to this application;
  • FIG. 5 is a schematic diagram of a structure of a static analysis tool according to this application.
  • FIG. 6 is an interaction flowchart of another static analysis method according to this application.
  • FIG. 7 is a schematic block diagram of a static analysis apparatus according to this application.
  • FIG. 8 is a schematic diagram of a structure of another static analysis apparatus according to this application.
  • FIG. 1 is a diagram of a scenario in which a static analysis method provided in this application is used.
  • a user 10 a first device 20, and a static analysis tool 30.
  • the static analysis tool 30 is installed in the first device 20.
  • the user 10 needs to operate the first device 20, and initiate a problem analysis request (for example, the problem analysis request may be used to request to analyze a problem such as security vulnerability, a code error, a null pointer, or the like of the object) to the static analysis tool 30 installed in the first device 20, to run the static analysis tool 30, so that the static analysis tool 30 performs problem analysis on the object, and after the static analysis tool 30 completes analysis, the static analysis tool 30 outputs a problem analysis result, for example, a problem analysis report, to the first device 20.
  • a to-be-analyzed object may be stored in the first device 20 or another device connected to the first device 20.
  • the static analysis tool 30 in this application may be static analysis software, may be a data packet for static analysis, or may be an executable file. This is not limited in this application.
  • the first device 20 in an example in FIG. 1 is a notebook computer, but this example does not constitute a limitation on a protection scope of this application.
  • the first device 20 in this application may be a server, including various servers classified based on a network scale, an architecture, a purpose, an appearance, or the like.
  • the first device 20 may alternatively be an intelligent terminal, for example, a mobile phone, a tablet computer (pad) , a computer having a wireless transceiver function, a virtual reality (VR) terminal, an augmented reality (AR) terminal, a wireless terminal in industrial control, a wireless terminal in self driving, a wireless terminal in TeleMedicine, a wireless terminal in smart grid, a wireless terminal in transportation safety, a wireless terminal in smart city, a wireless terminal in smart home, a cellular phone, a cordless telephone set, a session initiation protocol (SIP) telephone, a wireless local loop (WLL) station, a personal digital assistant (PDA) , a handheld device having a wireless communication function, a computing device or another processing device connected to a wireless modem, a vehicle-mounted device, a wearable device, a terminal in a 5G network, or a terminal in a future evolved network. This is not limited in this application.
  • a mobile phone for example, a mobile phone, a tablet computer (pad)
  • FIG. 2 is a flowchart of a static analysis method according to this application.
  • a method 200 shown in FIG. 2 is performed by a static analysis tool, and the method 200 includes the following steps.
  • Step S210 Receive a first request, where the first request is used to perform problem analysis on a first object.
  • the first object may be a project, software, or a set of source code. This is not limited in this application.
  • Step S220 Perform problem analysis on a translation unit (translation unit, TU) associated with the first object, to generate a problem analysis result of the first object.
  • translation unit translation unit
  • a source file included in the first object is scanned, to generate a TU queue.
  • the TU queue includes N TUs, N is an integer greater than or equal to 1, and each TU in the TU queue is associated with at least one function in at least one source file.
  • Problem analysis is performed, based on the TU queue, on a function associated with the N TUs in the TU queue, to generate a problem analysis result of the first object.
  • one TU in the TU queue includes one or more source files, and one source file includes one or more functions.
  • the first object includes 100 source files, and a TU queue is generated in a scanning process.
  • the TU queue is maintained in a memory.
  • the TU queue may be an array in the memory, or the TU queue may be a data organization manner.
  • the TU queue may include 100 TUs, or may include 10 TUs. It is assumed that the TU queue includes 100 TUs, and based on an equal distribution principle, each TU in the TU queue is associated with a function included in one source file. It is assumed that the TU queue includes 10 TUs, and based on an equal distribution principle, each TU in the TU queue is associated with a function included in 10 source files. Certainly, the equal distribution principle may alternatively not be used. To be specific, each TU in the TU queue is associated with a different quantity of source files, but all the TUs in the TU queue are associated with a total of 100 source files. This is not limited in this application.
  • a TU is established based on a quantity of source files, or a TU may be established in another manner. This is not limited in this application.
  • all source files of the project are scanned once, to generate the TU queue and a correspondence between a TU (each TU is associated with one source file) in the TU queue and a function.
  • each source file has an identifier, and the identifier is used to uniquely identify each source file.
  • a TU 1 in the TU queue is associated with one source file.
  • the TU 1 is associated with a source file #1
  • the source file #1 includes three functions (for example, a function 1, a function 2, and a function 3) . Therefore, three correspondences are generated in the scanning process, to be specific, a correspondence between the source file #1 and the function 1, a correspondence between the source file #1 and the function 2, and a correspondence between the source file #1 and the function 3.
  • a correspondence between the TU 1 and the function 1, a correspondence between the TU 1 and the function 2, and a correspondence between the TU 1 and the function 3 are generated in the scanning process.
  • a correspondence between another TU in the TU queue and a function may be generated by analogy.
  • the finally generated TU queue includes the N TUs and a correspondence between each of the N TUs and a function.
  • a correspondence between a TU and a function may also be described as follows:
  • the TU is associated with the function.
  • the function may be described as a function associated with the TU. That a TU is associated with a source file means that there is a correspondence between a TU and a source file.
  • the source file may be described as a source file associated with the TU. This is not limited in this application.
  • any TU (which may also be referred to as a first TU for ease of description) is selected from the TU queue, and a first function set associated with the first TU is determined.
  • the first function set includes a function in a source file associated with the first TU. Any function, for example, a first function, in the first function set is analyzed, and an analysis result of the first function is determined.
  • the analysis result of the first function includes at least one of a call relationship of the first function and a quality problem of the first function.
  • a second function set is analyzed, and an analysis result of a function in the second function set is determined.
  • the second function set includes at least one function having the call relationship with the first function (in other words, the function included in the second function set is called by the first function) .
  • the TU 1 in the TU queue is selected, the first function set is determined.
  • the function 1, the function 2, and the function 3 are determined.
  • the function 1, the function 2, and the function 3 are analyzed.
  • an analysis result of the foregoing functions is determined.
  • the analysis result includes at least one of a calling relationship of the functions and a quality problem of the functions. If it is assumed that the function 2 calls a function 4 and a function 5 in an analysis process, the function 2 needs to be analyzed after analysis of the second function set (the function 4 and the function 5) is completed.
  • the function 4 and the function 5 may be associated with a same TU, or may be associated with different TUs.
  • analyzing the TU 1 is analyzing all functions associated with the TU 1. If it is found in the analysis process that a function associated with the TU 1 calls another function, the function that calls the another function is placed in a waiting pool. If a function associated with the TU 1 does not call another function, analysis of the function is directly completed. After processing of the function associated with the TU 1 is completed, a next TU may be analyzed.
  • the waiting pool has specific storage space in the memory, and may be an array in the memory. An implementation form of placing the function in the waiting pool may be to store the function by using a queue or a database, or by specifying the storage space. The function placed in the waiting pool may be analyzed after analysis of the called function is completed.
  • the functions associated with the TU 1 include the function 1, the function 2, and the function 3.
  • the function 1 and the function 3 do not call another function, and the function 2 calls the function 4. Therefore, analysis of the function 1 and the function 3 is completed, and the function 2 is placed in the waiting pool. In other words, processing of the TU 1 is completed.
  • another TU may be selected from the TU queue based on popularity for analysis.
  • the function 2 may be analyzed after analysis of the function 4 is completed.
  • a TU when a TU is selected from the TU queue for analysis, sequential selection may be performed, and one TU may be selected for analysis each time, or a plurality of TUs may be simultaneously selected for analysis.
  • a plurality of TUs When a plurality of TUs are simultaneously selected for analysis, static analysis efficiency may be further improved.
  • functions associated with the plurality of TUs may call a same another function.
  • the called same another function can be analyzed once, to reduce redundant analysis.
  • that the function is analyzed may be identified by using identification information.
  • each TU in the TU queue includes popularity, and the popularity is used to identify a quantity of times that a function in a TU associated with the popularity is called.
  • the first TU may be selected based on the popularity of each TU in the TU queue.
  • a popularity identifier (which may be understood as a popularity value) of the first TU is higher than a first threshold.
  • the popularity is a presentation of a weight
  • a value of the popularity is represented by using a popularity value
  • the popularity value is a presentation of a weight value.
  • the popularity value may be identified by using a color or another manner. This is not limited in this application.
  • a TU when a TU is selected from the TU queue, a TU whose popularity value is higher than the first threshold is preferentially selected for analysis.
  • the popularity of each TU in the TU queue may be determined based on a quantity of times or a frequency of calling a function associated with the TU.
  • the popularity of each TU in the TU queue may alternatively be determined based on a space occupation rate of the TU when a function associated with the TU is processed.
  • the popularity value of the first TU is updated once. Therefore, a larger quantity of times that the function associated with the first TU is called indicates a higher popularity value of the first TU.
  • a larger quantity of times that the function associated with the first TU is called indicates a higher priority of the first TU in the TU queue.
  • control flow graph control flow graph
  • abstract syntax tree abstract syntax tree, AST
  • summary information of the function is stored in a lookup table.
  • the summary information of the function may be understood as a description of behavior of the function. If a TU that calls the function exists during subsequent analysis, that the function performs a specific operation in a specific condition to output a specific result may be learned of based on the summary information of the function, to further analyze the TU.
  • the summary information of the function may include one or more of the following: a call parameter of the function, a return value of the function, a function of the function, internal logic of the function, and an environment variable of the function.
  • the summary information of the function is compressed and stored in the lookup table.
  • an existing compression tool may be selected to compress the summary information of the function.
  • the first function and a CFG and an AST that correspond to the first function may be cleared from the memory.
  • the analysis of the first function associated with the first TU is completed and the analysis of all other functions that call the first function is completed, it indicates that the first function is no longer called by any function.
  • the first function and the CFG and the AST that correspond to the first function are cleared from the memory, and only the summary information of the first function is retained.
  • the first TU is cleared from the TU queue.
  • the analysis of all the functions associated with the first TU is completed, and the analysis of all the other functions that call the function associated with the first TU is completed, the first TU is cleared from the TU queue.
  • the first TU is cleared from the TU queue, to reduce memory overheads of the system.
  • Step S230 Send the problem analysis result of the first object.
  • step S220 in a process of selecting a TU from the TU queue for analysis, a change in a ranking of a priority of the TU in the TU queue is shown in FIG. 3.
  • the source file of the first object is associated with the TU, and the TU may include the at least one source file.
  • the TU is used as a granularity, a call relationship graph of all the functions included in the first object does not need to be generated, and a plurality of TUs may be simultaneously selected for parallel processing, to improve static analysis efficiency.
  • a TU or a function whose analysis is completed and that is not called may be cleared from the memory, to reduce memory overheads of the system.
  • FIG. 3 is a schematic diagram of a change in a ranking of a TU in a TU queue according to this application.
  • the TU queue 10 shown in FIG. 3 includes four TUs, and each TU is associated with one source file (the file in FIG. 3 is used as an example, and the same is true of FIG. 4) . It can be understood that each TU includes one source file, and each source file is associated with two functions. It can be understood that each source file includes two functions.
  • a quantity of TUs in the TU queue 10, a quantity of source files included in each TU, and a quantity of functions included in each source file are merely examples. This is not limited in this application.
  • a ranking of a TU in an initial TU queue 10 is a random ranking, and a TU 1, a TU 2, a TU 3, and a TU 4 are respectively ranked from top to bottom.
  • the TU 1 is associated with a source file 1
  • the source file 1 includes a function (function, func) 1 and a function 2
  • the TU 2 is associated with a source file 2
  • the source file 2 includes a function 3 and a function 4
  • the TU 3 is associated with a source file 3
  • the source file 3 includes a function 5 and a function 6
  • the TU 4 includes a source file 4
  • the source file 4 includes a function 7 and a function 8.
  • the TU 1 is randomly selected from the TU queue 10 for analysis.
  • the function 2 in the TU 1 calls the function 5 in the TU 3. Therefore, a popularity value of the TU 3 is increased.
  • all popularity values of the TU 1, the TU 2, the TU 3, and the TU 4 are 0.
  • the function 2 in the TU 1 calls the function 5 in the TU 3, to increase the popularity value of the TU 3, so that the popularity value of the TU 3 is updated to 1.
  • the TU 3 has a highest popularity value in the TU queue 10.
  • the function 5 and the function 6 that correspond to the TU 3 are analyzed.
  • the function 2 and the function 1 are analyzed.
  • An analysis result of the function 5 needs to be used during analysis of the function 2. It should be understood that, after analysis of the functions corresponding to the TU 3 and the TU 1 is completed, if it is determined that no another function calls the function 1, the function 2, the function 5, and the function 6, the TU 3 and the TU 1 are cleared from the TU queue 10. As shown in the figure, after reranking, there are only the TU 2 and the TU 4 in the TU queue 10. In comparison with an initial analysis period, a quantity of TUs in the TU queue 10 is reduced, to reduce memory overheads of the system.
  • FIG. 4 is a principle flowchart of performing analysis at a granularity of a TU according to this application.
  • Step S410 Scan the source file included in the first object, to generate the TU queue 10 and a correspondence between each TU in the TU queue 10 and a function.
  • a TU queue shown in FIG. 4 includes n TUs, and the n TUs are sequentially a TU 1, a TU 2, a TU 3, ..., and a TUn.
  • Each TU is associated with one source file, and each source file includes two functions.
  • a quantity of TUs in the TU queue is similar to that in FIG. 3.
  • a quantity of source files included in each TU and a quantity of functions included in each source file are merely examples. This is not limited in this application.
  • FIG. 4 shows only a correspondence between each of the TU 1, the TU 2, and the TU 3 and each ofa source file and a function.
  • the TU 1 is associated with a source file 1
  • the source file 1 includes a function 1 and a function 2
  • the TU 2 is associated with a source file 2
  • the source file 2 includes a function 3 and a function 4
  • the TU 3 is associated with a source file 3
  • the source file 3 includes a function 5 and a function 6.
  • Step S420 Select a TU from the TU queue 10.
  • a TU may be randomly selected from the TU queue 10, and one or more TUs may be simultaneously selected. If selection is not performed for the first time, a TU may be selected from the TU queue 10 based on the policy of preferentially selecting a TU having a high popularity value (for example, a TU is selected if a popularity value of the TU is higher than a first threshold) , and one or more TUs may be simultaneously selected.
  • Step S430 Analyze a function associated with the selected TU.
  • the selected TU is the TU 1 is used as an example.
  • all functions associated with the TU 1 such as the function 1 and the function 2 are analyzed.
  • a CFG and an AST that correspond to each of the function 1 and the function 2 are generated.
  • Step S440 Further analyze whether the function 1 and the function 2 call another function, where the another function may be an analyzed function, or may be a function that is not analyzed.
  • Step S4410 It is assumed that the function associated with the TU 1 calls another function. For example, the function 2 calls the function 5 in the TU 3.
  • Step S4411 Search the TU queue 10 for a TU corresponding to the function 5, where it is assumed that the TU corresponding to the function 5 is the TU 3.
  • Step S4412 Increase a popularity value of the TU 3, so that the TU 3 is ranked the first in the TU queue 10.
  • step S4413 the function 2 associated with the TU is placed in a waiting pool 20 for analysis.
  • the waiting pool 20 has specific storage space in a memory, and may be an array in the memory.
  • An implementation form of placing the function 2 in the waiting pool 20 may be to store the function 2 by using a queue or a database, or by specifying the storage space.
  • the function 2 in the waiting pool 20 may be notified that the analysis of the function 5 is completed, and the function 2 may be further analyzed.
  • both the function 2 and the function 5 may be placed in the waiting pool 20 for analysis, or all functions, namely, the function 2, the function 5, and the function 6 associated with TUs corresponding to the function 2 and the function 5 are placed in the waiting pool 20 for analysis. This is not limited in this application.
  • Step S4420 It is assumed that the function associated with the TU 1 does not call another function. For example, the function 1 does not call another function.
  • Step S4421 Create summary information of the function 1. It should be understood that the created summary information is not a value of specific summary information, and the value of the specific summary information needs to be filled in after the analysis of the function is completed.
  • Step S4422 Analyze the function 1, and after the analysis of the function 1 is completed, fill the created summary information with the value of the specific summary information, and store the summary information of the function 1 in a lookup table.
  • the summary information of the function 1 may include one or more of the following: a call parameter of the function 1, a return value of the function 1, a function of the function 1, internal logic of the function 1, and an environment variable of the function 1.
  • Step S4423 Analyze whether the function 1 is called, and if no function calls the function 1, clear the function 1 and a CFG and an AST that are generated in a process of analyzing the function 1.
  • Step S4424 If a function calls the function 1, the function that calls the function 1 in the waiting pool is notified that the analysis of the function 1 is completed, and a next step may be performed, in other words, the function that calls the function 1 is analyzed.
  • one or more TUs are selected based on the policy of preferentially selecting a TU having a high popularity value. If a function associated with the TU is no longer called during analysis of the function, or analysis of each function that calls the function is completed, after the analysis of the function is completed, the function and a CFG and an AST that are generated in a process of analyzing the function are cleared from the memory, and only the summary information of the function is retained.
  • the TU is cleared from the TU queue.
  • the quantity of TUs in the TU queue gradually decreases. Until analysis of all the TUs in the TU queue is completed, problem analysis of the first object is completed.
  • the static analysis method provided in this application is to analyze a system project or a type of problem (for example, a null pointer problem) in software.
  • the method 200 is performed by a static analysis tool in software.
  • the static analysis tool may be static analysis software, a data packet for static analysis, or an executable file.
  • the static analysis tool 300 may be divided based on functional modules, and may include a code compilation module, a data conversion module, a software processing module, and a problem analysis module.
  • the code compilation module is configured to compile code of a problem analysis object, to generate a required AST and CFG.
  • the code compilation module includes an AST coding module and a CFG compilation module.
  • the problem analysis object is a static analysis object, and may be a system project or software. This is not limited in this application.
  • the code compilation module may be Clang or a Z3 solver.
  • Clang and Z3 are public libraries. In other words, code of a platform of an existing analysis tool is used for the code compilation module.
  • Clang is used to compile, analyze, and generate the required AST and CFG.
  • Clang is an existing executable file.
  • An input of Clang is a source file of C/C++, and an output is an AST and a CFG that are generated through parsing by using Clang.
  • Z3 is an open source constraint solver produced by Microsoft, and can resolve a constraint solving problem in many cases.
  • An input of Z3 is a series of propositional equations, and an output is a solution that satisfies an equation.
  • code compilation module in the static analysis tool may alternatively be another public library
  • Z3 solver is also one of optional solvers in a symbol execution process, and may be replaced with another solver.
  • Clang and Z3 are used as examples in the following description. This is not limited in this application.
  • the data conversion module is configured to convert an output of the code compilation module into a self-defined data structure, and includes an AST conversion module and a CFG conversion module.
  • an AST and a CFG that are generated by Clang are converted into an AST and a CFG of an independent data format, so that a static analysis engine does not need to excessively rely on a library function of the code compilation module.
  • the software processing module is a core algorithm module of the static analysis method provided in this application, may also be referred to as a static analysis engine, and may be a segment of code that may be invoked.
  • the software processing module includes an AST processing module and a DFA processing module.
  • the software processing module calls the code compilation module by using the data conversion module, to analyze a file of the problem analysis object.
  • the software processing module calls underlying Clang by using the data conversion module, to analyze the file, and converts the source file into the AST and the CFG that are output by Clang. Then the data is parsed by the data conversion module, to generate a data structure defined by the software processing module.
  • the problem analysis module performs analysis and a check based on the data structure provided by the software processing module, for example, a null pointer check, and a variable initialization check.
  • the problem analysis module includes an AST analysis module and a DFA analysis module.
  • the AST analysis module traverses all nodes of the AST to check whether there is a potential quality problem, or the DFA analysis module traverses all branches of the CFG to determine, in a symbolic execution method, whether logic of a branch is valid.
  • the symbol execution method is used to determine whether a branch is reachable.
  • the symbolic execution method described herein is specifically to obtain an equation by performing a conjunction on all logical expressions representing a condition on a branch (the branch may be understood as a path) , and the equation is used as an input of Z3 to obtain a solution.
  • the conjunction is to connect a plurality of logical expressions by using logical "and" . Only when each expression is valid, a final conjunction equation is valid.
  • FIG. 6 is an interaction flowchart of another static analysis method according to this application. It can be learned from FIG. 5 that a static analysis tool includes a code compilation module, a data conversion module, a software processing module, and a problem analysis module. Functions of the code compilation module and the data conversion module are to perform data conversion on a source file of an analysis object, so that the software processing module performs specific analysis based on data obtained after conversion is performed on the source file, and the problem analysis module performs problem a problem check based on analysis performed by the software processing module, so as to output a problem analysis result. Because the code compilation module and the data conversion module do not perform a specific analysis process, FIG. 6 only shows an interaction diagram between the problem analysis module and the software processing module.
  • the method 200 is described from a perspective of a functional module included in an execution body (the static analysis tool 300) in the method 200.
  • the method 300 shown in FIG. 6 includes the following steps.
  • Step S310 The problem analysis module sends a first request to the software processing module, where the first request is used to perform problem analysis on a first object.
  • the software processing module receives the first request.
  • the first object may be a project or software. This is not limited in this application.
  • the problem analysis module receives a first request from an application or a server.
  • the first request may correspond to the first request in step S210.
  • the first request may be an operation performed by a user on a first device.
  • the problem analysis module forwards the first request to the software processing module, so that the software processing module starts to initiate a task.
  • Step S320 The software processing module performs problem analysis on a TU associated with the first object, to generate a problem analysis result of the first object.
  • the software processing module scans a source file included in the first object, to generate a TU queue.
  • the TU queue includes N TUs, N is an integer greater than or equal to 1, and each TU in the TU queue is associated with at least one function in at least one source file.
  • the software processing module performs, based on the TU queue, problem analysis on a function associated with the N TUs in the TU queue, to generate the problem analysis result of the first object.
  • one TU in the TU queue is associated with one or more source files, and one source file includes one or more functions.
  • the first object includes 100 source files.
  • the software processing module generates the TU queue in a scanning process.
  • the TU queue is maintained in a memory (the TU queue may be an array in the memory, or the TU queue may be a data organization manner) .
  • the TU queue may include 100 TUs, or may include 10 TUs. It is assumed that the TU queue includes 100 TUs, and based on an equal distribution principle, each TU in the TU queue is associated with a function included in one source file. It is assumed that the TU queue includes 10 TUs, and based on an equal distribution principle, each TU in the TU queue is associated with a function included in 10 source files. Certainly, the equal distribution principle may alternatively not be used. To be specific, each TU in the TU queue is associated with a different quantity of source files, but the TUs in the TU queue are associated with a total of 100 source files. This is not limited in this application.
  • the software processing module in a process of generating the TU queue, the software processing module establishes a TU based on a quantity of source files.
  • the software processing module may alternatively establish the TU in another manner. This is not limited in this application.
  • the software processing module scans all source files of the project once, to generate the TU queue and a correspondence between a TU (the TU is associated with one source file) in the TU queue and a function.
  • the software processing module in the scanning process, the software processing module generates an identifier of each source file, and is configured to uniquely identify each source file. It is assumed that a TU 1 in the TU queue is associated with one source file. For example, the TU 1 is associated with a file #1, the file #1 includes three functions (for example, a function 1, a function 2, and a function 3) . Therefore, correspondences between the file #1 and the three functions, to be specific, a correspondence between the file #1 and the function 1, a correspondence between the file #1 and the function 2, and a correspondence between the file #1 and the function 3 are generated in the scanning process.
  • the TU 1 is associated with a file #1
  • the file #1 includes three functions (for example, a function 1, a function 2, and a function 3) . Therefore, correspondences between the file #1 and the three functions, to be specific, a correspondence between the file #1 and the function 1, a correspondence between the file #1 and the function 2, and a correspondence between the file #1 and the function 3 are generated in the scanning process.
  • a correspondence between the TU 1 and the function 1, a correspondence between the TU 1 and the function 2, and a correspondence between the TU 1 and the function 3 are generated in the scanning process. It should be understood that a correspondence between another TU in the TU queue and a function may be generated by analogy.
  • the finally generated TU queue includes the N TUs and a correspondence between each of the N TUs and a function.
  • the software processing module when the software processing module performs, based on the TU queue, problem analysis on the function associated with the N TUs in the TU queue, the software processing module selects a TU from the TU queue, and analyzes a function associated with the TU. Until analysis of all the TUs in the TU queue is completed, it may be considered that the software processing module completes analysis of the first object.
  • the software processing module selects any TU (which may also be referred to as a first TU for ease of description) from the TU queue, and determines a first function set associated with the first TU, where the first function set includes a function in a source file associated with the first TU; analyzes any function, for example, a first function, in the first function set, and determines an analysis result of the first function, where the analysis result of the first function includes at least one of a call relationship of the first function and a quality problem of the first function; when there is a function having a call relationship with the first function (for example, the first function calls another function) , analyzes a second function set, and determines an analysis result of a function in the second function set, where the second function set includes at least one function having the call relationship with the first function (in other words, the function included in the second function set is called by the first function) ; and when the analysis of all the TUs in the TU queue is completed, generates the problem
  • the software processing module selects the TU 1 in the TU queue, determines the first function set, to be specific, determines the function 1, the function 2, and the function 3, and then analyzes the function 1, the function 2, and the function 3.
  • the analysis result includes at least one of a calling relationship of the functions and a quality problem of the functions. If it is assumed that the function 2 calls a function 4 and a function 5 in the analysis process, the function 2 needs to be analyzed after analysis of the function 4 and the function 5 is completed.
  • the function 4 and the function 5 may be associated with a same TU, or may be associated with different TUs.
  • analyzing the TU 1 is analyzing all functions associated with the TU 1. If it is found in the analysis process that a function associated with the TU 1 calls another function, the function that calls the another function is placed in a waiting pool. If a function associated with the TU 1 does not call another function, analysis of the function is directly completed. After processing of the function associated with the TU 1 is completed, a next TU may be analyzed. The function placed in the waiting pool may be analyzed after analysis of the called function is completed.
  • the functions associated with the TU 1 include the function 1, the function 2, and the function 3.
  • the function 1 and the function 3 do not call another function, and the function 2 calls the function 4. Therefore, analysis of the function 1 and the function 3 is completed, and the function 2 is placed in the waiting pool. In other words, processing of the TU 1 is completed.
  • another TU may be selected from the TU queue based on popularity for analysis.
  • the function 2 may be analyzed after the analysis of the function 4 is completed.
  • the software processing module may perform sequential selection, and select one TU for analysis each time, or may simultaneously select a plurality of TUs for analysis.
  • the software processing module simultaneously selects a plurality of TUs for analysis, static analysis efficiency may be further improved.
  • functions associated with the plurality of TUs may call a same another function.
  • the called same another function can be analyzed once, to reduce redundant analysis.
  • that the function is analyzed may be identified by using identification information.
  • each TU in the TU queue includes popularity, and the popularity is used to identify a quantity of times that a function in a TU associated with the popularity is called.
  • the software processing module selects the first TU from the TU queue, the software processing module may select the first TU based on the popularity of each TU in the TU queue.
  • a popularity identifier (which may be understood as a popularity value) of the first TU is higher than a first threshold.
  • the software processing module when selecting a TU from the TU queue, preferentially selects a TU with a high popularity value for analysis.
  • the popularity of each TU in the TU queue may be determined based on a quantity of times or a frequency of calling a function associated with each TU, and the popularity of each TU in the TU queue may alternatively be determined based on a space occupation rate of the TU when the function associated with the TU is processed.
  • the popularity value of the first TU is updated once. Therefore, a larger quantity of times that the function associated with the first TU is called indicates a higher popularity value of the first TU.
  • a larger quantity of times that the function associated with the first TU is called indicates a higher priority of the first TU in the TU queue.
  • the software processing module correspondingly generates a control flow graph (control flow graph, CFG) and an abstract syntax tree (abstract syntax tree, AST) corresponding to the function, and stores summary information of the function in a lookup table.
  • the summary information of the function may alternatively be understood as a description of behavior of the function. If a TU that calls the function exists during subsequent analysis, that the function performs a specific operation in a specific condition to output a specific result may be learned of based on the summary information of the function, to further analyze the TU.
  • the summary information of the function may include one or more of the following: a call parameter of the function, a return value of the function, a function of the function, internal logic of the function, and an environment variable of the function.
  • the software processing module compresses the summary information of the function and stores the summary information of the function in the lookup table.
  • the software processing module includes a compression module for compressing the summary information of a function.
  • the software processing module may clear, from the memory, the first function and a CFG and an AST that correspond to the first function.
  • the software processing module clears, from the memory, the first function and the CFG and the AST that correspond to the first function, and retains only the summary information of the first function.
  • the software processing module clears the first TU from the TU queue.
  • the software processing module clears the first TU from the TU queue.
  • the first TU is cleared from the TU queue, to reduce memory overheads of the system.
  • Step S330 The software processing module sends a processing result of the first object to the problem analysis module, and correspondingly, the problem analysis module receives the processing result.
  • the software processing module After completing analysis of the TU in the TU queue, that is, after clearing the TU in the TU queue, the software processing module outputs a processing result to the problem analysis module.
  • the processing result may include data obtained after processing performed by an AST processing module and a DFA processing module.
  • Step S340 The problem analysis module performs problem analysis on the first object based on the processing result.
  • analysis is performed based on the processing result output by the software processing module. For example, all nodes of the AST are traversed to check whether there is a potential problem (for example, a null pointer problem or a variable initialization problem) , or all branches of the CFG are traversed to determine, in a symbolic execution method, whether logic of a branch is valid.
  • a potential problem for example, a null pointer problem or a variable initialization problem
  • the problem analysis module After completing problem analysis of the first object based on the processing result, the problem analysis module needs to output the problem analysis result (corresponding to the problem analysis result in step S230 in the method 200) , the analysis result may be a problem analysis report. It is assumed that for the null pointer problem, content of the problem analysis report may be specifically learning, through analysis, that there is a null pointer problem at a specific location at a specific line of code in a specific source file of the object. This is not limited in this application.
  • the source file of the first object is associated with the TU, and the TU may include the at least one source file.
  • the TU is used as a granularity, a call relationship graph of all the functions included in the first object does not need to be generated, and a plurality of TUs may be simultaneously selected for parallel processing, to improve static analysis efficiency.
  • a TU or a function whose analysis is completed and that is not called may be cleared from the memory, to reduce memory overheads of the system.
  • a static analysis apparatus described below includes a corresponding hardware structure and/or software module for executing each function.
  • a person skilled in the art should be aware that, units and algorithm steps in the examples described with reference to the embodiments disclosed in this specification can be implemented by hardware or a combination of hardware and computer software in this application. Whether the functions are performed by hardware or hardware driven by computer software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.
  • the static analysis apparatus may be divided into functional modules based on the foregoing method examples.
  • each functional module may be obtained through division for a corresponding function, or two or more functions may be integrated into one processing module.
  • the integrated module may be implemented in a form of hardware, or may be implemented in a form of a software functional module. It should be noted that, in this application, division into the modules is an example and is merely logical function division, and may be other division in an actual implementation. An example in which each functional module is obtained through division based on a corresponding function is used below for description.
  • FIG. 7 is a schematic block diagram of a static analysis apparatus according to this application.
  • a static analysis apparatus 400 includes a receiving module 410, a processing module 420, and a sending module 430.
  • the receiving module 410 may be configured to receive information sent from the outside, the receiving module 410 may be a receiving module in the problem analysis module 310, the processing module 420 is configured to process data inside the static analysis apparatus, an operation performed by the processing module 420 may correspond to an operation performed by the problem analysis module 310 and the software processing module 320 in the static analysis tool 300, the sending module 430 may be configured to send information to the outside, and the sending module 430 may be a sending module in the problem analysis module 310.
  • the static analysis apparatus 400 in this application may be implemented by using a central processing unit (CPU) , may be implemented by using an application-specific integrated circuit (ASIC) , or may be implemented by using a programmable logic device (PLD) .
  • the PLD may be a complex programmable logic device (CPLD) , a field-programmable array gate (FPGA) , a general array logic (GAL) , or any combination thereof.
  • CPLD complex programmable logic device
  • FPGA field-programmable array gate
  • GAL general array logic
  • the static analysis apparatus 400 and each module thereof may alternatively be a software module.
  • the static analysis apparatus 400 may further include a storage module 440.
  • the storage module 440 may be configured to store instructions and/or data generated in a processing process, and the processing module 420 may read the instructions and/or data in the storage module 440.
  • the receiving module 410 is configured to receive a first request.
  • the first request is used to perform problem analysis on a first object, and the first object includes at least one source file.
  • the processing module 420 is configured to perform problem analysis on a translation unit TU associated with the first object, to generate a problem analysis result of the first object.
  • the sending module 430 is configured to send the problem analysis result.
  • the processing module 420 is specifically configured to: scan the source file included in the first object, to generate a TU queue, where the TU queue includes N TUs, each TU in the TU queue is associated with at least one function in one source file, and N is an integer greater than or equal to 1; and perform, based on the TU queue, problem analysis on a function associated with the N TUs in the TU queue, to generate the problem analysis result of the first object.
  • the processing module 420 is specifically configured to: select a first TU in the TU queue, and determine a first function set associated with the first TU, where the first function set includes at least one function in a source file associated with the first TU; analyze a first function, and determine an analysis result of the first function, where the first function is any function in the first function set, and the analysis result of the first function includes at least one of a call relationship of the first function and a quality problem of the first function;
  • the processing module 420 is specifically configured to delete the first TU from the TU queue after analysis of all functions associated with the first TU in the TU queue is completed.
  • each TU in the TU queue includes popularity
  • the popularity is used to identify a quantity of times that a function in a TU associated with the popularity is called
  • the processing module 420 is specifically configured to select the first TU based on the popularity of the TU in the TU queue, where popularity of the first TU is a TU that meets a first condition in the TU queue, and the first condition includes that a popularity identifier of the first TU is higher than a first threshold.
  • the processing module 420 is specifically configured to: update the popularity of the TU in the TU queue based on popularity of a function associated with the TU in the TU queue; and rank the TU in the TU queue based on the updated popularity.
  • the static analysis apparatus 400 may be deployed in a server, so that when the server runs, the method performed by the static analysis tool in the foregoing method embodiments may be implemented.
  • the static analysis apparatus 400 may alternatively be deployed in a cloud environment.
  • the cloud environment is an entity that provides a cloud service for a user by using a basic resource in a cloud computing mode.
  • the cloud environment includes a cloud data center and a cloud service platform.
  • the cloud data center includes a large quantity of basic resources (including compute resources, storage resources, and network resources) owned by a cloud service provider.
  • the compute resources included in the cloud data center may be a large quantity of computing devices (for example, servers) .
  • the static analysis apparatus 400 may be a server for static analysis in the cloud data center.
  • the static analysis apparatus 400 may alternatively be a virtual machine created in the cloud data center for static analysis.
  • the static analysis apparatus 400 may alternatively be a software apparatus deployed on the server or the virtual machine in the cloud data center.
  • the software apparatus is configured to perform static analysis.
  • the software apparatus may be deployed on a plurality of servers in a distributed manner, deployed on a plurality of virtual machines in a distributed manner, or deployed on the virtual machine and the server in a distributed manner.
  • the module 410, the module 420, and the module 430 in the static analysis apparatus 400 may be deployed on a plurality of servers in a distributed manner, deployed on a plurality of virtual machines in a distributed manner, or deployed on the virtual machine and the server in a distributed manner.
  • the plurality of submodules may be deployed on a plurality of servers, deployed on a plurality of virtual machines in a distributed manner, or deployed on the virtual machine and the server in a distributed manner.
  • the static analysis apparatus 400 may be abstracted by the cloud service provider into a cloud service of static analysis on the cloud service platform, to provide the cloud service to the user. After the user purchases the cloud service on the cloud service platform, the cloud environment provides the cloud service of static analysis for the user by using the cloud service.
  • the user may upload an analysis object by using an application programming interface (API) or by using a web page interface provided by the cloud service platform, and the apparatus 400 performs static analysis and outputs an analysis result.
  • API application programming interface
  • the static analysis apparatus 400 When the static analysis apparatus 400 is a software apparatus, the static analysis apparatus 400 may be independently deployed on a computing device in any environment. When the static analysis apparatus 400 is hardware, the static analysis apparatus 400 may be a computing device or a chip.
  • the static analysis device 500 includes one or more processors 501.
  • the processor 501 is configured to execute a computer program or instructions stored in a memory 504 and/or data, so that the method in the foregoing method embodiments is performed.
  • the processor 501 is coupled to the memory 504, or the memory 504 and the processor 501 may be disposed separately.
  • the memory 504 may also be referred to as a memory unit, and stores executable code, for example, stores the computer program or instructions and/or the data.
  • the memory 504 provides the instructions and the data for the processor 501.
  • the memory 504 may further include a software module required for another running process such as an operating system.
  • the memory 504 includes a kernel, a program, a file generation module, a file transmission module, a file obtaining module, and a decoding module.
  • the processor 501 may be one or more CPUs, or the processor 501 may be another general-purpose processor, a digital signal processor (DSP) , an application-specific integrated circuit (ASIC) , a field programmable gate array (FPGA) or another programmable logic device, a discrete gate or a transistor logic device, a discrete hardware component, or the like.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • the general-purpose processor may be a microprocessor, any conventional processor, or the like.
  • the memory 504 may be a volatile memory or a nonvolatile memory, or may include both a volatile memory and a nonvolatile memory.
  • the nonvolatile memory may be a read-only memory (ROM) , a programmable read-only memory (programmable ROM, PROM) , an erasable programmable read-only memory (erasable PROM, EPROM) , an electrically erasable programmable read-only memory (electrically EPROM, EEPROM) , or a flash memory.
  • the volatile memory may be a random access memory (RAM) that is used as an external cache.
  • RAMs may be used, for example, a static random access memory (static RAM, SRAM) , a dynamic random access memory (DRAM) , a synchronous dynamic random access memory (synchronous DRAM, SDRAM) , a double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM) , an enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM) , a synchlink dynamic random access memory (synchlink DRAM, SLDRAM) , and a direct rambus random access memory (direct rambus RAM, DR RAM) .
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • synchlink dynamic random access memory synchlink dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the static analysis device 500 may further include a storage medium 505.
  • the storage medium 505 may be configured to store data generated in a process of performing the foregoing method embodiments.
  • the storage medium 505 may be a magnetic tape, a hard disk, for example, a solid state drive (solid state drive, SSD) , a floppy disk, or the like.
  • the static analysis device 500 may further include a communications interface 506.
  • the communications interface 506 is configured to receive and/or send a signal and/or data.
  • the processor 501 may be configured to control the communications interface 506 to receive and/or send a signal and/or data.
  • the static analysis device 500 may further include an output device 502 and an input device 503.
  • the input device 503 may input, to the static analysis device 500, an object on which problem analysis needs to be performed. After completing analysis of the object, the static analysis device 500 may output a problem analysis result to the output device 502.
  • the static analysis device 500 may further include a bus 507.
  • the bus is a public communication trunk line for transferring information between various functional components of a computer, and is a common channel for transferring information between a CPU, a memory, an input device, and an output device. Components of a host are connected by using the bus. An external device is connected to the bus through a corresponding interface circuit, to form a computer hardware system.
  • a bus of the computer may be divided into a data bus, an address bus, and a control bus, and the data bus, the address bus, and the control bus are respectively used to transmit data, a data address, and a control signal.
  • various buses in FIG. 8 are marked as the bus 507.
  • the static analysis device 500 is configured to implement an operation performed by the static analysis tool in the foregoing method embodiments.
  • the processing module 420 in the static analysis apparatus 400 shown in FIG. 7 may be the processor 501 in FIG. 8
  • the receiving module 410 and the sending module 430 may be the communications interface 506 in FIG. 8
  • the storage module 440 may be the memory 504 in FIG. 8.
  • the processor 501 specifically refer to the foregoing description of the processing module 420.
  • the communications interface 506 refer to the description of the receiving module 410 and the sending module 430.
  • the memory 504 refer to the description of the storage module 440. Details are not described herein again.
  • This application further provides a computer-readable storage medium, and the computer-readable storage medium stores instructions used to implement the method performed by the static analysis tool in the foregoing method embodiments.
  • the computer-readable storage medium stores a computer program or instructions, and when the computer program or instructions are executed by a computer, the computer may be enabled to implement the method performed by the static analysis tool in the foregoing method embodiments.
  • the static analysis device may be a server, and a static analysis tool is installed on the static analysis device, so that when the static analysis device runs, the method performed by the static analysis tool in the foregoing method embodiments can be implemented.
  • All or some of the foregoing embodiments may be implemented by software, hardware, firmware, or any combination thereof.
  • the software is used to implement the embodiments, all or some of the foregoing embodiments may be implemented in a form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or another programmable apparatus.
  • the computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, or microwave) manner.
  • the computer-readable storage medium may be any usable medium accessible by the computer, or a data storage device, for example, a server or a data center, integrating one or more usable media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape) , an optical medium (for example, a DVD) , or a semiconductor medium.
  • the semiconductor medium may be a solid-state drive (solid-state drive, SSD) .

Abstract

A static analysis method is provided, including: receiving a first request, where the first request is used to perform problem analysis on a first object; performing problem analysis on a translation unit TU associated with the first object, to generate a problem analysis result of the first object after the analysis is completed; and sending the problem analysis result. The first object includes at least one source file. Therefore, a method of performing problem analysis on the first object at a granularity of a TU is implemented, to improve static analysis efficiency.

Description

STATIC ANALYSIS METHOD, APPARATUS, AND DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
This application claims priority to Russian Patent Application No. RU2021124956, filed on August 24, 2021, which is hereby incorporated by reference in its entirety.
TECHNICAL FIELD
This application relates to the field of computer technologies, and more specifically, to a static analysis method, apparatus, and device, and a computer-readable storage medium in a software detection technology.
BACKGROUND
A static analysis technology is a technology of analyzing software program code without executing a program. A key function of the static analysis technology is to check whether a representation and a description of software are consistent and whether there is a conflict or ambiguity. As a type of white-box testing, the static analysis technology plays an important role in testing quality of the software program code.
However, with rapid development of enterprise-level software (for example, storage software) and cloud software, today′s software products presents a trend of a larger scale and a more complex structure. However, a limitation of an existing static analysis technology is gradually revealed. A storage scenario is used as an example. One storage system may include a plurality of subsystems. A scale of source code of a single subsystem is up to 5 million lines. In the existing static analysis technology, analysis is usually performed in a line-by-line code scanning manner. Consequently, analysis processing efficiency is low in an entire analysis process. Therefore, how to provide an efficient static analysis method becomes a technical problem to be resolved as soon as possible.
SUMMARY
This application provides a static analysis method, apparatus, and device, and a computer-readable storage medium, to improve static analysis efficiency.
According to a first aspect, a static analysis method is provided. The method may be performed by a static analysis tool, for example, static analysis software, or may be performed by a device in which the static analysis tool is installed. This is not limited in this application. The method includes: receiving a first request, where the first request is used to perform problem analysis on a first object; performing problem analysis on a translation unit TU associated with the first object, to generate a problem analysis result of the first object after the analysis is completed; and sending the problem analysis result. The first object includes at least one source file.
Based on the foregoing solution, in the static analysis method provided in this application, during problem analysis of the first object, the source file of the first object is associated with the TU, and the TU may include the at least one source file. During specific problem analysis, the TU is used as a granularity, a call relationship graph of all functions included in the first object does not need to be generated, and a plurality of TUs may be simultaneously selected for parallel processing, to improve static analysis efficiency.
In a possible implementation, the source file included in the first object is scanned, to generate a TU queue. The TU queue includes N TUs, N is an integer greater than or equal to 1, and each TU in the TU queue is associated with at least one function in at least one source file. Specifically, problem analysis is performed, based on the TU queue, on a function associated with the N TUs in the TU queue, to generate the problem analysis result of the first object.
Based on the foregoing solution, the TU in the TU queue may be associated with a function in one source file, or may be associated with functions in a plurality of source files. If the TU is associated with functions in a plurality of source files, a solution in which problem analysis is performed at a granularity of a TU in this application can be applied to a more complex cross-file problem analysis scenario.
With reference to the first aspect, in some implementations of the first aspect, during analysis of the first object, a first TU in the TU queue is selected, and a first function set associated with the first TU is determined. The first function set includes a function in a source  file associated with the first TU. Any function, for example, a first function, in the first function set is analyzed, and an analysis result of the first function is determined. The analysis result of the first function includes at least one of a call relationship of the first function and a quality problem of the first function. When there is a function having a call relationship with the first function (for example, the first function calls another function) , a second function set is analyzed, and an analysis result of a function in the second function set is determined. The second function set includes at least one function having the call relationship with the first function (in other words, the function included in the second function set is called by the first function) . When analysis of all the TUs in the TU queue is completed, the problem analysis result of the first object is generated.
In another possible implementation, during analysis of the first object, at least two TUs may be selected from the TU queue, and synchronous analysis is performed on functions associated with the two TUs.
Based on the foregoing solution, problem analysis is performed by performing parallel processing on the plurality of TUs, to improve static analysis efficiency.
In another possible implementation, after analysis of all functions associated with the first TU in the TU queue is completed, the first TU is deleted from the TU queue.
Based on the foregoing solution, the TU queue is maintained in a memory of a system. Therefore, when analysis of the first TU in the TU queue is completed, or when analysis of the first TU in the TU queue is completed and the function associated with the first TU is not called by the function associated with another TU, the first TU is deleted from the TU queue, to reduce memory consumption of the system.
In another possible implementation, if a queue in which the first TU is located before the analysis is completed is referred to as a first TU queue, as analysis of the first TU is completed, the first TU queue is updated to a second TU queue. If a change in the TU queue is presented in a visual interface in a static analysis process, it can be learned that a quantity of TUs in the first TU queue is greater than a quantity of TUs in the second TU queue, because in a process of analyzing the TU in the TU queue, a TU whose analysis is completed is deleted from the TU queue.
In another possible implementation, each TU in the TU queue includes popularity, and the popularity is used to identify a quantity of times that a function in the TU is called, so  that when a TU is selected from the TU queue for analysis, the first TU may be selected based on the popularity of the TU in the TU queue, and a popularity identifier of the first TU is higher than a first threshold.
Based on the foregoing solution, a processing priority of each TU in the TU queue may be determined based on a quantity of times that a function associated with each TU in the TU queue is called, so that a function that is associated with the TU and that is called for a large quantity of times is preferentially analyzed.
In another possible implementation, during analysis of the first TU, if it is found that the function associated with the first TU calls another function, for example, the function associated with the first TU is a third function, the third function calls a fourth function, and a TU associated with the fourth function is a second TU. In this case, popularity of the second TU in the TU queue is increased. It is assumed that the TU queue in which the first TU is located before the analysis of the first TU is started is referred to as a third TU queue. When it is determined that the function associated with the first TU calls another function, the third TU queue is updated to the fourth TU queue. It should be understood that, in this process, the analysis of the first TU is not completed. If a change in the TU queue is presented in the visual interface in the static analysis process, it can be learned that popularity of the second TU in the third TU queue is lower than popularity of the second TU in the fourth TU queue.
It should be understood that, the first TU queue, the second TU queue, the third TU queue, and the fourth TU queue that are involved in the foregoing description are presented as one TU queue in the memory of the system. The first TU queue, the second TU queue, the third TU queue, and the fourth TU queue are used only to distinguish a change in the TU queue in the static analysis process.
In another possible implementation, the popularity of the TU in the TU queue is updated based on popularity of a function associated with the TU in the TU queue. In other words, a larger quantity of times that the function associated with the TU is called leads to higher popularity of the TU in the TU queue. A priority of the TU in the TU queue is ranked based on the updated popularity (namely, popularity that is updated as the quantity that the TU is called increases) .
In another possible implementation, specifically, during analysis of the first TU, if the function associated with the first TU includes a second function, after analysis of the second  function is completed, it is determined whether analysis of a function that calls the second function is completed. If analysis of each function that calls the second function is completed, the second function and data generated in a process of analyzing the second function are cleared from the memory.
Based on the foregoing solution, some data that is not required for subsequent analysis is deleted from the memory, to avoid a memory expansion problem caused by further static analysis, in other words, reduce memory consumption of the system.
In another possible implementation, specifically, during analysis of the first TU, if the function associated with the first TU includes a second function, after analysis of the second function is completed, summary information (for example, a return value, a function of the function, internal logic, or an environment variable) of the second function is generated, the summary information of the second function is compressed, and the compressed summary information is stored in a lookup table.
Based on the foregoing solution, the summary information of the function is compressed, to further reduce memory consumption of the system.
According to a second aspect, this application provides a static analysis apparatus. The apparatus includes each module configured to perform the method in any one of the first aspect or the possible implementations of the first aspect.
According to a third aspect, this application provides a static analysis device. The apparatus includes a processor, the processor is coupled to a memory, the memory is configured to store a computer program or instructions, and the processor is configured to execute the computer program or the instructions in the memory, so that the device performs the method in any one of the first aspect or the possible implementations of the first aspect.
According to a fourth aspect, this application provides a static analysis device. The device includes a processor, and the processor is configured to: call a computer program from a memory, and run the computer program, so that the device performs the method in any one of the first aspect or the possible implementations of the first aspect.
According to a fifth aspect, this application provides a computer-readable storage medium. The computer-readable medium stores program code to be executed by a computing device, and the program code includes instructions used to perform the method in any one of the first aspect or the implementations of the first aspect.
Based on the implementations provided in the foregoing aspects, this application may provide more implementations through further combination.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 is a diagram of a scenario in which a static analysis method provided in this application is used;
FIG. 2 is a flowchart of a static analysis method according to this application;
FIG. 3 is a schematic diagram of a change in a ranking of a TU in a TU queue according to this application;
FIG. 4 is a principle flowchart of performing analysis at a granularity of a TU according to this application;
FIG. 5 is a schematic diagram of a structure of a static analysis tool according to this application;
FIG. 6 is an interaction flowchart of another static analysis method according to this application;
FIG. 7 is a schematic block diagram of a static analysis apparatus according to this application; and
FIG. 8 is a schematic diagram of a structure of another static analysis apparatus according to this application.
DESCRIPTION OF EMBODIMENTS
The following describes the technical solutions in this application with reference to the accompanying drawings.
FIG. 1 is a diagram of a scenario in which a static analysis method provided in this application is used. As shown in the figure, a user 10, a first device 20, and a static analysis tool 30.The static analysis tool 30 is installed in the first device 20. When the user 10 needs to perform static analysis on an object (a project or software) , the user 10 needs to operate the first device 20, and initiate a problem analysis request (for example, the problem analysis request may be used to request to analyze a problem such as security vulnerability, a code error, a null pointer, or the like of the object) to the static analysis tool 30 installed in the first device 20, to  run the static analysis tool 30, so that the static analysis tool 30 performs problem analysis on the object, and after the static analysis tool 30 completes analysis, the static analysis tool 30 outputs a problem analysis result, for example, a problem analysis report, to the first device 20. This is not limited in this application. A to-be-analyzed object may be stored in the first device 20 or another device connected to the first device 20.
The static analysis tool 30 in this application may be static analysis software, may be a data packet for static analysis, or may be an executable file. This is not limited in this application.
It should be understood that the first device 20 in an example in FIG. 1 is a notebook computer, but this example does not constitute a limitation on a protection scope of this application. The first device 20 in this application may be a server, including various servers classified based on a network scale, an architecture, a purpose, an appearance, or the like. The first device 20 may alternatively be an intelligent terminal, for example, a mobile phone, a tablet computer (pad) , a computer having a wireless transceiver function, a virtual reality (VR) terminal, an augmented reality (AR) terminal, a wireless terminal in industrial control, a wireless terminal in self driving, a wireless terminal in TeleMedicine, a wireless terminal in smart grid, a wireless terminal in transportation safety, a wireless terminal in smart city, a wireless terminal in smart home, a cellular phone, a cordless telephone set, a session initiation protocol (SIP) telephone, a wireless local loop (WLL) station, a personal digital assistant (PDA) , a handheld device having a wireless communication function, a computing device or another processing device connected to a wireless modem, a vehicle-mounted device, a wearable device, a terminal in a 5G network, or a terminal in a future evolved network. This is not limited in this application.
FIG. 2 is a flowchart of a static analysis method according to this application. A method 200 shown in FIG. 2 is performed by a static analysis tool, and the method 200 includes the following steps.
Step S210: Receive a first request, where the first request is used to perform problem analysis on a first object.
For example, the first object may be a project, software, or a set of source code. This is not limited in this application.
Step S220: Perform problem analysis on a translation unit (translation unit, TU) associated with the first object, to generate a problem analysis result of the first object.
Specifically, a source file included in the first object is scanned, to generate a TU queue. The TU queue includes N TUs, N is an integer greater than or equal to 1, and each TU in the TU queue is associated with at least one function in at least one source file.
Problem analysis is performed, based on the TU queue, on a function associated with the N TUs in the TU queue, to generate a problem analysis result of the first object.
Optionally, one TU in the TU queue includes one or more source files, and one source file includes one or more functions.
For example, the first object includes 100 source files, and a TU queue is generated in a scanning process. It should be understood that the TU queue is maintained in a memory. For example, the TU queue may be an array in the memory, or the TU queue may be a data organization manner. The TU queue may include 100 TUs, or may include 10 TUs. It is assumed that the TU queue includes 100 TUs, and based on an equal distribution principle, each TU in the TU queue is associated with a function included in one source file. It is assumed that the TU queue includes 10 TUs, and based on an equal distribution principle, each TU in the TU queue is associated with a function included in 10 source files. Certainly, the equal distribution principle may alternatively not be used. To be specific, each TU in the TU queue is associated with a different quantity of source files, but all the TUs in the TU queue are associated with a total of 100 source files. This is not limited in this application.
It should be understood that, in the foregoing example, in a process of generating the TU queue, a TU is established based on a quantity of source files, or a TU may be established in another manner. This is not limited in this application.
For example, all source files of the project are scanned once, to generate the TU queue and a correspondence between a TU (each TU is associated with one source file) in the TU queue and a function.
Specifically, in the scanning process, each source file has an identifier, and the identifier is used to uniquely identify each source file. It is assumed that a TU 1 in the TU queue is associated with one source file. For example, the TU 1 is associated with a source file #1, and the source file #1 includes three functions (for example, a function 1, a function 2, and a function 3) . Therefore, three correspondences are generated in the scanning process, to be specific, a correspondence between the source file #1 and the function 1, a correspondence between the source file #1 and the function 2, and a correspondence between the source file #1 and the  function 3. In other words, a correspondence between the TU 1 and the function 1, a correspondence between the TU 1 and the function 2, and a correspondence between the TU 1 and the function 3 are generated in the scanning process. It should be understood that a correspondence between another TU in the TU queue and a function may be generated by analogy. The finally generated TU queue includes the N TUs and a correspondence between each of the N TUs and a function. It should be understood that, a correspondence between a TU and a function may also be described as follows: The TU is associated with the function. The function may be described as a function associated with the TU. That a TU is associated with a source file means that there is a correspondence between a TU and a source file. The source file may be described as a source file associated with the TU. This is not limited in this application.
It should be understood that, when problem analysis is performed, based on the TU queue, on the function associated with the N TUs in the TU queue, a TU is selected from the TU queue, and a function associated with the TU is analyzed. Until analysis of all the TUs in the TU queue is completed, it is considered that the problem analysis of the first object is completed.
Specifically, during analysis of the first object, any TU (which may also be referred to as a first TU for ease of description) is selected from the TU queue, and a first function set associated with the first TU is determined. The first function set includes a function in a source file associated with the first TU. Any function, for example, a first function, in the first function set is analyzed, and an analysis result of the first function is determined. The analysis result of the first function includes at least one of a call relationship of the first function and a quality problem of the first function. When there is a function having a call relationship with the first function (for example, the first function calls another function) , a second function set is analyzed, and an analysis result of a function in the second function set is determined. The second function set includes at least one function having the call relationship with the first function (in other words, the function included in the second function set is called by the first function) . When the analysis of all the TUs in the TU queue is completed, the problem analysis result of the first object is generated.
For example, during analysis of the first object, the TU 1 in the TU queue is selected, the first function set is determined. To be specific, the function 1, the function 2, and the function 3 are determined. Then, the function 1, the function 2, and the function 3 are analyzed. In an analysis process, if the function 1, the function 2, and the function 3 do not call another function,  an analysis result of the foregoing functions is determined. The analysis result includes at least one of a calling relationship of the functions and a quality problem of the functions. If it is assumed that the function 2 calls a function 4 and a function 5 in an analysis process, the function 2 needs to be analyzed after analysis of the second function set (the function 4 and the function 5) is completed. It should be understood that the function 4 and the function 5 may be associated with a same TU, or may be associated with different TUs. When the analysis of all the TUs in the TU queue is completed according to the foregoing analysis rule, the problem analysis result of the first object is generated.
It should be understood that, analyzing the TU 1 is analyzing all functions associated with the TU 1. If it is found in the analysis process that a function associated with the TU 1 calls another function, the function that calls the another function is placed in a waiting pool. If a function associated with the TU 1 does not call another function, analysis of the function is directly completed. After processing of the function associated with the TU 1 is completed, a next TU may be analyzed. The waiting pool has specific storage space in the memory, and may be an array in the memory. An implementation form of placing the function in the waiting pool may be to store the function by using a queue or a database, or by specifying the storage space. The function placed in the waiting pool may be analyzed after analysis of the called function is completed.
For example, the functions associated with the TU 1 include the function 1, the function 2, and the function 3. The function 1 and the function 3 do not call another function, and the function 2 calls the function 4. Therefore, analysis of the function 1 and the function 3 is completed, and the function 2 is placed in the waiting pool. In other words, processing of the TU 1 is completed. In a next step, another TU may be selected from the TU queue based on popularity for analysis. The function 2 may be analyzed after analysis of the function 4 is completed.
Optionally, when a TU is selected from the TU queue for analysis, sequential selection may be performed, and one TU may be selected for analysis each time, or a plurality of TUs may be simultaneously selected for analysis. When a plurality of TUs are simultaneously selected for analysis, static analysis efficiency may be further improved.
It should be understood that, when a plurality of TUs are simultaneously selected for analysis, functions associated with the plurality of TUs may call a same another function. In this  case, when each of the plurality of TUs is analyzed, the called same another function can be analyzed once, to reduce redundant analysis. Alternatively, that the function is analyzed may be identified by using identification information.
Optionally, each TU in the TU queue includes popularity, and the popularity is used to identify a quantity of times that a function in a TU associated with the popularity is called. When the first TU is selected from the TU queue, the first TU may be selected based on the popularity of each TU in the TU queue. A popularity identifier (which may be understood as a popularity value) of the first TU is higher than a first threshold.
It should be understood that, the popularity is a presentation of a weight, a value of the popularity is represented by using a popularity value, and the popularity value is a presentation of a weight value. The popularity value may be identified by using a color or another manner. This is not limited in this application.
For example, when a TU is selected from the TU queue, a TU whose popularity value is higher than the first threshold is preferentially selected for analysis.
Optionally, the popularity of each TU in the TU queue may be determined based on a quantity of times or a frequency of calling a function associated with the TU.
Optionally, the popularity of each TU in the TU queue may alternatively be determined based on a space occupation rate of the TU when a function associated with the TU is processed.
For example, it is assumed that for the first TU in the TU queue, each time a function associated with another TU calls the function associated with the first TU, the popularity value of the first TU is updated once. Therefore, a larger quantity of times that the function associated with the first TU is called indicates a higher popularity value of the first TU. Alternatively, it can be understood that a larger quantity of times that the function associated with the first TU is called indicates a higher priority of the first TU in the TU queue. When a TU is subsequently selected from the TU queue, the TU is selected, for analysis, based on a policy of preferentially selecting a TU having a high popularity value.
Optionally, during analysis of the function associated with the TU, each time analysis of a function is completed, a control flow graph (control flow graph, CFG) and an abstract syntax tree (abstract syntax tree, AST) corresponding to the function are generated, and summary information of the function is stored in a lookup table. The summary information of the function  may be understood as a description of behavior of the function. If a TU that calls the function exists during subsequent analysis, that the function performs a specific operation in a specific condition to output a specific result may be learned of based on the summary information of the function, to further analyze the TU.
Optionally, the summary information of the function may include one or more of the following: a call parameter of the function, a return value of the function, a function of the function, internal logic of the function, and an environment variable of the function.
Optionally, the summary information of the function is compressed and stored in the lookup table.
For example, after the summary information of the function is compressed, memory space occupied by the summary information of the function is reduced, and memory overheads in a static analysis process are further reduced.
Specifically, an existing compression tool may be selected to compress the summary information of the function.
Optionally, when analysis of the first function associated with the first TU is completed, and analysis of all other functions that call the first function is completed, the first function and a CFG and an AST that correspond to the first function may be cleared from the memory.
It should be understood that when the analysis of the first function associated with the first TU is completed and the analysis of all other functions that call the first function is completed, it indicates that the first function is no longer called by any function. In this case, the first function and the CFG and the AST that correspond to the first function are cleared from the memory, and only the summary information of the first function is retained.
Optionally, when the analysis of all the functions associated with the first TU is completed, the first TU is cleared from the TU queue. Alternatively, when the analysis of all the functions associated with the first TU is completed, and the analysis of all the other functions that call the function associated with the first TU is completed, the first TU is cleared from the TU queue.
It should be understood that, when all the functions associated with the first TU are analyzed, and the analysis of all the other functions that call the function associated with the first TU is completed, it indicates that all the functions associated with the first TU are no longer  called by any function. In this case, the first TU is cleared from the TU queue, to reduce memory overheads of the system.
Step S230: Send the problem analysis result of the first object.
in step S220, in a process of selecting a TU from the TU queue for analysis, a change in a ranking of a priority of the TU in the TU queue is shown in FIG. 3.
In the static analysis method 200 provided in this application, during problem analysis of the first object, the source file of the first object is associated with the TU, and the TU may include the at least one source file. During specific problem analysis, the TU is used as a granularity, a call relationship graph of all the functions included in the first object does not need to be generated, and a plurality of TUs may be simultaneously selected for parallel processing, to improve static analysis efficiency. In addition, in a specific analysis process, a TU or a function whose analysis is completed and that is not called may be cleared from the memory, to reduce memory overheads of the system. FIG. 3 is a schematic diagram of a change in a ranking of a TU in a TU queue according to this application.
The TU queue 10 shown in FIG. 3 includes four TUs, and each TU is associated with one source file (the file in FIG. 3 is used as an example, and the same is true of FIG. 4) . It can be understood that each TU includes one source file, and each source file is associated with two functions. It can be understood that each source file includes two functions. A quantity of TUs in the TU queue 10, a quantity of source files included in each TU, and a quantity of functions included in each source file are merely examples. This is not limited in this application. As shown in the figure, before analysis is performed, a ranking of a TU in an initial TU queue 10 is a random ranking, and a TU 1, a TU 2, a TU 3, and a TU 4 are respectively ranked from top to bottom. The TU 1 is associated with a source file 1, the source file 1 includes a function (function, func) 1 and a function 2, the TU 2 is associated with a source file 2, the source file 2 includes a function 3 and a function 4, the TU 3 is associated with a source file 3, the source file 3 includes a function 5 and a function 6, the TU 4 includes a source file 4, and the source file 4 includes a function 7 and a function 8.
During first analysis, the TU 1 is randomly selected from the TU queue 10 for analysis. As shown in the figure, the function 2 in the TU 1 calls the function 5 in the TU 3. Therefore, a popularity value of the TU 3 is increased. For example, before the first analysis, all popularity values of the TU 1, the TU 2, the TU 3, and the TU 4 are 0. After the TU 1 is analyzed,  it is found that the function 2 in the TU 1 calls the function 5 in the TU 3, to increase the popularity value of the TU 3, so that the popularity value of the TU 3 is updated to 1. As shown in the figure, after reranking, the TU 3 has a highest popularity value in the TU queue 10.
During specific analysis, the function 5 and the function 6 that correspond to the TU 3 are analyzed. After the analysis is completed, the function 2 and the function 1 are analyzed. An analysis result of the function 5 needs to be used during analysis of the function 2. It should be understood that, after analysis of the functions corresponding to the TU 3 and the TU 1 is completed, if it is determined that no another function calls the function 1, the function 2, the function 5, and the function 6, the TU 3 and the TU 1 are cleared from the TU queue 10. As shown in the figure, after reranking, there are only the TU 2 and the TU 4 in the TU queue 10. In comparison with an initial analysis period, a quantity of TUs in the TU queue 10 is reduced, to reduce memory overheads of the system.
It should be understood that, according to the static analysis method provided in this application, if a processing progress in the static analysis process is presented in the TU queue 10, it can be learned that as the processing progress advances, the quantity of TUs in the queue 10 gradually decreases.
FIG. 4 is a principle flowchart of performing analysis at a granularity of a TU according to this application.
When problem analysis needs to be performed on the first object, the following steps are performed.
Step S410: Scan the source file included in the first object, to generate the TU queue 10 and a correspondence between each TU in the TU queue 10 and a function.
A TU queue shown in FIG. 4 includes n TUs, and the n TUs are sequentially a TU 1, a TU 2, a TU 3, ..., and a TUn. Each TU is associated with one source file, and each source file includes two functions. A quantity of TUs in the TU queue is similar to that in FIG. 3. A quantity of source files included in each TU and a quantity of functions included in each source file are merely examples. This is not limited in this application. In addition, FIG. 4 shows only a correspondence between each of the TU 1, the TU 2, and the TU 3 and each ofa source file and a function. To be specific, the TU 1 is associated with a source file 1, the source file 1 includes a function 1 and a function 2, the TU 2 is associated with a source file 2, the source file 2 includes a function 3 and a function 4, the TU 3 is associated with a source file 3, and the source file 3  includes a function 5 and a function 6. A correspondence between another TU and each of a source file and a function is not shown in this figure.
Step S420: Select a TU from the TU queue 10.
In this step, if selection is performed for the first time, a TU may be randomly selected from the TU queue 10, and one or more TUs may be simultaneously selected. If selection is not performed for the first time, a TU may be selected from the TU queue 10 based on the policy of preferentially selecting a TU having a high popularity value (for example, a TU is selected if a popularity value of the TU is higher than a first threshold) , and one or more TUs may be simultaneously selected.
Step S430: Analyze a function associated with the selected TU.
For clear description, that the selected TU is the TU 1 is used as an example. In this step, all functions associated with the TU 1 such as the function 1 and the function 2 are analyzed. In an analysis process, a CFG and an AST that correspond to each of the function 1 and the function 2 are generated.
Step S440: Further analyze whether the function 1 and the function 2 call another function, where the another function may be an analyzed function, or may be a function that is not analyzed.
Step S4410: It is assumed that the function associated with the TU 1 calls another function. For example, the function 2 calls the function 5 in the TU 3.
Step S4411: Search the TU queue 10 for a TU corresponding to the function 5, where it is assumed that the TU corresponding to the function 5 is the TU 3.
Step S4412: Increase a popularity value of the TU 3, so that the TU 3 is ranked the first in the TU queue 10.
In a possible implementation, in step S4413, the function 2 associated with the TU is placed in a waiting pool 20 for analysis. The waiting pool 20 has specific storage space in a memory, and may be an array in the memory. An implementation form of placing the function 2 in the waiting pool 20 may be to store the function 2 by using a queue or a database, or by specifying the storage space.
During subsequent analysis of the TU 3, after analysis of the function 5 associated with the TU 3 is completed, the function 2 in the waiting pool 20 may be notified that the analysis of the function 5 is completed, and the function 2 may be further analyzed.
In another possible implementation, when the function 2 calls the function 5 in the TU 3, in step S4413, both the function 2 and the function 5 may be placed in the waiting pool 20 for analysis, or all functions, namely, the function 2, the function 5, and the function 6 associated with TUs corresponding to the function 2 and the function 5 are placed in the waiting pool 20 for analysis. This is not limited in this application.
Step S4420: It is assumed that the function associated with the TU 1 does not call another function. For example, the function 1 does not call another function.
Step S4421: Create summary information of the function 1. It should be understood that the created summary information is not a value of specific summary information, and the value of the specific summary information needs to be filled in after the analysis of the function is completed.
Step S4422: Analyze the function 1, and after the analysis of the function 1 is completed, fill the created summary information with the value of the specific summary information, and store the summary information of the function 1 in a lookup table.
For example, the summary information of the function 1 may include one or more of the following: a call parameter of the function 1, a return value of the function 1, a function of the function 1, internal logic of the function 1, and an environment variable of the function 1.
Step S4423: Analyze whether the function 1 is called, and if no function calls the function 1, clear the function 1 and a CFG and an AST that are generated in a process of analyzing the function 1.
Step S4424: If a function calls the function 1, the function that calls the function 1 in the waiting pool is notified that the analysis of the function 1 is completed, and a next step may be performed, in other words, the function that calls the function 1 is analyzed.
When a TU is subsequently selected from the TU queue, one or more TUs are selected based on the policy of preferentially selecting a TU having a high popularity value. If a function associated with the TU is no longer called during analysis of the function, or analysis of each function that calls the function is completed, after the analysis of the function is completed, the function and a CFG and an AST that are generated in a process of analyzing the function are cleared from the memory, and only the summary information of the function is retained.
If the analysis of the function associated with the TU is completed; or if the analysis of the function associated with the TU is completed, and the function associated with the TU is  no longer called, the TU is cleared from the TU queue.
Therefore, with further analysis, the quantity of TUs in the TU queue gradually decreases. Until analysis of all the TUs in the TU queue is completed, problem analysis of the first object is completed.
The static analysis method provided in this application is to analyze a system project or a type of problem (for example, a null pointer problem) in software. The method 200 is performed by a static analysis tool in software. The static analysis tool may be static analysis software, a data packet for static analysis, or an executable file. As shown in FIG. 5, the static analysis tool 300 may be divided based on functional modules, and may include a code compilation module, a data conversion module, a software processing module, and a problem analysis module.
The code compilation module is configured to compile code of a problem analysis object, to generate a required AST and CFG. The code compilation module includes an AST coding module and a CFG compilation module. The problem analysis object is a static analysis object, and may be a system project or software. This is not limited in this application.
For example, the code compilation module may be Clang or a Z3 solver. Clang and Z3 are public libraries. In other words, code of a platform of an existing analysis tool is used for the code compilation module. Clang is used to compile, analyze, and generate the required AST and CFG. Clang is an existing executable file. An input of Clang is a source file of C/C++, and an output is an AST and a CFG that are generated through parsing by using Clang. Z3 is an open source constraint solver produced by Microsoft, and can resolve a constraint solving problem in many cases. An input of Z3 is a series of propositional equations, and an output is a solution that satisfies an equation.
It should be understood that the code compilation module in the static analysis tool may alternatively be another public library To replace Clang, only an underlying adaptation layer needs to be modified. Similarly, the Z3 solver is also one of optional solvers in a symbol execution process, and may be replaced with another solver. Clang and Z3 are used as examples in the following description. This is not limited in this application.
The data conversion module is configured to convert an output of the code compilation module into a self-defined data structure, and includes an AST conversion module and a CFG conversion module.
For example, an AST and a CFG that are generated by Clang are converted into an AST and a CFG of an independent data format, so that a static analysis engine does not need to excessively rely on a library function of the code compilation module.
The software processing module is a core algorithm module of the static analysis method provided in this application, may also be referred to as a static analysis engine, and may be a segment of code that may be invoked. The software processing module includes an AST processing module and a DFA processing module. During specific static analysis, the software processing module calls the code compilation module by using the data conversion module, to analyze a file of the problem analysis object.
For example, when analyzing the source file, the software processing module calls underlying Clang by using the data conversion module, to analyze the file, and converts the source file into the AST and the CFG that are output by Clang. Then the data is parsed by the data conversion module, to generate a data structure defined by the software processing module.
The problem analysis module performs analysis and a check based on the data structure provided by the software processing module, for example, a null pointer check, and a variable initialization check. The problem analysis module includes an AST analysis module and a DFA analysis module.
For example, the AST analysis module traverses all nodes of the AST to check whether there is a potential quality problem, or the DFA analysis module traverses all branches of the CFG to determine, in a symbolic execution method, whether logic of a branch is valid. In other words, the symbol execution method is used to determine whether a branch is reachable. The symbolic execution method described herein is specifically to obtain an equation by performing a conjunction on all logical expressions representing a condition on a branch (the branch may be understood as a path) , and the equation is used as an input of Z3 to obtain a solution. The conjunction is to connect a plurality of logical expressions by using logical "and" . Only when each expression is valid, a final conjunction equation is valid.
FIG. 6 is an interaction flowchart of another static analysis method according to this application. It can be learned from FIG. 5 that a static analysis tool includes a code compilation module, a data conversion module, a software processing module, and a problem analysis module. Functions of the code compilation module and the data conversion module are to perform data conversion on a source file of an analysis object, so that the software processing  module performs specific analysis based on data obtained after conversion is performed on the source file, and the problem analysis module performs problem a problem check based on analysis performed by the software processing module, so as to output a problem analysis result. Because the code compilation module and the data conversion module do not perform a specific analysis process, FIG. 6 only shows an interaction diagram between the problem analysis module and the software processing module.
It should be understood that in a method 300 shown in FIG. 6, the method 200 is described from a perspective of a functional module included in an execution body (the static analysis tool 300) in the method 200. The method 300 shown in FIG. 6 includes the following steps.
Step S310: The problem analysis module sends a first request to the software processing module, where the first request is used to perform problem analysis on a first object. Correspondingly, the software processing module receives the first request. For example, the first object may be a project or software. This is not limited in this application.
It should be understood that, when problem analysis needs to be performed on the first object, before step S310, the problem analysis module receives a first request from an application or a server. The first request may correspond to the first request in step S210. In the scenario shown in FIG. 1, the first request may be an operation performed by a user on a first device. After receiving the first request, the problem analysis module forwards the first request to the software processing module, so that the software processing module starts to initiate a task.
Step S320: The software processing module performs problem analysis on a TU associated with the first object, to generate a problem analysis result of the first object.
Specifically, the software processing module scans a source file included in the first object, to generate a TU queue. The TU queue includes N TUs, N is an integer greater than or equal to 1, and each TU in the TU queue is associated with at least one function in at least one source file.
The software processing module performs, based on the TU queue, problem analysis on a function associated with the N TUs in the TU queue, to generate the problem analysis result of the first object.
Optionally, one TU in the TU queue is associated with one or more source files, and one source file includes one or more functions.
For example, the first object includes 100 source files. The software processing module generates the TU queue in a scanning process. It should be understood that the TU queue is maintained in a memory (the TU queue may be an array in the memory, or the TU queue may be a data organization manner) . The TU queue may include 100 TUs, or may include 10 TUs. It is assumed that the TU queue includes 100 TUs, and based on an equal distribution principle, each TU in the TU queue is associated with a function included in one source file. It is assumed that the TU queue includes 10 TUs, and based on an equal distribution principle, each TU in the TU queue is associated with a function included in 10 source files. Certainly, the equal distribution principle may alternatively not be used. To be specific, each TU in the TU queue is associated with a different quantity of source files, but the TUs in the TU queue are associated with a total of 100 source files. This is not limited in this application.
It should be understood that, in the foregoing example, in a process of generating the TU queue, the software processing module establishes a TU based on a quantity of source files. In addition, the software processing module may alternatively establish the TU in another manner. This is not limited in this application.
For example, the software processing module scans all source files of the project once, to generate the TU queue and a correspondence between a TU (the TU is associated with one source file) in the TU queue and a function.
Specifically, in the scanning process, the software processing module generates an identifier of each source file, and is configured to uniquely identify each source file. It is assumed that a TU 1 in the TU queue is associated with one source file. For example, the TU 1 is associated with a file #1, the file #1 includes three functions (for example, a function 1, a function 2, and a function 3) . Therefore, correspondences between the file #1 and the three functions, to be specific, a correspondence between the file #1 and the function 1, a correspondence between the file #1 and the function 2, and a correspondence between the file #1 and the function 3 are generated in the scanning process. In other words, a correspondence between the TU 1 and the function 1, a correspondence between the TU 1 and the function 2, and a correspondence between the TU 1 and the function 3 are generated in the scanning process. It should be understood that a correspondence between another TU in the TU queue and a function may be generated by analogy. The finally generated TU queue includes the N TUs and a correspondence between each of the N TUs and a function.
It should be understood that, when the software processing module performs, based on the TU queue, problem analysis on the function associated with the N TUs in the TU queue, the software processing module selects a TU from the TU queue, and analyzes a function associated with the TU. Until analysis of all the TUs in the TU queue is completed, it may be considered that the software processing module completes analysis of the first object.
Specifically, during analysis of the first object, the software processing module selects any TU (which may also be referred to as a first TU for ease of description) from the TU queue, and determines a first function set associated with the first TU, where the first function set includes a function in a source file associated with the first TU; analyzes any function, for example, a first function, in the first function set, and determines an analysis result of the first function, where the analysis result of the first function includes at least one of a call relationship of the first function and a quality problem of the first function; when there is a function having a call relationship with the first function (for example, the first function calls another function) , analyzes a second function set, and determines an analysis result of a function in the second function set, where the second function set includes at least one function having the call relationship with the first function (in other words, the function included in the second function set is called by the first function) ; and when the analysis of all the TUs in the TU queue is completed, generates the problem analysis result of the first object. For example, during analysis of the first object, the software processing module selects the TU 1 in the TU queue, determines the first function set, to be specific, determines the function 1, the function 2, and the function 3, and then analyzes the function 1, the function 2, and the function 3. In the analysis process, if the function 1, the function 2, and the function 3 do not call another function, an analysis result of the foregoing functions is determined. The analysis result includes at least one of a calling relationship of the functions and a quality problem of the functions. If it is assumed that the function 2 calls a function 4 and a function 5 in the analysis process, the function 2 needs to be analyzed after analysis of the function 4 and the function 5 is completed. It should be understood that the function 4 and the function 5 may be associated with a same TU, or may be associated with different TUs. When the analysis of all the TUs in the TU queue is completed according to the foregoing analysis rule, the problem analysis result of the first object is generated.
It should be understood that, analyzing the TU 1 is analyzing all functions associated with the TU 1. If it is found in the analysis process that a function associated with the TU 1 calls  another function, the function that calls the another function is placed in a waiting pool. If a function associated with the TU 1 does not call another function, analysis of the function is directly completed. After processing of the function associated with the TU 1 is completed, a next TU may be analyzed. The function placed in the waiting pool may be analyzed after analysis of the called function is completed.
For example, the functions associated with the TU 1 include the function 1, the function 2, and the function 3. The function 1 and the function 3 do not call another function, and the function 2 calls the function 4. Therefore, analysis of the function 1 and the function 3 is completed, and the function 2 is placed in the waiting pool. In other words, processing of the TU 1 is completed. In a next step, another TU may be selected from the TU queue based on popularity for analysis. The function 2 may be analyzed after the analysis of the function 4 is completed.
Optionally, when selecting a TU from the TU queue for analysis, the software processing module may perform sequential selection, and select one TU for analysis each time, or may simultaneously select a plurality of TUs for analysis. When the software processing module simultaneously selects a plurality of TUs for analysis, static analysis efficiency may be further improved.
It should be understood that, when the software processing module simultaneously selects a plurality of TUs for analysis, functions associated with the plurality of TUs may call a same another function. In this case, when each of the plurality of TUs is analyzed, the called same another function can be analyzed once, to reduce redundant analysis. Alternatively, that the function is analyzed may be identified by using identification information.
Optionally, each TU in the TU queue includes popularity, and the popularity is used to identify a quantity of times that a function in a TU associated with the popularity is called. When the software processing module selects the first TU from the TU queue, the software processing module may select the first TU based on the popularity of each TU in the TU queue. A popularity identifier (which may be understood as a popularity value) of the first TU is higher than a first threshold.
For example, when selecting a TU from the TU queue, the software processing module preferentially selects a TU with a high popularity value for analysis.
Optionally, the popularity of each TU in the TU queue may be determined based on a  quantity of times or a frequency of calling a function associated with each TU, and the popularity of each TU in the TU queue may alternatively be determined based on a space occupation rate of the TU when the function associated with the TU is processed.
For example, it is assumed that for the first TU in the TU queue, each time a function associated with another TU calls the function associated with the first TU, the popularity value of the first TU is updated once. Therefore, a larger quantity of times that the function associated with the first TU is called indicates a higher popularity value of the first TU. Alternatively, it can be understood that a larger quantity of times that the function associated with the first TU is called indicates a higher priority of the first TU in the TU queue. When a TU is subsequently selected from the TU queue, the TU is selected, for analysis, based on a policy of preferentially selecting a TU having a high popularity value.
Optionally, during analysis of the function associated with the TU, each time analysis of a function is completed, the software processing module correspondingly generates a control flow graph (control flow graph, CFG) and an abstract syntax tree (abstract syntax tree, AST) corresponding to the function, and stores summary information of the function in a lookup table. The summary information of the function may alternatively be understood as a description of behavior of the function. If a TU that calls the function exists during subsequent analysis, that the function performs a specific operation in a specific condition to output a specific result may be learned of based on the summary information of the function, to further analyze the TU.
Optionally, the summary information of the function may include one or more of the following: a call parameter of the function, a return value of the function, a function of the function, internal logic of the function, and an environment variable of the function.
Optionally, the software processing module compresses the summary information of the function and stores the summary information of the function in the lookup table.
For example, after the software processing module compresses the summary information of the function, memory space occupied by the summary information of the function is reduced, and memory overheads in the static analysis process are further reduced.
Specifically, the software processing module includes a compression module for compressing the summary information of a function.
Optionally, when analysis of the first function associated with the first TU is completed, and analysis of all other functions that call the first function is completed, the  software processing module may clear, from the memory, the first function and a CFG and an AST that correspond to the first function.
It should be understood that when the analysis of the first function associated with the first TU is completed and the analysis of all other functions that call the first function is completed, it indicates that the first function is no longer called by any function. In this case, the software processing module clears, from the memory, the first function and the CFG and the AST that correspond to the first function, and retains only the summary information of the first function.
Optionally, when analysis of all the functions associated with the first TU is completed, the software processing module clears the first TU from the TU queue. Alternatively, when analysis of all the functions associated with the first TU is completed, and analysis of all the other functions that call the function associated with the first TU is completed, the software processing module clears the first TU from the TU queue.
It should be understood that, when all the functions associated with the first TU are analyzed, and analysis of all the other functions that call the function associated with the first TU is completed, it indicates that all the functions associated with the first TU are no longer called by any function. In this case, the first TU is cleared from the TU queue, to reduce memory overheads of the system.
Step S330: The software processing module sends a processing result of the first object to the problem analysis module, and correspondingly, the problem analysis module receives the processing result.
For example, after completing analysis of the TU in the TU queue, that is, after clearing the TU in the TU queue, the software processing module outputs a processing result to the problem analysis module. The processing result may include data obtained after processing performed by an AST processing module and a DFA processing module.
Step S340: The problem analysis module performs problem analysis on the first object based on the processing result.
For example, analysis is performed based on the processing result output by the software processing module. For example, all nodes of the AST are traversed to check whether there is a potential problem (for example, a null pointer problem or a variable initialization problem) , or all branches of the CFG are traversed to determine, in a symbolic execution method,  whether logic of a branch is valid.
After completing problem analysis of the first object based on the processing result, the problem analysis module needs to output the problem analysis result (corresponding to the problem analysis result in step S230 in the method 200) , the analysis result may be a problem analysis report. It is assumed that for the null pointer problem, content of the problem analysis report may be specifically learning, through analysis, that there is a null pointer problem at a specific location at a specific line of code in a specific source file of the object. This is not limited in this application.
In the static analysis method 300 provided in this application, during problem analysis of the first object, the source file of the first object is associated with the TU, and the TU may include the at least one source file. During specific problem analysis, the TU is used as a granularity, a call relationship graph of all the functions included in the first object does not need to be generated, and a plurality of TUs may be simultaneously selected for parallel processing, to improve static analysis efficiency. In addition, in a specific analysis process, a TU or a function whose analysis is completed and that is not called may be cleared from the memory, to reduce memory overheads of the system. The foregoing describes the method embodiments of this application with reference to the accompanying drawings, and the following describes an apparatus embodiment of this application. It can be understood that the description of the method embodiments and the description of the apparatus embodiment may correspond to each other. Therefore, for a part that is not described, refer to the foregoing method embodiments.
To implement functions in the foregoing method embodiments, a static analysis apparatus described below includes a corresponding hardware structure and/or software module for executing each function. A person skilled in the art should be aware that, units and algorithm steps in the examples described with reference to the embodiments disclosed in this specification can be implemented by hardware or a combination of hardware and computer software in this application. Whether the functions are performed by hardware or hardware driven by computer software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.
In this application, the static analysis apparatus may be divided into functional  modules based on the foregoing method examples. For example, each functional module may be obtained through division for a corresponding function, or two or more functions may be integrated into one processing module. The integrated module may be implemented in a form of hardware, or may be implemented in a form of a software functional module. It should be noted that, in this application, division into the modules is an example and is merely logical function division, and may be other division in an actual implementation. An example in which each functional module is obtained through division based on a corresponding function is used below for description.
FIG. 7 is a schematic block diagram of a static analysis apparatus according to this application. A static analysis apparatus 400 includes a receiving module 410, a processing module 420, and a sending module 430. The receiving module 410 may be configured to receive information sent from the outside, the receiving module 410 may be a receiving module in the problem analysis module 310, the processing module 420 is configured to process data inside the static analysis apparatus, an operation performed by the processing module 420 may correspond to an operation performed by the problem analysis module 310 and the software processing module 320 in the static analysis tool 300, the sending module 430 may be configured to send information to the outside, and the sending module 430 may be a sending module in the problem analysis module 310.
It should be understood that the static analysis apparatus 400 in this application may be implemented by using a central processing unit (CPU) , may be implemented by using an application-specific integrated circuit (ASIC) , or may be implemented by using a programmable logic device (PLD) . The PLD may be a complex programmable logic device (CPLD) , a field-programmable array gate (FPGA) , a general array logic (GAL) , or any combination thereof. Alternatively, when the static analysis method shown in FIG. 2 to FIG. 6 may be implemented by using software, the static analysis apparatus 400 and each module thereof may alternatively be a software module.
Optionally, the static analysis apparatus 400 may further include a storage module 440. The storage module 440 may be configured to store instructions and/or data generated in a processing process, and the processing module 420 may read the instructions and/or data in the storage module 440.
Optionally, the receiving module 410 is configured to receive a first request. The first  request is used to perform problem analysis on a first object, and the first object includes at least one source file. The processing module 420 is configured to perform problem analysis on a translation unit TU associated with the first object, to generate a problem analysis result of the first object. The sending module 430 is configured to send the problem analysis result.
Optionally, the processing module 420 is specifically configured to: scan the source file included in the first object, to generate a TU queue, where the TU queue includes N TUs, each TU in the TU queue is associated with at least one function in one source file, and N is an integer greater than or equal to 1; and perform, based on the TU queue, problem analysis on a function associated with the N TUs in the TU queue, to generate the problem analysis result of the first object.
Optionally, the processing module 420 is specifically configured to: select a first TU in the TU queue, and determine a first function set associated with the first TU, where the first function set includes at least one function in a source file associated with the first TU; analyze a first function, and determine an analysis result of the first function, where the first function is any function in the first function set, and the analysis result of the first function includes at least one of a call relationship of the first function and a quality problem of the first function;
when there is a function having a call relationship with the first function, analyze a second function set, and determine an analysis result of a function in the second function set, where the second function set includes at least one function having the call relationship with the first function; and generate the problem analysis result of the first object when analysis of all the TUs in the TU queue is completed.
Optionally, the processing module 420 is specifically configured to delete the first TU from the TU queue after analysis of all functions associated with the first TU in the TU queue is completed.
Optionally, each TU in the TU queue includes popularity, the popularity is used to identify a quantity of times that a function in a TU associated with the popularity is called, and the processing module 420 is specifically configured to select the first TU based on the popularity of the TU in the TU queue, where popularity of the first TU is a TU that meets a first condition in the TU queue, and the first condition includes that a popularity identifier of the first TU is higher than a first threshold.
Optionally, the processing module 420 is specifically configured to: update the  popularity of the TU in the TU queue based on popularity of a function associated with the TU in the TU queue; and rank the TU in the TU queue based on the updated popularity.
The static analysis apparatus 400 may be deployed in a server, so that when the server runs, the method performed by the static analysis tool in the foregoing method embodiments may be implemented.
In a possible implementation, the static analysis apparatus 400 may alternatively be deployed in a cloud environment. The cloud environment is an entity that provides a cloud service for a user by using a basic resource in a cloud computing mode. The cloud environment includes a cloud data center and a cloud service platform. The cloud data center includes a large quantity of basic resources (including compute resources, storage resources, and network resources) owned by a cloud service provider. The compute resources included in the cloud data center may be a large quantity of computing devices (for example, servers) . The static analysis apparatus 400 may be a server for static analysis in the cloud data center. The static analysis apparatus 400 may alternatively be a virtual machine created in the cloud data center for static analysis. The static analysis apparatus 400 may alternatively be a software apparatus deployed on the server or the virtual machine in the cloud data center. The software apparatus is configured to perform static analysis. The software apparatus may be deployed on a plurality of servers in a distributed manner, deployed on a plurality of virtual machines in a distributed manner, or deployed on the virtual machine and the server in a distributed manner. For example, the module 410, the module 420, and the module 430 in the static analysis apparatus 400 may be deployed on a plurality of servers in a distributed manner, deployed on a plurality of virtual machines in a distributed manner, or deployed on the virtual machine and the server in a distributed manner. For another example, when the module 420 includes a plurality of submodules, the plurality of submodules may be deployed on a plurality of servers, deployed on a plurality of virtual machines in a distributed manner, or deployed on the virtual machine and the server in a distributed manner.
The static analysis apparatus 400 may be abstracted by the cloud service provider into a cloud service of static analysis on the cloud service platform, to provide the cloud service to the user. After the user purchases the cloud service on the cloud service platform, the cloud environment provides the cloud service of static analysis for the user by using the cloud service. The user may upload an analysis object by using an application programming interface (API) or  by using a web page interface provided by the cloud service platform, and the apparatus 400 performs static analysis and outputs an analysis result.
When the static analysis apparatus 400 is a software apparatus, the static analysis apparatus 400 may be independently deployed on a computing device in any environment. When the static analysis apparatus 400 is hardware, the static analysis apparatus 400 may be a computing device or a chip.
As shown in FIG. 8, this application further provides a static analysis device 500. The static analysis device 500 includes one or more processors 501. The processor 501 is configured to execute a computer program or instructions stored in a memory 504 and/or data, so that the method in the foregoing method embodiments is performed.
Optionally, the processor 501 is coupled to the memory 504, or the memory 504 and the processor 501 may be disposed separately.
The memory 504 may also be referred to as a memory unit, and stores executable code, for example, stores the computer program or instructions and/or the data. The memory 504 provides the instructions and the data for the processor 501. The memory 504 may further include a software module required for another running process such as an operating system. For example, the memory 504 includes a kernel, a program, a file generation module, a file transmission module, a file obtaining module, and a decoding module.
It should be understood that, in this application, the processor 501 may be one or more CPUs, or the processor 501 may be another general-purpose processor, a digital signal processor (DSP) , an application-specific integrated circuit (ASIC) , a field programmable gate array (FPGA) or another programmable logic device, a discrete gate or a transistor logic device, a discrete hardware component, or the like. The general-purpose processor may be a microprocessor, any conventional processor, or the like.
The memory 504 may be a volatile memory or a nonvolatile memory, or may include both a volatile memory and a nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM) , a programmable read-only memory (programmable ROM, PROM) , an erasable programmable read-only memory (erasable PROM, EPROM) , an electrically erasable programmable read-only memory (electrically EPROM, EEPROM) , or a flash memory. The volatile memory may be a random access memory (RAM) that is used as an external cache. By way of example but not limitative description, many forms of RAMs may be used, for example,  a static random access memory (static RAM, SRAM) , a dynamic random access memory (DRAM) , a synchronous dynamic random access memory (synchronous DRAM, SDRAM) , a double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM) , an enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM) , a synchlink dynamic random access memory (synchlink DRAM, SLDRAM) , and a direct rambus random access memory (direct rambus RAM, DR RAM) .
Optionally, as shown in FIG. 8, the static analysis device 500 may further include a storage medium 505. The storage medium 505 may be configured to store data generated in a process of performing the foregoing method embodiments. The storage medium 505 may be a magnetic tape, a hard disk, for example, a solid state drive (solid state drive, SSD) , a floppy disk, or the like.
Optionally, as shown in FIG. 8, the static analysis device 500 may further include a communications interface 506. The communications interface 506 is configured to receive and/or send a signal and/or data. The processor 501 may be configured to control the communications interface 506 to receive and/or send a signal and/or data.
Optionally, as shown in FIG. 8, the static analysis device 500 may further include an output device 502 and an input device 503. The input device 503 may input, to the static analysis device 500, an object on which problem analysis needs to be performed. After completing analysis of the object, the static analysis device 500 may output a problem analysis result to the output device 502.
Optionally, the static analysis device 500 may further include a bus 507. The bus is a public communication trunk line for transferring information between various functional components of a computer, and is a common channel for transferring information between a CPU, a memory, an input device, and an output device. Components of a host are connected by using the bus. An external device is connected to the bus through a corresponding interface circuit, to form a computer hardware system. According to a type of information transmitted by the computer, a bus of the computer may be divided into a data bus, an address bus, and a control bus, and the data bus, the address bus, and the control bus are respectively used to transmit data, a data address, and a control signal. However, for clear descriptions, various buses in FIG. 8 are marked as the bus 507.
In a design, the static analysis device 500 is configured to implement an operation  performed by the static analysis tool in the foregoing method embodiments.
It should be understood that the processing module 420 in the static analysis apparatus 400 shown in FIG. 7 may be the processor 501 in FIG. 8, the receiving module 410 and the sending module 430 may be the communications interface 506 in FIG. 8, and the storage module 440 may be the memory 504 in FIG. 8. For an operation performed by the processor 501, specifically refer to the foregoing description of the processing module 420. For an operation performed by the communications interface 506, refer to the description of the receiving module 410 and the sending module 430. For an operation performed by the memory 504, refer to the description of the storage module 440. Details are not described herein again.
This application further provides a computer-readable storage medium, and the computer-readable storage medium stores instructions used to implement the method performed by the static analysis tool in the foregoing method embodiments.
For example, the computer-readable storage medium stores a computer program or instructions, and when the computer program or instructions are executed by a computer, the computer may be enabled to implement the method performed by the static analysis tool in the foregoing method embodiments.
This application further provides a static analysis device. The static analysis device may be a server, and a static analysis tool is installed on the static analysis device, so that when the static analysis device runs, the method performed by the static analysis tool in the foregoing method embodiments can be implemented.
All or some of the foregoing embodiments may be implemented by software, hardware, firmware, or any combination thereof. When the software is used to implement the embodiments, all or some of the foregoing embodiments may be implemented in a form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the procedures or functions according to this application are all or partially generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or another programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data  center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium accessible by the computer, or a data storage device, for example, a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape) , an optical medium (for example, a DVD) , or a semiconductor medium. The semiconductor medium may be a solid-state drive (solid-state drive, SSD) .
A person of ordinary skill in the art may be aware that, units and algorithm steps in the examples described with reference to the embodiments disclosed in this specification can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by the hardware or the software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.
The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims (14)

  1. A static analysis method, comprising:
    receiving a first request, wherein the first request is used to perform problem analysis on a first object, and the first object comprises at least one source file;
    performing problem analysis on a translation unit TU associated with the first object, to generate a problem analysis result of the first object; and
    sending the problem analysis result.
  2. The method according to claim 1, wherein the performing problem analysis on a TU associated with the first object, to generate a problem analysis result of the first object comprises:
    scanning the source file comprised in the first object, to generate a TU queue, wherein the TU queue comprises N TUs, each TU in the TU queue is associated with at least one function in one source file, and N is an integer greater than or equal to 1; and
    performing, based on the TU queue, problem analysis on a function associated with the N TUs in the TU queue, to generate the problem analysis result of the first object.
  3. The method according to claim 2, wherein the performing, based on the TU queue, problem analysis on a function associated with the N TUs in the TU queue, to generate the problem analysis result of the first object comprises:
    selecting a first TU in the TU queue, and determining a first function set associated with the first TU, wherein the first function set comprises at least one function in a source file associated with the first TU;
    analyzing a first function, and determining an analysis result of the first function, wherein the first function is any function in the first function set, and the analysis result of the first function comprises at least one of a call relationship of the first function and a quality problem of the first function;
    when there is a function having a call relationship with the first function, analyzing a second function set, and determining an analysis result of a function in the second function set, wherein the second function set comprises at least one function having the call relationship with the first function; and
    generating the problem analysis result of the first object when analysis of all the TUs in the TU queue is completed.
  4. The method according to claim 3, wherein the method further comprises:
    deleting the first TU from the TU queue after analysis of all functions associated with the first TU in the TU queue is completed.
  5. The method according to claim 3, wherein each TU in the TU queue comprises popularity, the popularity is used to identify a quantity of times that a function in a TU associated with the popularity is called, and the selecting a first TU in the TU queue comprises:
    selecting the first TU based on the popularity of the TU in the TU queue, wherein a popularity identifier of the first TU is higher than a first threshold.
  6. The method according to claim 5, wherein the method further comprises:
    updating the popularity of the TU in the TU queue based on popularity of a function associated with the TU in the TU queue; and
    ranking the TU in the TU queue based on the updated popularity.
  7. A static analysis apparatus, comprising:
    a receiving module, wherein the receiving module is configured to receive a first request, wherein the first request is used to perform problem analysis on a first object, and the first object comprises at least one source file;
    a processing module, wherein the processing module is configured to perform problem analysis on a translation unit TU associated with the first object, to generate a problem analysis result of the first object; and
    a sending module, wherein the sending module is configured to send the problem analysis result.
  8. The apparatus according to claim 7, wherein the processing module is specifically configured to:
    scan the source file comprised in the first object, to generate a TU queue, wherein the TU queue comprises N TUs, each TU in the TU queue is associated with at least one function in one source file, and N is an integer greater than or equal to 1; and
    perform, based on the TU queue, problem analysis on a function associated with the N TUs in the TU queue, to generate the problem analysis result of the first object.
  9. The apparatus according to claim 8, wherein the processing module is specifically  configured to:
    select a first TU in the TU queue, and determine a first function set associated with the first TU, wherein the first function set comprises at least one function in a source file associated with the first TU;
    analyze a first function, and determine an analysis result of the first function, wherein the first function is any function in the first function set, and the analysis result of the first function comprises at least one of a call relationship of the first function and a quality problem of the first function;
    when there is a function having a call relationship with the first function, analyze a second function set, and determine an analysis result of a function in the second function set, wherein the second function set comprises at least one function having the call relationship with the first function; and
    generate the problem analysis result of the first object when analysis of all the TUs in the TU queue is completed.
  10. The apparatus according to claim 9, wherein the processing module is specifically configured to:
    delete the first TU from the TU queue after analysis of all functions associated with the first TU in the TU queue is completed.
  11. The apparatus according to claim 9, wherein each TU in the TU queue comprises popularity, the popularity is used to identify a quantity of times that a function in a TU associated with the popularity is called, and the processing module is specifically configured to:
    select the first TU based on the popularity of the TU in the TU queue, wherein a popularity identifier of the first TU is higher than a first threshold.
  12. The apparatus according to claim 11, wherein the processing module is specifically configured to:
    update the popularity of the TU in the TU queue based on popularity of a function associated with the TU in the TU queue; and
    rank the TU in the TU queue based on the updated popularity.
  13. A static analysis device, comprising a processor, wherein the processor is coupled to a memory, the memory is configured to store a computer program or instructions, and the processor is configured to execute the computer program or the instructions in the memory, so  that the device performs the method according to any one of claims 1 to 6.
  14. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program or instructions, and when the computer program or the instructions are executed, the computer is enabled to perform the method according to any one of claims 1 to 6.
PCT/CN2022/104055 2021-08-24 2022-07-06 Static analysis method, apparatus, and device, and computer-readable storage medium WO2023024714A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280057286.XA CN117897694A (en) 2021-08-24 2022-07-06 Static analysis method, device and equipment and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2021124956A RU2021124956A (en) 2021-08-24 Static analysis method, equipment and apparatus and computer-readable storage medium
RU2021124956 2021-08-24

Publications (1)

Publication Number Publication Date
WO2023024714A1 true WO2023024714A1 (en) 2023-03-02

Family

ID=85321373

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104055 WO2023024714A1 (en) 2021-08-24 2022-07-06 Static analysis method, apparatus, and device, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN117897694A (en)
WO (1) WO2023024714A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286132A (en) * 2008-06-02 2008-10-15 北京邮电大学 Test method and system based on software defect mode
US20130014093A1 (en) * 2010-03-29 2013-01-10 Soft4Soft Co., Ltd. Code inspection executing system for performing a code inspection of abap source codes
CN104021084A (en) * 2014-06-19 2014-09-03 国家电网公司 Method and device for detecting defects of Java source codes
CN106294156A (en) * 2016-08-11 2017-01-04 北京邮电大学 A kind of static code fault detection analysis method and device
CN111694570A (en) * 2019-03-13 2020-09-22 南京大学 JavaScript function parameter mismatching detection method based on static program analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286132A (en) * 2008-06-02 2008-10-15 北京邮电大学 Test method and system based on software defect mode
US20130014093A1 (en) * 2010-03-29 2013-01-10 Soft4Soft Co., Ltd. Code inspection executing system for performing a code inspection of abap source codes
CN104021084A (en) * 2014-06-19 2014-09-03 国家电网公司 Method and device for detecting defects of Java source codes
CN106294156A (en) * 2016-08-11 2017-01-04 北京邮电大学 A kind of static code fault detection analysis method and device
CN111694570A (en) * 2019-03-13 2020-09-22 南京大学 JavaScript function parameter mismatching detection method based on static program analysis

Also Published As

Publication number Publication date
CN117897694A (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN109542399B (en) Software development method and device, terminal equipment and computer readable storage medium
US8219575B2 (en) Method and system for specifying, preparing and using parameterized database queries
CN110019080B (en) Data access method and device
CN109361628B (en) Message assembling method and device, computer equipment and storage medium
CN114531477B (en) Method and device for configuring functional components, computer equipment and storage medium
CN110895471A (en) Installation package generation method, device, medium and electronic equipment
CN111694572A (en) Code format conversion method, device, computer equipment and storage medium
CN107341106B (en) Application compatibility detection method, development terminal and storage medium
CN113360300B (en) Interface call link generation method, device, equipment and readable storage medium
CN112214250A (en) Application program assembly loading method and device
CN108959294B (en) Method and device for accessing search engine
WO2023024714A1 (en) Static analysis method, apparatus, and device, and computer-readable storage medium
WO2023143545A1 (en) Resource processing method and apparatus, electronic device, and computer-readable storage medium
CN109597825B (en) Rule engine calling method, device, equipment and computer readable storage medium
CN108595160B (en) Method and storage medium for calling native object by JS
US11706156B2 (en) Method and system for changing resource state, terminal, and storage medium
WO2021088686A1 (en) Compiler optimization information generating method and apparatus, and electronic device
CN113918129A (en) Front-end and back-end separated interface request processing method and device
CN114547604A (en) Application detection method and device, storage medium and electronic equipment
CN113742385A (en) Data query method and device
CN112650502A (en) Batch processing task processing method and device, computer equipment and storage medium
CN111198614A (en) Method and apparatus for processing input content of human interface device
CN114270309A (en) Resource acquisition method and device and electronic equipment
CN112463214B (en) Data processing method and device, computer readable storage medium and electronic equipment
WO2021051958A1 (en) Model operation method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22860061

Country of ref document: EP

Kind code of ref document: A1