CN116483423B

CN116483423B - Incremental code scanning method and system based on genetic algorithm

Info

Publication number: CN116483423B
Application number: CN202310744349.5A
Authority: CN
Inventors: 蒋玉芳; 高家祺; 郑晨晨; 王翱宇
Original assignee: Hangzhou Harmonycloud Technology Co Ltd
Current assignee: Hangzhou Harmonycloud Technology Co Ltd
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-09-05
Anticipated expiration: 2043-06-25
Also published as: CN116483423A

Abstract

The invention discloses an incremental code scanning method based on a genetic algorithm, and belongs to the technical field of computers; the method comprises the following steps: acquiring updated file information; constructing a current code file information tree according to the updated file information; optimizing the current code file information tree through a genetic algorithm to obtain an optimized current code file information tree; replacing the unaffected nodes by the optimized current code file information tree to obtain a final information tree; and scanning the directory and the file list of the final information tree to obtain the increment code. The invention also provides an incremental code scanning system based on the genetic algorithm. The invention can realize incremental scanning in any scene, improve the code scanning efficiency and reduce the resource cost.

Description

Incremental code scanning method and system based on genetic algorithm

Technical Field

The invention relates to the technical field of computers, in particular to an incremental code scanning method and system based on a genetic algorithm.

Background

Along with digital transformation, software research and development enters the lean management era. In the lean development process, code scanning is important to improve code quality, reliability and security. Code scanning can bring the following advantages to enterprises:

1) Finding potential defects of the code;

2) Discovering potential security problems of the code;

3) The manual code walking cost is reduced, automatic defect searching is realized, and the research and development efficiency is improved;

4) And the software complexity and the maintenance cost are reduced, and the quick access of new people is facilitated.

Based on the benefits of code scanning, enterprises have proposed the concept of continuous scanning, but have encountered certain challenges in the actual landing process.

1) And the time cost is high. The general code scanning tool can support full-scale scanning, but the full-scale scanning generally takes a long time in an enterprise project, and long-time code scanning influences the research and development iteration rate, so that an enterprise generally performs code scanning before online, and code quality control is realized in the last link;

2) And the resource cost is high. The full code scanning requires a large amount of computing resources, cpu, memory and disk because the whole code warehouse is required to be scanned;

3) And the repair cost is high. The ROI for repairing the problem with the steady running and tested historical code is not high, and the code is usually continuously scanned from the new code in the practice process.

In order to solve the above problems, the conventional scheme and the corresponding drawbacks are:

1) Incremental scanning functionality using an open source code scanning tool sonar. It is mainly aimed at setting a baseline and then allowing the user to pay attention to the problem after the baseline, i.e. to solve the third problem. But in terms of scanning efficiency it is essentially a full scan that is performed, with only the two scan results being subjected to a differential analysis, leaving the latest scan results. This solution is only applicable to a certain extent.

2) Version control git, and a sonar scan tool. This type of scheme uses Git diff to perform file difference analysis of two versions, and then uses file inclusion of the sonerscanner command to analyze only new files. This approach can solve the above three problems in most cases. However, the command inclusion file is transmitted with a certain number limit, and when the number limit is exceeded, the error is directly reported. In which case the degradation into full scan is required.

3) And a compiler integration mode. Some compiler-integrated code scanning tools may perform incremental scanning while compiling code. But this approach and compilers have a strong dependency and not all compilers in all languages can do this.

Disclosure of Invention

The invention aims to provide an efficient incremental code scanning method and system based on a genetic algorithm.

In order to solve the technical problems, the invention provides an incremental code scanning method based on a genetic algorithm, which comprises the following steps:

acquiring updated file information;

constructing a current code file information tree according to the updated file information;

expanding or merging nodes of the current code file information tree through a genetic algorithm to obtain an optimized current code file information tree;

expanding the unaffected nodes in the optimized current code file information tree to obtain a final information tree;

and scanning the final information tree to obtain an increment code.

Preferably, the method for acquiring the updated file information specifically comprises the following steps:

and acquiring the current commit data by a Git diff method to be used as updated file information.

Preferably, the method constructs the current code file information tree according to the updated file information, and specifically comprises the following steps:

judging whether the last commit data has a corresponding historical code file information tree or not;

if the last commit data has a corresponding historical code file information tree, reconstructing the historical code file information tree according to the current commit data and the last commit data to obtain a current code file information tree;

if the last commit data does not have the corresponding historical code file information tree, constructing a current code file information tree according to the current commit data.

Preferably, the method comprises the steps of reconstructing a historical code file information tree according to the current commit data and the last commit data to obtain a current code file information tree, and specifically comprises the following steps:

comparing the current commit data with the last commit data to obtain new modification code file information;

and adding the new modified code file information into the corresponding historical code file information tree to obtain a current code file information tree.

Preferably, the current code file information tree is constructed according to the present commit data, and the method specifically comprises the following steps:

respectively constructing a code file information tree according to the current commit data and the last commit data;

comparing the current commit data with the code file information tree of the last commit data to obtain a public directory;

and deleting the public directory from the code file information tree of the current commit data to obtain a current code file information tree.

Preferably, the nodes of the current code file information tree are unfolded or combined through a genetic algorithm to obtain an optimized current code file information tree, and the method specifically comprises the following steps:

coding the current code file information tree to obtain a coding result;

selecting a root node in a coding result to be unfolded, and combining other nodes to be used as an initial population;

and repeatedly executing selection, intersection and mutation on the initial population to generate a next generation population until the iteration times are reached or no expandable node exists, so as to obtain a current code file information tree after the conversion.

Preferably, the selection, crossover and mutation are repeatedly performed on the initial population to generate a next generation population, specifically comprising the following steps:

counting the number of nodes of the next layer;

judging whether the leaf nodes of the layer meet constraint conditions according to the number of the nodes of the next layer;

if the leaf nodes of the layer do not meet constraint conditions, calculating the adaptability of the leaf nodes and sequencing the leaf nodes to determine the nodes to be mutated;

and selecting nodes from the nodes to be mutated according to a dichotomy to be unfolded to serve as a next generation population.

Preferably, the final information tree is scanned to obtain the increment code, which specifically comprises the following steps:

decoding the final information tree to obtain a final catalog and a file;

and scanning the final catalogue and the file to obtain the increment code.

Preferably, the method comprises the steps of expanding non-affected nodes in the optimized current code file information tree to obtain a final information tree, and specifically comprises the following steps:

taking directory nodes with only one file node in the optimized current code file information tree as influence-free nodes;

and expanding the influence-free nodes to obtain a final information tree.

The invention also provides an incremental code scanning system based on the genetic algorithm, which comprises the following steps:

the acquisition module is used for acquiring updated file information;

the construction module is used for constructing a current code file information tree according to the updated file information;

the optimizing module is used for expanding or combining the nodes of the current code file information tree through a genetic algorithm to obtain an optimized current code file information tree;

the replacing module is used for expanding the unaffected nodes in the optimized current code file information tree to obtain a final information tree;

and the scanning module is used for scanning the final information tree to obtain an increment code.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a general incremental scanning method for balancing optimal scanning accuracy and scanning efficiency, which helps enterprises to realize incremental scanning in any scene, improves code scanning efficiency and reduces resource cost.

Drawings

The following describes the embodiments of the present invention in further detail with reference to the accompanying drawings.

FIG. 1 is a schematic flow chart of constructing a code file information tree;

FIG. 2 is a schematic diagram of the removal of public directories;

FIG. 3 is a schematic flow chart of genetic algorithm optimization of the current code file information tree;

FIG. 4 is a schematic flow chart of an incremental scan;

FIG. 5 is a flow chart of a method for scanning incremental codes based on a genetic algorithm according to the present invention.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than those herein described, and those skilled in the art will readily appreciate that the present invention may be similarly embodied without departing from the spirit or essential characteristics thereof, and therefore the present invention is not limited to the specific embodiments disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The invention is described in further detail below with reference to the attached drawing figures:

as shown in fig. 5, the invention provides an incremental code scanning method based on a genetic algorithm, which comprises the following steps:

acquiring updated file information;

and scanning the final information tree to obtain an increment code.

coding the current code file information tree to obtain a coding result;

counting the number of nodes of the next layer;

decoding the final information tree to obtain a final catalog and a file;

and scanning the final catalogue and the file to obtain the increment code.

and expanding the influence-free nodes to obtain a final information tree.

the acquisition module is used for acquiring updated file information;

In order to better illustrate the technical effects of the present invention, the present invention provides the following specific embodiments to illustrate the above technical flow:

embodiment 1, a genetic algorithm-based incremental code scanning method, as shown in fig. 4;

based on the shortcomings of the prior art solutions, the present method and system aim to improve the second solution of the conventional practice in the background art, and seek a certain optimal solution in the case of certain restrictions on file parameters. The invention provides a general incremental scanning method for balancing optimal scanning accuracy and scanning efficiency, which helps enterprises to realize incremental scanning in any scene, improves code scanning efficiency and reduces resource cost.

The specific steps of the specific method are as follows:

1) The number of updated files is obtained according to the Git diff method, and then a code file information tree is constructed according to the information. The specific steps are shown in figure 1.

(1) Git diff obtains new modified code file information between scanning commit;

(2) judging whether there is previous history tree information

(3) If not, reconstructing the code file information tree. Firstly, according to the longest public subtree of the characteristic statistical information tree with the child node tree being 1, the longest public subtree is a public directory, the current commit data is traversed, the total file number of the file related directory is counted in each file traversal, and the storage tree structure comprises the total file number and the new file number. The tree structure deletes the public directory when stored, and reduces the memory occupation of the tree information, as shown in fig. 2.

(4) And if the history tree exists, acquiring the history tree, analyzing the catalog information according to the new modification code file information acquired in the step 1, inserting the catalog, and updating the total file number and the new file number of the tree node.

2) After the number of the code files is built, the files are required to be combined under the condition that the number limit is met, the files are transmitted into the catalogue, and the new files are required to be kept as most as possible under the condition that the files are transmitted into the catalogue, so that the increment scanning accuracy is higher. This step requires the search for an optimal solution, so a genetic algorithm can be used to select files, ensuring that the number of files does not exceed the command limit of the sonarscaner. The specific steps are shown in fig. 3:

(1) the final question is abstracted as to whether the new file-dependent directories merge or unfold. Merging is 0 and unrolling is 1. The first step is to encode the information tree of the constructed code file. And constructing a two-dimensional array by using the file depth L and the number C of leaf nodes, and judging whether the directory on the first updated file of the first line of codes of the data is unfolded or not, wherein 1 is obtained and 0 is obtained as a combination. Because the tree depths are different, if there is a directory-level miss, a-1 representation is used; 2 denotes leaf nodes, each iteration is not updated. The tree according to the above can be constructed as follows:

Treecodes[C][L]=

(2) and obtaining an initial population. Because of the number limitation problem, if the number of the catalogues is limited, and the catalogues exceed the limitation, intermediate combination is not needed, and all population evolution evolves from top to bottom. The initial population assumes that the root node is expanded and the other directories are merged, so the encoded gene results are as follows:

(3) and selecting an algorithm. And counting the number of leaf nodes (catalogues or files) according to the second column of information, judging whether constraint conditions are met, and ending if constraint limits are not met. Satisfying constraint conditions, sorting according to fitness, performing mutation operation on the next row of maximum N-count (leaf node tree), expanding by 0-1, selecting a node with small fitness for mutation, and performing mutation on a leaf node with a value of 2. According to the dichotomy, selecting nodes to try mutation, determining a mutation boundary, wherein the left side of the mutation boundary exceeds the limit, and the right side of the mutation boundary does not exceed the limit. And selecting the node with small adaptability on the right side to perform batch variation.

Total number of files: ftotal

New number of files: fnew

And finally selecting a node tree: count (Count)

Limiting number: n (N)

Function fitness: fitness=fnew/Ftotal 100;

constraint Count < = N

(4) And mutating the node on the right side of the boundary, and setting the node value as expansion.

(5) And obtaining next generation stage information, and repeating the steps 3-5 to obtain the optimal solution.

3) And expanding the catalog of only one updated file according to the result obtained by the genetic algorithm again, wherein the mutation does not influence the final result and the second settlement can be ended at the fastest speed.

4) And (3) transmitting the file and the catalogue in the final information tree obtained in the step (3) into command parameters to realize scanning.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and the division of modules, or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units, modules, or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed.

The units may or may not be physically separate, and the components shown as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. The above-described functions defined in the method of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU). The computer readable medium of the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the present invention is not limited thereto, but any changes or substitutions within the technical scope of the present invention should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The incremental code scanning method based on the genetic algorithm is characterized by comprising the following steps of:

acquiring updated file information;

coding the current code file information tree to obtain a coding result;

repeatedly executing selection, intersection and mutation on the initial population to generate a next generation population until the iteration times are reached or no expandable node exists, so as to obtain an optimized current code file information tree;

expanding the influence-free nodes to obtain a final information tree;

and scanning the final information tree to obtain an increment code.

2. The incremental code scanning method based on a genetic algorithm according to claim 1, wherein the step of acquiring the updated file information comprises the steps of:

3. The incremental code scanning method based on genetic algorithm according to claim 2, wherein the current code file information tree is constructed based on the updated file information, specifically comprising the steps of:

4. The incremental code scanning method according to claim 3, wherein the step of reconstructing the historical code file information tree based on the present commit data and the last commit data to obtain the current code file information tree comprises the steps of:

5. The incremental code scanning method based on genetic algorithm of claim 4 wherein constructing the current code file information tree based on the current commit data comprises the steps of:

6. The genetic algorithm-based incremental code scan method of claim 1 wherein the selecting, crossing and mutating are repeatedly performed on the initial population to generate a next generation population, comprising the steps of:

counting the number of nodes of the next layer;

7. The method for scanning incremental codes based on genetic algorithm according to claim 1, wherein the final information tree is scanned to obtain the incremental codes, and specifically comprising the steps of:

decoding the final information tree to obtain a final catalog and a file;

and scanning the final catalogue and the file to obtain the increment code.

8. A genetic algorithm-based incremental code scanning system for implementing the genetic algorithm-based incremental code scanning method according to any one of claims 1 to 7, comprising:

the acquisition module is used for acquiring updated file information;