CN109885479B - Software fuzzy test method and device based on path record truncation - Google Patents

Software fuzzy test method and device based on path record truncation Download PDF

Info

Publication number
CN109885479B
CN109885479B CN201910012433.1A CN201910012433A CN109885479B CN 109885479 B CN109885479 B CN 109885479B CN 201910012433 A CN201910012433 A CN 201910012433A CN 109885479 B CN109885479 B CN 109885479B
Authority
CN
China
Prior art keywords
code
path
truncation
mark
condition code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910012433.1A
Other languages
Chinese (zh)
Other versions
CN109885479A (en
Inventor
宋晓斌
柳晓龙
王允超
武泽慧
魏强
曹琰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN201910012433.1A priority Critical patent/CN109885479B/en
Publication of CN109885479A publication Critical patent/CN109885479A/en
Application granted granted Critical
Publication of CN109885479B publication Critical patent/CN109885479B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to the technical field of software testing, and particularly relates to a software fuzzy testing method and device based on path record truncation, wherein the method comprises the following steps: constructing a project data set and extracting a condition code structure, acquiring model input data of a low-frequency path transfer condition code structure classification model, and performing model training, wherein the classification model adopts an LSTM network model structure; adding a path truncation mark and a mark checking instruction in a pile inserting code of the fuzzy tester; and aiming at the program to be tested, extracting a condition code structure and acquiring model input data, transmitting the model input data to a trained classification model, identifying a low-frequency path transfer condition code structure, performing source code level instrumentation at a corresponding position in a source file, performing path truncation according to a path truncation mark, and completing a fuzzy test. The method identifies the low-frequency path transfer condition code before the program is executed, cancels the high-frequency path sample test by adopting the path truncation strategy, improves the fuzzy test efficiency and the coverage rate, and has strong engineering application prospect.

Description

Software fuzzy test method and device based on path record truncation
Technical Field
The invention belongs to the technical field of software testing, and particularly relates to a software fuzzy testing method and device based on path record truncation.
Background
Fuzz testing (Fuzzing) is an automated software testing technique that provides semi-valid data as input to a test program and monitors the program for anomalies. Due to the simplicity and high efficiency, the method is widely applied to various large software manufacturers and development and test of open source software, and a large number of bugs are found in various types of software. However, with the wide application of software security testing tools and the development awareness of code security, vulnerabilities often occur at locations where code structures are more complex. The existing fuzzing test has remarkable effect on mining code bugs with relatively simple code structures, but is often difficult to capture exceptions when the existing fuzzing test faces codes with complex structures. The reason for this problem is that most test samples perform the same high frequency path, whereas it is difficult to explore the low frequency path.
In order to solve the problems, researchers combine other vulnerability analysis technologies with the fuzzy test technology and successively put forward different fuzzy test methods. The method is mainly divided into a fuzzy test method based on symbolic execution, a fuzzy test method based on taint analysis and a fuzzy test method based on static analysis. The fuzz testing method based on symbolic execution is a fuzz testing technology combined with symbolic execution, and fuzz testing and selective concolic execution are utilized in a balanced mode to find out deeper errors. Selective concolic execution is used to test paths that the fuzz tester judges to be more "valuable" but obstructed. By combining the advantages of lightweight fuzz testing and concopic execution, the disadvantages of path explosion and fuzz imperfection inherent in symbol execution are avoided. A fuzzy test technology based on taint analysis is adopted, and a dynamic taint analysis technology is adopted to analyze which bytes in a test sample are mutated to more easily trigger the exploration of unknown codes, so that more targeted mutation is performed, and finally a better input sample is generated to realize the detection of deep codes. The fuzzy test method based on the static analysis adjusts the attention degree of different seeds by combining the program static analysis technology, optimizes the seed sequencing and selection strategy by utilizing the edge coverage rate information, thereby improving the test probability of low-frequency paths and improving the code coverage rate. Although the three fuzzy testing methods adopt different technologies to improve the low-frequency path testing probability, the testing of a high-frequency path sample still exists, so that the improvement of the low-frequency path testing probability is limited, and the improvement of the overall testing efficiency is not obvious.
Disclosure of Invention
Therefore, the invention provides a software fuzzy test method and device based on path record truncation, which can improve the test efficiency and test coverage rate of deep codes and have strong engineering application prospect.
According to the design scheme provided by the invention, the software fuzzing test method based on path record truncation comprises the following contents:
A) constructing a project data set and extracting a condition code structure, performing model input data processing on the extracted condition code structure, and performing model training as the input of a low-frequency path transfer condition code structure classification model, wherein the classification model adopts an LSTM network model structure;
B) adding a path truncation mark and a mark checking instruction in a pile inserting code of the fuzzy tester;
C) and extracting a condition code structure and carrying out model input data processing aiming at the program to be tested, inputting the processed data as a trained classification model, identifying a low-frequency path transfer condition code structure, carrying out source code level instrumentation at a corresponding position in a source file, carrying out path truncation according to a path truncation mark and finishing source code fuzzy test.
In the above, in a) extracting the condition code structure, firstly, defining a symbolic description facing a source code data set and extracting all condition code structures in the data set, then, performing code analysis on the extracted condition code structures, and performing labeling processing on an analysis result to obtain a code token sequence; and carrying out vector conversion on the token sequence to obtain input data of the classification model.
Preferably, in a), all condition code structures in the data set are extracted, and the following contents are included: firstly, preprocessing a source code data set and extracting effective codes; and then, extracting a condition structure set from the effective codes, constructing a stack structure, recording the position corresponding relation between the code segments and the source code data set, and performing iterative processing on the nested condition structure set according to the stack structure to obtain a minimized code segment.
Preferably, in a), the code parsing process includes the following steps: and extracting a code sequence by adopting an abstract syntax tree, performing symbolization processing, grouping according to the meaning of the entries in the code sequence, acquiring a synonym set, and expanding the code sequence.
Preferably, in a), vector conversion is performed on the token sequence, and the method includes the following steps: performing text vectorization conversion by using word2vec, and obtaining a word vector model by setting feature vector dimensions and word frequency parameter output; and acquiring a dictionary index dictionary and a word vector dictionary according to the word vector model, and acquiring classification model input data according to the dictionary index dictionary and the word vector dictionary.
In the above, in the process of adding the path truncation flag and the flag check instruction, the path truncation flag is inserted into the source code data set, and the path truncation flag is defined in the bss section of the code; and a path truncation marking check is performed at the original stake entry.
Preferably, in step C), for the test program, the error handling code segments identified by the classification model in the training process are instrumented by combining the condition code structure and the position index in the source code data set in an internal and external instrumentation manner.
Further, in C), for the case that the path truncation flag is after the instrumentation code, the original instrumentation of the current condition is cancelled, the instrumentation is performed with a null instruction, and an annotation flag is added at the instruction annotation.
Further, in C), the instrumentation is set to cancel instrumentation at the marked branch for the compilation optimization problem of the same conditional statement in different source code datasets.
A software fuzzing test device based on path record truncation comprises: a training module, a labeling module, and a testing module, wherein,
the training module is used for constructing a project data set, extracting a condition code structure, performing model input data processing on the extracted condition code structure, and performing model training as the input of a low-frequency path transfer condition code structure classification model, wherein the classification model adopts an LSTM network model structure;
the marking module is used for adding a path truncation mark and a mark checking instruction in the instrumentation code of the fuzzy tester;
and the test module is used for extracting a condition code structure and carrying out model input data processing aiming at the program to be tested, inputting the processed data as a trained classification model, identifying a low-frequency path transfer condition code structure, carrying out source code level instrumentation at a corresponding position in a source file, carrying out path truncation according to a path truncation mark and finishing source code fuzzy test.
The invention has the beneficial effects that:
1. aiming at the problem that the high-frequency path influences the fuzzy testing efficiency, the low-frequency path transfer condition code recognition is carried out by adopting a deep learning neural network, source code level instrumentation is carried out on the transfer condition code structure obtained by the training model recognition, and the path record is cut off according to the mark code, so that the probability of low-frequency sample variation is increased, and the fuzzy testing efficiency is finally improved.
2. According to the method, the low-frequency path transfer condition codes are recognized before the program is specifically executed, the high-frequency path sample test is cancelled by adopting the path truncation strategy, the blank of the high-frequency path sample in the aspect of influencing the analysis is filled, the complicated dynamic analysis technology is not relied on, the overhead problem is not caused, the method can be effectively combined with other ash box test technologies, the coverage rate is further improved on the basis of the original test tool, and the method has an important guiding significance on the development of the software test technology.
Description of the drawings:
FIG. 1 is a flow chart of a software fuzz testing method in an embodiment;
FIG. 2 is a schematic diagram of software fuzz testing in an embodiment;
FIG. 3 is a schematic diagram of a classification model constructed by combining word2vec and LSTM neural network models in the embodiment;
FIG. 4 is a schematic diagram of an embodiment of a software fuzz testing apparatus.
The specific implementation mode is as follows:
in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.
In view of the situations of limiting the low-frequency path test probability and limiting the overall test efficiency in the existing software fuzz test, in the embodiment of the present invention, referring to fig. 1, a software fuzz test method based on path record truncation is provided, which includes the following contents:
s101, constructing a project data set and extracting a condition code structure, performing model input data processing on the extracted condition code structure, and performing model training as the input of a low-frequency path transfer condition code structure classification model, wherein the classification model adopts an LSTM network model structure;
s102, adding a path truncation mark and a mark checking instruction in a pile inserting code of the fuzzy tester;
s103, extracting a condition code structure and performing model input data processing aiming at the program to be tested, inputting the processed data as a trained classification model, identifying a low-frequency path transfer condition code structure, performing source code level instrumentation at a corresponding position in a source file, performing path truncation according to a path truncation mark, and completing source code fuzzy test.
Referring to fig. 2, by constructing a data set, extracting all condition code structures in the data set, and performing code analysis on the code structures; preprocessing the analysis result to extract a code token sequence and performing vector conversion on the token sequence; generating a label by the code, and taking the generated vector as the input of an LSTM neural network model to train a low-frequency path transfer condition code structure classification model; adding a path truncation mark and a mark checking instruction in a pile inserting code of the fuzzy tester; and carrying out condition code structure extraction on the test program to obtain a low-frequency path transfer condition code structure, carrying out light-weight source code instrumentation on the corresponding position of the identified structure in the source file, and starting to test the program to be tested.
Most input parsing programs usually have a large number of format checks, and there is a corresponding error handling structure for failure of the check, and such structure usually results in a coverage rate that is difficult to increase. Therefore, the error handling code belongs to a most representative type of low frequency path branch condition code, and refers to a code segment executed when a program input causes a program to be incorrectly exited due to various different reasons. In the extraction of the conditional code structure, in another embodiment of the invention, firstly, the symbolic description facing to the source code data set is defined, all the conditional code structures in the data set are extracted, then, the extracted conditional code structures are subjected to code analysis, and the analysis result is subjected to labeling processing to obtain a code token sequence; and carrying out vector conversion on the token sequence to obtain input data of the classification model. Preferably, all condition code structures in the extracted data set include the following: firstly, preprocessing a source code data set and extracting effective codes; and then, extracting a condition structure set from the effective codes, constructing a stack structure, recording the position corresponding relation between the code segments and the source code data set, and performing iterative processing on the nested condition structure set according to the stack structure to obtain a minimized code segment. Specifically, the descriptors are defined as shown in table 1.
TABLE 1 symbolic description
Figure GDA0003370926820000051
Firstly to SoPreprocessing is performed to remove some unnecessary information, such as code comments, line breaks, and the like. Here, R is used for efficient code extraction to obtain Sn. Then to the processed SnExtraction of IeThere is proposed a method based on bracket stack balancing, i.e.
Figure GDA0003370926820000052
Cl0, so that B can be identified by constructing a stack-like structurelThen C iss+1 when B is identifiedr,Cs1, when C issWhen the ratio is 0, mixingtIs added to Ie. And simultaneously recording the corresponding relation between the code segment and the source file position so as to facilitate the subsequent instrumentation processing. However, after extraction in this way I appearsn,InCan lead to false positives because of assumptions
Figure GDA0003370926820000053
And is
Figure GDA0003370926820000054
If it is considered to be InE is E, then
Figure GDA0003370926820000055
Creating a contradiction. Therefore, also need to be onnAn iterative process is performed to ensure that the extracted code fragments are minimized. By comparing the first extracted IeExtracting in the same manner until each structure is Ir
In another embodiment of the present invention, the code parsing process includes the following steps: and extracting a code sequence by adopting an abstract syntax tree, performing symbolization processing, grouping according to the meaning of the entries in the code sequence, acquiring a synonym set, and expanding the code sequence.
The extracted code segments are parsed into word sequences, and all segments are parsed into sequences of equal length for input into the LSTM model. The code section parsing stage extracts a code sequence using an Abstract Syntax Tree (AST). And simultaneously performing symbolization processing, such as expressing an integer as num and expressing a character string as str. However, in this classification, the content of the character string has a certain influence. Most error handling code fragments contain a characteristic that if a code fragment contains a character string, the character string usually contains words with similar meanings such as error, fail and the like. Therefore, two ways are used to perform string symbolization, which is divided into whether special keywords are included, specifically denoted as errstr and str. Although a part of the misexpression vocabulary has been extracted by the previous analysis, the number is limited. In the embodiment of the invention, WordNet can be used, an English dictionary established and maintained by Princeton university is grouped by vocabulary entry meanings, each vocabulary entry with the same meaning forms a synonym set, and the vocabulary entry group in the synonym set can be used for expanding wrongly expressed vocabulary. Because of adopting the supervised learning method, each sample needs to be labeled, if the code segment belongs to the error processing code segment, the label is marked as 1, otherwise, the label is marked as 0. Here, a heuristic-based approach is used for labeling, and the following 5 heuristics are summarized through a large number of source file analyses: 1) comparisons are typically included in if (…); 2) if the character string is contained, the character string contains an error expression vocabulary; 3) may contain return or jump keywords such as return, goto, etc.; 4) if the function is contained, the error expression vocabulary is usually contained in the function name; 5) it may contain system error macro definitions such as 'EPERM', 'enont', etc.
In another embodiment of the present invention, vector conversion is performed on token sequences, which includes the following steps: performing text vectorization conversion by using word2vec, and obtaining a word vector model by setting feature vector dimensions and word frequency parameter output; and acquiring a dictionary index dictionary and a word vector dictionary according to the word vector model, and acquiring classification model input data according to the dictionary index dictionary and the word vector dictionary.
Vectorization is carried out by taking the obtained token sequence as input, a tool word2vec widely applied to text vectorization is used for conversion, a word vector model is obtained by setting feature vector dimensions and word frequency parameter output, and a dictionary index dictionary and a word vector dictionary are established according to the obtained model and are used as the input of a subsequent LSTM model. Referring to fig. 3, since different code segments contain different numbers of tokens, but the LSTM can only accept inputs of the same length, a padding and trimming process is required. After vectorization and code segment labeling of the code segments are obtained, training of the LSTM network can be started, and a Dropout layer is added in the model except essential matrixes such as an Embedding layer and an LSTM unit to prevent data overfitting results.
The detection stage is used for detecting the type of a given unknown code segment and outputting a file to which the code segment belongs and the position of the code segment in a source file if the code segment belongs to an error processing code. Given an unknown item, the specific detection process is as follows: 1. extracting an error processing code segment structure in each source file in the project and recording the file name and the position of the error processing code segment structure; 2. analyzing the code segments to obtain respective code sequences; 3. vectorizing the code sequence obtained in the last step according to rules by using a word2vec model obtained by early training; 4. and inputting the obtained vector into a trained LSTM network for judgment.
Further, in another embodiment of the present invention, in the process of adding the path truncation flag and the flag checking instruction, the path truncation flag is inserted into the source code data set, and the path truncation flag is defined in the bss section of the code; and a path truncation marking check is performed at the original stake entry.
The assembly code is embedded in the source code. And a mark continue _ log is set by inserting a mark into the source code to realize the function of canceling the record of the subsequent instrumentation. The flag is defined in the bss segment, and since the bss segment data is uninitialized data and the memory is cleared before each operation, the flag may be set to 1 to indicate that the subsequent basic block is not recorded any more. And (4) canceling the record of the basic block at the time by performing continueLog mark query at the entrance of the original instrumentation and jumping to a return point if the index is 1. The subsequent program always marks 1 in operation, so the subsequent basic block is no longer recorded, i.e. if the original path is 1 → 2 → 3 → …, if the mark is contained at 3, this time recording is 1 → 2. And when the next round of test starts, the mark is cleared, and the execution path can be recorded normally.
And (3) performing instrumentation on the error processing code segment by means of the error processing code segment identified by the error code classification model obtained by utilizing LSTM network training in the early stage and combining an index established by the structure and the position of the if-else in the source file. The method is realized by adopting an internal and external instrumentation mode, because if statements are compiled into conditional jump instructions, and instrumentation of the fuzzy tester is judged by the conditional jump instructions, before the if statements, namely, source code instrumentation outside an if structure can influence whether subsequent basic blocks are instrumented or not. Before the first statement of the if structure, that is, performing source code instrumentation in the if structure, the recording mode of the subsequent instrumented basic block may be determined, because the first statement belongs to the beginning of the jump basic block, and a conditional jump instruction may still exist subsequently, the first statement will determine to traverse the recording results of all the subsequent basic blocks on the path of the basic block. The following three cases are mainly distinguished:
the first case is where the error handling code is located in the if structure, in which case it is only necessary to do instrumentation before the if statement and before the first statement in the if structure.
The second case is where the error handling code is located in the else structure, in which case it needs to be instrumented before the first statement in the else structure and the adjacent if statement before the else structure. If the preamble structure is an else if structure, the else if structure needs to be converted into an if structure, and then instrumentation is performed before if.
In the third case, the error processing code is located in the else if structure, and the instrumentation is performed before the if statement and before the first statement in the else if structure after the structure conversion processing is also performed. When the else if structure is instrumented, the original code structure is damaged, so that the repair is needed to ensure the code to be compiled and run correctly. The source code structure repair algorithm in the embodiment of the invention can be designed as follows:
Figure GDA0003370926820000081
the algorithm can realize accurate compiling and running of the program.
Aiming at a test program, the error processing code segments identified by a classification model in a training process are inserted by combining a condition code structure and a position index in a source code data set in an internal and external insertion mode. Preferably, for the situation that the path truncation mark is behind the instrumentation code, the original instrumentation of the current condition is cancelled, the instrumentation is performed by adopting a null instruction, and an annotation mark is added at the instruction annotation position. Preferably, for the compiling optimization problem of the same conditional statement in different source code data sets, the marking branch is set to cancel the instrumentation.
If the continue _ log mark causes the basic block to be recorded after the instrumentation code, although the subsequent code block cannot be recorded continuously because the flag bit is already set to 1, the basic block record is still considered to generate a new path, so that the test sample is retained and the purpose of the targeted test is not achieved. It is therefore necessary in this case to cancel the original instrumentation of the current conditional jump. Null instructions (nop) are instrumented and tagged at instruction annotations, allowing for normal execution flow that does not affect the program. Since the assembly code comments are not cleared after the source code is compiled into assembly code, the decision can be made using the comment tag. And when meeting the annotation mark, assigning the instrumentation mark, then judging the instrumentation according to the instrumentation mark when meeting the conditional jump instruction, then clearing the instrumentation mark, and circulating the process until all code instrumentation is finished. Therefore, the subsequent code block skipping instrumentation process is realized. Due to the problem of compilation optimization, that is, when the same if (i | ═ 0) statement is assembled in different source files, cases such as jz and jnz may exist, and the original instrumentation only performs instrumentation at a negative jump. Therefore, the above situation is dealt with by adopting a mode of canceling the insertion at all the marked branches, and although one effective basic block record is canceled, the information of the effective path is not influenced.
Based on the software fuzzing test method, an embodiment of the present invention further provides a software fuzzing test apparatus based on path record truncation, as shown in fig. 4, including: a training module 101, a labeling module 102, and a testing module 103, wherein,
the training module 101 is used for constructing a project data set, extracting a condition code structure, performing model input data processing on the extracted condition code structure, and performing model training as the input of a low-frequency path transfer condition code structure classification model, wherein the classification model adopts an LSTM network model structure;
the marking module 102 is used for adding a path truncation mark and a mark checking instruction in the instrumentation code of the fuzzy tester;
the test module 103 is configured to extract a condition code structure and perform model input data processing for a program to be tested, input the processed data as a trained classification model, identify a low-frequency path transfer condition code structure, perform source code level instrumentation at a corresponding position in a source file, perform path truncation according to a path truncation flag, and complete a source code fuzzy test.
Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
Based on the foregoing method, an embodiment of the present invention further provides a server, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method described above.
Based on the above method, the embodiment of the present invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the above method.
The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A software fuzzing test method based on path record truncation is characterized by comprising the following contents:
A) constructing a project data set and extracting a condition code structure, performing model input data processing on the extracted condition code structure, and performing model training as the input of a low-frequency path transfer condition code structure classification model, wherein the classification model adopts an LSTM network model structure;
B) adding a path truncation mark and a mark checking instruction in a pile inserting code of the fuzzy tester;
C) and extracting a condition code structure and carrying out model input data processing aiming at the program to be tested, inputting the processed data as a trained classification model, identifying a low-frequency path transfer condition code structure, carrying out source code level instrumentation at a corresponding position in a source file, carrying out path truncation according to a path truncation mark and finishing source code fuzzy test.
2. The software fuzzy test method based on path record truncation according to claim 1, wherein in A) extracting a condition code structure, firstly defining a symbolic description facing a source code data set and extracting all condition code structures in the data set, then performing code analysis on the extracted condition code structure, and performing labeling processing on an analysis result to obtain a code token sequence; and carrying out vector conversion on the token sequence to obtain input data of the classification model.
3. The software fuzzing test method based on path record truncation according to claim 2, wherein in A), all condition code structures in a data set are extracted, and the condition code structures comprise the following contents: firstly, preprocessing a source code data set and extracting effective codes; and then, extracting a condition structure set from the effective codes, constructing a stack structure, recording the position corresponding relation between the code segments and the source code data set, and performing iterative processing on the nested condition structure set according to the stack structure to obtain a minimized code segment.
4. The software fuzzing test method based on path record truncation according to claim 2, wherein in A), the code parsing process comprises the following steps: and extracting a code sequence by adopting an abstract syntax tree, performing symbolization processing, grouping according to the meaning of the entries in the code sequence, acquiring a synonym set, and expanding the code sequence.
5. The software fuzzing test method based on path record truncation according to claim 2, wherein in A), vector conversion is performed on token sequences, and the method comprises the following steps: performing text vectorization conversion by using word2vec, and obtaining a word vector model by setting feature vector dimensions and word frequency parameter output; and acquiring a dictionary index dictionary and a word vector dictionary according to the word vector model, and acquiring classification model input data according to the dictionary index dictionary and the word vector dictionary.
6. The software fuzzing test method based on path record truncation according to claim 1, wherein in the step B), in the process of adding the path truncation mark and the mark checking instruction, the path truncation mark is defined in a bss section of the code by inserting the path truncation mark in the source code data set; and a path truncation marking check is performed at the original stake entry.
7. The software fuzzing test method based on path record truncation according to claim 6, wherein in C), aiming at a test program, the error processing code segments identified by the classification model in the training process are inserted by combining a condition code structure and a position index in a source code data set in an internal and external insertion mode.
8. The software fuzzing test method based on path record truncation according to claim 7, wherein in C), for the situation that the path truncation mark is after the instrumentation code, the original instrumentation of the current condition is cancelled, the instrumentation is performed by adopting a null instruction, and an annotation mark is added at the instruction annotation position.
9. The software fuzzing test method based on path record truncation according to claim 7, wherein in C), for the compiling optimization problem of the same conditional statement in different source code data sets, the instrumentation is set to be cancelled at the marked branch.
10. A software fuzzing test device based on path record truncation is characterized by comprising: a training module, a labeling module, and a testing module, wherein,
the training module is used for constructing a project data set, extracting a condition code structure, performing model input data processing on the extracted condition code structure, and performing model training as the input of a low-frequency path transfer condition code structure classification model, wherein the classification model adopts an LSTM network model structure;
the marking module is used for adding a path truncation mark and a mark checking instruction in the instrumentation code of the fuzzy tester;
and the test module is used for extracting a condition code structure and carrying out model input data processing aiming at the program to be tested, inputting the processed data as a trained classification model, identifying a low-frequency path transfer condition code structure, carrying out source code level instrumentation at a corresponding position in a source file, carrying out path truncation according to a path truncation mark and finishing source code fuzzy test.
CN201910012433.1A 2019-01-07 2019-01-07 Software fuzzy test method and device based on path record truncation Active CN109885479B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910012433.1A CN109885479B (en) 2019-01-07 2019-01-07 Software fuzzy test method and device based on path record truncation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910012433.1A CN109885479B (en) 2019-01-07 2019-01-07 Software fuzzy test method and device based on path record truncation

Publications (2)

Publication Number Publication Date
CN109885479A CN109885479A (en) 2019-06-14
CN109885479B true CN109885479B (en) 2022-02-01

Family

ID=66925678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910012433.1A Active CN109885479B (en) 2019-01-07 2019-01-07 Software fuzzy test method and device based on path record truncation

Country Status (1)

Country Link
CN (1) CN109885479B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306853B (en) * 2019-08-01 2023-12-12 深圳市腾讯计算机系统有限公司 Fuzzy test method, device, equipment and medium
CN110851830B (en) * 2019-10-24 2021-08-03 中国人民解放军战略支援部队信息工程大学 CPU (Central processing Unit) -oriented undisclosed instruction discovery method based on instruction format identification
CN111191245B (en) * 2019-12-24 2022-06-17 中国人民解放军战略支援部队信息工程大学 Fuzzy test method based on path perception mutation strategy
CN111563040B (en) * 2020-05-08 2023-08-15 中国工商银行股份有限公司 Block chain intelligent contract code testing method and device
CN111913878B (en) * 2020-07-13 2023-09-15 苏州洞察云信息技术有限公司 Byte code instrumentation method, device and storage medium based on program analysis result
CN112069061B (en) * 2020-08-19 2021-08-20 北京科技大学 Software security vulnerability detection method and system for deep learning gradient guidance variation
CN114546816A (en) * 2020-11-25 2022-05-27 腾讯科技(深圳)有限公司 Test method, test platform, test device, electronic equipment and storage medium
CN112905493B (en) * 2021-04-07 2023-07-18 南京大学 Structured fuzzy test method based on conversion test
CN113434386B (en) * 2021-05-26 2022-10-04 深圳开源互联网安全技术有限公司 Method, system and storage medium for fuzz testing
CN113688036A (en) * 2021-08-13 2021-11-23 北京灵汐科技有限公司 Data processing method, device, equipment and storage medium
CN114064506B (en) * 2021-11-29 2023-04-04 电子科技大学 Binary program fuzzy test method and system based on deep neural network
CN114491424B (en) * 2021-12-31 2024-05-03 西安电子科技大学 Binary code clipping method based on fuzzy test

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385550A (en) * 2010-08-30 2012-03-21 北京理工大学 Detection method for software vulnerability
CN109032942A (en) * 2018-07-24 2018-12-18 北京理工大学 A kind of fuzz testing frame based on AFL
CN109117367A (en) * 2018-07-24 2019-01-01 北京理工大学 A kind of fuzz testing variation quantity determines method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385550A (en) * 2010-08-30 2012-03-21 北京理工大学 Detection method for software vulnerability
CN109032942A (en) * 2018-07-24 2018-12-18 北京理工大学 A kind of fuzz testing frame based on AFL
CN109117367A (en) * 2018-07-24 2019-01-01 北京理工大学 A kind of fuzz testing variation quantity determines method and apparatus

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
RankFuzz: Fuzz Testing Based on Comprehensive Evaluation;Cheng Li 等;《2012 Fourth International Conference on Multimedia Information Networking and Security》;20121231;全文 *
S2F:Discover Hard-to-Reach Vulnerabilities by Semi-Symbolic Fuzz Testing;Bin Zhang 等;《2017 13th International Conference on Computational Intelligence and Security》;20171231;全文 *
基于功能性测试的软件质量模糊综合评判;王蕴君 等;《电子工程师》;20061031;全文 *
基于异常分布导向的智能Fuzzing方法;欧阳永基 等;《电子与信息学报》;20150131;全文 *

Also Published As

Publication number Publication date
CN109885479A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN109885479B (en) Software fuzzy test method and device based on path record truncation
CN102339252B (en) Static state detecting system based on XML (Extensive Makeup Language) middle model and defect mode matching
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
CN111611586B (en) Software vulnerability detection method and device based on graph convolution network
CN111459799B (en) Software defect detection model establishing and detecting method and system based on Github
CN108491228B (en) Binary vulnerability code clone detection method and system
CN116049831A (en) Software vulnerability detection method based on static analysis and dynamic analysis
CN112733156A (en) Intelligent software vulnerability detection method, system and medium based on code attribute graph
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
CN113609488B (en) Vulnerability detection method and system based on self-supervised learning and multichannel hypergraph neural network
CN112699665A (en) Triple extraction method and device of safety report text and electronic equipment
CN115269427A (en) Intermediate language representation method and system for WEB injection vulnerability
CN111881300A (en) Third-party library dependency-oriented knowledge graph construction method and system
CN116305158A (en) Vulnerability identification method based on slice code dependency graph semantic learning
Meng et al. A deep learning approach for a source code detection model using self-attention
US11947572B2 (en) Method and system for clustering executable files
CN115066674A (en) Method for evaluating source code using numeric array representation of source code elements
Chen et al. Author identification of software source code with program dependence graphs
CN116149669B (en) Binary file-based software component analysis method, binary file-based software component analysis device and binary file-based medium
Ye et al. Misim: A neural code semantics similarity system using the context-aware semantics structure
CN116595537A (en) Vulnerability detection method of generated intelligent contract based on multi-mode features
KR20220077847A (en) A technique to BinDiff cross architecture binaries
WO2021160822A1 (en) A method for linking a cve with at least one synthetic cpe
Ullah et al. Efficient features for function matching in multi-architecture binary executables
CN116414445B (en) Homology detection method and system based on source code watermark

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant