CN116108449A - Software fuzzy test method, device, equipment and storage medium - Google Patents
Software fuzzy test method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN116108449A CN116108449A CN202310067956.2A CN202310067956A CN116108449A CN 116108449 A CN116108449 A CN 116108449A CN 202310067956 A CN202310067956 A CN 202310067956A CN 116108449 A CN116108449 A CN 116108449A
- Authority
- CN
- China
- Prior art keywords
- field
- software
- program
- corresponding relation
- test
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010998 test method Methods 0.000 title claims abstract description 23
- 238000003860 storage Methods 0.000 title claims abstract description 21
- 238000012360 testing method Methods 0.000 claims abstract description 128
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000003062 neural network model Methods 0.000 claims abstract description 31
- 230000008569 process Effects 0.000 claims abstract description 16
- 238000005457 optimization Methods 0.000 claims abstract description 12
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 10
- 230000015654 memory Effects 0.000 claims description 27
- 238000004458 analytical method Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 6
- 230000003044 adaptive effect Effects 0.000 claims description 4
- 230000035772 mutation Effects 0.000 description 11
- 238000013461 design Methods 0.000 description 8
- 230000005291 magnetic effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000013522 software testing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application provides a software fuzzy test method, a device, equipment and a storage medium, wherein the method comprises the following steps: inputting an initial seed file into software to be tested, and acquiring a corresponding relation among input bytes, binary program instructions and basic program blocks generated in the testing process; integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm to obtain field boundary information; further determining the corresponding relation between the basic blocks and the fields of the program; inputting the program basic block information corresponding to the field into a pre-trained neural network model, and determining the field type of the field; recording the field type in a file with a preset format to obtain a format template file; and a fuzzy test tool is adopted to carry out fuzzy test on the software based on the format template file, record the variation execution result, and carry out self-adaptive optimization of the fuzzy test on the software, thereby improving the efficiency of the software test.
Description
Technical Field
The present disclosure relates to the field of computer and software technologies, and in particular, to a software fuzzy testing method, device, equipment and storage medium.
Background
Along with the continuous popularization of information technology, the informatization is spread over the aspects of social production and life, people put higher requirements on software security, software security holes are found in advance, and targeted restoration is of great significance to the current social maintenance order and stable development. Software fuzzing is one of the effective methods of discovering software vulnerabilities.
In the prior art, the black box fuzzy test scheme is a common software fuzzy test method. The black box fuzzy test is to input an initial seed file into a program, and a worker is required to analyze and test according to the output result and the breakdown state of the program.
However, the inventors found that the prior art has at least the following technical problems: the black box fuzzy test scheme requires a great deal of manpower and expert knowledge, and has low test efficiency.
Disclosure of Invention
The application provides a software fuzzy test method, device, equipment and storage medium, which are used for solving the problem of low test efficiency.
In a first aspect, the present application provides a software ambiguity test method, including:
inputting an initial seed file into software to be tested, and acquiring input bytes and executed binary program instructions used in the test process;
acquiring a first corresponding relation between a binary program instruction and a program basic block;
determining a second corresponding relation between the input byte and the binary program instruction by adopting a dynamic taint analysis method;
determining a third corresponding relation between the input byte and the basic block of the program according to the first corresponding relation and the second corresponding relation;
according to the third corresponding relation, integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm to obtain field boundary information;
determining a fourth corresponding relation between the basic block and the field according to the third corresponding relation and the field boundary information;
inputting the basic block information of the program corresponding to the field into a pre-trained neural network model according to the fourth corresponding relation, and determining the field type corresponding to the field;
recording the field types in a file with a preset format according to a format information model to obtain a format template file corresponding to the initial seed file;
and carrying out fuzzy test on the software based on the format template file by adopting a fuzzy test tool, recording a mutation execution result, and carrying out self-adaptive optimization of the software fuzzy test according to the mutation execution result.
In one possible design, according to the fourth correspondence, inputting the program basic block information corresponding to the field into the pre-trained neural network model, determining the field type corresponding to the field includes: obtaining a program basic block corresponding to the field according to the fourth corresponding relation; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
In one possible design, a fuzzy test tool is used to perform fuzzy test on software based on a format template file, record a mutation execution result, and perform adaptive optimization of the software fuzzy test according to the mutation execution result, including: inputting the initial seed file and a format template file corresponding to the initial seed file into software to be tested by adopting a fuzzy test tool, and generating a new test case for the variation of the initial seed file based on format information; judging whether the code coverage rate increment corresponding to the test case in the fuzzy test is larger than or equal to a first preset value, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, re-extracting the format template file of the test case; recording the variation execution times of the format template files corresponding to the test cases in the fuzzy test process, calculating the average value of the variation execution times, and performing the variation execution again on the test cases corresponding to the format template with the variation execution times smaller than the average value when the code coverage rate increase speed of the format template files corresponding to the test cases is smaller than a second preset value.
In one possible design, the determining the second correspondence between the input bytes and the binary program instructions using dynamic taint analysis includes: and adopting a dynamic binary instrumentation tool to perform dynamic taint analysis processing on the input bytes and the binary program instructions, and obtaining the corresponding relation between the input byte offset and the registers or the memories related to the binary program instructions as a second corresponding relation between the input bytes and the binary program instructions.
In one possible design, the method inputs the basic block information of the program corresponding to the field into the pre-trained neural network model, and before determining the field type corresponding to the field, the method further includes: acquiring field boundary information of a sample file and a field type of a sample field; obtaining program basic block information corresponding to the sample field according to the field boundary information, and vectorizing the program basic block information of the sample; training a neural network model based on field types of the sample fields and the vectorized program basic block information to obtain a pre-trained neural network model.
In a second aspect, the present application provides a software ambiguity test apparatus, including:
the first acquisition module is used for inputting the initial seed file into the software to be tested and acquiring input bytes and executed binary program instructions used in the test process;
the second acquisition module is used for acquiring a first corresponding relation between the binary program instruction and the program basic block;
the first determining module is used for determining a second corresponding relation between the input bytes and the binary program instruction by adopting a dynamic taint analysis method;
the second determining module is used for determining a third corresponding relation between the input byte and the basic program block according to the first corresponding relation and the second corresponding relation;
the clustering module is used for integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm according to the third corresponding relation to obtain field boundary information;
the third determining module is used for determining a fourth corresponding relation between the basic block and the field according to the third corresponding relation and the field boundary information;
the fourth determining module is used for inputting the basic block information of the program corresponding to the field into the pre-trained neural network model according to the fourth corresponding relation, and determining the field type corresponding to the field;
the recording module is used for recording the field types in a file with a preset format according to the format information model to obtain a format template file corresponding to the initial seed file;
and the testing module is used for carrying out fuzzy testing on the software based on the format template file by adopting a fuzzy testing tool, recording a variation execution result and carrying out self-adaptive optimization of the software fuzzy testing according to the variation execution result.
In one possible design, the fourth determining module is configured to obtain a program basic block corresponding to the field according to the fourth corresponding relationship; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
In a third aspect, the present application provides a computer device comprising: at least one processor and memory;
the memory stores computer-executable instructions;
at least one processor executes computer-executable instructions stored in a memory, causing the at least one processor to perform the software ambiguity test method as described above in the first aspect and various possible designs of the first aspect.
In a fourth aspect, the present application provides a computer storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the software ambiguity test method of the first aspect and the various possible designs of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, including a computer program, which when executed by a processor implements the software fuzzing method of the first aspect and the various possible designs of the first aspect.
According to the software fuzzy test method, device, equipment and storage medium, the corresponding relation between the input bytes and the binary program instructions generated in the test process is obtained through the dynamic taint analysis method, the input bytes are combined into the fields according to the minimum clustering algorithm, the field boundary information of the fields is obtained, the field types of the fields are obtained through inputting the program basic block information corresponding to the fields into the neural network, the field types are recorded in the format template file in the preset format, the fuzzy test tool is adopted to carry out fuzzy test on the format template file, the self-adaptive optimization is carried out according to the variation execution result, and the efficiency of the software test is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is an application scenario schematic diagram of a software ambiguity test method provided in an embodiment of the present application;
FIG. 2 is a flow chart of a software ambiguity test method according to one embodiment of the present application;
FIG. 3 is a flowchart of a software ambiguity test method according to another embodiment of the present application;
FIG. 4 is a flowchart of a software ambiguity test method according to another embodiment of the present application;
fig. 5 is a schematic structural diagram of a software ambiguity test device according to an embodiment of the present application;
fig. 6 is a schematic hardware structure of a computer device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Aiming at the problem of lower test efficiency in the prior art, the embodiment of the application provides the following technical scheme: the method comprises the steps of inputting an initial seed file into software to be tested, combining input bytes in a testing process into fields to obtain field boundaries, analyzing field types of the fields and generating format template files, carrying out fuzzification testing on the software based on different format models, and recording variation execution times of the different format template files. The following will explain in detail the embodiments.
Fig. 1 is an application scenario schematic diagram of a software ambiguity test method provided in an embodiment of the present application. As shown in fig. 1, the computer device 101 inputs an initial seed file into software to be tested, performs a software test, and sends a test result to the display terminal 102 for display.
The following describes the technical solution of the present application and how the technical solution of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a schematic flow chart of a software ambiguity test method provided in the embodiment of the present application, where the execution body of the embodiment may be a computer device in the embodiment shown in fig. 1, or any computer processing device, and the embodiment is not limited herein. As shown in fig. 2, the method includes:
s201: the initial seed file is input into the software to be tested, and input bytes and executed binary program instructions used in the testing process are obtained.
Where the initial seed file broadly refers to various types of inputs including, but not limited to, files in a file system, command line inputs, network message inputs, and the like.
Specifically, the initial seed file is input into the software program to be tested, the software program to be tested starts to run, input bytes used in the initial seed file are obtained, and binary program instructions executed in the software running process are obtained.
S202: a first correspondence between binary program instructions and program basic blocks is obtained.
Specifically, an open source static analysis tool Angr is used for analyzing a program control flow graph, and a first corresponding relation between a binary program instruction and a program basic block is obtained.
S203: a second correspondence between the input bytes and the binary program instructions is determined using dynamic taint analysis.
Specifically, a dynamic binary instrumentation tool is adopted, and based on a dynamic taint analysis method, input bytes and binary program instructions are processed to obtain a corresponding relation between input byte offset and a register or a memory related to the binary program instructions, and the corresponding relation is used as a second corresponding relation between the input bytes and the binary program instructions.
In this embodiment, the dynamic binary instrumentation tool may be an "Intel pintools" dynamic binary instrumentation tool.
S204: and determining a third corresponding relation between the input byte and the basic block of the program according to the first corresponding relation and the second corresponding relation.
Specifically, a third correspondence between the input byte and the program basic block is determined based on the correspondence between the input byte and the binary program instruction and the correspondence between the binary program instruction and the program basic block.
S205: and integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm according to the third corresponding relation to obtain field boundary information.
Specifically, the input bytes in the same basic block are integrated by utilizing a minimum clustering algorithm, continuous offset is used as a field, and if fields among different basic blocks overlap, the minimum unit in the fields is used as the field, so that field boundary information is obtained.
The field boundary information is the offset of the field in the input byte and the field length.
S206: and determining a fourth corresponding relation between the basic block and the field according to the third corresponding relation and the field boundary information.
S207: and according to the fourth corresponding relation, inputting the program basic block information corresponding to the field into the pre-trained neural network model, and determining the field type corresponding to the field.
Wherein, the field type contains one or more of the following, length, enumeration, magic number, character string, check code, offset: the length represents the length of the data byte number or array; enumeration represents an enumerated type that can only take some specific values; magic numbers represent some hard-coded special bytes, commonly referred to as file headers, etc.; the character string represents a coded character sequence such as ASCII, unicode; the check code represents a special field for checking the integrity of other bytes in the input; the offset represents a field indicating the position of the other specific part in the input.
Specifically, according to the fourth corresponding relation, obtaining a program basic block corresponding to the field; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
In this embodiment, binary program instructions of the program basic blocks corresponding to the fields may be obtained according to the second correspondence; vectorizing binary program instructions of the program basic blocks corresponding to the fields by using open source items VEX and Keras in a single-hot coding mode to obtain vectorization information of the binary program instructions corresponding to the fields; integrating the vectorization information of the binary program instruction corresponding to the field to obtain vectorization information of the binary program basic block corresponding to the field; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
S208: and recording the field types in a file with a preset format according to the format information model to obtain a format template file corresponding to the initial seed file.
In this embodiment, the preset file format may be a "pit file", where the length, offset, and corresponding field type corresponding to the input field are recorded.
S209: and carrying out fuzzy test on the software based on the format template file by adopting a fuzzy test tool, recording a mutation execution result, and carrying out self-adaptive optimization of the software fuzzy test according to the mutation execution result.
Specifically, a fuzzy test tool is adopted, fuzzy test is carried out on software based on a format template file, input bytes in an initial seed file are mutated to obtain a new test case, format template analysis is carried out on the test case meeting preset conditions, and a mutated test result is recorded and self-adaptive optimization is carried out.
In summary, according to the software fuzzy test method provided by the embodiment, the correspondence between the input bytes and the binary program instruction generated in the test process is obtained by adopting the dynamic taint analysis method, the input bytes are combined into the field according to the minimum clustering algorithm, the field boundary information of the field is obtained, the field type of the field is obtained by inputting the program basic block information corresponding to the field into the neural network, the field type is recorded in the format template file with the preset format, the fuzzy test tool is adopted to carry out fuzzy test on the format template file, and the adaptive optimization is carried out according to the variation execution result, so that the efficiency of the software test is improved.
Fig. 3 is a flowchart of a software ambiguity test method according to another embodiment of the present application. The embodiment of the present application describes S209 in detail on the basis of the embodiment provided in fig. 2. As shown in fig. 3, the method includes:
s301: and inputting the initial seed file and a format template file corresponding to the initial seed file into software to be tested by adopting a fuzzy test tool, and generating a new test case for the variation of the initial seed file based on the format information.
The format information is information recorded in the format template, and includes field boundary information, namely the offset and the length of a field in an input byte, and a corresponding field type.
Specifically, the input bytes in the initial seed file are mutated based on the format information to generate new test cases, different targeted mutation strategies are adopted for different field types, the probability of generating effective input is remarkably improved, and the software test efficiency is improved.
In this embodiment, the fuzzy test tool may be an AFL test tool; the specific mutation policy may be a specific integer value for the length type, or a specific enumeration value for the enumeration type.
S302: judging whether the code coverage rate increment corresponding to the test case in the fuzzy test is larger than or equal to a first preset value, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, re-extracting the format template file of the test case.
Specifically, a self-contained tool in the fuzzy test is adopted to obtain the code coverage rate corresponding to the test case, whether the code coverage rate increment is larger than or equal to a first preset value is judged through the function in the fuzzy test tool which is improved in advance, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, the format template file of the test case is extracted again.
Illustratively, the code coverage may be 80%,90%, etc. of no more than a percentage of 1.
S303: recording the variation execution times of the format template files corresponding to the test cases in the fuzzy test process, calculating the average value of the variation execution times, and performing the variation execution again on the test cases corresponding to the format template with the variation execution times smaller than the average value when the code coverage rate increase speed of the format template files corresponding to the test cases is smaller than a second preset value.
Specifically, the test cases are tested by adopting a test tool, whether the code coverage rate increase speed of the format template file corresponding to the test cases is smaller than a second preset value is judged by adopting functions in a fuzzy test tool which are improved in advance, if the code coverage rate increase speed of the format template file corresponding to the test cases is smaller than the second preset value, the input bytes in the test cases corresponding to the format templates with variation execution times smaller than the average value are changed again, a new test case is obtained, and software fuzzy test is carried out on the new test case.
In summary, according to the software fuzzy testing method provided by the embodiment, the software to be tested is tested based on the initial seed file by adopting the improved fuzzy testing tool, and the testing energy is secondarily distributed according to the increment amount and the increment speed of the code coverage rate corresponding to the test case, so that the efficiency of software testing is further improved.
Fig. 4 is a flowchart of a software ambiguity test method according to another embodiment of the present application. The embodiment of the present application is based on the embodiment provided in fig. 2, and the training model is described in detail before S206. As shown in fig. 4, the method includes:
s401: and acquiring field boundary information of the sample file and field types of the sample fields.
The sample file includes, but is not limited to, a file in a file system of known format information, command line input, network message input, etc.
Specifically, the field boundary information and field type of the sample file are automatically extracted using an Autoit script programming language and 010Editor software, and sent to a display terminal for inspection and correction by a tester.
S402: and obtaining program basic block information corresponding to the sample field according to the field boundary information, and vectorizing the program basic block information of the sample.
Specifically, a first corresponding relation between a binary program instruction and a program basic block is obtained, a second corresponding relation between an input byte contained in a sample field and the binary program instruction is determined by adopting a dynamic taint analysis method, a third corresponding relation between the input byte and the program basic block is determined according to the first corresponding relation and the second corresponding relation, further program basic block information corresponding to the sample field is obtained, vectorization is carried out on the program basic block information of the sample, and binary program basic block vectorization information is obtained.
S403: training a neural network model based on field types of the sample fields and the vectorized program basic block information to obtain a pre-trained neural network model.
Specifically, the field type of the sample field and the vectorized program basic block information are input into a neural network model to obtain a pre-trained neural network model.
The neural network model may be, for example, a convolutional neural network model.
In summary, according to the software fuzzy test method provided by the embodiment, the neural network model is trained according to the field boundary information and the field type of the sample field, so that the accuracy of the neural network model is improved, and the efficiency of the software fuzzy test is further improved.
Fig. 5 is a schematic structural diagram of a software ambiguity test apparatus according to an embodiment of the present application. As shown in fig. 5, the software ambiguity test apparatus includes: a first acquisition module 501, a second acquisition module 502, a first determination module 503, a second determination module 504, a clustering module 505, a third determination module 506, a fourth determination module 507, a recording module 508, and a test module 509.
The first obtaining module 501 inputs an initial seed file into software to be tested, and obtains input bytes and executed binary program instructions used in the testing process;
a second obtaining module 502, configured to obtain a first correspondence between a binary program instruction and a program basic block;
a first determining module 503, configured to determine a second correspondence between the input byte and the binary program instruction by using a dynamic taint analysis method;
a second determining module 504, configured to determine a third correspondence between the input byte and the basic block of the program according to the first correspondence and the second correspondence;
the clustering module 505 is configured to integrate, according to the third correspondence, a minimum continuous byte in the input bytes into a field by using a minimum clustering algorithm, so as to obtain field boundary information;
a third determining module 506, configured to determine a fourth corresponding relationship between the program basic block and the field according to the third corresponding relationship and the field boundary information;
a fourth determining module 507, configured to input, according to a fourth correspondence, program basic block information corresponding to the field to the pre-trained neural network model, and determine a field type corresponding to the field;
the recording module 508 is configured to record the field type in a file with a preset format according to the format information model, so as to obtain a format template file corresponding to the initial seed file;
the test module 509 is configured to perform a fuzzy test on the software based on the format template file by using a fuzzy test tool, record a mutation execution result, and perform adaptive optimization of the fuzzy test on the software according to the mutation execution result.
In a possible implementation manner, the fourth determining module 507 is specifically configured to obtain a program basic block corresponding to the field according to the fourth corresponding relationship; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
In one possible implementation manner, the test module 509 is specifically configured to input the initial seed file and a format template file corresponding to the initial seed file into the software to be tested by using a fuzzy test tool, and generate a new test case for mutation of the initial seed file based on the format information; judging whether the code coverage rate increment corresponding to the test case in the fuzzy test is larger than or equal to a first preset value, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, re-extracting the format template file of the test case; recording the variation execution times of the format template files corresponding to the test cases in the fuzzy test process, calculating the average value of the variation execution times, and performing the variation execution again on the test cases corresponding to the format template with the variation execution times smaller than the average value when the code coverage rate increase speed of the format template files corresponding to the test cases is smaller than a second preset value.
In one possible implementation manner, the first determining module 503 is specifically configured to perform dynamic taint analysis processing on the input byte and the binary program instruction by using a dynamic binary instrumentation tool, so as to obtain a corresponding relationship between the input byte offset and a register or a memory related to the binary program instruction, as a second corresponding relationship between the input byte and the binary program instruction.
In one possible implementation manner, the software ambiguity test device further includes a training module 510, specifically configured to obtain field boundary information of the sample file and a field type of the sample field; obtaining program basic block information corresponding to the sample field according to the field boundary information, and vectorizing the program basic block information of the sample; training a neural network model based on field types of the sample fields and the vectorized program basic block information to obtain a pre-trained neural network model.
The device provided in this embodiment may be used to implement the technical solution of the foregoing method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.
Fig. 6 is a schematic hardware structure of a computer device according to an embodiment of the present application. As shown in fig. 6, the computer device of the present embodiment includes: a processor 601 and a memory 602; wherein the method comprises the steps of
A memory 602 for storing computer-executable instructions;
the processor 601 is configured to execute computer-executable instructions stored in the memory to implement the steps performed by the computer device in the above embodiments. Reference may be made in particular to the relevant description of the embodiments of the method described above.
Alternatively, the memory 602 may be separate or integrated with the processor 601.
When the memory 602 is provided separately, the computer device further comprises a bus 603 for connecting the memory 602 and the processor 601.
The embodiment of the application also provides a computer storage medium, wherein computer execution instructions are stored in the computer storage medium, and when a processor executes the computer execution instructions, the software ambiguity test method is realized.
The embodiment of the application also provides a computer program product, which comprises a computer program, and when the computer program is executed by a processor, the software ambiguity test method is realized. The embodiment of the application also provides a computer program product, which comprises a computer program, and when the computer program is executed by a processor, the software ambiguity test method is realized.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to implement the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit. The units formed by the modules can be realized in a form of hardware or a form of hardware and software functional units.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or processor to perform some steps of the methods of the various embodiments of the present application.
It should be understood that the above processor may be a central processing unit (Central Processing Unit, abbreviated as CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, abbreviated as DSP), application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc.
The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.
Claims (10)
1. A software ambiguity test method, comprising:
inputting an initial seed file into software to be tested, and acquiring input bytes and executed binary program instructions used in the test process;
acquiring a first corresponding relation between the binary program instruction and a program basic block;
determining a second correspondence between the input bytes and the binary program instructions by using a dynamic taint analysis method;
determining a third corresponding relation between the input byte and the program basic block according to the first corresponding relation and the second corresponding relation;
according to the third corresponding relation, integrating the smallest continuous bytes in the input bytes into fields by utilizing a smallest clustering algorithm to obtain field boundary information;
determining a fourth corresponding relation between the program basic block and the field according to the third corresponding relation and the field boundary information;
inputting the basic block information of the program corresponding to the field into a pre-trained neural network model according to the fourth corresponding relation, and determining the field type corresponding to the field;
recording the field type in a file with a preset format according to a format information model to obtain a format template file corresponding to the initial seed file;
and carrying out fuzzy test on the software based on the format template file by adopting a fuzzy test tool, recording a variation execution result, and carrying out self-adaptive optimization of the software fuzzy test according to the variation execution result.
2. The method according to claim 1, wherein the inputting the program basic block information corresponding to the field into the pre-trained neural network model according to the fourth correspondence, determining the field type corresponding to the field, includes:
obtaining a program basic block corresponding to the field according to the fourth corresponding relation;
vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields;
and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
3. The method of claim 1, wherein the employing the fuzzy test tool to fuzzify the software based on the format template file, recording a variant execution result, and performing adaptive optimization of the software fuzzing according to the variant execution result comprises:
inputting the initial seed file and a format template file corresponding to the initial seed file into software to be tested by adopting a fuzzy test tool, and generating a new test case for the variation of the initial seed file based on format information;
judging whether the code coverage rate increment corresponding to the test case in the fuzzy test is larger than or equal to a first preset value, and if the code coverage rate increment corresponding to the test case is larger than or equal to the first preset value, re-extracting the format template file of the test case;
recording the variation execution times of the format template file corresponding to the test case in the fuzzy test process, calculating the average value of the variation execution times, and performing the variation execution again on the test case corresponding to the format template with the variation execution times smaller than the average value when the code coverage rate increase speed of the format template file corresponding to the test case is smaller than a second preset value.
4. The method of claim 1, wherein said determining a second correspondence between said input bytes and said binary program instructions using dynamic taint analysis comprises:
and adopting a dynamic binary instrumentation tool to perform dynamic taint analysis processing on the input byte and the binary program instruction to obtain a corresponding relation between the input byte offset and a register or a memory related to the binary program instruction, wherein the corresponding relation is used as a second corresponding relation between the input byte and the binary program instruction.
5. The method according to any one of claims 1 to 4, wherein the inputting the program basic block information corresponding to the field into the pre-trained neural network model, before determining the field type corresponding to the field, further comprises:
acquiring field boundary information of a sample file and a field type of a sample field;
obtaining program basic block information corresponding to the sample field according to the field boundary information, and vectorizing the program basic block information of the sample;
training a neural network model based on the field type of the sample field and the vectorized program basic block information to obtain a pre-trained neural network model.
6. A software ambiguity test apparatus, comprising:
the first acquisition module is used for inputting the initial seed file into the software to be tested and acquiring input bytes and executed binary program instructions used in the test process;
the second acquisition module is used for acquiring a first corresponding relation between the binary program instruction and the program basic block;
the first determining module is used for determining a second corresponding relation between the input byte and the binary program instruction by adopting a dynamic taint analysis method;
the second determining module is used for determining a third corresponding relation between the input byte and the basic program block according to the first corresponding relation and the second corresponding relation;
the clustering module is used for integrating the minimum continuous bytes in the input bytes into fields by utilizing a minimum clustering algorithm according to the third corresponding relation to obtain field boundary information;
a third determining module, configured to determine a fourth correspondence between the program basic block and the field according to the third correspondence and field boundary information;
a fourth determining module, configured to input, according to the fourth correspondence, program basic block information corresponding to the field to a pre-trained neural network model, and determine a field type corresponding to the field;
the recording module is used for recording the field types in a file with a preset format according to a format information model to obtain a format template file corresponding to the initial seed file;
and the testing module is used for carrying out fuzzy testing on the software based on the format template file by adopting a fuzzy testing tool, recording a variation execution result and carrying out self-adaptive optimization of the software fuzzy testing according to the variation execution result.
7. The apparatus of claim 6, wherein the fourth determining module is configured to obtain a basic block of the program corresponding to the field according to a fourth correspondence; vectorizing the program basic blocks corresponding to the fields to obtain vectorization information of binary program basic blocks corresponding to the fields; and inputting the vectorization information of the binary program basic blocks corresponding to the fields into a pre-trained neural network model to obtain the field types corresponding to the fields.
8. A computer device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing computer-executable instructions stored in the memory causes the at least one processor to perform the software fuzzing method of any one of claims 1 to 5.
9. A computer storage medium having stored therein computer executable instructions which, when executed by a processor, implement the software fuzzing method of any one of claims 1 to 5.
10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the software blur testing method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310067956.2A CN116108449B (en) | 2023-01-12 | 2023-01-12 | Software fuzzy test method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310067956.2A CN116108449B (en) | 2023-01-12 | 2023-01-12 | Software fuzzy test method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116108449A true CN116108449A (en) | 2023-05-12 |
CN116108449B CN116108449B (en) | 2024-02-23 |
Family
ID=86257699
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310067956.2A Active CN116108449B (en) | 2023-01-12 | 2023-01-12 | Software fuzzy test method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116108449B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117827685A (en) * | 2024-03-05 | 2024-04-05 | 国网浙江省电力有限公司丽水供电公司 | Fuzzy test input generation method, device, terminal and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622558A (en) * | 2012-03-01 | 2012-08-01 | 北京邮电大学 | Excavating device and excavating method of binary system program loopholes |
CN103440201A (en) * | 2013-09-05 | 2013-12-11 | 北京邮电大学 | Dynamic taint analysis device and application thereof to document format reverse analysis |
CN107025175A (en) * | 2017-05-12 | 2017-08-08 | 北京理工大学 | A kind of fuzz testing seed use-case variable-length field pruning method |
CN108416219A (en) * | 2018-03-18 | 2018-08-17 | 西安电子科技大学 | A kind of Android binary files leak detection method and system |
CN112905184A (en) * | 2021-01-08 | 2021-06-04 | 浙江大学 | Pile-insertion-based industrial control protocol grammar reverse analysis method under basic block granularity |
-
2023
- 2023-01-12 CN CN202310067956.2A patent/CN116108449B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622558A (en) * | 2012-03-01 | 2012-08-01 | 北京邮电大学 | Excavating device and excavating method of binary system program loopholes |
CN103440201A (en) * | 2013-09-05 | 2013-12-11 | 北京邮电大学 | Dynamic taint analysis device and application thereof to document format reverse analysis |
CN107025175A (en) * | 2017-05-12 | 2017-08-08 | 北京理工大学 | A kind of fuzz testing seed use-case variable-length field pruning method |
CN108416219A (en) * | 2018-03-18 | 2018-08-17 | 西安电子科技大学 | A kind of Android binary files leak detection method and system |
CN112905184A (en) * | 2021-01-08 | 2021-06-04 | 浙江大学 | Pile-insertion-based industrial control protocol grammar reverse analysis method under basic block granularity |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117827685A (en) * | 2024-03-05 | 2024-04-05 | 国网浙江省电力有限公司丽水供电公司 | Fuzzy test input generation method, device, terminal and medium |
CN117827685B (en) * | 2024-03-05 | 2024-04-30 | 国网浙江省电力有限公司丽水供电公司 | Fuzzy test input generation method, device, terminal and medium |
Also Published As
Publication number | Publication date |
---|---|
CN116108449B (en) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304720A (en) | A kind of Android malware detection methods based on machine learning | |
CN104881611A (en) | Method and apparatus for protecting sensitive data in software product | |
CN116108449B (en) | Software fuzzy test method, device, equipment and storage medium | |
CN112052160A (en) | Code case obtaining method and device, electronic equipment and medium | |
CN111338622B (en) | Supply chain code identification method, device, server and readable storage medium | |
CN112966113A (en) | Data risk prevention and control method, device and equipment | |
CN112884569A (en) | Credit assessment model training method, device and equipment | |
CN112181430A (en) | Code change statistical method and device, electronic equipment and storage medium | |
US11868465B2 (en) | Binary image stack cookie protection | |
CN111400695A (en) | Equipment fingerprint generation method, device, equipment and medium | |
CN110162472A (en) | A kind of method for generating test case based on fuzzing test | |
CN114238980A (en) | Industrial control equipment vulnerability mining method, system, equipment and storage medium | |
CN113946826A (en) | Method, system, equipment and medium for analyzing and monitoring vulnerability fingerprint silence | |
CN113901463A (en) | Concept drift-oriented interpretable Android malicious software detection method | |
CN114285587A (en) | Domain name identification method and device and domain name classification model acquisition method and device | |
CN110070383B (en) | Abnormal user identification method and device based on big data analysis | |
CN114880637B (en) | Account risk verification method and device, computer equipment and storage medium | |
CN114792007A (en) | Code detection method, device, equipment, storage medium and computer program product | |
CN115828244A (en) | Memory leak detection method and device and related equipment | |
Alexandra-Cristina et al. | Material survey on source code plagiarism detection in programming courses | |
Ahn et al. | Data embedding scheme for efficient program behavior modeling with neural networks | |
JP2022505341A (en) | Systems and methods for selectively instrumenting programs according to performance characteristics | |
CN117688564B (en) | Detection method, device and storage medium for intelligent contract event log | |
CN116578979B (en) | Cross-platform binary code matching method and system based on code features | |
CN113778839B (en) | Regression testing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |