CN109885479A - Software obfuscation test method and device based on path record truncation - Google Patents
Software obfuscation test method and device based on path record truncation Download PDFInfo
- Publication number
- CN109885479A CN109885479A CN201910012433.1A CN201910012433A CN109885479A CN 109885479 A CN109885479 A CN 109885479A CN 201910012433 A CN201910012433 A CN 201910012433A CN 109885479 A CN109885479 A CN 109885479A
- Authority
- CN
- China
- Prior art keywords
- code
- path
- truncation
- label
- pitching pile
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention belongs to software testing technology fields, in particular to a kind of software obfuscation test method and device based on path record truncation, this method includes: building project data collection and extraction conditions code structure, obtain the mode input data of low frequency path jump condition code structure disaggregated model, carry out model training, wherein, disaggregated model uses LSTM network architecture;Increase truncation label in path in fuzz testing device pitching pile code to instruct with mark check;For program to be tested, extraction conditions code structure simultaneously obtains mode input data, is transmitted to trained disaggregated model, identifies low frequency path jump condition code structure, corresponding position carries out source code level pitching pile in source file, carries out path truncation according to path truncation label and completes fuzz testing.The present invention identifies low frequency path jump condition code before program executes and cancels high-frequency path test sample, the fuzzy testing efficiency of promotion and coverage rate using path truncation strategy, has very strong future in engineering applications.
Description
Technical field
The invention belongs to software testing technology field, in particular to a kind of software obfuscation test based on path record truncation
Method and device.
Background technique
Fuzz testing (Fuzzing) is a kind of automation software testing technology, by mentioning half valid data as input
Simultaneously whether monitoring program there is a kind of abnormal test method to supply test program.Since it is simple, efficient, answered extensively
For in major software vendor and Open Source Software test, and the technology to find a large amount of loopholes in all kinds of softwares.But
As the extensive use of software security testing tool and the promotion of code security exploitation consciousness, loophole typically occur in code structure
Increasingly complex position.Existing fuzz testing significant effect in the relatively simple code vulnerabilities of excavation code structure, but face
It often defies capture when to complicated code to exception.The reason of leading to the problem, is that most of test samples execute phase
Same high-frequency path, and be difficult to explore low frequency path.
To solve the above problems, other leak analysis technologies are combined by researcher with fuzz testing technology, successively
It is proposed different fuzz testing methods.It is broadly divided into the fuzz testing method based on semiology analysis, obscuring based on stain analysis
Test method, the fuzz testing method based on static analysis.Fuzz testing method based on semiology analysis is a kind of combined symbol
The fuzz testing technology of execution takes the mode of balance to execute using fuzz testing and the concolic of selectivity, to find more
Profound mistake.It is executed using the concolic of selectivity to test fuzz testing device and be judged to more having " value " but be obstructed
Path.The advantages of by combining lightweight fuzz testing and concolic to execute, it is quick-fried to avoid path intrinsic in semiology analysis
Fry and obscure incomplete defect.Based on the fuzz testing technology of stain analysis, is analyzed and tested using dynamic stain analytical technology
The variation of which byte is easier to trigger the exploration of unknown code in sample, to carry out more targeted variation, most throughout one's life
The detection of deep layer code is realized at more preferably input sample.Fuzz testing method based on static analysis is by combining program static
Analytical technology adjusts the concerned degree size of different seeds, using side coverage rate Advance data quality seed sequence and selection strategy, from
And the test probability to low frequency path is improved, improve code coverage.Although above-mentioned three kinds of fuzz testing methods use not
Low frequency path test probability is improved with technology, but the test of high-frequency path sample is still remained, does not only result in low frequency path
Test probability promoted limited, whole testing efficiency is promoted also unobvious.
Summary of the invention
For this purpose, the present invention provides a kind of software obfuscation test method and device based on path record truncation, deep layer is promoted
Code tester efficiency and test coverage have very strong future in engineering applications.
According to design scheme provided by the present invention, a kind of software obfuscation test method based on path record truncation, packet
Containing following content:
The CC condition code structure extracted is carried out mode input by A) building project data collection and extraction conditions code structure
After data processing, the input as low frequency path jump condition code structure disaggregated model carries out model training, wherein classification mould
Type uses LSTM network architecture;
B) increase truncation label in path in fuzz testing device pitching pile code to instruct with mark check;
C it) is directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, will treated number
It is inputted according to as trained disaggregated model, identifies low frequency path jump condition code structure, and the corresponding positions in source file
It sets and carries out source code level pitching pile, carry out path truncation according to path truncation label and complete source code fuzz testing.
Above-mentioned, A) in extraction conditions code structure, the denotational description towards source code data set is defined first and extracts number
According to all CC condition code structure is concentrated, the CC condition code structure of extraction is then subjected to code analysis, and to parsing result into
Row labelization processing obtains code token sequence;Vector conversion is carried out to token sequence, obtains disaggregated model input data.
Preferably, A) in, CC condition code structure all in data set is extracted, includes following content: first to source code number
It is pre-processed according to collection, extracts valid code;Then, to valid code extraction conditions structured set, and similar stack is constructed, simultaneously
Record code section and source code data set position corresponding relationship are iterated place to nested construction of condition set according to similar stack
Reason, obtains the code snippet of minimum.
Preferably, A) in, code analysis process includes following content: extracting code sequence using abstract syntax tree and accords with
Number change processing, while being grouped according to entry meaning in code sequence, synonym collection is obtained, code sequence is expanded.
Preferably, A) in, vector conversion is carried out to token sequence, includes following content: carrying out text using word2vec
Vectorization conversion, is exported by setting feature vector dimension and word frequency parameter, obtains term vector model;According to the term vector model
Dictionary index dictionary and term vector dictionary are obtained, obtains disaggregated model input according to the dictionary index dictionary and term vector dictionary
Data.
Above-mentioned, B) in, increase in path truncation label and mark check instruction process, by being inserted in source code data set
Enter path truncation label, path truncation label is defined at bss sections of code;And path is carried out in original pitching pile inlet and is cut
Disconnected mark check.
Preferably, C) in, for test program, the error handling code recognized by disaggregated model in training process
Section, and source code data set conditional code structure and location index are combined, using inside and outside pitching pile mode to error handling code section
Carry out pitching pile.
Further, C) in, for situation of the path truncation label after pitching pile code, cancels the original of conditions present and insert
Stake carries out pitching pile using do-nothing instruction and comment token is added at instruction annotation.
Further, C) in, for same conditional statement in different source code data sets in compiling optimization problem, if
It is scheduled on label bifurcation and cancels pitching pile.
A kind of software obfuscation test device based on path record truncation includes: training module, mark module and test mould
Block, wherein
Training module, for constructing project data collection and extraction conditions code structure, the CC condition code structure that will be extracted
After carrying out mode input data processing, the input as low frequency path jump condition code structure disaggregated model carries out model instruction
Practice, wherein disaggregated model uses LSTM network architecture;
Mark module is instructed for increasing truncation label in path in fuzz testing device pitching pile code with mark check;
Test module, for being directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, will
Data that treated are inputted as trained disaggregated model, identify low frequency path jump condition code structure, and in source document
Corresponding position carries out source code level pitching pile in part, and truncation label in foundation path carries out path truncation and completes the fuzzy survey of source code
Examination.
Beneficial effects of the present invention:
1. the present invention aiming at the problem that high-frequency path influences fuzz testing efficiency, is carried out low using deep learning neural network
Jump condition code identification in frequency path carries out source code level by the jump condition code structure identified to training pattern and inserts
Stake is increased the probability of low frequency samples variation with this, finally promotes fuzz testing according to the truncation of marker code realizing route record
Efficiency.
2. plan is truncated by identifying low frequency path jump condition code before program specifically executes, and using path in the present invention
The high-frequency path test sample that disappears is taken by force, the blank in terms of high-frequency path sample impact analysis is filled up, not against complicated dynamic point
Analysis technology and overhead issues are not brought, can effectively be combined with other grey box testing technologies, in the base of original testing tool
Coverage rate is further promoted on plinth, and all there is important directive significance for software testing technology development.
Detailed description of the invention:
Fig. 1 is software obfuscation test method flow diagram in embodiment;
Fig. 2 is software obfuscation test schematic in embodiment;
Fig. 3 is that word2vec and LSTM neural network model is combined to construct disaggregated model schematic diagram in embodiment;
Fig. 4 is software obfuscation test device schematic diagram in embodiment.
Specific embodiment:
To make the object, technical solutions and advantages of the present invention clearer, understand, with reference to the accompanying drawing with technical solution pair
The present invention is described in further detail.
For limiting the situations such as low frequency path test probability and integrated testability efficiency is limited in the test of existing software obfuscation, this
It is shown in Figure 1 in inventive embodiments, a kind of software obfuscation test method based on path record truncation is provided, comprising as follows
Content:
The CC condition code structure extracted is carried out model by S101, building project data collection and extraction conditions code structure
After input data processing, the input as low frequency path jump condition code structure disaggregated model carries out model training, wherein point
Class model uses LSTM network architecture;
S102, increase path truncation label and mark check instruction in fuzz testing device pitching pile code;
S103, it is directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, by treated
Data are inputted as trained disaggregated model, identify low frequency path jump condition code structure, and corresponding in source file
Position carries out source code level pitching pile, carries out path truncation according to path truncation label and completes source code fuzz testing.
It is shown in Figure 2, by constructing data set, CC condition code structure all in data set is extracted, to code structure
Carry out code analysis;Pretreatment is carried out to parsing result and extracts code token sequence and to the progress vector conversion of token sequence;
Code building label, and using the vector of generation as the input of LSTM neural network model, training low frequency path jump condition generation
Code textural classification model;Increase truncation label in path in fuzz testing device pitching pile code to instruct with mark check;To test journey
Sequence carries out CC condition code structure extraction, obtains low frequency path jump condition code structure therein, to the structure of identification in source document
Corresponding position in part carries out lightweight source code pitching pile, starts to treat ranging sequence and be tested.
There are a large amount of format checkings in usual most of input analysis programs, while also there is reply accordingly and checking failure
Error handle structure, and the class formation normally results in coverage rate and is difficult to be promoted.Therefore, error handling code belongs to most generation
A kind of low frequency path jump condition code of table, error handling code refer to that program input leads to journey due to a variety of differences
The code segment that sequence mistake executes when exiting.In extraction conditions code structure, another embodiment of the present invention, definition is towards source first
Code data set denotational description simultaneously extract CC condition code structure all in data set, then by the CC condition code structure of extraction into
Line code parsing, and labeling processing is carried out to parsing result and obtains code token sequence;Vector is carried out to token sequence to turn
It changes, obtains disaggregated model input data.Preferably, CC condition code structure all in data set is extracted, includes following content: is first
First source code data set is pre-processed, extracts valid code;Then, it to valid code extraction conditions structured set, and constructs
Similar stack, while record code section and source code data set position corresponding relationship, according to similar stack to nested construction of condition set
It is iterated processing, obtains the code snippet of minimum.Specifically, being defined to descriptor, as shown in table 1.
1 denotational description of table
First to SoIt is pre-processed, removes some unnecessary information, such as code annotation, newline.Here R is used
The extraction for carrying out effective code, obtains Sn.Later to treated SnExtract Ie, propose here a kind of based on bracket stack balance
Method, i.e.,Cl≡ 0, therefore can be by constructing the structure of similar stack, when recognizing Bl, then Cs+ 1, when recognizing
Br, Cs- 1, work as Cs=0, by ItIt is added to Ie.The corresponding relationship of record code section and source file location simultaneously is inserted so as to subsequent
Stake processing.But this mode will appear I after extractingn, InIt will lead to erroneous judgement, because it was assumed thatAnd
If it is considered to In∈ E, thenGenerate contradiction.Therefore it also needs to InProcessing is iterated to guarantee the code snippet extracted most
Smallization.Pass through the I extracted to first timeeIt extracts in an identical manner, until each structure is Ir。
In further embodiment of the present invention, code analysis process includes following content: extracting code using abstract syntax tree
Sequence and symbolism processing, while being grouped according to entry meaning in code sequence, synonym collection is obtained, code sequence is expanded
Column.
The code segment of extraction is resolved into one section of word sequence, is resolved to isometric sequence in all segments in order to defeated
Enter into LSTM model.Code segment resolution phase extracts code sequence using abstract syntax tree (AST).It carries out at symbolism simultaneously
Reason, if integer representation is num, string table is shown as str.But in the classification, string content can generate certain influence.Greatly
It includes a feature in code snippet that partial error, which is handled, i.e., if in code snippet including character string, wraps in usual character string
Containing error, vocabulary that fail etc. is close in meaning.Therefore character string symbol processing is carried out using two ways, is divided into and whether wraps
Containing special key word, it is embodied as errstr and str.Although having extracted a part of false demonstration by preliminary analysis
Vocabulary, but limited amount.WordNet, the English established and safeguarded by Princeton University can be used in the embodiment of the present invention
Language dictionary is grouped by entry meaning, each has mutually convertible brief note group as a synonym collection, can be with
The expansion of false demonstration vocabulary is carried out using brief note group therein.It, need to be to each due to using supervised learning method
Sample carries out labeling processing, if code segment belongs to error handling code Duan Ze labeled as 1, is otherwise labeled as 0.Here it uses
It is marked based on didactic mode, it is heuristic by 5 kinds below a large amount of source file analysis and summary: 1) in if (...) usually
Including comparing;If 2) include false demonstration vocabulary in character string comprising character string;It 3) may be comprising returning to or jumping key
Word, such as return, goto etc.;It include false demonstration vocabulary in general function name if 4) wherein include function;It 5) may packet
Macrodefinition containing system mistake, such as ' EPERM', ' ENOENT' etc..
In another embodiment of the present invention, vector conversion is carried out to token sequence, includes following content: being utilized
Word2vec carries out text vector conversion, is exported by setting feature vector dimension and word frequency parameter, obtains term vector model;
Dictionary index dictionary and term vector dictionary are obtained according to the term vector model, according to the dictionary index dictionary and term vector dictionary
Obtain disaggregated model input data.
Vectorization processing is carried out as input using obtained token sequence, is widely used in text used here as one kind
The tool word2vec of vectorization is converted, and exports to obtain term vector mould by setting feature vector dimension and word frequency parameter
Type, the input according to obtained model foundation dictionary index dictionary and term vector dictionary as subsequent LSTM model.Referring to Fig. 3 institute
Show, since the token number that different code segments includes is different, but LSTM may only receive the input of equal length, it is therefore desirable to
Do filling and pruning modes.It can start to train LSTM network after the vectorization and code segment label for obtaining code segment, in mould
In type other than the Primary layers such as Embedding layers necessary, LSTM unit, Dropout layers are increased, to prevent data over-fitting
As a result.
Detection-phase is used to detect the type of given unknown code segment, if one section of code belongs to error handling code
Then export its position of affiliated file and the code segment in source file.It is specific to examine under the premise of giving a unknown purpose
Survey process is as follows: error handling code segment structure in 1. extraction projects in each source file and record its affiliated filename with
Position;2. carrying out code segment to parse to obtain respective code sequence;3. using the word2vec model that early period, training obtained to upper
The code sequence that one step obtains carries out vectorization processing according to rule;4. obtained vector is input to trained LSTM network
In judged.
Further, in another embodiment of the present invention, increase path truncation label with mark check instruction process, lead to
It crosses and is inserted into truncation label in path in source code data set, path truncation label is defined at bss sections of code;And it is inserted original
Stake inlet carries out path and mark check is truncated.
By the way of the Eembedded Assembly code in source code.Realize that subsequent pitching pile cancels note by being inserted into label in source code
The function of record, setting flag continue_log.The label is defined in bss sections, since bss segment data is no initializtion
Data, this section of memory will do it clearing before each run, therefore the label can be set to 1 to indicate that successor basic block is no longer remembered
Record.By carrying out continue_log tag query in original pitching pile inlet, is jumped to if it is 1 and realize cancellation at return
This time record of basic block.It is always 1 that down-stream marks in operation, therefore successor basic block does not also re-record, i.e., if
Original path is 1 → 2 → 3 →..., if being this time recorded as 1 → 2 comprising label at 3.And when next round tests beginning,
Label is reset, can be with normal recordings execution route.
The error handling code section that the error code disaggregated model obtained by early period using LSTM network training is recognized,
In conjunction with the index that if-else structure in source file and position are established, pitching pile is carried out to error handling code section.Using inside and outside pitching pile
Mode realize that because if sentence will be compiled as conditional jump instructions, and the pitching pile of fuzz testing device exactly relies on condition to jump
Turn instruction to be determined, therefore before if sentence, i.e., progress source code pitching pile, which can influence successor basic block, outside if structure is
No carry out pitching pile.And before the first statement of if structure, i.e., subsequent pitching pile can be determined by carrying out source code pitching pile in if structure
The recording mode of basic block, it is subsequent to still remain conditional jump because first statement belongs to the beginning for jumping basic block
Instruction, therefore first statement will determine the record result of all successor basic blocks on the traversal basic block path.It is broadly divided into
Three kinds of situations below:
The first situation is that error handling code is located in if structure, in such cases need to be before if sentence and if structure
Pitching pile is carried out before interior first statement.
Second situation is that error handling code is located in else structure, in such cases need to be in first in else structure
Before sentence and else structure close on if sentence before carry out pitching pile.If preceding sequence structure be else if structure, need by
Else if structure is converted to if structure, carries out pitching pile before if later.
The third situation is that error handling code is located in else if structure, in if after same progress structure conversion process
Pitching pile is carried out before first statement before sentence and in else if structure.When carrying out else if structure pitching pile, due to destroying
Original code structure, it is therefore desirable to be repaired to guarantee the correct compilation run of code.Source code knot in the embodiment of the present invention
Structure reparation algorithm can design as follows:
The accurate compilation run of program may be implemented by above-mentioned algorithm.
For test program, the error handling code section recognized by disaggregated model in training process, and combine source code
Data set conditional code structure and location index carry out pitching pile to error handling code section using inside and outside pitching pile mode.It is preferred that
, for situation of the path truncation label after pitching pile code, cancels the original pitching pile of conditions present, carried out using do-nothing instruction slotting
Simultaneously comment token is added at instruction annotation in stake.Preferably, in for same conditional statement in different source code data sets
Optimization problem is compiled, label bifurcation is set in and cancels pitching pile.
If continue_log label will lead to the basic block after pitching pile code and still be recorded, although subsequent code
Block is because flag bit has been set 1 and recorded without will continue to, but basic block record still can be considered as generating new route, make this
Test sample is retained, and does not achieve the purpose that specific aim is tested.Therefore it needs to cancel conditions present in this case and jump
Original pitching pile.In view of not influencing the normal execution flow of program, pitching pile is carried out using do-nothing instruction (nop) and at instruction annotation
Label is added.Due to by compilation of source code be assembly code after will not remove assembly code annotation, can use annotation
Label is determined.When encountering comment token, pitching pile is marked into assignment, later when encountering conditional jump instructions according to pitching pile
Label carries out pitching pile judgement, then marks pitching pile and removes, recycles the process until all code pitching piles terminate.After being realized with this
Pitching pile process is skipped after code block.Due to there is compiling optimization, i.e. same if (i!=0) sentence is in different source documents
Assembly instruction situations such as there may be jz and jnz in part, and original pitching pile is only in the negative place of jumping progress pitching pile.Therefore using mark
Note bifurcation cancels the mode of pitching pile to cope with above situation, records although eliminating an effective basic block, not shadow
Ring the information of active path.
Based on above-mentioned software obfuscation test method, the embodiment of the present invention also provides a kind of based on the soft of path record truncation
Part fuzz testing device, it is shown in Figure 4, include: training module 101, mark module 102 and test module 103, wherein
Training module 101, for constructing project data collection and extraction conditions code structure, the CC condition code knot that will be extracted
After structure carries out mode input data processing, the input as low frequency path jump condition code structure disaggregated model carries out model instruction
Practice, wherein disaggregated model uses LSTM network architecture;
Mark module 102 is instructed for increasing truncation label in path in fuzz testing device pitching pile code with mark check;
Test module 103, for being directed to program to be tested, extraction conditions code structure is simultaneously carried out at mode input data
Reason, will treated that data are inputted as trained disaggregated model, identify low frequency path jump condition code structure, and
Progress source code level pitching pile in corresponding position in source file carries out path truncation and completes source code to obscure according to path truncation label
Test.
Unless specifically stated otherwise, the opposite step of the component and step that otherwise illustrate in these embodiments, digital table
It is not limit the scope of the invention up to formula and numerical value.
Based on above-mentioned method, the embodiment of the present invention also provides a kind of server, comprising: one or more processors;It deposits
Storage device, for storing one or more programs, when one or more of programs are executed by one or more of processors,
So that one or more of processors realize above-mentioned method.
Based on above-mentioned method, the embodiment of the present invention also provides a kind of computer-readable medium, is stored thereon with computer
Program, wherein the program realizes above-mentioned method when being executed by processor.
The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation
Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In all examples being illustrated and described herein, any occurrence should be construed as merely illustratively, without
It is as limitation, therefore, other examples of exemplary embodiment can have different values.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
The flow chart and block diagram in the drawings show the system of multiple embodiments according to the present invention, method and computer journeys
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, section or code of table, a part of the module, section or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually base
Originally it is performed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that
It is the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, can uses and execute rule
The dedicated hardware based system of fixed function or movement is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with
It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit,
Only a kind of logical function partition, there may be another division manner in actual implementation, in another example, multiple units or components can
To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for
The mutual coupling, direct-coupling or communication connection of opinion can be through some communication interfaces, device or unit it is indirect
Coupling or communication connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, of the invention
Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words
The form of product embodies, which is stored in a storage medium, including some instructions use so that
One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the present invention
State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-
Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can be with
Store the medium of program code.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention
Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art
In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention
Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. a kind of software obfuscation test method based on path record truncation, which is characterized in that include following content:
The CC condition code structure extracted is carried out mode input data by A) building project data collection and extraction conditions code structure
After processing, the input as low frequency path jump condition code structure disaggregated model carries out model training, wherein disaggregated model is adopted
With LSTM network architecture;
B) increase truncation label in path in fuzz testing device pitching pile code to instruct with mark check;
C it) is directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, and data are made by treated
For trained disaggregated model input, identify low frequency path jump condition code structure, and in source file corresponding position into
Row source code level pitching pile carries out path truncation according to path truncation label and completes source code fuzz testing.
2. the software obfuscation test method according to claim 1 based on path record truncation, which is characterized in that A) it extracts
In CC condition code structure, the denotational description towards source code data set is defined first and extracts CC condition code knot all in data set
Then the CC condition code structure of extraction is carried out code analysis, and carries out labeling processing to parsing result and obtain code by structure
Token sequence;Vector conversion is carried out to token sequence, obtains disaggregated model input data.
3. the software obfuscation test method according to claim 2 based on path record truncation, which is characterized in that A) in,
CC condition code structure all in data set is extracted, includes following content: source code data set being pre-processed first, extraction has
Imitate code;Then, to valid code extraction conditions structured set, and similar stack is constructed, while record code section and source code data
Collect position corresponding relationship, processing is iterated to nested construction of condition set according to similar stack, obtains the code piece of minimum
Section.
4. the software obfuscation test method according to claim 2 based on path record truncation, which is characterized in that A) in,
Code analysis process includes following content: extracting code sequence and symbolism processing using abstract syntax tree, while according to code
Entry meaning is grouped in sequence, obtains synonym collection, expands code sequence.
5. the software obfuscation test method according to claim 2 based on path record truncation, which is characterized in that A) in,
Vector conversion is carried out to token sequence, includes following content: carrying out text vector conversion using word2vec, passes through setting
Feature vector dimension and word frequency parameter export, and obtain term vector model;According to the term vector model obtain dictionary index dictionary and
Term vector dictionary obtains disaggregated model input data according to the dictionary index dictionary and term vector dictionary.
6. the software obfuscation test method according to claim 1 based on path record truncation, which is characterized in that B) in,
Increase path truncation label with mark check instruction process, passes through insertion path truncation label, the road in source code data set
Diameter truncation label is defined at bss sections of code;And path is carried out in original pitching pile inlet, mark check is truncated.
7. the software obfuscation test method according to claim 6 based on path record truncation, which is characterized in that C) in,
For test program, the error handling code section recognized by disaggregated model in training process, and combine in source code data set
CC condition code structure and location index carry out pitching pile to error handling code section using inside and outside pitching pile mode.
8. the software obfuscation test method according to claim 7 based on path record truncation, which is characterized in that C) in,
For situation of the path truncation label after pitching pile code, cancels the original pitching pile of conditions present, pitching pile is carried out using do-nothing instruction
And comment token is added at instruction annotation.
9. the software obfuscation test method according to claim 7 based on path record truncation, which is characterized in that C) in,
For same conditional statement in different source code data sets in compiling optimization problem, be set in label bifurcation cancel insert
Stake.
10. a kind of software obfuscation test device based on path record truncation is, characterized by comprising: training module, label mould
Block and test module, wherein
Training module carries out the CC condition code structure extracted for constructing project data collection and extraction conditions code structure
After mode input data processing, the input as low frequency path jump condition code structure disaggregated model carries out model training,
In, disaggregated model uses LSTM network architecture;
Mark module is instructed for increasing truncation label in path in fuzz testing device pitching pile code with mark check;
Test module, for being directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, will handle
Data afterwards are inputted as trained disaggregated model, identify low frequency path jump condition code structure, and in source file
Corresponding position carries out source code level pitching pile, carries out path truncation according to path truncation label and completes source code fuzz testing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910012433.1A CN109885479B (en) | 2019-01-07 | 2019-01-07 | Software fuzzy test method and device based on path record truncation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910012433.1A CN109885479B (en) | 2019-01-07 | 2019-01-07 | Software fuzzy test method and device based on path record truncation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109885479A true CN109885479A (en) | 2019-06-14 |
CN109885479B CN109885479B (en) | 2022-02-01 |
Family
ID=66925678
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910012433.1A Active CN109885479B (en) | 2019-01-07 | 2019-01-07 | Software fuzzy test method and device based on path record truncation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109885479B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851830A (en) * | 2019-10-24 | 2020-02-28 | 中国人民解放军战略支援部队信息工程大学 | CPU (Central processing Unit) -oriented undisclosed instruction discovery method based on instruction format identification |
CN111191245A (en) * | 2019-12-24 | 2020-05-22 | 中国人民解放军战略支援部队信息工程大学 | Fuzzy test method based on path perception variation strategy |
CN111563040A (en) * | 2020-05-08 | 2020-08-21 | 中国工商银行股份有限公司 | Block chain intelligent contract code testing method and device |
CN111913878A (en) * | 2020-07-13 | 2020-11-10 | 苏州洞察云信息技术有限公司 | Program analysis result-based bytecode instrumentation method, device and storage medium |
CN112069061A (en) * | 2020-08-19 | 2020-12-11 | 北京科技大学 | Software security vulnerability detection method and system for deep learning gradient guidance variation |
CN112306853A (en) * | 2019-08-01 | 2021-02-02 | 深圳市腾讯计算机系统有限公司 | Fuzzy test method, device, equipment and medium |
CN112905493A (en) * | 2021-04-07 | 2021-06-04 | 南京大学 | Structured fuzzy test method based on conversion test |
CN113434386A (en) * | 2021-05-26 | 2021-09-24 | 深圳开源互联网安全技术有限公司 | Method, system and storage medium for fuzz testing |
CN113688036A (en) * | 2021-08-13 | 2021-11-23 | 北京灵汐科技有限公司 | Data processing method, device, equipment and storage medium |
CN114064506A (en) * | 2021-11-29 | 2022-02-18 | 电子科技大学 | Binary program fuzzy test method and system based on deep neural network |
CN114491424A (en) * | 2021-12-31 | 2022-05-13 | 西安电子科技大学 | Binary code clipping method based on fuzzy test |
CN114546816A (en) * | 2020-11-25 | 2022-05-27 | 腾讯科技(深圳)有限公司 | Test method, test platform, test device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102385550A (en) * | 2010-08-30 | 2012-03-21 | 北京理工大学 | Detection method for software vulnerability |
CN109032942A (en) * | 2018-07-24 | 2018-12-18 | 北京理工大学 | A kind of fuzz testing frame based on AFL |
CN109117367A (en) * | 2018-07-24 | 2019-01-01 | 北京理工大学 | A kind of fuzz testing variation quantity determines method and apparatus |
-
2019
- 2019-01-07 CN CN201910012433.1A patent/CN109885479B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102385550A (en) * | 2010-08-30 | 2012-03-21 | 北京理工大学 | Detection method for software vulnerability |
CN109032942A (en) * | 2018-07-24 | 2018-12-18 | 北京理工大学 | A kind of fuzz testing frame based on AFL |
CN109117367A (en) * | 2018-07-24 | 2019-01-01 | 北京理工大学 | A kind of fuzz testing variation quantity determines method and apparatus |
Non-Patent Citations (4)
Title |
---|
BIN ZHANG 等: "S2F:Discover Hard-to-Reach Vulnerabilities by Semi-Symbolic Fuzz Testing", 《2017 13TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY》 * |
CHENG LI 等: "RankFuzz: Fuzz Testing Based on Comprehensive Evaluation", 《2012 FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY》 * |
欧阳永基 等: "基于异常分布导向的智能Fuzzing方法", 《电子与信息学报》 * |
王蕴君 等: "基于功能性测试的软件质量模糊综合评判", 《电子工程师》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112306853A (en) * | 2019-08-01 | 2021-02-02 | 深圳市腾讯计算机系统有限公司 | Fuzzy test method, device, equipment and medium |
CN112306853B (en) * | 2019-08-01 | 2023-12-12 | 深圳市腾讯计算机系统有限公司 | Fuzzy test method, device, equipment and medium |
CN110851830A (en) * | 2019-10-24 | 2020-02-28 | 中国人民解放军战略支援部队信息工程大学 | CPU (Central processing Unit) -oriented undisclosed instruction discovery method based on instruction format identification |
CN110851830B (en) * | 2019-10-24 | 2021-08-03 | 中国人民解放军战略支援部队信息工程大学 | CPU (Central processing Unit) -oriented undisclosed instruction discovery method based on instruction format identification |
CN111191245A (en) * | 2019-12-24 | 2020-05-22 | 中国人民解放军战略支援部队信息工程大学 | Fuzzy test method based on path perception variation strategy |
CN111191245B (en) * | 2019-12-24 | 2022-06-17 | 中国人民解放军战略支援部队信息工程大学 | Fuzzy test method based on path perception mutation strategy |
CN111563040A (en) * | 2020-05-08 | 2020-08-21 | 中国工商银行股份有限公司 | Block chain intelligent contract code testing method and device |
CN111563040B (en) * | 2020-05-08 | 2023-08-15 | 中国工商银行股份有限公司 | Block chain intelligent contract code testing method and device |
CN111913878A (en) * | 2020-07-13 | 2020-11-10 | 苏州洞察云信息技术有限公司 | Program analysis result-based bytecode instrumentation method, device and storage medium |
CN111913878B (en) * | 2020-07-13 | 2023-09-15 | 苏州洞察云信息技术有限公司 | Byte code instrumentation method, device and storage medium based on program analysis result |
CN112069061A (en) * | 2020-08-19 | 2020-12-11 | 北京科技大学 | Software security vulnerability detection method and system for deep learning gradient guidance variation |
CN114546816A (en) * | 2020-11-25 | 2022-05-27 | 腾讯科技(深圳)有限公司 | Test method, test platform, test device, electronic equipment and storage medium |
CN112905493B (en) * | 2021-04-07 | 2023-07-18 | 南京大学 | Structured fuzzy test method based on conversion test |
CN112905493A (en) * | 2021-04-07 | 2021-06-04 | 南京大学 | Structured fuzzy test method based on conversion test |
CN113434386A (en) * | 2021-05-26 | 2021-09-24 | 深圳开源互联网安全技术有限公司 | Method, system and storage medium for fuzz testing |
CN113688036A (en) * | 2021-08-13 | 2021-11-23 | 北京灵汐科技有限公司 | Data processing method, device, equipment and storage medium |
CN114064506A (en) * | 2021-11-29 | 2022-02-18 | 电子科技大学 | Binary program fuzzy test method and system based on deep neural network |
CN114491424A (en) * | 2021-12-31 | 2022-05-13 | 西安电子科技大学 | Binary code clipping method based on fuzzy test |
CN114491424B (en) * | 2021-12-31 | 2024-05-03 | 西安电子科技大学 | Binary code clipping method based on fuzzy test |
Also Published As
Publication number | Publication date |
---|---|
CN109885479B (en) | 2022-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109885479A (en) | Software obfuscation test method and device based on path record truncation | |
Harer et al. | Automated software vulnerability detection with machine learning | |
CN102339252B (en) | Static state detecting system based on XML (Extensive Makeup Language) middle model and defect mode matching | |
Rabin et al. | Understanding neural code intelligence through program simplification | |
CN111459799B (en) | Software defect detection model establishing and detecting method and system based on Github | |
Shen et al. | A survey of automatic software vulnerability detection, program repair, and defect prediction techniques | |
CN105787367B (en) | A kind of the patch safety detecting method and system of software upgrading | |
CN106663003A (en) | Systems and methods for software analysis | |
Ding et al. | VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements | |
CN112307473A (en) | Malicious JavaScript code detection model based on Bi-LSTM network and attention mechanism | |
CN116049831A (en) | Software vulnerability detection method based on static analysis and dynamic analysis | |
CN111475820A (en) | Binary vulnerability detection method and system based on executable program and storage medium | |
CN112256271B (en) | Block chain intelligent contract safety detection system based on static analysis | |
CN108563561B (en) | Program implicit constraint extraction method and system | |
CN112131122B (en) | Method and device for source code defect detection tool misinformation evaluation | |
CN114911711A (en) | Code defect analysis method and device, electronic equipment and storage medium | |
CN105487983B (en) | Sensitive spot approach method based on intelligent Route guiding | |
CN115269427A (en) | Intermediate language representation method and system for WEB injection vulnerability | |
Ahmed et al. | Synfix: Automatically fixing syntax errors using compiler diagnostics | |
CN114385491B (en) | JS translator defect detection method based on deep learning | |
Wang et al. | {NLP-EYE}: Detecting Memory Corruptions via {Semantic-Aware} Memory Operation Function Identification | |
Ahmed et al. | Learning to find usages of library functions in optimized binaries | |
CN115935369A (en) | Method for evaluating source code using numeric array representation of source code elements | |
CN110286912A (en) | Code detection method, device and electronic equipment | |
Wu et al. | Code vulnerability detection based on deep sequence and graph models: A survey |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |