CN109885479A - Software obfuscation test method and device based on path record truncation - Google Patents

Software obfuscation test method and device based on path record truncation Download PDF

Info

Publication number
CN109885479A
CN109885479A CN201910012433.1A CN201910012433A CN109885479A CN 109885479 A CN109885479 A CN 109885479A CN 201910012433 A CN201910012433 A CN 201910012433A CN 109885479 A CN109885479 A CN 109885479A
Authority
CN
China
Prior art keywords
code
path
truncation
label
pitching pile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910012433.1A
Other languages
Chinese (zh)
Other versions
CN109885479B (en
Inventor
宋晓斌
柳晓龙
王允超
武泽慧
魏强
曹琰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN201910012433.1A priority Critical patent/CN109885479B/en
Publication of CN109885479A publication Critical patent/CN109885479A/en
Application granted granted Critical
Publication of CN109885479B publication Critical patent/CN109885479B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to software testing technology fields, in particular to a kind of software obfuscation test method and device based on path record truncation, this method includes: building project data collection and extraction conditions code structure, obtain the mode input data of low frequency path jump condition code structure disaggregated model, carry out model training, wherein, disaggregated model uses LSTM network architecture;Increase truncation label in path in fuzz testing device pitching pile code to instruct with mark check;For program to be tested, extraction conditions code structure simultaneously obtains mode input data, is transmitted to trained disaggregated model, identifies low frequency path jump condition code structure, corresponding position carries out source code level pitching pile in source file, carries out path truncation according to path truncation label and completes fuzz testing.The present invention identifies low frequency path jump condition code before program executes and cancels high-frequency path test sample, the fuzzy testing efficiency of promotion and coverage rate using path truncation strategy, has very strong future in engineering applications.

Description

Software obfuscation test method and device based on path record truncation
Technical field
The invention belongs to software testing technology field, in particular to a kind of software obfuscation test based on path record truncation Method and device.
Background technique
Fuzz testing (Fuzzing) is a kind of automation software testing technology, by mentioning half valid data as input Simultaneously whether monitoring program there is a kind of abnormal test method to supply test program.Since it is simple, efficient, answered extensively For in major software vendor and Open Source Software test, and the technology to find a large amount of loopholes in all kinds of softwares.But As the extensive use of software security testing tool and the promotion of code security exploitation consciousness, loophole typically occur in code structure Increasingly complex position.Existing fuzz testing significant effect in the relatively simple code vulnerabilities of excavation code structure, but face It often defies capture when to complicated code to exception.The reason of leading to the problem, is that most of test samples execute phase Same high-frequency path, and be difficult to explore low frequency path.
To solve the above problems, other leak analysis technologies are combined by researcher with fuzz testing technology, successively It is proposed different fuzz testing methods.It is broadly divided into the fuzz testing method based on semiology analysis, obscuring based on stain analysis Test method, the fuzz testing method based on static analysis.Fuzz testing method based on semiology analysis is a kind of combined symbol The fuzz testing technology of execution takes the mode of balance to execute using fuzz testing and the concolic of selectivity, to find more Profound mistake.It is executed using the concolic of selectivity to test fuzz testing device and be judged to more having " value " but be obstructed Path.The advantages of by combining lightweight fuzz testing and concolic to execute, it is quick-fried to avoid path intrinsic in semiology analysis Fry and obscure incomplete defect.Based on the fuzz testing technology of stain analysis, is analyzed and tested using dynamic stain analytical technology The variation of which byte is easier to trigger the exploration of unknown code in sample, to carry out more targeted variation, most throughout one's life The detection of deep layer code is realized at more preferably input sample.Fuzz testing method based on static analysis is by combining program static Analytical technology adjusts the concerned degree size of different seeds, using side coverage rate Advance data quality seed sequence and selection strategy, from And the test probability to low frequency path is improved, improve code coverage.Although above-mentioned three kinds of fuzz testing methods use not Low frequency path test probability is improved with technology, but the test of high-frequency path sample is still remained, does not only result in low frequency path Test probability promoted limited, whole testing efficiency is promoted also unobvious.
Summary of the invention
For this purpose, the present invention provides a kind of software obfuscation test method and device based on path record truncation, deep layer is promoted Code tester efficiency and test coverage have very strong future in engineering applications.
According to design scheme provided by the present invention, a kind of software obfuscation test method based on path record truncation, packet Containing following content:
The CC condition code structure extracted is carried out mode input by A) building project data collection and extraction conditions code structure After data processing, the input as low frequency path jump condition code structure disaggregated model carries out model training, wherein classification mould Type uses LSTM network architecture;
B) increase truncation label in path in fuzz testing device pitching pile code to instruct with mark check;
C it) is directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, will treated number It is inputted according to as trained disaggregated model, identifies low frequency path jump condition code structure, and the corresponding positions in source file It sets and carries out source code level pitching pile, carry out path truncation according to path truncation label and complete source code fuzz testing.
Above-mentioned, A) in extraction conditions code structure, the denotational description towards source code data set is defined first and extracts number According to all CC condition code structure is concentrated, the CC condition code structure of extraction is then subjected to code analysis, and to parsing result into Row labelization processing obtains code token sequence;Vector conversion is carried out to token sequence, obtains disaggregated model input data.
Preferably, A) in, CC condition code structure all in data set is extracted, includes following content: first to source code number It is pre-processed according to collection, extracts valid code;Then, to valid code extraction conditions structured set, and similar stack is constructed, simultaneously Record code section and source code data set position corresponding relationship are iterated place to nested construction of condition set according to similar stack Reason, obtains the code snippet of minimum.
Preferably, A) in, code analysis process includes following content: extracting code sequence using abstract syntax tree and accords with Number change processing, while being grouped according to entry meaning in code sequence, synonym collection is obtained, code sequence is expanded.
Preferably, A) in, vector conversion is carried out to token sequence, includes following content: carrying out text using word2vec Vectorization conversion, is exported by setting feature vector dimension and word frequency parameter, obtains term vector model;According to the term vector model Dictionary index dictionary and term vector dictionary are obtained, obtains disaggregated model input according to the dictionary index dictionary and term vector dictionary Data.
Above-mentioned, B) in, increase in path truncation label and mark check instruction process, by being inserted in source code data set Enter path truncation label, path truncation label is defined at bss sections of code;And path is carried out in original pitching pile inlet and is cut Disconnected mark check.
Preferably, C) in, for test program, the error handling code recognized by disaggregated model in training process Section, and source code data set conditional code structure and location index are combined, using inside and outside pitching pile mode to error handling code section Carry out pitching pile.
Further, C) in, for situation of the path truncation label after pitching pile code, cancels the original of conditions present and insert Stake carries out pitching pile using do-nothing instruction and comment token is added at instruction annotation.
Further, C) in, for same conditional statement in different source code data sets in compiling optimization problem, if It is scheduled on label bifurcation and cancels pitching pile.
A kind of software obfuscation test device based on path record truncation includes: training module, mark module and test mould Block, wherein
Training module, for constructing project data collection and extraction conditions code structure, the CC condition code structure that will be extracted After carrying out mode input data processing, the input as low frequency path jump condition code structure disaggregated model carries out model instruction Practice, wherein disaggregated model uses LSTM network architecture;
Mark module is instructed for increasing truncation label in path in fuzz testing device pitching pile code with mark check;
Test module, for being directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, will Data that treated are inputted as trained disaggregated model, identify low frequency path jump condition code structure, and in source document Corresponding position carries out source code level pitching pile in part, and truncation label in foundation path carries out path truncation and completes the fuzzy survey of source code Examination.
Beneficial effects of the present invention:
1. the present invention aiming at the problem that high-frequency path influences fuzz testing efficiency, is carried out low using deep learning neural network Jump condition code identification in frequency path carries out source code level by the jump condition code structure identified to training pattern and inserts Stake is increased the probability of low frequency samples variation with this, finally promotes fuzz testing according to the truncation of marker code realizing route record Efficiency.
2. plan is truncated by identifying low frequency path jump condition code before program specifically executes, and using path in the present invention The high-frequency path test sample that disappears is taken by force, the blank in terms of high-frequency path sample impact analysis is filled up, not against complicated dynamic point Analysis technology and overhead issues are not brought, can effectively be combined with other grey box testing technologies, in the base of original testing tool Coverage rate is further promoted on plinth, and all there is important directive significance for software testing technology development.
Detailed description of the invention:
Fig. 1 is software obfuscation test method flow diagram in embodiment;
Fig. 2 is software obfuscation test schematic in embodiment;
Fig. 3 is that word2vec and LSTM neural network model is combined to construct disaggregated model schematic diagram in embodiment;
Fig. 4 is software obfuscation test device schematic diagram in embodiment.
Specific embodiment:
To make the object, technical solutions and advantages of the present invention clearer, understand, with reference to the accompanying drawing with technical solution pair The present invention is described in further detail.
For limiting the situations such as low frequency path test probability and integrated testability efficiency is limited in the test of existing software obfuscation, this It is shown in Figure 1 in inventive embodiments, a kind of software obfuscation test method based on path record truncation is provided, comprising as follows Content:
The CC condition code structure extracted is carried out model by S101, building project data collection and extraction conditions code structure After input data processing, the input as low frequency path jump condition code structure disaggregated model carries out model training, wherein point Class model uses LSTM network architecture;
S102, increase path truncation label and mark check instruction in fuzz testing device pitching pile code;
S103, it is directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, by treated Data are inputted as trained disaggregated model, identify low frequency path jump condition code structure, and corresponding in source file Position carries out source code level pitching pile, carries out path truncation according to path truncation label and completes source code fuzz testing.
It is shown in Figure 2, by constructing data set, CC condition code structure all in data set is extracted, to code structure Carry out code analysis;Pretreatment is carried out to parsing result and extracts code token sequence and to the progress vector conversion of token sequence; Code building label, and using the vector of generation as the input of LSTM neural network model, training low frequency path jump condition generation Code textural classification model;Increase truncation label in path in fuzz testing device pitching pile code to instruct with mark check;To test journey Sequence carries out CC condition code structure extraction, obtains low frequency path jump condition code structure therein, to the structure of identification in source document Corresponding position in part carries out lightweight source code pitching pile, starts to treat ranging sequence and be tested.
There are a large amount of format checkings in usual most of input analysis programs, while also there is reply accordingly and checking failure Error handle structure, and the class formation normally results in coverage rate and is difficult to be promoted.Therefore, error handling code belongs to most generation A kind of low frequency path jump condition code of table, error handling code refer to that program input leads to journey due to a variety of differences The code segment that sequence mistake executes when exiting.In extraction conditions code structure, another embodiment of the present invention, definition is towards source first Code data set denotational description simultaneously extract CC condition code structure all in data set, then by the CC condition code structure of extraction into Line code parsing, and labeling processing is carried out to parsing result and obtains code token sequence;Vector is carried out to token sequence to turn It changes, obtains disaggregated model input data.Preferably, CC condition code structure all in data set is extracted, includes following content: is first First source code data set is pre-processed, extracts valid code;Then, it to valid code extraction conditions structured set, and constructs Similar stack, while record code section and source code data set position corresponding relationship, according to similar stack to nested construction of condition set It is iterated processing, obtains the code snippet of minimum.Specifically, being defined to descriptor, as shown in table 1.
1 denotational description of table
First to SoIt is pre-processed, removes some unnecessary information, such as code annotation, newline.Here R is used The extraction for carrying out effective code, obtains Sn.Later to treated SnExtract Ie, propose here a kind of based on bracket stack balance Method, i.e.,Cl≡ 0, therefore can be by constructing the structure of similar stack, when recognizing Bl, then Cs+ 1, when recognizing Br, Cs- 1, work as Cs=0, by ItIt is added to Ie.The corresponding relationship of record code section and source file location simultaneously is inserted so as to subsequent Stake processing.But this mode will appear I after extractingn, InIt will lead to erroneous judgement, because it was assumed thatAnd If it is considered to In∈ E, thenGenerate contradiction.Therefore it also needs to InProcessing is iterated to guarantee the code snippet extracted most Smallization.Pass through the I extracted to first timeeIt extracts in an identical manner, until each structure is Ir
In further embodiment of the present invention, code analysis process includes following content: extracting code using abstract syntax tree Sequence and symbolism processing, while being grouped according to entry meaning in code sequence, synonym collection is obtained, code sequence is expanded Column.
The code segment of extraction is resolved into one section of word sequence, is resolved to isometric sequence in all segments in order to defeated Enter into LSTM model.Code segment resolution phase extracts code sequence using abstract syntax tree (AST).It carries out at symbolism simultaneously Reason, if integer representation is num, string table is shown as str.But in the classification, string content can generate certain influence.Greatly It includes a feature in code snippet that partial error, which is handled, i.e., if in code snippet including character string, wraps in usual character string Containing error, vocabulary that fail etc. is close in meaning.Therefore character string symbol processing is carried out using two ways, is divided into and whether wraps Containing special key word, it is embodied as errstr and str.Although having extracted a part of false demonstration by preliminary analysis Vocabulary, but limited amount.WordNet, the English established and safeguarded by Princeton University can be used in the embodiment of the present invention Language dictionary is grouped by entry meaning, each has mutually convertible brief note group as a synonym collection, can be with The expansion of false demonstration vocabulary is carried out using brief note group therein.It, need to be to each due to using supervised learning method Sample carries out labeling processing, if code segment belongs to error handling code Duan Ze labeled as 1, is otherwise labeled as 0.Here it uses It is marked based on didactic mode, it is heuristic by 5 kinds below a large amount of source file analysis and summary: 1) in if (...) usually Including comparing;If 2) include false demonstration vocabulary in character string comprising character string;It 3) may be comprising returning to or jumping key Word, such as return, goto etc.;It include false demonstration vocabulary in general function name if 4) wherein include function;It 5) may packet Macrodefinition containing system mistake, such as ' EPERM', ' ENOENT' etc..
In another embodiment of the present invention, vector conversion is carried out to token sequence, includes following content: being utilized Word2vec carries out text vector conversion, is exported by setting feature vector dimension and word frequency parameter, obtains term vector model; Dictionary index dictionary and term vector dictionary are obtained according to the term vector model, according to the dictionary index dictionary and term vector dictionary Obtain disaggregated model input data.
Vectorization processing is carried out as input using obtained token sequence, is widely used in text used here as one kind The tool word2vec of vectorization is converted, and exports to obtain term vector mould by setting feature vector dimension and word frequency parameter Type, the input according to obtained model foundation dictionary index dictionary and term vector dictionary as subsequent LSTM model.Referring to Fig. 3 institute Show, since the token number that different code segments includes is different, but LSTM may only receive the input of equal length, it is therefore desirable to Do filling and pruning modes.It can start to train LSTM network after the vectorization and code segment label for obtaining code segment, in mould In type other than the Primary layers such as Embedding layers necessary, LSTM unit, Dropout layers are increased, to prevent data over-fitting As a result.
Detection-phase is used to detect the type of given unknown code segment, if one section of code belongs to error handling code Then export its position of affiliated file and the code segment in source file.It is specific to examine under the premise of giving a unknown purpose Survey process is as follows: error handling code segment structure in 1. extraction projects in each source file and record its affiliated filename with Position;2. carrying out code segment to parse to obtain respective code sequence;3. using the word2vec model that early period, training obtained to upper The code sequence that one step obtains carries out vectorization processing according to rule;4. obtained vector is input to trained LSTM network In judged.
Further, in another embodiment of the present invention, increase path truncation label with mark check instruction process, lead to It crosses and is inserted into truncation label in path in source code data set, path truncation label is defined at bss sections of code;And it is inserted original Stake inlet carries out path and mark check is truncated.
By the way of the Eembedded Assembly code in source code.Realize that subsequent pitching pile cancels note by being inserted into label in source code The function of record, setting flag continue_log.The label is defined in bss sections, since bss segment data is no initializtion Data, this section of memory will do it clearing before each run, therefore the label can be set to 1 to indicate that successor basic block is no longer remembered Record.By carrying out continue_log tag query in original pitching pile inlet, is jumped to if it is 1 and realize cancellation at return This time record of basic block.It is always 1 that down-stream marks in operation, therefore successor basic block does not also re-record, i.e., if Original path is 1 → 2 → 3 →..., if being this time recorded as 1 → 2 comprising label at 3.And when next round tests beginning, Label is reset, can be with normal recordings execution route.
The error handling code section that the error code disaggregated model obtained by early period using LSTM network training is recognized, In conjunction with the index that if-else structure in source file and position are established, pitching pile is carried out to error handling code section.Using inside and outside pitching pile Mode realize that because if sentence will be compiled as conditional jump instructions, and the pitching pile of fuzz testing device exactly relies on condition to jump Turn instruction to be determined, therefore before if sentence, i.e., progress source code pitching pile, which can influence successor basic block, outside if structure is No carry out pitching pile.And before the first statement of if structure, i.e., subsequent pitching pile can be determined by carrying out source code pitching pile in if structure The recording mode of basic block, it is subsequent to still remain conditional jump because first statement belongs to the beginning for jumping basic block Instruction, therefore first statement will determine the record result of all successor basic blocks on the traversal basic block path.It is broadly divided into Three kinds of situations below:
The first situation is that error handling code is located in if structure, in such cases need to be before if sentence and if structure Pitching pile is carried out before interior first statement.
Second situation is that error handling code is located in else structure, in such cases need to be in first in else structure Before sentence and else structure close on if sentence before carry out pitching pile.If preceding sequence structure be else if structure, need by Else if structure is converted to if structure, carries out pitching pile before if later.
The third situation is that error handling code is located in else if structure, in if after same progress structure conversion process Pitching pile is carried out before first statement before sentence and in else if structure.When carrying out else if structure pitching pile, due to destroying Original code structure, it is therefore desirable to be repaired to guarantee the correct compilation run of code.Source code knot in the embodiment of the present invention Structure reparation algorithm can design as follows:
The accurate compilation run of program may be implemented by above-mentioned algorithm.
For test program, the error handling code section recognized by disaggregated model in training process, and combine source code Data set conditional code structure and location index carry out pitching pile to error handling code section using inside and outside pitching pile mode.It is preferred that , for situation of the path truncation label after pitching pile code, cancels the original pitching pile of conditions present, carried out using do-nothing instruction slotting Simultaneously comment token is added at instruction annotation in stake.Preferably, in for same conditional statement in different source code data sets Optimization problem is compiled, label bifurcation is set in and cancels pitching pile.
If continue_log label will lead to the basic block after pitching pile code and still be recorded, although subsequent code Block is because flag bit has been set 1 and recorded without will continue to, but basic block record still can be considered as generating new route, make this Test sample is retained, and does not achieve the purpose that specific aim is tested.Therefore it needs to cancel conditions present in this case and jump Original pitching pile.In view of not influencing the normal execution flow of program, pitching pile is carried out using do-nothing instruction (nop) and at instruction annotation Label is added.Due to by compilation of source code be assembly code after will not remove assembly code annotation, can use annotation Label is determined.When encountering comment token, pitching pile is marked into assignment, later when encountering conditional jump instructions according to pitching pile Label carries out pitching pile judgement, then marks pitching pile and removes, recycles the process until all code pitching piles terminate.After being realized with this Pitching pile process is skipped after code block.Due to there is compiling optimization, i.e. same if (i!=0) sentence is in different source documents Assembly instruction situations such as there may be jz and jnz in part, and original pitching pile is only in the negative place of jumping progress pitching pile.Therefore using mark Note bifurcation cancels the mode of pitching pile to cope with above situation, records although eliminating an effective basic block, not shadow Ring the information of active path.
Based on above-mentioned software obfuscation test method, the embodiment of the present invention also provides a kind of based on the soft of path record truncation Part fuzz testing device, it is shown in Figure 4, include: training module 101, mark module 102 and test module 103, wherein
Training module 101, for constructing project data collection and extraction conditions code structure, the CC condition code knot that will be extracted After structure carries out mode input data processing, the input as low frequency path jump condition code structure disaggregated model carries out model instruction Practice, wherein disaggregated model uses LSTM network architecture;
Mark module 102 is instructed for increasing truncation label in path in fuzz testing device pitching pile code with mark check;
Test module 103, for being directed to program to be tested, extraction conditions code structure is simultaneously carried out at mode input data Reason, will treated that data are inputted as trained disaggregated model, identify low frequency path jump condition code structure, and Progress source code level pitching pile in corresponding position in source file carries out path truncation and completes source code to obscure according to path truncation label Test.
Unless specifically stated otherwise, the opposite step of the component and step that otherwise illustrate in these embodiments, digital table It is not limit the scope of the invention up to formula and numerical value.
Based on above-mentioned method, the embodiment of the present invention also provides a kind of server, comprising: one or more processors;It deposits Storage device, for storing one or more programs, when one or more of programs are executed by one or more of processors, So that one or more of processors realize above-mentioned method.
Based on above-mentioned method, the embodiment of the present invention also provides a kind of computer-readable medium, is stored thereon with computer Program, wherein the program realizes above-mentioned method when being executed by processor.
The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In all examples being illustrated and described herein, any occurrence should be construed as merely illustratively, without It is as limitation, therefore, other examples of exemplary embodiment can have different values.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
The flow chart and block diagram in the drawings show the system of multiple embodiments according to the present invention, method and computer journeys The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, section or code of table, a part of the module, section or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually base Originally it is performed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that It is the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, can uses and execute rule The dedicated hardware based system of fixed function or movement is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation, in another example, multiple units or components can To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be through some communication interfaces, device or unit it is indirect Coupling or communication connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, of the invention Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the present invention State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can be with Store the medium of program code.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. a kind of software obfuscation test method based on path record truncation, which is characterized in that include following content:
The CC condition code structure extracted is carried out mode input data by A) building project data collection and extraction conditions code structure After processing, the input as low frequency path jump condition code structure disaggregated model carries out model training, wherein disaggregated model is adopted With LSTM network architecture;
B) increase truncation label in path in fuzz testing device pitching pile code to instruct with mark check;
C it) is directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, and data are made by treated For trained disaggregated model input, identify low frequency path jump condition code structure, and in source file corresponding position into Row source code level pitching pile carries out path truncation according to path truncation label and completes source code fuzz testing.
2. the software obfuscation test method according to claim 1 based on path record truncation, which is characterized in that A) it extracts In CC condition code structure, the denotational description towards source code data set is defined first and extracts CC condition code knot all in data set Then the CC condition code structure of extraction is carried out code analysis, and carries out labeling processing to parsing result and obtain code by structure Token sequence;Vector conversion is carried out to token sequence, obtains disaggregated model input data.
3. the software obfuscation test method according to claim 2 based on path record truncation, which is characterized in that A) in, CC condition code structure all in data set is extracted, includes following content: source code data set being pre-processed first, extraction has Imitate code;Then, to valid code extraction conditions structured set, and similar stack is constructed, while record code section and source code data Collect position corresponding relationship, processing is iterated to nested construction of condition set according to similar stack, obtains the code piece of minimum Section.
4. the software obfuscation test method according to claim 2 based on path record truncation, which is characterized in that A) in, Code analysis process includes following content: extracting code sequence and symbolism processing using abstract syntax tree, while according to code Entry meaning is grouped in sequence, obtains synonym collection, expands code sequence.
5. the software obfuscation test method according to claim 2 based on path record truncation, which is characterized in that A) in, Vector conversion is carried out to token sequence, includes following content: carrying out text vector conversion using word2vec, passes through setting Feature vector dimension and word frequency parameter export, and obtain term vector model;According to the term vector model obtain dictionary index dictionary and Term vector dictionary obtains disaggregated model input data according to the dictionary index dictionary and term vector dictionary.
6. the software obfuscation test method according to claim 1 based on path record truncation, which is characterized in that B) in, Increase path truncation label with mark check instruction process, passes through insertion path truncation label, the road in source code data set Diameter truncation label is defined at bss sections of code;And path is carried out in original pitching pile inlet, mark check is truncated.
7. the software obfuscation test method according to claim 6 based on path record truncation, which is characterized in that C) in, For test program, the error handling code section recognized by disaggregated model in training process, and combine in source code data set CC condition code structure and location index carry out pitching pile to error handling code section using inside and outside pitching pile mode.
8. the software obfuscation test method according to claim 7 based on path record truncation, which is characterized in that C) in, For situation of the path truncation label after pitching pile code, cancels the original pitching pile of conditions present, pitching pile is carried out using do-nothing instruction And comment token is added at instruction annotation.
9. the software obfuscation test method according to claim 7 based on path record truncation, which is characterized in that C) in, For same conditional statement in different source code data sets in compiling optimization problem, be set in label bifurcation cancel insert Stake.
10. a kind of software obfuscation test device based on path record truncation is, characterized by comprising: training module, label mould Block and test module, wherein
Training module carries out the CC condition code structure extracted for constructing project data collection and extraction conditions code structure After mode input data processing, the input as low frequency path jump condition code structure disaggregated model carries out model training, In, disaggregated model uses LSTM network architecture;
Mark module is instructed for increasing truncation label in path in fuzz testing device pitching pile code with mark check;
Test module, for being directed to program to be tested, extraction conditions code structure simultaneously carries out mode input data processing, will handle Data afterwards are inputted as trained disaggregated model, identify low frequency path jump condition code structure, and in source file Corresponding position carries out source code level pitching pile, carries out path truncation according to path truncation label and completes source code fuzz testing.
CN201910012433.1A 2019-01-07 2019-01-07 Software fuzzy test method and device based on path record truncation Active CN109885479B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910012433.1A CN109885479B (en) 2019-01-07 2019-01-07 Software fuzzy test method and device based on path record truncation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910012433.1A CN109885479B (en) 2019-01-07 2019-01-07 Software fuzzy test method and device based on path record truncation

Publications (2)

Publication Number Publication Date
CN109885479A true CN109885479A (en) 2019-06-14
CN109885479B CN109885479B (en) 2022-02-01

Family

ID=66925678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910012433.1A Active CN109885479B (en) 2019-01-07 2019-01-07 Software fuzzy test method and device based on path record truncation

Country Status (1)

Country Link
CN (1) CN109885479B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851830A (en) * 2019-10-24 2020-02-28 中国人民解放军战略支援部队信息工程大学 CPU (Central processing Unit) -oriented undisclosed instruction discovery method based on instruction format identification
CN111191245A (en) * 2019-12-24 2020-05-22 中国人民解放军战略支援部队信息工程大学 Fuzzy test method based on path perception variation strategy
CN111563040A (en) * 2020-05-08 2020-08-21 中国工商银行股份有限公司 Block chain intelligent contract code testing method and device
CN111913878A (en) * 2020-07-13 2020-11-10 苏州洞察云信息技术有限公司 Program analysis result-based bytecode instrumentation method, device and storage medium
CN112069061A (en) * 2020-08-19 2020-12-11 北京科技大学 Software security vulnerability detection method and system for deep learning gradient guidance variation
CN112306853A (en) * 2019-08-01 2021-02-02 深圳市腾讯计算机系统有限公司 Fuzzy test method, device, equipment and medium
CN112905493A (en) * 2021-04-07 2021-06-04 南京大学 Structured fuzzy test method based on conversion test
CN113434386A (en) * 2021-05-26 2021-09-24 深圳开源互联网安全技术有限公司 Method, system and storage medium for fuzz testing
CN113688036A (en) * 2021-08-13 2021-11-23 北京灵汐科技有限公司 Data processing method, device, equipment and storage medium
CN114064506A (en) * 2021-11-29 2022-02-18 电子科技大学 Binary program fuzzy test method and system based on deep neural network
CN114491424A (en) * 2021-12-31 2022-05-13 西安电子科技大学 Binary code clipping method based on fuzzy test
CN114546816A (en) * 2020-11-25 2022-05-27 腾讯科技(深圳)有限公司 Test method, test platform, test device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385550A (en) * 2010-08-30 2012-03-21 北京理工大学 Detection method for software vulnerability
CN109032942A (en) * 2018-07-24 2018-12-18 北京理工大学 A kind of fuzz testing frame based on AFL
CN109117367A (en) * 2018-07-24 2019-01-01 北京理工大学 A kind of fuzz testing variation quantity determines method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385550A (en) * 2010-08-30 2012-03-21 北京理工大学 Detection method for software vulnerability
CN109032942A (en) * 2018-07-24 2018-12-18 北京理工大学 A kind of fuzz testing frame based on AFL
CN109117367A (en) * 2018-07-24 2019-01-01 北京理工大学 A kind of fuzz testing variation quantity determines method and apparatus

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BIN ZHANG 等: "S2F:Discover Hard-to-Reach Vulnerabilities by Semi-Symbolic Fuzz Testing", 《2017 13TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY》 *
CHENG LI 等: "RankFuzz: Fuzz Testing Based on Comprehensive Evaluation", 《2012 FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY》 *
欧阳永基 等: "基于异常分布导向的智能Fuzzing方法", 《电子与信息学报》 *
王蕴君 等: "基于功能性测试的软件质量模糊综合评判", 《电子工程师》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306853A (en) * 2019-08-01 2021-02-02 深圳市腾讯计算机系统有限公司 Fuzzy test method, device, equipment and medium
CN112306853B (en) * 2019-08-01 2023-12-12 深圳市腾讯计算机系统有限公司 Fuzzy test method, device, equipment and medium
CN110851830A (en) * 2019-10-24 2020-02-28 中国人民解放军战略支援部队信息工程大学 CPU (Central processing Unit) -oriented undisclosed instruction discovery method based on instruction format identification
CN110851830B (en) * 2019-10-24 2021-08-03 中国人民解放军战略支援部队信息工程大学 CPU (Central processing Unit) -oriented undisclosed instruction discovery method based on instruction format identification
CN111191245A (en) * 2019-12-24 2020-05-22 中国人民解放军战略支援部队信息工程大学 Fuzzy test method based on path perception variation strategy
CN111191245B (en) * 2019-12-24 2022-06-17 中国人民解放军战略支援部队信息工程大学 Fuzzy test method based on path perception mutation strategy
CN111563040A (en) * 2020-05-08 2020-08-21 中国工商银行股份有限公司 Block chain intelligent contract code testing method and device
CN111563040B (en) * 2020-05-08 2023-08-15 中国工商银行股份有限公司 Block chain intelligent contract code testing method and device
CN111913878A (en) * 2020-07-13 2020-11-10 苏州洞察云信息技术有限公司 Program analysis result-based bytecode instrumentation method, device and storage medium
CN111913878B (en) * 2020-07-13 2023-09-15 苏州洞察云信息技术有限公司 Byte code instrumentation method, device and storage medium based on program analysis result
CN112069061A (en) * 2020-08-19 2020-12-11 北京科技大学 Software security vulnerability detection method and system for deep learning gradient guidance variation
CN114546816A (en) * 2020-11-25 2022-05-27 腾讯科技(深圳)有限公司 Test method, test platform, test device, electronic equipment and storage medium
CN112905493B (en) * 2021-04-07 2023-07-18 南京大学 Structured fuzzy test method based on conversion test
CN112905493A (en) * 2021-04-07 2021-06-04 南京大学 Structured fuzzy test method based on conversion test
CN113434386A (en) * 2021-05-26 2021-09-24 深圳开源互联网安全技术有限公司 Method, system and storage medium for fuzz testing
CN113688036A (en) * 2021-08-13 2021-11-23 北京灵汐科技有限公司 Data processing method, device, equipment and storage medium
CN114064506A (en) * 2021-11-29 2022-02-18 电子科技大学 Binary program fuzzy test method and system based on deep neural network
CN114491424A (en) * 2021-12-31 2022-05-13 西安电子科技大学 Binary code clipping method based on fuzzy test
CN114491424B (en) * 2021-12-31 2024-05-03 西安电子科技大学 Binary code clipping method based on fuzzy test

Also Published As

Publication number Publication date
CN109885479B (en) 2022-02-01

Similar Documents

Publication Publication Date Title
CN109885479A (en) Software obfuscation test method and device based on path record truncation
Harer et al. Automated software vulnerability detection with machine learning
CN102339252B (en) Static state detecting system based on XML (Extensive Makeup Language) middle model and defect mode matching
Rabin et al. Understanding neural code intelligence through program simplification
CN111459799B (en) Software defect detection model establishing and detecting method and system based on Github
Shen et al. A survey of automatic software vulnerability detection, program repair, and defect prediction techniques
CN105787367B (en) A kind of the patch safety detecting method and system of software upgrading
CN106663003A (en) Systems and methods for software analysis
Ding et al. VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements
CN112307473A (en) Malicious JavaScript code detection model based on Bi-LSTM network and attention mechanism
CN116049831A (en) Software vulnerability detection method based on static analysis and dynamic analysis
CN111475820A (en) Binary vulnerability detection method and system based on executable program and storage medium
CN112256271B (en) Block chain intelligent contract safety detection system based on static analysis
CN108563561B (en) Program implicit constraint extraction method and system
CN112131122B (en) Method and device for source code defect detection tool misinformation evaluation
CN114911711A (en) Code defect analysis method and device, electronic equipment and storage medium
CN105487983B (en) Sensitive spot approach method based on intelligent Route guiding
CN115269427A (en) Intermediate language representation method and system for WEB injection vulnerability
Ahmed et al. Synfix: Automatically fixing syntax errors using compiler diagnostics
CN114385491B (en) JS translator defect detection method based on deep learning
Wang et al. {NLP-EYE}: Detecting Memory Corruptions via {Semantic-Aware} Memory Operation Function Identification
Ahmed et al. Learning to find usages of library functions in optimized binaries
CN115935369A (en) Method for evaluating source code using numeric array representation of source code elements
CN110286912A (en) Code detection method, device and electronic equipment
Wu et al. Code vulnerability detection based on deep sequence and graph models: A survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant