CN107688748A - Fragility Code Clones detection method and its device based on leak fingerprint - Google Patents

Fragility Code Clones detection method and its device based on leak fingerprint Download PDF

Info

Publication number
CN107688748A
CN107688748A CN201710789364.6A CN201710789364A CN107688748A CN 107688748 A CN107688748 A CN 107688748A CN 201710789364 A CN201710789364 A CN 201710789364A CN 107688748 A CN107688748 A CN 107688748A
Authority
CN
China
Prior art keywords
code
leak
fingerprint
fragility
bitmap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710789364.6A
Other languages
Chinese (zh)
Other versions
CN107688748B (en
Inventor
魏强
刘臻
林超
麻荣宽
柳晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA Information Engineering University
Original Assignee
PLA Information Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA Information Engineering University filed Critical PLA Information Engineering University
Priority to CN201710789364.6A priority Critical patent/CN107688748B/en
Publication of CN107688748A publication Critical patent/CN107688748A/en
Application granted granted Critical
Publication of CN107688748B publication Critical patent/CN107688748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Collating Specific Patterns (AREA)
  • Storage Device Security (AREA)

Abstract

The present invention relates to a kind of fragility Code Clones detection method and its device based on leak fingerprint, this method includes:Code sample is collected, establishes vulnerability scan;Selected leak, inquires about leak patch information, obtains fragility code sample;Build code parser;Fragility code sample is pre-processed using code parser, the intermediate representation to be standardized;Intermediate representation is divided into the code block that size is s rows, by the characteristic value of hash function calculation code block, and combination producing leak fingerprint;Code to be detected is pre-processed using code parser, obtains the characteristic value sequence of code to be detected;Leak fingerprint is mapped as to the bitmap of n positions, utilizes clone and the output that whether there is fragility code in bitmap identification feature value sequence.The present invention successfully manages common code modification means in Code Clones, relation that can preferably between balance detection efficiency and detection accuracy, and good accuracy rate is kept while efficient detection extensive object.

Description

Fragility Code Clones detection method and its device based on leak fingerprint
Technical field
The invention belongs to computer software bug excavation technical field, more particularly to a kind of fragility based on leak fingerprint Code Clones detection method and its device.
Background technology
Fragility code refers to cause key code caused by software vulnerability, and the clone of fragility code may opened Identical leak is introduced during hair.With becoming increasingly popular for the Internet, applications, the growing demand of software result in The needs of Efficient Development, therefore the reuse based on existing component and Code Template has turned into the conventional means of software development, increases income Software (OSS) also turns into the good solution for improving software development efficiency and quality and reducing programming cost.But in OSS Numerous leaks can cause a large amount of software vulnerabilities as caused by Code Clones naturally, and this will form serious prestige to security of system The side of body.Black Duck companies of the U.S. point out that about 2/3rds business applications have the code of known bugs, and can With prediction, the attack quantity based on leak in Open Source Code will increase by 20% in 2017.Therefore, fragility Code Clones are detected Research it is most important.Code Clones are generally along with several code revision methods, such as annotation modifications, variable renaming, data Type Change, operator are changed, and statement sequence change, code block order is changed, and redundant code insertion is equivalent with control structure Conversion.Roy et al. proposes the generally accepted classification based on Code Clones, and we use modification level in code below clone Definition:Level1, by changing space and tab change the layout of code, and editor's annotation, code section not by Modification;Level2, change data type of a variable and function return value, and renaming identifier and variable;Level3, addition or Some code statements and modification expression formula or function call are deleted, the original function without changing code;Level4, adjustment Code structure is semantic without changing, and such as changes the order of code block, the equivalent transformation of control structure.The increase of code revision rank It will result directly in the increase of Code Clones detection difficulty.Level1 and Level2 modification can in work on hand effective detection, And further change and normally result in bigger computing cost.However, in fact, Level3 and Level4 modification is quite general Time.
As towards the important means in source code rank bug excavation, Current Domestic proposes various software fragility outside Property Code Clones detection method.For example Jang et al. proposes fragility code gram in a kind of quick search operation system code storehouse Grand system ReDeBug.By with code behavioral value granularity, being cloned using sliding window algorithm and bloom filter lookups Code.ReDeBug has greater advantage in speed, supports the detection towards extensive code library, but it can not tackle variable The common code modification means such as renaming or data type change, show higher rate of failing to report and rate of false alarm.Li et al. pins CLORIFI is proposed to buffer-overflow vulnerability, the clone of known bugs code is searched to position possibility by n-token algorithms Existing leak, checking leak is tested using concolic to reduce wrong report on this basis.CLORIFI improves the standard of detection True property, but because the limitation of fragility Code Clones detection algorithm result in higher rate of failing to report, and detect consumption resource compared with Greatly.Gan Shuitao etc. proposes the fragility Code Clones detection method CVdetector of feature based matrix.By traveling through leak The Vulnerability Characteristics matrix and characteristic vector of the syntactic analysis tree construction key node of code snippet, are realized using clustering algorithm Detection to polytype leak.Although the time overhead of this method and the size of code detected are linear, in efficiency still Larger room for promotion be present.Kim et al. proposes a kind of method VUDDY of efficient detection function level fragility Code Clones, Comparing and function length filtration using function signature realizes high efficiency and scalability, can be with higher accuracy identification Know the Code Clones of leak.But this method can only tackle simple code modification, word order modification, redundant code insertion etc. are not supported often The code revision means seen, application scenarios have larger limitation.
Object code is generally converted to intermediate representation by the method for existing fragility code reuse detection, such as analytic tree or control Flow graph processed, it can be caused based on this by the way that identical structure or same characteristic features are matched with known case to find by Code Clones Leak.Complicated intermediate representation potentially contributes to improve accuracy, but also results in higher calculating cost, and the high level of abstraction It may effectively utilize, but lose necessary fragile factor.Therefore, how under acceptable cost balance efficiency and Accuracy, and the common modification means in Code Clones are effectively handled, there is important Research Significance.
The content of the invention
For deficiency of the prior art, the present invention provides a kind of fragility Code Clones detection side based on leak fingerprint Method and its device, solve when being changed in various degree in face of code to occur in existing software vulnerability Code Clones detection process High rate of failing to report, low detection efficiency, using the situation such as limited, by being pre-processed to fragility code sample and feature extraction Software vulnerability feature is obtained, and generates leak fingerprint, fingerprint recognition and positioning are carried out to code to be detected using leak fingerprint, To tackle a variety of modification means in Code Clones, while efficient detection extensive object, good accuracy rate is kept, is had The wider scope of application.
According to design provided by the present invention, a kind of fragility Code Clones detection method based on leak fingerprint, Comprise the following steps:
Step 1), leak v for building fingerprint is selected, leak patch letter is inquired about from open vulnerability information database Breath, obtain corresponding fragility code sample in patch;
Step 2), structure code parser;
Step 3), using code parser fragility code sample is pre-processed, the intermediate representation to be standardized;
Step 4), intermediate representation is divided into the code block that size is s rows, passes through the feature of hash function calculation code block Value, and combination producing leak fingerprint;
Step 5), using code parser code to be detected is pre-processed, pass through sliding window method and using breathing out Uncommon function calculates the characteristic value of code to be detected, obtains its characteristic value sequence seqf
Step 6), the bitmap bitmap that leak fingerprint is mapped as to n positionsv, utilize bitmap bitmapvIdentification feature value sequence seqfIn whether there is the clone of fragility code, if in the presence of being recorded and exported to it.
It is above-mentioned, in the acquisition patch in step 1) corresponding fragility code sample, in particular to:It is right in patch to obtain All diff files answered, using obtained diff files as fragility code sample.
Above-mentioned, structure code parser, refers in step 2):Using ANTLR, according to C/C++ morphology and syntactic definition Generate the code parser towards C/C++.
Above-mentioned, before being pre-processed in step 3), code is first subjected to small letter conversion, and delete excess space, tab, change Row symbol and all annotations, retraction pattern is changed to Lisp.
Above-mentioned, pre-processed in step 3), comprising:To the function in code and parameter name, variable identifier, data class Type, character string constant and function call name do unified replacement respectively.
Above-mentioned, pre-processed in step 3), comprising:By function declaration and the parameter name of form of ownership is recorded, is used Each parameter in symbol _ PARAM replacement function bodies, while replace the function name in the function declaration with symbol _ FUNCDEC Claim;Use all variables defined in symbol _ DATA replacement functions;Replace what is stated in ISO C standards using symbol _ TYPE All data types, while self-defined structure body all in code is replaced, the pass in structure statement is retained in replacement process Key word struct;Use symbol _ STR substitute character string constants;Each function call is replaced using symbol _ FUNCTION, is retained Usage and parameter value.
Above-mentioned, step 4) includes following content:
Step 41), the code change in leak patch is determined according to diff files, it is s rows to select code block size;
Step 42), the intermediate representation for being obtained after pretreatment, the s line code blocks of deletion are designated as blockD, will add S line code blocks be designated as blockA, it is filled or is separated into when the code block deleted or added is less than s rows is multiple Continuous code block, last code block is filled with context;
Step 43), using hash function calculate each blockDAnd blockACharacteristic value, with addition mark before characteristic value Or mark is deleted to distinguish;Characteristic value is respectively combined to the characteristic sequence seq of addition by typeAWith deleted characteristic sequence seqDIn, form characteristic vector V corresponding to a diff filediff=[seqA, seqD];
Step 44), for all diff files in leak repairing program above-mentioned steps are performed, obtain corresponding characteristic vector Vdiff1、Vdiff2…Vdiffn, for the leak v of n diff file in patch, all characteristic vectors collectively constitute leak fingerprint Fv={ Vdiff1, Vdiff2…Vdiffn}。
Preferably, it is as follows to include content for step 6:
Step 61), by leak fingerprint FvMapping turns into the bitmap bitmap of a m positionv, wherein, m meetsBitmap under original statevIn each be all set to 0;
Step 62), the characteristic value sequence seq for code to be detectedf, travel through seqf, check whether each characteristic value is Leak fingerprint FvIn, if it is present by this feature value in bitmapvMiddle corresponding positions are set to 1, otherwise without modification;Simultaneously will seqfAffiliated function name and the filename composition label Tag at place are attached to the position;
Step 63), judgement meet the code snippet of preparatory condition for fragility Code Clones and exported, preparatory condition Comprising condition 1 and condition 2, wherein,
Condition 1:The code added in all diff should all be not present in code to be measured;The code deleted in all diff All should completely exist in code to be measured;
Condition 2:Bitmap bitmapvIn, for same VdiffPosition corresponding to interior all characteristic values, file famous prime minister in its label Deng.
Preferably, in condition 1, according to bitmap bitmapvJudged, show bitmap bitmapvIn be:VdiffIn with The all values of mark are added in bitmapvMiddle correspondence position is all 0;VdiffIn with delete mark all values in bitmapvIn it is right It is all 1 to answer position.
A kind of detection means of the fragility Code Clones based on leak fingerprint, comprising:Collection module, sample acquisition mould Block, code parser structure module, sample preprocessing module, leak fingerprint generation module, code pretreatment module to be measured and generation Code clone detection module, wherein,
Collection module, collect code sample and establish vulnerability scan using extraction technique;
Sample acquisition module, the leak v for building fingerprint is selected, inquire about leak from open vulnerability information database and mend Fourth information, obtain corresponding fragility code sample in patch;
Code parser builds module, and code parser is generated using island grammer by ANTLR;
Sample preprocessing module, fragility code sample is pre-processed using code parser, standardized Intermediate representation;
Leak fingerprint generation module, intermediate representation is divided into the code block that size is s rows, generation is calculated by hash function The characteristic value of code block, and combination producing leak fingerprint;
Code pretreatment module to be measured, code to be detected is pre-processed using code parser, passes through sliding window Method and the characteristic value that code to be detected is calculated using hash function, obtain characteristic value sequence seqf
Code Clones detection module, leak fingerprint is mapped as to the bitmap bitmap of n positionsv, utilize bitmap bitmapvIdentification Characteristic value sequence seqfIn whether there is the clone of fragility code, if in the presence of being recorded and exported to it.
Beneficial effects of the present invention:
1st, the present invention by fragility code sample is pre-processed and feature extraction obtain software vulnerability feature, and Leak fingerprint is generated, fingerprint recognition and positioning are carried out to code to be detected using leak fingerprint, it is a variety of in Code Clones to tackle Modification means, while efficient detection extensive object, good accuracy rate is kept, there is the wider scope of application, can Effectively it is multiplexed existing vulnerability information storehouse knowledge;Inspection is improved using the feature construction fingerprint and the efficient recognition methods of utilization of lightweight Efficiency is surveyed, copes with code revision mode common in Multiple Code cloning procedure;With detection efficiency is high, rate of false alarm is low, The advantages of scalability is strong, beneficial support can be provided for software source code Hole Detection.
2nd, the present invention improves the accuracy of fragility Code Clones detection, copes with a variety of common code revision hands Section, there is extensive detection applicability;Compared with prior art, the present invention can be solved effectively as follows present in current method Deficiency:1) can not successfully manage in the Code Clones such as form modifying, variable renaming, rubbish code insertion, sentence rearrangement Common amending method;2) detection method based on complicated intermediate representation exists in detection process solves computationally intensive, detection The problem of efficiency is low, detection efficiency can not be taken into account very well while accuracy is improved.In the same of the extensive object of efficient detection When keep good accuracy rate, there is great importance to computer network security and bug excavation technology.
Brief description of the drawings:
Fig. 1 is the method flow schematic diagram of the present invention;
Fig. 2 is leak fingerprint product process figure in embodiment;
Fig. 3 is to carry out Code Clones overhaul flow chart by bitmap in embodiment;
Fig. 4 is the schematic device of the present invention;
Fig. 5 is implementation process schematic diagram in embodiment;
Fig. 6 is the leak fingerprint schematic diagram generated in embodiment;
Fig. 7 is this signal of fingerprint bit pattern corresponding to two kinds of situations of fragility Code Clones and non-cloned codes in embodiment Figure.
Embodiment:
To make the object, technical solutions and advantages of the present invention clearer, clear, below in conjunction with the accompanying drawings with technical scheme pair The present invention is described in further detail.
Fragility code refers to cause key code caused by software vulnerability, and the clone of fragility code may opened Identical leak is introduced during hair.To solve the existing reply code revision ability in fragility Code Clones detection process not By force, the problems such as detection efficiency is not high, the present embodiment provide a kind of fragility Code Clones detection method based on leak fingerprint, ginseng As shown in Figure 1, comprise the following steps:
Step 11, leak v for building fingerprint is selected, leak patch letter is inquired about from open vulnerability information database Breath, obtain corresponding fragility code sample in patch;
Step 12, structure code parser;
Step 13, using code parser fragility code sample is pre-processed, the intermediate representation to be standardized;
Step 14, intermediate representation is divided into the code block that size is s rows, passes through the feature of hash function calculation code block Value, and combination producing leak fingerprint;
Step 15, using code parser code to be detected is pre-processed, pass through sliding window method and using breathing out Uncommon function calculates the characteristic value of code to be detected, obtains its characteristic value sequence seqf
Step 16, the bitmap bitmap that leak fingerprint is mapped as to n positionsv, utilize bitmap bitmapvIdentification feature value sequence seqfIn whether there is the clone of fragility code, if in the presence of being recorded and exported to it.
Using code parser and leak fingerprint extraction bug code feature, successfully manage common code in Code Clones and repair Change means, relation that can preferably between balance detection efficiency and detection accuracy, in the same of the extensive object of efficient detection When keep good accuracy rate, there is great importance to computer network security and bug excavation technology.
Diff files are a kind of representations of conventional code revision, by one section of code line sequence for carrying special marking Composition.In an alternative embodiment of the invention, obtain patch in corresponding fragility code sample, in particular to:Obtain patch In corresponding all diff files, using obtained diff files as fragility code sample.
Code parser is the important component of code morphological analysis and parsing, and it is the base of pretreatment and feature extraction Plinth.ANTLR as increasing income syntax analyzer, can be automatically generated according to input syntax tree and it is visual show, go out In the consideration of efficiency and robustness, this programme does not use the resolver being integrated in compiler (such as llvm and gcc), using increasing income Project ANTLR v4 build resolver, and C/C++ code parsers are generated using island grammer corresponding to ANTLR v4.
Further to improve detection efficiency and accuracy, fragility code sample is being located in advance using code parser Before reason, code is first subjected to small letter conversion, and deletes excess space, tab, newline and all annotations, by retraction pattern more It is changed to Lisp.
Above-mentioned, fragility code sample is being pre-processed using code parser, comprising:To the function in code Do unified replacement respectively with parameter name, variable identifier, data type, character string constant and function call name.
Further, fragility code sample is being pre-processed using code parser, comprising:
A) function and parameter name are replaced:By function declaration and record all formal parameter titles, using symbol _ Each parameter in PARAM replacement function bodies;The function name in the function declaration is replaced with symbol _ FUNCDEC simultaneously.
B) variable identifier is replaced:Use all variables defined in symbol _ DATA replacement functions.
C) data type is replaced:All data types stated in ISO C standards are replaced using symbol _ TYPE, simultaneously Replace all self-defined structure body in code, retain in replacement process keyword " struct " in structure statement so as to Normal data type classification.Do not replace the member variable stated in self-defined structure body simultaneously, or " signed " or The types of variables modifier such as " unsigned ".
D) character string constant is replaced:Use symbol _ STR substitute character string constants.Include the lattice such as " %s ", " %d ", " %f " The character of formula character is not replaced.
E) function call title is replaced:Each function call is replaced using symbol _ FUNCTION, retains its usage and parameter Value.
In an alternative embodiment of the invention, intermediate representation is divided into the code block that size is s rows, passes through hash function The characteristic value of calculation code block, and combination producing leak fingerprint, it is shown in Figure 2, include following content:
41) code change in leak patch, is determined according to diff files, it is s rows to select code block size;
42), for the intermediate representation obtained after pretreatment, the s line code blocks of deletion are designated as blockD, by the s of addition Line code block is designated as blockA, multiple companies are filled or are separated into it when the code block deleted or added is less than s rows Continuous code block, last code block is filled with context;
43), each block is calculated using hash functionDAnd blockACharacteristic value, marked or deleted with addition before characteristic value Except mark is to distinguish;Characteristic value is respectively combined to the characteristic sequence seq of addition by typeAWith deleted characteristic sequence seqD In, form characteristic vector V corresponding to a diff filediff=[seqA, seqD];
44) above-mentioned steps, are performed for all diff files in leak repairing program, obtain corresponding characteristic vector Vdiff1、Vdiff2…Vdiffn, for the leak v of n diff file in patch, all characteristic vectors collectively constitute leak fingerprint Fv={ Vdiff1, Vdiff2…Vdiffn}。
In one more embodiment of the present invention, leak fingerprint is mapped as to the bitmap bitmap of n positionsv, utilize bitmap bitmapv Identification feature value sequence seqfIn whether there is fragility code clone, it is shown in Figure 3, it is as follows comprising content:
61), by leak fingerprint FvMapping turns into the bitmap bitmap of a m positionv, wherein, m meets Bitmap under original statevIn each be all set to 0;
62), for the characteristic value sequence seq of code to be detectedf, travel through seqf, check whether each characteristic value is leak Fingerprint FvIn, if it is present by this feature value in bitmapvMiddle corresponding positions are set to 1, otherwise without modification;Simultaneously by seqf Affiliated function name and the filename composition label Tag at place are attached to the position;
63) code snippet for, judging to meet preparatory condition is fragility Code Clones and is exported, and preparatory condition includes Condition 1 and condition 2, wherein,
Condition 1:The code added in all diff should all be not present in code to be measured;The code deleted in all diff All should completely exist in code to be measured;
Condition 2:Bitmap bitmapvIn, for same VdiffPosition corresponding to interior all characteristic values, file famous prime minister in its label Deng.
Preferably, in condition 1, according to bitmap bitmapvJudged, show bitmap bitmapvIn be:VdiffIn with The all values of mark are added in bitmapvMiddle correspondence position is all 0;VdiffIn with delete mark all values in bitmapvIn it is right It is all 1 to answer position.
Corresponding with the above method, the embodiment of the present invention additionally provides a kind of fragility Code Clones based on leak fingerprint Detection means, it is shown in Figure 4, comprising:Collection module 201, sample acquisition module 202, code parser structure module 203, Sample preprocessing module 204, leak fingerprint generation module 205, code pretreatment module 206 to be measured and Code Clones detection module 207, wherein,
Collection module 201, collect code sample and establish vulnerability scan using extraction technique;
Sample acquisition module 202, the leak v for building fingerprint is selected, leakage is inquired about from open vulnerability information database Hole patch information, obtain corresponding fragility code sample in patch;
Code parser builds module 203, by ANTLR according to C/C++ morphology and syntactic definition generation towards C/C++'s Code parser;
Sample preprocessing module 204, fragility code sample is pre-processed using code parser, standardized Intermediate representation;
Leak fingerprint generation module 205, intermediate representation is divided into the code block that size is s rows, passes through hash function meter Calculate the characteristic value of code block, and combination producing leak fingerprint;
Code pretreatment module 206 to be measured, code to be detected is pre-processed using code parser, passes through sliding window Mouth method and the characteristic value that code to be detected is calculated using hash function, obtain characteristic value sequence seqf
Code Clones detection module 207, leak fingerprint is mapped as to the bitmap bitmap of n positionsv, utilize bitmap bitmapvKnow Other characteristic value sequence seqfIn whether there is the clone of fragility code, if in the presence of being recorded and exported to it.
Effectiveness of the invention is further illustrated below by concrete example, shown in Figure 5, implementation process is as follows:
1. selecting the leak for building fingerprint, diff all in its patch is obtained from CVE vulnerability information databases File is as fragility code sample.
Diff files are a kind of representations of conventional code revision, by one section of code line sequence for carrying special marking Composition.The code of addition is represented before code line with symbol "+", "-" represents deleted code, and no symbol description is not repaiied Change.
By taking leak CVE-2016-6198 patch as an example, included in the patch to fs/namei.c and fs/open.c files In two functions modify, following two diff files can be expressed as:
Diff (1) in the leak CVE-2016-6198 patches of table 1
Diff (2) in the leak CVE-2016-6198 patches of table 2
2. generating code parser, resolver is built using open source projects ANTLR v4, utilizes island corresponding to ANTLR v4 Grammer generates C/C++ code parsers.
3. code pre-processes
Fragility code sample is pre-processed using the code parser generated in step 2, by all code conversions For small letter, unnecessary space, tab, newline and all annotations are deleted.Retraction pattern is changed to Lisp patterns.Then it is pre- Processing is as follows:
Function and parameter name are replaced:By function declaration and record all formal parameter titles, using symbol _ Each parameter in PARAM replacement function bodies.The function name in the function declaration is replaced with symbol _ FUNCDEC simultaneously.
Variable identifier is replaced:Use all variables defined in symbol _ DATA replacement functions.
Data type is replaced:All data types stated in ISO C standards are replaced using symbol _ TYPE, are replaced simultaneously All self-defined structure bodies in replacement code, retain in replacement process keyword " struct " in structure statement so as to just Regular data type classification.Do not replace the member variable stated in self-defined structure body simultaneously, or " signed " or The types of variables modifier such as " unsigned ".
Character string constant is replaced:Use symbol _ STR substitute character string constants.Include the form such as " %s ", " %d ", " %f " The character of character is not replaced.
Function call title is replaced:Each function call is replaced using symbol _ FUNCTION, retains its usage and parameter Value.
Result is as follows after diff (1) pretreatments shown in table 1:
4. fingerprint generates
It is used for calculation code block eigenvalue as hash function for the MD5 algorithms of 8 bytes using output, and passes through following step Rapid generation leak fingerprint:
4.1 determine the code change in leak patch according to diff files, select appropriate code block size s rows, usual s Take 4.
4.2 for pretreated code, and the s line code blocks for selecting to delete in diff files are as blockD, selection The s line code blocks added in diff files are as blockA.These are filled in the case where deleting or adding code and be less than s rows S row blocks, or multiple continuous blocks are separated into, last block fills multiple s rows with context.
4.3 calculate each block using hash functionDAnd blockACharacteristic value.Before characteristic value table is represented with "+" mark Addition, marked and be deleted with "-".Characteristic value is respectively combined to the characteristic sequence seq of addition by typeAWith it is deleted Seq in characteristic sequenceDIn, form characteristic vector V corresponding to a diff filediff=[seqA, seqD]。
4.4 perform above-mentioned steps for all diff files in the Hotfix of this leak, obtain corresponding characteristic vector Vdiff1、Vdiff2…Vdiffn.Leak v for sharing n diff file in patch, all characteristic vectors collectively constitute leak and referred to Line Fv={ Vdiff1, Vdiff2…Vdiffn}。
The patch of leak CVE-2016-6198 shown in table 1 and table 2 carries out above-mentioned steps, obtains leakage as shown in Figure 6 Hole fingerprint.
5. calculate the characteristic value sequence of code to be detected
For each function f in code to be measured, after parsing and pretreatment, it is divided into using sliding window algorithm Size identical code block, and utilize characteristic value sequence seq corresponding with identical hash function calculating in step 4f.Sliding window The size size of mouthwinDetermined by the code block size determined in leak fingerprint.Characteristic value generating algorithm is described in algorithm 1.
Algorithm using pretreated function to be detected, window size as input, wherein window size size with step 4 really Fixed code block size s is identical.Characteristic value is calculated for the code block use in window and identical hash function in step 4, Characteristic sequence seq of the eigenvalue cluster of all windows into code to be measuredf, for detecting in next step.
6. leak identifies
6.1 bitmap mapping
For leak v leak fingerprint Fv, this programme mapped it onto as the bitmap bitmap of a m positionv, wherein m satisfactionsThat is the size of the bitmap and fingerprint FvIn the characteristic value total amount that includes it is identical.Pass through each in bitmap The presence of hashed value is corresponded in instruction fingerprint, so that simplify the process of fingerprint recognition, bitmap under original statevIn each all It is set to 0.
6.2 characteristic sequences travel through
For the seq obtained in step 4f, travel through seqfTo check whether each characteristic value is leak fingerprint FvIn.Such as Fruit is present, then by this feature value in bitmapvMiddle corresponding positions are set to 1, otherwise without modification.Simultaneously by seqfAffiliated function The filename at name and place composition label Tag is attached to the position, to follow the trail of the position searched fragility Code Clones and occurred.
6.3 leaks judge
Cloned for each bug code, while meet following two conditions:
Condition 1:
The code added in all diff should all be not present in code to be measured;The code deleted in all diff is to be measured All should completely exist in code.
The condition, which is shown in bitmap, is:
VdiffIn with "+" mark all values in bitmapvMiddle correspondence position is all 0;VdiffIn with all of "-" mark Value is in bitmapvMiddle correspondence position is all 1.
Condition 2:
Bitmap bitmapvIn, for same VdiffPosition corresponding to interior all characteristic values, filename should be equal in its label.This The false positive that extensive code may be brought can be reduced.
Bitmap corresponding to two samples is as shown in Figure 7.In Fig. 7, (a) meets all conditions, is considered as a fragility code Clone;(b) condition is unsatisfactory for, not as fragility Code Clones.
By above step, the detection to fragility Code Clones present in code to be measured is completed, reports file in Tag As clone position occurs for name and function name, so as to reach the purpose of fragility Code Clones detection.
Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be and other The difference of embodiment, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part It is bright.
With reference to the embodiments described herein describe each example unit and method and step, can with electronic hardware, Computer software or the combination of the two are realized, in order to clearly demonstrate the interchangeability of hardware and software, in described above In the composition and step of each example have been generally described according to function.These functions are held with hardware or software mode OK, the application-specific and design constraint depending on technical scheme.Those of ordinary skill in the art can be to each specific Using realizing described function using distinct methods, but this realization be not considered as it is beyond the scope of this invention.
One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program Related hardware is completed, and described program can be stored in computer-readable recording medium, such as:Read-only storage, disk or CD Deng.Alternatively, all or part of step of above-described embodiment can also be realized using one or more integrated circuits, accordingly Ground, each module/unit in above-described embodiment can be realized in the form of hardware, can also use the shape of software function module Formula is realized.The present invention is not restricted to the combination of the hardware and software of any particular form.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the application. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments in the case where not departing from spirit herein or scope.Therefore, the application The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (10)

1. a kind of detection method of the fragility Code Clones based on leak fingerprint, it is characterised in that comprise the following steps:
Step 1), leak v for building fingerprint is selected, inquire about leak patch information from open vulnerability information database, obtain Take corresponding fragility code sample in patch;
Step 2), structure code parser;
Step 3), using code parser fragility code sample is pre-processed, the intermediate representation to be standardized;
Step 4), intermediate representation is divided into the code block that size is s rows, by the characteristic value of hash function calculation code block, And combination producing leak fingerprint;
Step 5), using code parser code to be detected is pre-processed, pass through sliding window method and utilize Hash letter Number calculates the characteristic value of code to be detected, obtains its characteristic value sequence seqf
Step 6), the bitmap bitmap that leak fingerprint is mapped as to n positionsv, utilize bitmap bitmapvIdentification feature value sequence seqf In whether there is the clone of fragility code, if in the presence of being recorded and exported to it.
2. the detection method of the fragility Code Clones according to claim 1 based on leak fingerprint, it is characterised in that step It is rapid 1) in acquisition patch in corresponding fragility code sample, in particular to:Corresponding all diff files in patch are obtained, Using obtained diff files as fragility code sample.
3. the detection method of the fragility Code Clones according to claim 1 based on leak fingerprint, it is characterised in that step Rapid 2) middle structure code parser, refers to:Using ANTLR, the generation towards C/C++ is generated according to C/C++ morphology and syntactic definition Code resolver.
4. the detection method of the fragility Code Clones according to claim 1 based on leak fingerprint, it is characterised in that step Before rapid 3) middle pretreatment, code is first subjected to small letter conversion, and deletes excess space, tab, newline and all annotations, will Retraction pattern is changed to Lisp.
5. the detection method of the fragility Code Clones according to claim 1 based on leak fingerprint, it is characterised in that step Rapid 3) middle pretreatment, comprising:To the function in code and parameter name, variable identifier, data type, character string constant and Function call name does unified replacement respectively.
6. the detection method of the fragility Code Clones based on leak fingerprint, its feature exist according to claim 1 or 5 In, pre-processed in step 3), comprising:By function declaration and the parameter name of form of ownership is recorded, is replaced using symbol _ PARAM Each parameter changed in function body, while replace the function name in the function declaration with symbol _ FUNCDEC;Using symbol _ All variables defined in DATA replacement functions;All data class stated in ISO C standards are replaced using symbol _ TYPE Type, while self-defined structure body all in code is replaced, the keyword struct in structure statement is retained in replacement process; Use symbol _ STR substitute character string constants;Each function call is replaced using symbol _ FUNCTION, retains usage and parameter Value.
7. the detection method of the fragility Code Clones according to claim 2 based on leak fingerprint, it is characterised in that step It is rapid 4) to include following content:
Step 41), the code change in leak patch is determined according to diff files, it is s rows to select code block size;
Step 42), the intermediate representation for being obtained after pretreatment, the s line code blocks of deletion are designated as blockD, by the s rows of addition Code block is designated as blockA, it is filled or is separated into when the code block deleted or added is less than s rows is multiple continuous Code block, last code block is filled with context;
Step 43), using hash function calculate each blockDAnd blockACharacteristic value, marked or deleted with addition before characteristic value Except mark is to distinguish;Characteristic value is respectively combined to the characteristic sequence seq of addition by typeAWith deleted characteristic sequence seqD In, form characteristic vector V corresponding to a diff filediff=[seqA,seqD];
Step 44), for all diff files in leak repairing program above-mentioned steps are performed, obtain corresponding characteristic vector Vdiff1、Vdiff2…Vdiffn, for the leak v of n diff file in patch, all characteristic vectors collectively constitute leak fingerprint Fv={ Vdiff1,Vdiff2…Vdiffn}。
8. the detection method of the fragility Code Clones according to claim 7 based on leak fingerprint, it is characterised in that step Rapid 6 is as follows comprising content:
Step 61), by leak fingerprint FvMapping turns into the bitmap bitmap of a m positionv, wherein, m meets Bitmap under original statevIn each be all set to 0;
Step 62), the characteristic value sequence seq for code to be detectedf, travel through seqf, check whether each characteristic value is leak Fingerprint FvIn, if it is present by this feature value in bitmapvMiddle corresponding positions are set to 1, otherwise without modification;Simultaneously by seqf Affiliated function name and the filename composition label Tag at place are attached to the position;
Step 63), judgement meet the code snippet of preparatory condition for fragility Code Clones and exported that preparatory condition includes Condition 1 and condition 2, wherein,
Condition 1:The code added in all diff should all be not present in code to be measured;The code deleted in all diff is being treated Surveying in code all should completely be present;
Condition 2:Bitmap bitmapvIn, for same VdiffPosition corresponding to interior all characteristic values, file famous prime minister etc. in its label.
9. the detection method of the fragility Code Clones according to claim 8 based on leak fingerprint, it is characterised in that bar In part 1, according to bitmap bitmapvJudged, show bitmap bitmapvIn be:VdiffIn with addition mark all values exist bitmapvMiddle correspondence position is all 0;VdiffIn with delete mark all values in bitmapvMiddle correspondence position is all 1.
10. a kind of detection means of the fragility Code Clones based on leak fingerprint, it is characterised in that include:Collection module, sample This acquisition module, code parser structure module, sample preprocessing module, leak fingerprint generation module, code to be measured pretreatment Module and Code Clones detection module, wherein,
Collection module, collect code sample and establish vulnerability scan using extraction technique;
Sample acquisition module, the leak v for building fingerprint is selected, leak patch letter is inquired about from open vulnerability information database Breath, obtain corresponding fragility code sample in patch;
Code parser builds module, and code parser is generated using island grammer by ANTLR;
Sample preprocessing module, fragility code sample is pre-processed using code parser, the centre standardized Represent;
Leak fingerprint generation module, intermediate representation is divided into the code block that size is s rows, passes through hash function calculation code block Characteristic value, and combination producing leak fingerprint;
Code pretreatment module to be measured, code to be detected is pre-processed using code parser, passes through sliding window method And the characteristic value of code to be detected is calculated using hash function, obtain characteristic value sequence seqf
Code Clones detection module, leak fingerprint is mapped as to the bitmap bitmap of n positionsv, utilize bitmap bitmapvIdentification feature Value sequence seqfIn whether there is the clone of fragility code, if in the presence of carrying out positioning output to it.
CN201710789364.6A 2017-09-05 2017-09-05 Fragility Code Clones detection method and its device based on loophole fingerprint Active CN107688748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710789364.6A CN107688748B (en) 2017-09-05 2017-09-05 Fragility Code Clones detection method and its device based on loophole fingerprint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710789364.6A CN107688748B (en) 2017-09-05 2017-09-05 Fragility Code Clones detection method and its device based on loophole fingerprint

Publications (2)

Publication Number Publication Date
CN107688748A true CN107688748A (en) 2018-02-13
CN107688748B CN107688748B (en) 2019-09-24

Family

ID=61155236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710789364.6A Active CN107688748B (en) 2017-09-05 2017-09-05 Fragility Code Clones detection method and its device based on loophole fingerprint

Country Status (1)

Country Link
CN (1) CN107688748B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445844A (en) * 2018-11-05 2019-03-08 浙江网新恒天软件有限公司 Code Clones detection method based on cryptographic Hash, electronic equipment, storage medium
CN110147673A (en) * 2019-03-29 2019-08-20 中国科学院信息工程研究所 A kind of loophole position mask method and device based on text and source code symbol extraction
CN110209425A (en) * 2018-09-21 2019-09-06 电子科技大学 Source code towards C language clones detection method
CN110427316A (en) * 2019-07-04 2019-11-08 沈阳航空航天大学 Embedded software defect-restoration method therefor based on access behavior perception
CN110989991A (en) * 2019-10-25 2020-04-10 深圳开源互联网安全技术有限公司 Method and system for detecting source code clone open source software in application program
CN111046390A (en) * 2019-07-12 2020-04-21 哈尔滨安天科技集团股份有限公司 Cooperative defense patch protection method and device and storage equipment
CN111368305A (en) * 2019-07-12 2020-07-03 北京关键科技股份有限公司 Code security risk detection method
CN111506900A (en) * 2020-04-15 2020-08-07 北京字节跳动网络技术有限公司 Vulnerability detection method and device, electronic equipment and computer storage medium
CN112131570A (en) * 2020-09-03 2020-12-25 苏州浪潮智能科技有限公司 PCA-based password hard code detection method, device and medium
CN112329012A (en) * 2019-07-19 2021-02-05 中国人民解放军战略支援部队信息工程大学 Detection method for malicious PDF document containing JavaScript and electronic equipment
CN112528290A (en) * 2020-12-04 2021-03-19 扬州大学 Vulnerability positioning method, system, computer equipment and storage medium
CN112651028A (en) * 2021-01-05 2021-04-13 西安工业大学 Vulnerability code clone detection method based on context semantics and patch verification
CN112685080A (en) * 2021-01-08 2021-04-20 深圳开源互联网安全技术有限公司 Open source component duplicate checking method, system, device and readable storage medium
CN113434870A (en) * 2021-07-14 2021-09-24 中国电子科技网络信息安全有限公司 Vulnerability detection method, device, equipment and medium based on software dependence analysis
CN113901474A (en) * 2021-09-13 2022-01-07 四川大学 Vulnerability detection method based on function-level code similarity
CN114880674A (en) * 2022-04-28 2022-08-09 西安交通大学 Vulnerability detection method and system based on novel vulnerability fingerprint
CN115586920A (en) * 2022-12-13 2023-01-10 北京安普诺信息技术有限公司 Fragile code segment clone detection method and device, electronic equipment and storage medium
WO2023028721A1 (en) * 2021-08-28 2023-03-09 Huawei Technologies Co.,Ltd. Systems and methods for detection of code clones
CN117873905A (en) * 2024-03-11 2024-04-12 北京安普诺信息技术有限公司 Method, device, equipment and medium for code homology detection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065095A (en) * 2013-01-29 2013-04-24 四川大学 WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology
CN106295335A (en) * 2015-06-11 2017-01-04 中国科学院信息工程研究所 The firmware leak detection method of a kind of Embedded equipment and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065095A (en) * 2013-01-29 2013-04-24 四川大学 WEB vulnerability scanning method and vulnerability scanner based on fingerprint recognition technology
CN106295335A (en) * 2015-06-11 2017-01-04 中国科学院信息工程研究所 The firmware leak detection method of a kind of Embedded equipment and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SEULBAE KIM,ET AL: "VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery", 《2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY》 *
邓雪峰等: "基于数据位图的滑动分块算法", 《计算机研究与发展》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209425B (en) * 2018-09-21 2022-03-15 电子科技大学 C language-oriented source code clone detection method
CN110209425A (en) * 2018-09-21 2019-09-06 电子科技大学 Source code towards C language clones detection method
CN109445844A (en) * 2018-11-05 2019-03-08 浙江网新恒天软件有限公司 Code Clones detection method based on cryptographic Hash, electronic equipment, storage medium
CN110147673A (en) * 2019-03-29 2019-08-20 中国科学院信息工程研究所 A kind of loophole position mask method and device based on text and source code symbol extraction
CN110427316A (en) * 2019-07-04 2019-11-08 沈阳航空航天大学 Embedded software defect-restoration method therefor based on access behavior perception
CN110427316B (en) * 2019-07-04 2023-02-14 沈阳航空航天大学 Embedded software defect repairing method based on access behavior perception
CN111046390A (en) * 2019-07-12 2020-04-21 哈尔滨安天科技集团股份有限公司 Cooperative defense patch protection method and device and storage equipment
CN111046390B (en) * 2019-07-12 2023-07-07 安天科技集团股份有限公司 Collaborative defense patch protection method and device and storage equipment
CN111368305A (en) * 2019-07-12 2020-07-03 北京关键科技股份有限公司 Code security risk detection method
CN112329012A (en) * 2019-07-19 2021-02-05 中国人民解放军战略支援部队信息工程大学 Detection method for malicious PDF document containing JavaScript and electronic equipment
CN110989991A (en) * 2019-10-25 2020-04-10 深圳开源互联网安全技术有限公司 Method and system for detecting source code clone open source software in application program
CN110989991B (en) * 2019-10-25 2023-12-01 深圳开源互联网安全技术有限公司 Method and system for detecting source code clone open source software in application program
CN111506900A (en) * 2020-04-15 2020-08-07 北京字节跳动网络技术有限公司 Vulnerability detection method and device, electronic equipment and computer storage medium
CN111506900B (en) * 2020-04-15 2023-07-18 抖音视界有限公司 Vulnerability detection method and device, electronic equipment and computer storage medium
CN112131570B (en) * 2020-09-03 2022-06-24 苏州浪潮智能科技有限公司 PCA-based password hard code detection method, device and medium
CN112131570A (en) * 2020-09-03 2020-12-25 苏州浪潮智能科技有限公司 PCA-based password hard code detection method, device and medium
CN112528290A (en) * 2020-12-04 2021-03-19 扬州大学 Vulnerability positioning method, system, computer equipment and storage medium
CN112528290B (en) * 2020-12-04 2023-07-18 扬州大学 Vulnerability positioning method, vulnerability positioning system, computer equipment and storage medium
CN112651028A (en) * 2021-01-05 2021-04-13 西安工业大学 Vulnerability code clone detection method based on context semantics and patch verification
CN112685080B (en) * 2021-01-08 2023-08-11 深圳开源互联网安全技术有限公司 Open source component duplicate checking method, system, device and readable storage medium
CN112685080A (en) * 2021-01-08 2021-04-20 深圳开源互联网安全技术有限公司 Open source component duplicate checking method, system, device and readable storage medium
CN113434870A (en) * 2021-07-14 2021-09-24 中国电子科技网络信息安全有限公司 Vulnerability detection method, device, equipment and medium based on software dependence analysis
WO2023028721A1 (en) * 2021-08-28 2023-03-09 Huawei Technologies Co.,Ltd. Systems and methods for detection of code clones
CN113901474B (en) * 2021-09-13 2022-07-26 四川大学 Vulnerability detection method based on function-level code similarity
CN113901474A (en) * 2021-09-13 2022-01-07 四川大学 Vulnerability detection method based on function-level code similarity
CN114880674A (en) * 2022-04-28 2022-08-09 西安交通大学 Vulnerability detection method and system based on novel vulnerability fingerprint
CN114880674B (en) * 2022-04-28 2024-05-31 西安交通大学 Vulnerability detection method and system based on novel vulnerability fingerprint
CN115586920B (en) * 2022-12-13 2023-03-14 北京安普诺信息技术有限公司 Fragile code segment clone detection method and device, electronic equipment and storage medium
CN115586920A (en) * 2022-12-13 2023-01-10 北京安普诺信息技术有限公司 Fragile code segment clone detection method and device, electronic equipment and storage medium
CN117873905A (en) * 2024-03-11 2024-04-12 北京安普诺信息技术有限公司 Method, device, equipment and medium for code homology detection
CN117873905B (en) * 2024-03-11 2024-05-31 北京安普诺信息技术有限公司 Method, device, equipment and medium for code homology detection

Also Published As

Publication number Publication date
CN107688748B (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN107688748B (en) Fragility Code Clones detection method and its device based on loophole fingerprint
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
CN109445834B (en) Program code similarity rapid comparison method based on abstract syntax tree
Silva et al. Refdiff: detecting refactorings in version histories
Prete et al. Template-based reconstruction of complex refactorings
Garcés et al. Managing model adaptation by precise detection of metamodel changes
US8312440B2 (en) Method, computer program product, and hardware product for providing program individuality analysis for source code programs
CN102339252B (en) Static state detecting system based on XML (Extensive Makeup Language) middle model and defect mode matching
CN109684838B (en) Static code auditing system and method for Ether house intelligent contract
EP3679481A1 (en) Automating generation of library suggestion engine models
US20120158625A1 (en) Creating and Processing a Data Rule
CN109375899A (en) A kind of method of formal verification Solidity intelligence contract
CN106537332A (en) Systems and methods for software analytics
CN106503496A (en) Replaced and the Python shell script anti-reversal methods for merging based on operation code
CN114297654A (en) Intelligent contract vulnerability detection method and system for source code hierarchy
CN108647025A (en) Processing method and processing device, electronics and the storage device of DOM Document Object Model interior joint
CN110515838A (en) Method and system for detecting software defects based on topic model
CN112131120B (en) Source code defect detection method and device
LU503512B1 (en) Operating method for construction of knowledge graph based on naming rule and caching mechanism
Corazza et al. A tree kernel based approach for clone detection
CN116305158A (en) Vulnerability identification method based on slice code dependency graph semantic learning
Haitzer et al. DSL-based support for semi-automated architectural component model abstraction throughout the software lifecycle
CN110737469A (en) Source code similarity evaluation method based on semantic information on functional granularities
Antoniol et al. Towards the integration of versioning systems, bug reports and source code meta-models
Gall Archview-analyzing evolutionary aspects of complex software systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant