CN117171759A - Vulnerability detection method and device based on clone codes and computer equipment - Google Patents

Vulnerability detection method and device based on clone codes and computer equipment Download PDF

Info

Publication number
CN117171759A
CN117171759A CN202311118877.6A CN202311118877A CN117171759A CN 117171759 A CN117171759 A CN 117171759A CN 202311118877 A CN202311118877 A CN 202311118877A CN 117171759 A CN117171759 A CN 117171759A
Authority
CN
China
Prior art keywords
code
vulnerability
fingerprint
function
patch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311118877.6A
Other languages
Chinese (zh)
Inventor
钱方
陈强
李林城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Super High Transmission Co of China South Electric Net Co Ltd
Original Assignee
Super High Transmission Co of China South Electric Net Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Super High Transmission Co of China South Electric Net Co Ltd filed Critical Super High Transmission Co of China South Electric Net Co Ltd
Priority to CN202311118877.6A priority Critical patent/CN117171759A/en
Publication of CN117171759A publication Critical patent/CN117171759A/en
Pending legal-status Critical Current

Links

Abstract

The application relates to a vulnerability detection method and device based on clone codes and computer equipment. The method comprises the following steps: obtaining vulnerability information and code segments corresponding to the vulnerability information; the code segments comprise historical version code segments and patch version code segments; extracting a function in the code segment, and extracting contrast characteristics based on the function; generating code fingerprints according to the contrast characteristics, and storing the code fingerprints in a vulnerability code fingerprint library; and acquiring a target software code, performing target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to a matching result. By adopting the method, the accuracy of detecting the loopholes of the clone codes can be improved.

Description

Vulnerability detection method and device based on clone codes and computer equipment
Technical Field
The present application relates to the field of network security technologies, and in particular, to a vulnerability detection method, device and computer equipment based on clone codes.
Background
Software vulnerabilities are the root cause of various security risks. Once a vulnerability is exploited by a malicious attack, the security of the system will be greatly compromised and catastrophic loss may result. With the increasing size of Open Source Software (OSS) ecosystems, open source software-based code cloning, code segments copied and pasted within or between software systems, is also increasing. Although code cloning can accelerate the software development process, it often severely impacts the security of software because vulnerabilities are easily propagated through code cloning. As OSS grows, so too does these vulnerable code clones, potentially contaminating many systems. Although researchers have been trying to detect code clones for decades, the possible presence of vulnerabilities in code is judged by detecting clone codes that are subject to vulnerabilities.
The main method at present is to detect the bug codes given in the bug library, and the method of matching the bug code fingerprints has two problems, namely, the matching failure is caused by the condition of code confusion, and the false alarm is caused by the fact that the code bug is patched. Subsequent researchers have proposed using a method of combining code context and patch fingerprint to detect cloned vulnerability codes, but still fail to address some vulnerability inheritance problems caused by old code branch clones, in which case patch fingerprint often does not align well to cloned codes on old version code branches, resulting in detection method failure.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a clone code-based vulnerability detection method, apparatus, and computer device that can improve detection accuracy.
In a first aspect, the present application provides a vulnerability detection method based on a clone code. The method comprises the following steps:
obtaining code segments corresponding to the vulnerability information; the code segments comprise historical version code segments and patch version code segments;
extracting function codes in the code fragments, and extracting contrast characteristics based on the function codes;
Generating code fingerprints according to the comparison characteristics, and storing the code fingerprints in a vulnerability code fingerprint library;
and acquiring target software codes, performing target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to the matching result.
In one embodiment, extracting function code in a code segment includes:
and searching the code fragments, and respectively extracting the code fragments of the historical version code vulnerability function, the patch version code vulnerability function and the corresponding function after patching.
In one embodiment, performing contrast feature extraction based on function code includes:
extracting code lines which are deleted in the code fragments of the patch version from the code fragments and exist in both the code vulnerability function of the historical version and the code vulnerability function of the patch version, and obtaining vulnerability code lines;
extracting code lines which are added in the code fragments of the patch version from the code fragments and are not existed in the code vulnerability functions of the historical version and the patch version, and obtaining patch code lines;
extracting code lines which have a direct control and data dependency relationship with a historical version code vulnerability function and a patch version code vulnerability function from the code fragments, and acquiring code lines on which the vulnerability depends, wherein the code lines exist in the historical version code vulnerability function and the patch version code vulnerability function;
Extracting code lines with direct control and data dependency relationship with the code segments of the corresponding function after patching from the code segments of the corresponding function after patching, and obtaining code lines on which the patches depend;
and extracting the conditional statement directly related to the control flow from the function entry to the vulnerability code line from the code segment, and obtaining the control flow path code line.
In one embodiment, generating a code fingerprint from the comparison features and saving the code fingerprint in the vulnerability code fingerprint library comprises:
normalizing the contrast characteristic;
obtaining a vulnerability fingerprint according to the standardized vulnerability code row, the vulnerability dependent code row and the control flow path code row, and obtaining a patch fingerprint according to the standardized patch code row and the patch dependent code row; both the vulnerability fingerprint and the patch fingerprint are represented by a string length and a hash value; and storing the vulnerability fingerprints, the patch fingerprints and the contrast characteristics in a vulnerability code fingerprint library.
In one embodiment, obtaining the target software code, and performing target software code matching based on the vulnerability code fingerprint library comprises:
retrieving target software codes, extracting each function code segment in the target software codes, and obtaining target function codes; the object code fingerprint is represented by a string length and a hash value;
Normalizing the objective function code;
generating an object code fingerprint according to the standardized object function code;
extracting vulnerability fingerprints and patch fingerprints in the same language as the target software code from a vulnerability code fingerprint library;
and matching the target code fingerprint with the extracted vulnerability fingerprint and patch fingerprint to obtain a matching result.
In one embodiment, matching the target code fingerprint with the extracted vulnerability fingerprint and patch fingerprint, the obtaining a matching result includes:
matching based on the character string length, and acquiring a first fingerprint according to the vulnerability fingerprint and the patch fingerprint which have the same character string length as the target code fingerprint;
matching the target code fingerprint with the first fingerprint based on the hash value, and if the target function code meets three preset conditions at the same time, successfully matching; three preset conditions are: all code lines in the object code fingerprint are contained in the vulnerability fingerprint, none of the code lines in the object code fingerprint are contained in the patch fingerprint, and the grammar of the object function code is similar to the historical version code vulnerability function or the patch version code vulnerability function by more than a threshold value.
In a second aspect, the application further provides a vulnerability detection device based on the clone codes. The device comprises:
The acquisition module is used for acquiring the vulnerability information and the code fragments corresponding to the vulnerability information; the code segments comprise historical version code segments and patch version code segments;
the comparison feature extraction module is used for extracting function codes in the code fragments and extracting comparison features based on the function codes;
the fingerprint library generating module is used for generating code fingerprints according to the comparison characteristics and storing the code fingerprints in the vulnerability code fingerprint library;
and the matching detection module is used for acquiring the target software code, carrying out target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to the matching result.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory and a processor, the memory stores a computer program, and the processor executes the computer program to realize the following steps:
obtaining code segments corresponding to the vulnerability information; the code segments comprise historical version code segments and patch version code segments;
extracting function codes in the code fragments, and extracting contrast characteristics based on the function codes;
generating code fingerprints according to the comparison characteristics, and storing the code fingerprints in a vulnerability code fingerprint library;
And acquiring target software codes, performing target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to the matching result.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
obtaining code segments corresponding to the vulnerability information; the code segments comprise historical version code segments and patch version code segments;
extracting function codes in the code fragments, and extracting contrast characteristics based on the function codes;
generating code fingerprints according to the comparison characteristics, and storing the code fingerprints in a vulnerability code fingerprint library;
and acquiring target software codes, performing target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to the matching result.
In a fifth aspect, the present application also provides a computer program product. Computer program product comprising a computer program which, when executed by a processor, realizes the steps of:
obtaining code segments corresponding to the vulnerability information; the code segments comprise historical version code segments and patch version code segments;
Extracting function codes in the code fragments, and extracting contrast characteristics based on the function codes;
generating code fingerprints according to the comparison characteristics, and storing the code fingerprints in a vulnerability code fingerprint library;
and acquiring target software codes, performing target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to the matching result.
According to the vulnerability detection method, device and computer equipment based on the clone codes, vulnerability information and code fragments corresponding to the vulnerability information are obtained; the code segments comprise historical version code segments and patch version code segments; extracting function codes in the code fragments, and extracting contrast characteristics based on the function codes; generating code fingerprints according to the comparison characteristics, and storing the code fingerprints in a vulnerability code fingerprint library; and acquiring target software codes, performing target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to the matching result. When the vulnerability information and the code fragments corresponding to the vulnerability information are acquired, vulnerability propagation using the historical version codes can be captured, namely, the code version of the release vulnerability and the historical version codes are changed, so that the problem that the vulnerability of software cloned by using the historical version codes is difficult to find without using the version codes corresponding to the release vulnerability can be solved. In addition, by combining the history version code feature, the vulnerability version code feature and the patch feature, it is possible to find vulnerability code fragments modified by a number of times of propagation through some variable names, function names, and the like, and reduce the detection result of false positives. The method is applicable to code detection of various languages and has strong expandability and universality.
Drawings
FIG. 1 is an application environment diagram of a clone code based vulnerability detection method in one embodiment;
FIG. 2 is a flow diagram of a method of vulnerability detection based on cloned code in one embodiment;
FIG. 3 is a flow chart illustrating a method of detecting vulnerabilities based on clone codes according to one embodiment;
FIG. 4 is a block diagram of a vulnerability detection method application system based on clone codes in one embodiment;
FIG. 5 is a block diagram of a clone code based vulnerability detection apparatus in one embodiment;
fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The vulnerability detection method based on the clone codes, provided by the embodiment of the application, can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, a vulnerability detection method based on clone codes is provided, and the method is applied to the terminal 102 in fig. 1 for illustration, and includes the following steps:
step 202, obtaining vulnerability information and code segments corresponding to the vulnerability information; the code fragments include historical versions of code fragments and patch versions of code fragments.
The present embodiment periodically retrieves vulnerability information and corresponding code fragments from an open source software platform, such as the gitub, based on a given open source software list, and downloads these codes, and the codes containing the same logic in their earliest version, to local, obtaining code fragments of historical version and code fragments of patch version. Such as CVE-xxx release, then vulnerabilities and patches in the corresponding open source software are obtained from the released information.
The historical version code is the code released in the earliest stage. And updating the historical version code for a plurality of times to acquire the code of the latest version, namely the code segment of the patch version. The code segments of the patch version disclose the existing vulnerabilities and provide patches for the vulnerabilities.
Step 204, extracting the function codes in the code segments, and performing contrast feature extraction based on the function codes.
The code segments realize a certain function by calling a series of functions, the called functions are extracted from the code segments, and the distinction between the bug codes and the patch codes is extracted based on the functions, so that the comparison characteristics are obtained.
Step 206, comparing the features to generate code fingerprints, and storing the code fingerprints in a vulnerability code fingerprint library.
The code fingerprint is a flag that confirms the identity of the code by the string length and hash value. By converting the contrast characteristic into the code fingerprint, the code identity can be confirmed in the limited time and limited resources, and the subsequent code matching speed is improved.
And step 208, acquiring target software codes, performing target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to the matching result.
And matching the target software code with the vulnerability code fingerprint library, judging whether the target software code is a patch code or a vulnerability code, and further judging whether the target software code has a vulnerability.
According to the embodiment, the code fragments of the historical version and the code fragments of the patch version aiming at the loopholes are introduced on the basis of traditional clone code detection, the code fragments of the historical version and the code fragments of the patch version are subjected to comparison feature extraction, differences among the code fragments of different versions are focused, and matching is carried out on the differences and the target software codes, so that the false positive condition that the introduced clone codes are misreported as the loopholes after being patched is effectively avoided, the source code branch version with the loopholes at earlier stage is traced, and the accuracy of loophole detection is greatly improved.
When the vulnerability information and the code segments corresponding to the vulnerability information are acquired, vulnerability propagation using the historical version codes can be captured, namely, the problem that the vulnerability cloned by using the historical version codes is difficult to find without using the new version codes due to the fact that the new version codes are changed from the historical version codes can be solved. In addition, by combining the history version code feature, the new version code feature and the patch feature, it is possible to find a bug code segment modified by a number of times of propagation through some variable names, function names, etc., and reduce the detection result of false positives. The method is applicable to code detection of various languages and has strong expandability and universality.
In one embodiment, extracting function code in a code segment includes: and searching the code fragments, and respectively extracting the code fragments of the historical version code vulnerability function, the patch version code vulnerability function and the corresponding function after patching.
The historical version code vulnerability function, the patch version code vulnerability function and the code fragments of the corresponding functions after patching are all codes. The code fragments of the historical version are the codes of the earliest version, and the code vulnerability function of the historical version is the function corresponding code fragments with vulnerabilities in the code fragments of the historical version; the code segment of the patch version refers to the code of the version published with the vulnerability, and the patch version code vulnerability function is the function corresponding code segment of the public vulnerability in the code segment of the patch version; the corresponding code after the patch refers to that the patch code is disclosed on the basis of the code fragment of the patch version, and some codes are added and some codes are deleted on the basis of the code fragment of the patch version.
The present embodiment extracts functions from code fragments by a syntax parsing engine.
And extracting function codes from the code fragments based on the functions so as to facilitate the subsequent comparison feature extraction aiming at loopholes and corresponding patches of the functions, limit an effective comparison range and improve the comparison efficiency.
In one embodiment, performing contrast feature extraction based on function code includes: extracting code lines which are deleted in the code fragments of the patch version from the code fragments and exist in both the code vulnerability function of the historical version and the code vulnerability function of the patch version, and obtaining vulnerability code lines; extracting code lines which are added in the code fragments of the patch version from the code fragments and are not existed in the code vulnerability functions of the historical version and the patch version, and obtaining patch code lines; extracting code lines which have a direct control and data dependency relationship with a historical version code vulnerability function and a patch version code vulnerability function from the code fragments, and acquiring code lines on which the vulnerability depends, wherein the code lines exist in the historical version code vulnerability function and the patch version code vulnerability function; extracting code lines of the function corresponding to the patched code segments, which have direct control and data dependency relationship, from the code segments, and obtaining code lines of the patched code; and extracting the conditional statement directly related to the control flow from the function entry to the vulnerability code line from the code segment, and obtaining the control flow path code line.
In one embodiment, generating a code fingerprint from the contrast features and saving the code fingerprint in the vulnerability code fingerprint library comprises: normalizing the contrast characteristic; obtaining a vulnerability fingerprint according to the standardized vulnerability code row, the vulnerability dependent code row and the control flow path code row, and obtaining a patch fingerprint according to the standardized patch code row and the patch dependent code row; both the vulnerability fingerprint and the patch fingerprint are represented by a string length and a hash value; and storing the vulnerability fingerprints, the patch fingerprints and the contrast characteristics in a vulnerability code fingerprint library.
Wherein the code fingerprint, i.e. the code represented by the string length and hash value, is a representation of the code.
Wherein the contrast feature normalization is normalized in units of rows. The method specifically comprises the steps of replacing an input parameter of a function with IN_PARAM, replacing a LOCAL variable defined and used IN the function with local_VAR, replacing a constant with CONST, replacing a macro-defined GLOBAL variable used IN the function with GLOBAL_VAR, replacing a data type IN the function with DTYPE, replacing a function CALL IN the function with CALL_FUNC, replacing all characters with lowercase, and removing a space.
The contrast features are divided into two categories, one category is a feature containing vulnerability information and the other category is a feature containing patch information. And respectively converting the two types of features into a vulnerability fingerprint and a patch fingerprint, and performing subsequent matching tasks in a fingerprint mode, thereby improving the matching efficiency.
In one embodiment, obtaining the target software code, performing target software code matching based on the vulnerability code fingerprint library comprises: retrieving target software codes, extracting each function code segment in the target software codes, and obtaining target function codes; normalizing the objective function code; generating an object code fingerprint according to the standardized object function code, wherein the object code fingerprint is represented by a character string length and a hash value; extracting vulnerability fingerprints and patch fingerprints in the same language as the target software code from a vulnerability code fingerprint library; and matching the target code fingerprint with the extracted vulnerability fingerprint and patch fingerprint to obtain a matching result.
The extraction target function code is the same as the extraction function in the above embodiment, and is also retrieved and extracted by the syntax analysis engine.
The normalization of the objective function code is the same as the normalization of the contrast feature in the above embodiment.
The object code fingerprint is generated in the same way as the loophole fingerprint and patch fingerprint are obtained in the above embodiment.
Before target code fingerprint matching, vulnerability fingerprints and patch fingerprints in the same language as the target software code need to be screened out, so that the subsequent matching range is reduced, and the matching efficiency is improved.
And matching the target code fingerprint with the screened vulnerability fingerprint and patch fingerprint to obtain a matching result.
The aim of the embodiment is to convert the target software code to be detected into a fingerprint form so as to be matched with the vulnerability code fingerprint library, and ensure that the target software code and the vulnerability code can be matched with each other in the fingerprint form.
In one embodiment, matching the object code fingerprint with the extracted vulnerability fingerprint and patch fingerprint, obtaining a matching result comprises: matching based on the character string length, and acquiring a first fingerprint according to the vulnerability fingerprint and the patch fingerprint which have the same character string length as the target code fingerprint; matching the target code fingerprint with the first fingerprint based on the hash value, and if the target function code meets three preset conditions at the same time, successfully matching; three preset conditions are: all code lines in the object code fingerprint are contained in the vulnerability fingerprint, none of the code lines in the object code fingerprint are contained in the patch fingerprint, and the grammar of the object function code is similar to the historical version code vulnerability function or the patch version code vulnerability function by more than a threshold value.
The similarity calculation adopts a Jaccard algorithm, a threshold value is set, and when the calculation result of the similarity is larger than the threshold value, the preset condition is considered to be met.
And performing preliminary screening through the character string length, removing the vulnerability fingerprints and patch fingerprints which are different from the object code fingerprint character string length, reserving the vulnerability fingerprints and patch fingerprints which are the same as the object code fingerprint character string length, and further reducing the matching range to obtain the first fingerprint. The first fingerprint is a set of vulnerability fingerprints and patch fingerprints having the same string length as the object code fingerprint.
The objective function code and the first fingerprint are matched based on the hash value. The hash values are different if the general codes are different. It can be determined whether the code identical to the code of the objective function exists in the first fingerprint by matching the hash value.
Through hash value matching, when the objective function code simultaneously meets three preset conditions, matching is successful, and vulnerability information corresponding to vulnerability fingerprints of all code lines comprising the objective function code can be output. When the objective function code does not meet any one of the three preset conditions, the matching fails, and a detection result that the existence of the loophole of the objective software code is not detected is output.
As shown in FIG. 3, an application system for clone code based vulnerability detection in one embodiment is illustrated. The invention realizes the vulnerability discovery method for detecting the clone codes based on the comparison characteristics. The method can be used for constructing a vulnerability code fingerprint library of open source software by comparing an old version with a new version corresponding to a patch, extracting comparison characteristics, carrying out standardized processing on the source code, and then respectively carrying out fingerprint extraction on a code structure and code semantics, and realizing rapid vulnerability mining on the code of target software based on the vulnerability code fingerprint library.
In this embodiment, the application system includes an OSS (Object Storage Service ) vulnerability code fingerprint library, a clone code detection module, a code fingerprint calculation module, a code standardization module, an OSS vulnerability code collection module, and a software code scanning module.
The OSS vulnerability code collection module is responsible for obtaining code fragments from an open source software platform based on an open source software list. The code fragments include vulnerability information and corresponding patch code, and these codes and their earliest versions contain the same logic in the code fragments and download them locally.
The code standardization module is responsible for carrying out standardization processing on codes, including replacing general function calls, variable names and variable data types, uniformly converting the functions into lowercase, removing notes, line-wrapping symbols and the like.
The code fingerprint calculation module is responsible for calculating code fingerprints, submitting the generated code fingerprints to an OSS vulnerability code fingerprint library by using a contrast characteristic method, and submitting the fingerprints generated by the target software code to be detected to the clone code detection module.
The OSS vulnerability code fingerprint library is responsible for storing vulnerability code fingerprints and original code fragments corresponding to the vulnerability code fingerprints, and provides inquiry and extraction interfaces of the code fingerprints.
The software code scanning module is responsible for acquiring the target software code.
And the clone code detection module uses code fingerprints submitted by target software to be detected to carry out fingerprint matching in an OSS vulnerability code fingerprint library, and if the code fingerprints are matched with corresponding fingerprints, the vulnerability of the code is reported.
In one embodiment, as shown in FIG. 4, the clone code based vulnerability detection method comprises the steps of:
1. source code is obtained from a vulnerability library. And periodically detecting vulnerability release information of the software from the open source software platform to obtain vulnerability information and corresponding code fragments, wherein the code fragments comprise code fragments of historical versions and code fragments of patch versions. And storing the code fragments in a vulnerability database, and directly acquiring source codes from the vulnerability database when vulnerability detection is required.
2. And (5) extracting a function. And searching the functions from the given code fragments through a grammar analysis engine, and respectively extracting a historical version code vulnerability function fo, a patch version code vulnerability function fd and a code fragment fp of the corresponding function after patching.
3. Contrast feature extraction, comprising:
3.1. vulnerability code line extraction, extracting code lines deleted in code fragments of patch version and existing in both fo and fd, formally expressed as
3.2. Patch code line extraction, extracting code lines added in code fragments of patch version and not existing in both fo and fd, formally expressed as
3.3. Extracting code lines with loophole dependency, extracting code lines with direct control and data dependency relationship with fo and fd and existing in both fo and fd, and formalizing a tableShown as
3.4. Patch-dependent code line extraction, extracting code lines with direct control and data dependence relationship with fp, formalized representation as
3.5. Control flow path code line extraction, extracting a control flow directly related conditional statement F from a function entry to a vulnerability code line l
4. Code normalization. The code is standardized according to rows, the input reference name of the function is replaced by IN_PARAM, the LOCAL variable defined and used IN the function is replaced by local_VAR, the constant is replaced by CONST, the macro-defined GLOBAL variable used IN the function is replaced by global_VAR, the data type IN the function is replaced by DTYPE, the function CALL IN the function is replaced by CALL_FUNC, all characters are replaced by lowercase, and spaces are removed.
5. And generating a code fingerprint. Generating a vulnerability fingerprint and a patch fingerprint respectively, wherein the vulnerability fingerprint is V s =(V l ,V D ,F l ) Patch fingerprint P s =(P l ,PD l ) And calculating the length and hash value of the character string after code standardization, generating a key value pair, and taking the character string length as key and the hash value as value. V herein s And P s Are fingerprints in a plurality of rows.
6. And (6) warehousing the code fingerprints. And saving the vulnerability fingerprints and the patch fingerprints into a vulnerability code fingerprint library.
7. And acquiring target software codes. The code file is read from the target software.
8. And extracting the target software code function. The function is retrieved from the given target software code by a grammar parsing engine, and each function code segment is extracted to obtain the target function code.
9. Code normalization. The same normalization operation as in step 4 is performed.
10. And generating a code fingerprint. And (3) calculating the length and hash value of the character string after code standardization according to the standardization result of the step (9), generating a key value pair, and taking the character string length as a key and the hash value as a value.
11. A vulnerability code fingerprint library. And taking out the loophole fingerprints and patch fingerprints in the same language as the target software from the loophole code fingerprint library.
12. Code fingerprint matching. Firstly, inquiring based on keys, finding code hash values of code character strings with the same length, then matching based on the hash values, and judging whether the matching is successful or not according to three preset conditions:
condition 1, all code lines of the object function code are contained in V s In (a) and (b);
condition 2, no code line of the object function code is contained in P s In (a) and (b);
in condition 3, the grammar of the objective function code needs to be similar to fo or fd, wherein the similarity is calculated by using a Jaccard algorithm, a threshold value is set, and when the similarity calculation result is larger than the threshold value, the condition 3 is considered to be satisfied.
If and only if the objective function code satisfies three preset conditions simultaneously, the matching is successful.
13. And generating a matching result. And returning the matching result, and outputting a detection result according to the matching result.
In one embodiment, the detection method is described using the detection of a specific vulnerability as an example. The method comprises the following steps:
1. the method comprises the steps of monitoring update information issued on the gitsub in real time, detecting a keyword CVE-, retrieving IDs of new and old files corresponding to patch submission, acquiring the files, and then finding patched code fragments through file comparison. In this embodiment, a piece of CVE-2016-8654 vulnerability code is used, where there is a Jasper heap buffer overflow vulnerability.
2. Extracting a function body in the code, wherein the extracted function body is as follows:
the CVE-2016-8654 loopholes release corresponding codes and patch codes as follows:
4. respectively extracting V l ,VD l ,F l Wherein V is l Is "hstartcol= (numrows+1-parity)>>1;”,VD l Is "if (numrows)>=2){srcptr=&a[(1-parity)*stride];”,f l Is "if (bufsize) >QMFB SPLITBUFSIZE){”。
5. Separate extraction of P l ,PD l Wherein P is l Is "hstartcol= (numrows+1-parity)>>1;m=numrows–hstartcol;”,PD l Is "if (numrows)>=2){srcptr=&a[(1-parity)*stride];”。
6. Code is standardized to obtain V l For "local_var= (in_param+1-in_param) > 1; ", VD l Is "if (in_param > =const) {", "local_var =&LOCAL_VAR[(1-IN_PARAM)*LOCAL_VAR];”,F l Is "if (local_VAR > global_VAR) {", P l For "local_var= (in_param+1-in_param) > 1; "local_var=in_param-in_param; ", PD l Is "if (in_param > =const) {", "local_var =&LOCAL_VAR[(1-IN_PARAM)*LOCAL_VAR];”。
7. Code fingerprints are generated based on standardized character strings, and the vulnerability fingerprints are V s =(V l ,VD l ,F l ) Patch fingerprint P s =(P l ,PD l )。
8. And saving the vulnerability fingerprints and the patch fingerprints in a vulnerability code fingerprint library.
9. The rewritten target software code segment is obtained from the software source code as follows:
10. the code segment from the function is modified using the old version of the code, after normalization and generation of the code fingerprint, a threshold of 0.5 is set, and the Jaccard algorithm is used to match the similarity of the code fingerprint to be detected and the vulnerability CVE-2016-8654 code fingerprint.
11. If the result is matched with the fingerprint information, the vulnerability CVE-2016-8654 can be found in the software code.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a vulnerability detection device based on the clone code for realizing the vulnerability detection method based on the clone code. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the device for detecting a vulnerability based on a clone code provided below may be referred to the limitation of the method for detecting a vulnerability based on a clone code hereinabove, and will not be described herein.
In one embodiment, as shown in fig. 5, there is provided a vulnerability detection apparatus based on clone codes, including: an acquisition module 502, a contrast feature extraction module 504, a fingerprint library generation module 506, and a match detection module 508, wherein:
the obtaining module 502 is configured to obtain vulnerability information and a code segment corresponding to the vulnerability information; the code fragments include historical versions of code fragments and patch versions of code fragments.
The contrast feature extraction module 504 is configured to extract a function code in the code segment, and perform contrast feature extraction based on the function code.
The fingerprint library generating module 506 is configured to generate a code fingerprint according to the comparison feature, and store the code fingerprint in the vulnerability code fingerprint library.
And the matching detection module 508 is used for acquiring the target software code, performing target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to the matching result.
The contrast feature extraction module 504 is further configured to retrieve code segments, and extract code segments of a historical version code vulnerability function, a patch version code vulnerability function, and a corresponding function after patching, respectively.
The contrast feature extraction module 504 is further configured to extract, from the code fragments, code lines that are deleted in the code fragments of the patch version and that exist in both the code vulnerability function of the historical version and the code vulnerability function of the patch version, and obtain vulnerability code lines; extracting code lines which are added in the code fragments of the patch version from the code fragments and are not existed in the code vulnerability functions of the historical version and the patch version, and obtaining patch code lines; extracting code lines which have a direct control and data dependency relationship with a historical version code vulnerability function and a patch version code vulnerability function from the code fragments, and acquiring code lines on which the vulnerability depends, wherein the code lines exist in the historical version code vulnerability function and the patch version code vulnerability function; extracting code lines of the function corresponding to the patched code segments, which have direct control and data dependency relationship, from the code segments, and obtaining code lines of the patched code; and extracting the conditional statement directly related to the control flow from the function entry to the vulnerability code line from the code segment, and obtaining the control flow path code line.
The fingerprint library generating module 506 is further configured to normalize the comparison feature; obtaining a vulnerability fingerprint according to the standardized vulnerability code row, the vulnerability dependent code row and the control flow path code row, and obtaining a patch fingerprint according to the standardized patch code row and the patch dependent code row; both the vulnerability fingerprint and the patch fingerprint are represented by a string length and a hash value; and storing the vulnerability fingerprints, the patch fingerprints and the contrast characteristics in a vulnerability code fingerprint library.
The fingerprint library generating module 506 is further configured to retrieve target software codes, extract each function code segment in the target software codes, and obtain target function codes; normalizing the objective function code; generating an object code fingerprint according to the standardized object function code; the object code fingerprint is represented by a string length and a hash value; extracting vulnerability fingerprints and patch fingerprints in the same language as the target software code from a vulnerability code fingerprint library; and matching the target code fingerprint with the extracted vulnerability fingerprint and patch fingerprint to obtain a matching result.
The matching detection module 508 is further configured to perform matching based on the string length, and obtain a first fingerprint according to a vulnerability fingerprint and a patch fingerprint that have the same length as the target code fingerprint; matching the target code fingerprint with the first fingerprint based on the hash value, and if the target function code meets three preset conditions at the same time, successfully matching; three preset conditions are: all code lines in the object code fingerprint are contained in the vulnerability fingerprint, none of the code lines in the object code fingerprint are contained in the patch fingerprint, and the grammar of the object function code is similar to the historical version code vulnerability function or the patch version code vulnerability function by more than a threshold value.
The above-described modules in the clone-code based vulnerability detection apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store clone code data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a clone code based vulnerability detection method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing all the method embodiments described above when executing the computer program.
In one embodiment, a computer readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, implements all of the method embodiments described above.
In an embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, implements all the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A vulnerability detection method based on clone codes, the method comprising:
obtaining vulnerability information and code segments corresponding to the vulnerability information; the code segments comprise historical version code segments and patch version code segments;
extracting function codes in the code segments, and extracting contrast characteristics based on the function codes;
Generating code fingerprints according to the contrast characteristics, and storing the code fingerprints in a vulnerability code fingerprint library;
and acquiring a target software code, performing target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to a matching result.
2. The method of claim 1, wherein the extracting the function code in the code segment comprises:
and searching the code fragments, and respectively extracting the code fragments of the historical version code vulnerability function, the patch version code vulnerability function and the corresponding function after patching.
3. The method of claim 2, wherein the comparing feature extraction based on the function code comprises:
extracting code lines which are deleted in the code fragments of the patch version from the code fragments, and acquiring vulnerability code lines in both the historical version code vulnerability function and the patch version code vulnerability function;
extracting code lines which are added in the code fragments of the patch version from the code fragments and are not existed in the code vulnerability functions of the historical version and the patch version, and obtaining patch code lines;
Extracting code lines which have direct control and data dependency relationship with the historical version code vulnerability function and the patch version code vulnerability function from the code segments, and acquiring code lines of vulnerability dependency from the historical version code vulnerability function and the patch version code vulnerability function;
extracting code lines with direct control and data dependency relations of the code segments of the functions corresponding to the patched code segments from the code segments, and obtaining code lines on which patches depend;
and extracting a conditional statement directly related to a control flow from a function entry to the vulnerability code line from the code segment, and obtaining a control flow path code line.
4. A method as claimed in claim 3, wherein said generating a code fingerprint from said comparison features and saving said code fingerprint in a vulnerability code fingerprint library comprises:
normalizing the contrast features;
obtaining a vulnerability fingerprint according to the standardized vulnerability code row, the vulnerability dependent code row and the control flow path code row, and obtaining a patch fingerprint according to the standardized patch code row and the patch dependent code row; the vulnerability fingerprint and the patch fingerprint are both represented by a string length and a hash value; and storing the vulnerability fingerprint, the patch fingerprint and the contrast characteristic in the vulnerability code fingerprint library.
5. The method of claim 1, wherein the obtaining the target software code, the matching the target software code based on the vulnerability code fingerprint library comprises:
retrieving the target software codes, extracting each function code segment in the target software codes, and obtaining target function codes;
normalizing the objective function code;
generating an object code fingerprint according to the standardized object function code; the target code fingerprint is represented by a string length and a hash value;
extracting the vulnerability fingerprints and the patch fingerprints in the same language as the target software code from the vulnerability code fingerprint library;
and matching the target code fingerprint with the extracted vulnerability fingerprint and the patch fingerprint to obtain a matching result.
6. The method of claim 5, wherein said matching the object code fingerprint with the extracted vulnerability fingerprint and patch fingerprint, obtaining a matching result comprises:
matching based on the character string length, and acquiring a first fingerprint according to the vulnerability fingerprint and the patch fingerprint which have the same character string length as the target code fingerprint;
Matching the target code fingerprint with the first fingerprint based on the hash value, and if the target function code meets three preset conditions at the same time, successful matching is achieved; the three preset conditions are: all code lines in the object code fingerprint are contained in the vulnerability fingerprint, none of the code lines in the object code fingerprint are contained in the patch fingerprint, and a grammar of the object function code is similar to the historical version code vulnerability function or the patch version code vulnerability function by more than a threshold.
7. A clone code based vulnerability detection apparatus, the apparatus comprising:
the acquisition module is used for acquiring the vulnerability information and the code segments corresponding to the vulnerability information; the code segments comprise historical version code segments and patch version code segments;
the comparison feature extraction module is used for extracting function codes in the code fragments and extracting comparison features based on the function codes;
the fingerprint library generating module is used for generating code fingerprints according to the contrast characteristics and storing the code fingerprints in a vulnerability code fingerprint library;
and the matching detection module is used for acquiring target software codes, carrying out target software code matching based on the vulnerability code fingerprint library, and generating a detection result according to the matching result.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202311118877.6A 2023-08-31 2023-08-31 Vulnerability detection method and device based on clone codes and computer equipment Pending CN117171759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311118877.6A CN117171759A (en) 2023-08-31 2023-08-31 Vulnerability detection method and device based on clone codes and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311118877.6A CN117171759A (en) 2023-08-31 2023-08-31 Vulnerability detection method and device based on clone codes and computer equipment

Publications (1)

Publication Number Publication Date
CN117171759A true CN117171759A (en) 2023-12-05

Family

ID=88935925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311118877.6A Pending CN117171759A (en) 2023-08-31 2023-08-31 Vulnerability detection method and device based on clone codes and computer equipment

Country Status (1)

Country Link
CN (1) CN117171759A (en)

Similar Documents

Publication Publication Date Title
US11783029B2 (en) Methods and apparatus to improve feature engineering efficiency with metadata unit operations
US11693962B2 (en) Malware clustering based on function call graph similarity
JP6427592B2 (en) Manage data profiling operations related to data types
US10409980B2 (en) Real-time representation of security-relevant system state
CN111310178B (en) Firmware vulnerability detection method and system in cross-platform scene
EP3899770A1 (en) System and method for detecting data anomalies by analysing morphologies of known and/or unknown cybersecurity threats
US20180083770A1 (en) Detecting encoding attack
US11586735B2 (en) Malware clustering based on analysis of execution-behavior reports
CN113901474B (en) Vulnerability detection method based on function-level code similarity
CN113486350B (en) Method, device, equipment and storage medium for identifying malicious software
US20210334371A1 (en) Malicious File Detection Technology Based on Random Forest Algorithm
US20230254326A1 (en) System and Method for Information Gain for Malware Detection
CN113961768B (en) Sensitive word detection method and device, computer equipment and storage medium
US11487876B1 (en) Robust whitelisting of legitimate files using similarity score and suspiciousness score
CN117940894A (en) System and method for detecting code clones
CN111858467B (en) File data processing method, device, equipment and medium based on artificial intelligence
CN117171759A (en) Vulnerability detection method and device based on clone codes and computer equipment
CN112287952A (en) Virus clustering method, virus clustering device, storage medium and electronic device
TW201626279A (en) Protection method and computer system thereof
CN114547050A (en) Batch processing content duplication judging method, system, device, terminal equipment and storage medium
US20240095346A1 (en) Anomalous command line entry detection
US11822803B2 (en) Method, electronic device and computer program product for managing data blocks
US20230351017A1 (en) System and method for training of antimalware machine learning models
CN113806504B (en) Multi-dimensional report data calculation method and device and computer equipment
JP7092939B2 (en) Systems and methods for detecting data anomalies by morphological analysis of known and / or unknown cybersecurity threats

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination