CN114490313A - Security inspection defect detection method based on key semantic features - Google Patents

Security inspection defect detection method based on key semantic features Download PDF

Info

Publication number
CN114490313A
CN114490313A CN202111507955.2A CN202111507955A CN114490313A CN 114490313 A CN114490313 A CN 114490313A CN 202111507955 A CN202111507955 A CN 202111507955A CN 114490313 A CN114490313 A CN 114490313A
Authority
CN
China
Prior art keywords
statement
patch
critical
function
variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111507955.2A
Other languages
Chinese (zh)
Inventor
薄德芳
李丰
肖扬
俞晨东
许丽丽
卢志刚
霍玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202111507955.2A priority Critical patent/CN114490313A/en
Publication of CN114490313A publication Critical patent/CN114490313A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a security inspection defect detection method based on key semantic features, relates to the field of program analysis, and aims at the problem that security inspection defects in open-source large-scale software such as an operating system are difficult to detect and position by the conventional method.

Description

Security inspection defect detection method based on key semantic features
Technical Field
The invention relates to the field of program analysis, in particular to a software program-oriented security inspection defect detection technology, and specifically relates to a key semantic feature-based security inspection defect detection technology.
Background
Security check (security check) is a type of conditional branch statement used to check the condition or state of program execution. A security check flaw is a type of semantic error in a software program that results from the incorrect execution state or condition not being properly checked and processed. According to the difference of the security check defect repairing modes, the security check defects can be divided into four types, namely security check missing, checking position incorrect, checking itself incorrect (including checking predicate incorrect or corresponding processing logic incorrect after checking), and redundancy checking. Security check defects are prevalent in open source large-scale system software and base software, represented by operating system kernels. The software often receives untrusted inputs or performs error-prone complex tasks, resulting in a program often entering an erroneous state during the running process, and if the untrusted inputs or the erroneous state are not properly checked or correct processing logic is adopted, the software may be attacked by malicious code (e.g., stack overflow, etc.) or generate a fatal error (e.g., system crash, etc.) during the running process.
Currently, the detection technology for security inspection defects (such as CRIX USENIX, 2019) usually only focuses on the detection of security inspection defect types; the detection idea is to identify critical variables (critical variables) by identifying security checks and then detect security check defects based on the fact that the same critical variable should have substantially the same security check characteristics. The above-mentioned detection concept neglects the detection of critical operations (critical verbs), and is not only highly false-positive, but also not applicable to the detection of three types of security inspection defects including incorrect position of inspection, incorrect inspection itself, and redundant inspection.
Code similarity based defect detection techniques (e.g., MVP useenix, 2020) may detect unknown defects that have the same or similar code characteristics as the disclosed defects. Although the technology can also be used for detecting security check defects, the code features are too general and false alarms exist because the code features related to security check and other code features related to vulnerability function or execution path are not distinguished when the code features are extracted and compared.
Disclosure of Invention
The invention provides a method for detecting security inspection defects based on key semantic features, aiming at the problem that security inspection defects in open-source large-scale software such as an operating system are difficult to detect and position by the conventional method.
In order to achieve the purpose, the invention adopts the following technical scheme:
a security inspection defect detection method based on key semantic features comprises the following steps:
1) collecting patches related to safety inspection defects aiming at the target software to be tested;
2) obtaining the ID of the patch and the function pair before and after patch repair (F)v,Fp) And patch difference statement SdiffForming a triple group, and adding the triple group into a patch difference information list;
3) for the patch difference statement S in each triple in the patch difference informationdiffJudging each statement in the sentence, acquiring a true branch statement, marking the true branch statement as safety check, and adding the true branch statement into a safety check set;
4) integrating the safety check set to remove redundant safety check;
5) identifying each critical variable of each safety check in the safety check set, and adding the critical variables into a critical variable list; identifying whether the type of the critical variable is a state variable or a key use variable, and adding the type of the critical variable into a type list of the critical variable; forming a quadruple by the ID of the patch, the security check, the critical variable list and the type list of the critical variable, and adding the quadruple into the critical variable information set of the patch;
6) for each quadruple in the critical variable information set, extracting a repaired function F and a corresponding function F 'before repair according to the statement type of security check in the quadruple to obtain a function pair (F, F') with semantic characteristics; forming a triple group by the ID, the safety inspection sum (F, F') of the patch, and adding the triple group into a function pair information list of the patch for extracting semantic features;
7) obtaining a patch ID of each triple in the function pair information list and a quadruple in a critical variable information set corresponding to security check, and obtaining a key operation set of the repaired function F according to the type of each critical variable in the critical variable list in the quadruple;
8) for each critical operation CV in the set of critical operationsiFinding the corresponding critical operation CV in the function F' before patchingi'; if CV is not foundi', then will CViDeleting from the set of critical operations; aggregating CVs in the set of critical operationsiAnd corresponding CVi'pairwise pairing to form a key operation pair (CV) corresponding to the function pair (F, F')i,CVi') to join a set of key operation pairs;
9) for each critical operation pair (CV) in the set of critical operation pairsi,CVi') the function pair (F, F') where the patch is located, obtaining a statement set which has control dependence on the key operation according to the type of the critical variable, integrating the key operation and the statement set to obtain a key semantic feature statement set before and after patch repair
Figure BDA0003404956460000021
And
Figure BDA0003404956460000022
10) for each critical operation pair (CV) in the set of critical operation pairsi,CVi') the function pair (F, F') where the patch is located, acquiring a statement set which has data dependence with the key operation according to the type of the critical variable, integrating the key operation and the statement set to obtain a common semantic feature statement set before and after patching
Figure BDA0003404956460000023
11) Collecting the key semantic feature sentences before and after the patch is repaired
Figure BDA0003404956460000024
And a common semantic feature statement set
Figure BDA0003404956460000031
Respectively merging to obtain a triplet (S)vul,Sfix,Scontext) By patch ID, security check and triplet (S)vul,Sfix,Scontext) Composing functional characteristics of the triples;
12) normalizing the function characteristics and the source code of the target software to be tested to obtain the function characteristics (G) including the ID and G of the patchvul,Gfix,Gcontext) And the function feature set F of the target software to be testedTStoring the binary list into a characteristic database;
13) for the doublet in the feature database, FTEach function F oftAnd Gcontext
Figure BDA0003404956460000032
A comparison is made to find out the function that has a security check defect similar to the patch.
Further, the step of collecting the patches related to the security inspection defects in step 1) includes:
searching out a patch with a character description containing the keyword from a patch of a code managed library of the target software to be tested by taking a bug number, a bug type or words related to bug information as the keyword, or directly acquiring a related patch from a reference link of a historical bug of the target software to be tested;
screening out difference codes S before and after patches from obtained patchesdiffThe patch contains conditional branch statements, error codes for prompting exception or function statements related to exception handling, namely the patch related to the security inspection defect.
Further, the function pair (F) is obtained in the step 2)v,Fp) Comprises the following steps:
analyzing the head information of the patch, and identifying the file modified by the patch and the patch difference statement SdiffAnd SdiffThe row number of (c);
supplement at presentObtaining S from files before and after patchingdiffFunction F of the row numbervObtaining a pairing function F before or after patchingpForm a function pair (F)v,Fp)。
Further, the step of obtaining the true branch statement and marking as the safety check in the step 3) comprises:
to SdiffTraversing each statement, if the statement is a true branch of a conditional branch statement, marking the statement as security check and adding the statement into a security check set;
if the statement is a false branch of a conditional branch statement, a return error code or a function related to exception handling, acquiring all predecessors of a basic block where the statement is located on a control flow graph of the function, and traversing all the predecessors; each node of the control flow graph represents a basic block in the function, and the edge represents the transfer relation of the control flow;
if a predecessor ends with a true branch statement, its corresponding true branch statement is obtained, the true branch statement is identified as a security check and added to the security check set.
Further, the method for integrating the security check set in step 4) includes: regarding the safety check statements in the safety check set, if the two safety check statements are the same conditional branch statement or the conditional branch statement which is partially modified but has the same semantic meaning, the two safety check statements are regarded as the same safety check, and the statement type is kept as the new added statement SaddThe security check statement of (1).
Further, the step 5) of identifying the type of the critical variable includes:
acquiring a definition point set and a use point set of a critical variable;
if the definition point set of the critical variable is not empty and the use point set is empty, or the use point set is not empty and no function call except a return statement or a conditional branch statement exists in the use point set, the critical variable is a state variable;
if the set of usage points is not empty and there are other function calls in the set of usage points besides the return statement or the conditional branch statement, then the critical variable is a critical usage variable.
Further, if the security check in step 6) is a new added statement, the repaired function F is FpThe corresponding function F' before repair is Fv(ii) a If the security check is a pruned statement, the repaired function F is FvThe corresponding function F' before repair is Fp
Further, the step 7) of obtaining the critical operation set FCV of the repaired function F according to the type of the critical variable includes: if the critical variable is a state variable, acquiring a statement set which has direct backward data dependence on the critical variable on the F as a key operation set FCV of the F; if the critical variable is a key use variable, a statement set which has direct forward data dependency with the critical variable is obtained on the F as a key operation set FCV of the F.
Further, redundant key operation pairs in the CVpair obtained in the step 8) are combined.
Further, the step 9) of obtaining a statement set having control dependency on the key operation includes:
if the critical variable is a state variable, then obtain the critical operation CV in F and FiAnd its corresponding CVi' statement set FC with Forward control dependencyiAnd FCi′;
If the critical variable is the key usage variable, then obtain the CV from the key operation in F and FiAnd its corresponding CVi' statement set FC with Backward control dependencyiAnd FCi′;
Will CV isiAnd FCiPerforming integration to form a key semantic feature statement set of F
Figure BDA0003404956460000041
Will CV isi' and FCi' integration into Key semantic feature statement set constituting F
Figure BDA0003404956460000042
Further, the step of integrating the key operations and the statement sets in the step 10) includes:
if the critical variable is a state variable, obtaining the critical operation CV in F and FiAnd its corresponding CVi' statement set FDB with backward data flow dependencyiAnd FDBi', and a set of statements FCB with backward control flow dependenciesiAnd FCBi'; will CV isi、FDBi、FCBiPerforming integration to form a common semantic feature sentence set S of Fcontext-FWill CV ofi′、FDBi′、FCBi' integration into common semantic feature sentence set S constituting Fcontext-F′Will (S)context-F,Scontext-F′) Common semantic feature statement set with binary group as vulnerability
Figure BDA0003404956460000051
If the critical variable is the key usage variable, obtain the CV from F and F' and key operationiAnd its corresponding CVi' statement set FDB with backward data flow dependencyiAnd FDBi', and a statement set FDF having forward data flow dependenciesiAnd FDF'; will CV isi、FDBi、FDFiPerforming integration to form a common semantic feature sentence set S of Fcontext-FWill CV ofi′、FDBi', FDF' are integrated to form a common semantic feature sentence set S of Fcontext-F′(ii) a Will (S)context-F,Scontext-F′) Binary set as common semantic feature set for security check defect
Figure BDA0003404956460000052
Further, in step 11), the common semantic feature sentence sets S before and after patch patching are processed according to the information entropycontextAnd (5) screening.
Further, the comparing step in step 13) includes: a11 ═ FtNeutralization of HVNumber of matched statements/HVTotal number of sentences, a12 ═ FtAnd HpNumber of matched statements/HpTotal number of sentences, a2 ═ FtNeutral GvulNumber of matched statements/GvulTotal number of sentences, a3 ═ FtNeutral GfixNumber of matched statements/GfixA total number of statements; f is considered to be F if a11 > S1 and a2 > S2 with a3 < S3, or a12 > S1 with a2 > S2 with a3 < S3tThere are similar security check defects as patches.
The method analyzes the source code and the intermediate representation obtained after the source code is converted, and finally judges whether the function with similar safety inspection defects exists in the target software or not by extracting the function with the safety inspection defects and the code characteristics in the function with the patches and comparing the extracted code characteristics with the function code characteristics in the target software to be detected one by one. The invention has the advantages that: 1) by extracting key semantic features of safety inspection, the interference of other code features in a function where the safety inspection defect is located on the detection precision is reduced, and the accuracy and efficiency of detecting the safety inspection defect are improved. 2) The method overcomes the defect that the existing detection technology is limited to a specific safety inspection defect type, and supports the detection of four types of safety inspection defects including safety inspection deficiency, incorrect position of inspection, incorrect inspection (including incorrect predicates of inspection or incorrect corresponding processing logic after inspection) and redundant inspection.
Drawings
FIG. 1 is a flow chart of a security inspection defect detection method based on key semantic features according to the present invention.
Fig. 2 is a patch example (a0323b979f 81).
Fig. 3 is a patch illustrating (a0323b979f81) a security check defect that results in a null pointer dereference due to a missing security check.
Fig. 4 is a patch illustrating (b82175750131) an infinite loop of security check defects due to security check loss.
Detailed Description
In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.
To facilitate the presentation of the method of the invention, some key definitions and formalized descriptions used in the method of the invention are explained below.
Vulnerability patch (Vulnerabilitypatch). Each submission of open source software using version management controls (e.g., git) is called a patch (commit), where the patch associated with fixing the vulnerability is a vulnerability patch. The patch provides information as to why the change occurred, the version before and after the fix, and what change occurred in the code. Assuming that each vulnerability exists within a function, use (F)v,Fp) Function pair to represent function F before patching corresponding to a bugvAll program statements and patched function FpAll program statements of (1). By obtaining the version of the patch and its previous version, (F) is obtainedv,Fp) A pair of functions. Each vulnerability patch contains one or more modified blocks (hunks), which are the basic units of the patch, and each hunk contains a context statement (contextlines) ScontextDelete statements (deletedlines) SdelAnd new statements (addlines) Sadd. Wherein S isdelTo exist in FvStatement of (1) but not present in FpStatement of (1), SaddTo exist in FpStatement of (1) but not present in FvThe statement in (1). ScontextFor up to 3 function statements before and after a hunk and at SdelOr SaddThe sentence in (c). Patch difference statement SdiffComprises SdelAnd SaddI.e. Sdiff=Sadd∪Sdel. As shown in FIG. 2, the commit number in the first row identifies the ID of the patch, i.e., the commit ID, lines 5-11 record why the patch has changed, lines 17-20 record the files and versions before and after the patch modification, line 21 records the function of the patch modification, and lines 25-31 record the code of the patch modificationWherein the 25 th action S del26 th to 28 th actions SaddLines 22-24 and lines 29-31, ScontextLines 21-31 are a hunk.
Security check (securitycheck). As previously mentioned, is a type of conditional branch statement set SC used to check the condition or state of program execution. For scalability of analysis of large-scale patches, in a specific method implementation if the statement S of the patch modification is BSiThe patch is considered to be relevant for the fixing of the security check defect. Since the patch itself is a security-related patch, if S is BSiS is a security check SCi(ii) a If S is an error code prompting exception or a function statement related to exception handling, then a direct predecessor BS of the branch P where S is located can be reachediFor security checking SCi
Critical operations (critical verb). Operations protected for security checks, faulty operations or functions performed due to incorrect checks, and error-prone function sets CV that require security checks to perform status checks.
Critical variables (critical variables). The critical variables are variables representing the operating state of the system or for critical operations, typically variables checked in security checks. According to the semantics of security check, the critical variables can be divided into two categories: key usage variable VuseAnd a state variable Vcond。VuseAre variables that are used for critical operations. VuseIs that they are usually checked before being used by the critical operation, so they tend to have a def-use relationship with the forward statement, and the point of use of the variable tends to be the critical operation. VcondIs a variable that indicates whether the critical operation was performed successfully or not. VcondIs that they are usually checked after a critical operation, so they have a def-use relationship with the statements in their back, and the def point closest to the variable is often the critical operation, VcondThere is no other use than as a flag to indicate the success or failure of a critical operation.
Function Signature (Function Signature). For a given function, the function features are anchor points for critical operationsAnd obtaining a data or control dependency relationship with the key operation according to the type of the critical variable, and normalizing to obtain the hash value set of the function statement. For a vulnerability, it (F)v,Fp) The function pair corresponds to a function characteristic of (G)vul,Gfix,Gcontext). Wherein GvulAnd GfixFor distinguishable key semantic features before and after security check defect repair, GcontextIs a reaction of with FvAnd FpRelated doublets (F) of common semantic featuresv,Fp) The function or execution path of the vulnerability or patch function may be reflected.
The invention discloses a security inspection defect detection method based on key semantic features, which is used for detecting whether a security inspection defect exists in target software in a given source code form, outputting the position of the security inspection defect in a source code and additional information for assisting a user in analyzing and repairing the defect. The present invention also supports detecting, for a given target software in source code form, whether there is a security check defect therein similar to a user-specified historical security check defect, and outputting the location of the latter in the source code, as well as additional information to assist the user in analyzing, repairing the defect. The flow of the method is shown in figure 1, and comprises the following steps:
1) if the user does not specify the patch set related to the historical security inspection defects, the repaired security inspection defects and the patches of the repaired security inspection defects in the historical version of the tested target software are collected, and if not, the next step is directly carried out. And with the vulnerability number, the vulnerability type or words related to vulnerability information as keywords, searching out patches with characters description containing the keywords from the patches of the code hosting library of the target software to be tested or directly acquiring related patch sets from reference links of historical vulnerabilities of the target software to be tested. Screening out difference codes S before and after patching from the patchingdiffIncluding conditional branch statements, error codes for prompting exception or patches of function statements related to exception handling to obtain patch set, and recording as Ctarget
2) And obtaining the difference information of the patch. For screeningOutgoing patch set C related to security check defecttargetEach patch c: first, the header information of the patch is parsed to identify the file modified by the patch and the patch difference statement SdiffAnd the line number of the patch, and then obtaining S from the files before and after patchingdiffFunction F of the row numbervObtaining the pairing function F before or after repairingpPair of composition functions (F)v,Fp) Finally, the commit ID of the patch will be included, (F)v,Fp) And SdiffIs added to the difference information list diff _ info of the patch.
3) A set of security checks involved in the patch modification is identified. For each tuple in the patch difference information list diff _ info, the following steps are performed. First, for SdiffEach statement S in the set is traversed: if S is a true branch of the conditional branch statement, identifying S as security check and adding the security check set SC; if S is a false branch of a conditional branch statement, a return error code or a function related to exception handling, all predecessors of a basic block where the statement is located need to be acquired on a control flow graph CFG of the function, and all the predecessors are traversed. Each node of the control flow graph represents a basic block in the function, and the edges represent control flow transfer relations. If one of the predecessors ends with a true branch statement, its corresponding true branch statement is fetched, identified as a security check and added to the SC. Finally, the binary group containing the commit ID of the patch and the SC set is added into the security check information set SC _ info of the patch.
4) And integrating the safety check set SC to remove redundant safety check. For each tuple in sc _ info in the security check information set of the patch, the following steps are performed: for the safety check statement SC in SCiIf two SCsiIf the conditional branch statements are the same or the conditional branch statements which are partially modified but have the same semantics, the conditional branch statements are regarded as the same security check, and the statement type is kept as SaddThe security check statement of (1).
5) The critical variables and their types are identified from the set of security checks SC. To pairEach tuple in the security check information set sc info of the patch performs the following steps: first, critical variables for security checks are identified. Checking each SC in a set of SCs for securityiGo through and go over SCiRecording position of each checked critical variable V, adding the position to the critical variable list VlistIn (1). Next, the type of critical variable is identified. The definition point set def and the use point set use of v are acquired. If def of v is not null and use is null, or use is not null and function calls except a return statement or a conditional branch statement do not exist in use, v is a state variable; v is a key use variable if the use is not empty and there are other function calls in the use in addition to the return statement or conditional branch statement. Adding type of V to type list of critical variablestype. Finally, commit ID, security check SC of the patch will be includediAnd corresponding VlistAnd VtypeIs added to the critical variable information set critical _ variable _ info of the patch.
6) And obtaining a function pair of the patch for extracting semantic features. For each tuple in the critical variable information set critical _ variable _ info of the patch, the following steps are performed: according to SCiThe type of statement, the pair of functions (F, F') that extract semantic features, is largely divided into two cases: 1) if SCiFor the new added statement, the function F after patching is FpThe corresponding function F' before repair is Fv(ii) a 2) If SCiFor pruning a statement, the function F after patching is FvThe corresponding function F' before repair is Fp. Will contain the commit ID of the patch, the Security check SCiAnd the corresponding triple of the function pair (F, F') for extracting the semantic features is added into a function pair information list F for extracting the semantic features by the patchpairs
7) And acquiring the key operation on the repaired function. For FpairsEach tuple in (a) performs the following steps: obtaining commit ID and Security check SC for tuplesiEach piece of critical variable information corresponding to the patch, namely quadruple critical _ variable _ info, is corresponding to the critical variable in critical _ variable _ infoV of the listlistAccording to the type of the critical variable v, obtaining a critical operation set FCV of the repaired function F. If v is a state variable, acquiring a statement set which has direct backward data dependence on v on a function F as a key operation set FCV of the function F (v data depends on FCV); if v is a key use variable, a statement set which has direct forward data dependency with v is obtained on a function F as a key operation set FCV of F (FCV data depends on v).
8) And obtaining a key operation pair corresponding to the function pair used for extracting the semantic features of the patch. Each key operation CV in the key operation set FCV of the repaired function F output in the step 7)iIf CV isiFor newly adding sentences, finding function sentences with similar or similar positions in the function F' before patching as the corresponding key operation CV of the Fi' and add it to the critical operation set FCV ' of F '; if CV is not foundi', then will CViDeleted from the FCV. CV in FCViAnd the corresponding CV in FCVi'pairwise pairing to make a key pair of functions (F, F') (CV)i,CVi') to join a set of key operation pairs CVpairs, and finally to merge redundant key operation pairs in the CVpairs. Commit ID, Security check SC, which will contain the patchiAnd adding the triples of the key operation pair set CVpair into the key operation pair information set CVinfo.
9) And obtaining key semantic features before and after patch patching. For each tuple in the key operation pair information set cvinfo, the following operations are performed: for each critical operation pair (CV) in the set of critical operation pairs CVpairsi,CVi') a set of statements (F, F') having a control dependency on the critical operation is obtained according to the type of critical variable. If v is a state variable, obtaining a correlation value CV in F and FiAnd its corresponding CVi' statement set FC with Forward control dependencyiAnd FCi′(CViAnd CVi' Individual Forward control depends on FCiAnd FCi'). If v is a key usage variableObtaining and Critical operating CV in F and FiAnd its corresponding CVi' statement set FC with Backward control dependencyiAnd FCi(CViAnd CViIndependent backward control dependent on FCiAnd FCi'). With F as FvF' is FpFor example, let CV beiAnd FCiIntegrating to form a vulnerability function FvIs set of key semantic feature statements
Figure BDA0003404956460000091
Namely, it is
Figure BDA0003404956460000092
Will CV isi' and FCi' performing integration to construct a patch function FpIs set of key semantic feature statements
Figure BDA0003404956460000093
Namely, it is
Figure BDA0003404956460000094
10) And acquiring common semantic features before and after patch patching. For each tuple in the key operation pair information set cvinfo, the following operations are performed: in the key operation pair (CV)i,CVi') the function pair in which the sets of statements having data dependency on the critical operation are obtained according to the type of the critical variable, respectively. If v is a state variable, obtaining a correlation value CV in F and FiAnd its corresponding CVi' statement set FDB with backward data flow dependencyiAnd FDBi' and statement set FCB with backward control flow dependenciesiAnd FCBi'. Will CV isi、FDBi、FCBiPerforming integration to form a common semantic feature sentence set S of Fcontext-FI.e. Scontext-F=CVi∪FDBi∪FCBiWill CV ofi′、FDBi′、FCBiPerforming integration to form a common semantic feature sentence set S of Fcontext-F′I.e. Scontext-F=CVi∪FDBi∪FCBi. Will (S)context-F,Scontext-F′) Common semantic feature statement set with binary group as vulnerability
Figure BDA0003404956460000101
If v is the key usage variable, get the CV from F and F' and key operationiAnd its corresponding CVi' statement set FDB with backward data flow dependencyiAnd FDBi' and statement set FDF with forward data flow dependencyiAnd FDF'. Will CV isi、FDBi、FDFiPerforming integration to form a common semantic feature sentence set S of Fcontext-FI.e. Scontext-F=CVi∪FDBi∪FDFi. Will CV isi′、FDBi', FDF' are integrated to form a common semantic feature sentence set S of Fcontext-F′I.e. Scontext-F=CVi∪FDBiAnd U.S. FDF'. Will (S)context-F,Scontext-F′) Binary set as common semantic feature set for security check defect
Figure BDA0003404956460000102
11) And merging the obtained key semantic features and the public semantic features before and after patch patching. Merging all the function characteristics obtained in the steps 9) and 10) to finally obtain the commit ID and the security check SC containing the patchiAnd triplets (S) of sets of key semantic features and common semantic feature sentences before and after patchingvul,Sfix,Scontext) And (3) the formed triple function feature signature is added into a patch function feature list signatures, and the merging rule is as follows:
Figure BDA0003404956460000103
Figure BDA0003404956460000104
Figure BDA0003404956460000105
preferably, the vulnerability function F before and after patch patching is carried out according to the information entropyvAnd a patch function FpOf a set S of common semantic feature sentencescontextScreening is performed, and the step is an optional step. Common semantic features S due to vulnerabilitiescontextToo many statements may be included which would introduce large noise and thus false positives. According to observation, the farther away from the key operation statement CV, the less the relationship between the statement and the actual defect. For common semantic features S in each item in signaturecontextCan be based on the entropy of information to ScontextScreening, the specific steps are related to MVP [ USENIX, 2020)]The procedure is similar.
12) And normalizing the function characteristic statement and the source code of the target software to be tested. Normalizing the function characteristic statement signature output in the step 11) and the source code of the target software to be tested (taking the function as a unit) to finally obtain the commit ID containing the patch and the function characteristic (G) of the patchvul,Gfix,Gcontext)((Gvul,Gfix,Gcontext) Is prepared from (S)vul,Sfix,Scontext) Obtained after normalization processing) and function feature set F of target software to be testedTThe patch _ info is saved to the feature database.
13) And comparing the characteristics extracted by using the published and suspected loopholes in the characteristic database with each function characteristic of the target software to be detected. The input of the step is a record set binary list patch _ info in the feature database and a function feature set F of the target software to be testedTAnd with common semantic features GcontextVulnerability key semantic features and patch key semantic features Gvul,GfixThe matched thresholds S1, S2, S3, output as function F with similar security check defectstAnd additional information to assist the user in analyzing and repairing the defect.
For each record in the feature library, namely the binary patch _ info, traversing F of the target software to be testedTEach function F oftRespectively, calculating a11 ═ FtNeutralization of HVNumber of matched statements/HVTotal number of sentences, a12 ═ FtAnd HpNumber of matched statements/HpTotal number of sentences, a2 ═ FtNeutral GvulNumber of matched statements/GvulTotal number of sentences, a3 ═ FtNeutral GfixNumber of matched statements/GfixThe total number of statements. F is considered to be F if a11 > S1 and a2 > S2 with a3 < S3 or a12 > S1 with a2 > S2 and a3 < S3tThere may be security check defects similar to patches. If S of the patchaddAnd SdelWhen the overlap is complete, the order of the statements needs to be considered when matching. Eventually requiring confirmation through manual review FtWhether a security check defect really exists.
The execution of the above steps is described below by way of an example.
As shown in fig. 3, a suspected bug patch with security effect in the Linux kernel of the operating system software is taken as an example to describe a process for detecting a security check defect based on key semantic features. The input of the steps is the source code of the Linux kernel with the defect version v5.9.0-rc5 and the selected suspected bug patch a0323b979f 81. The defect is that whether the attributes exist or not is not checked before the NFC _ ATTR _ TARGET _ INDEX and NFC _ ATTR _ promoters attributes are used, so that a program in a malicious user mode can trigger null pointer dereferencing, and results such as program crash and even privilege escalation can be caused.
First, a process of screening the patches related to the security check repair is performed, and since the selected patch relates to modification of the conditional branch statement, the selected patch is identified as a patch related to the security check defect repair (corresponding to step 1)). The difference information for the patch is then identified. As shown in fig. 3, the only file modified by this patch is netlink.c, which is complemented by parsingThe header information of the file indicates the version v corresponding to the file before and after patchingpAnd vf6b0850e and b251fb9, SdelOr SaddIs in the range of the function nfc _ gen _ activate _ target start and stop row numbers, so nfc _ gen _ activate _ target is the only modified function in this patch. Finally, the source code of the function nfc _ gen _ activate _ target is obtained in netlink.c of versions 6b0850e and b251fb9 respectively and is used as the function pair before and after patching corresponding to the function nfc _ gen _ activate _ target (F)v,Fp) (corresponding to step 2)). Since the patch has a new statement and the new statement includes a conditional branch statement IF, the 11 th, 12 th, 13 th, and 14 th statements are recognized as security checks and the record positions are added to the security check set SC (corresponding to step 3)). Since the deleted security check in line 11 and the added security check in lines 12, 13, and 14 are the same security check, only the locations of the security checks in lines 12, 13, and 14 are reserved, and the security check corresponding to line 11 is deleted from the SC (corresponding to step 4)). Then, a definition point set def (line 5) and a use point set use ( lines 17, 23 and 24) having a direct data dependency relationship with the critical variable info of the security check are obtained, and if the use is not empty and there are other function calls besides a return statement or a conditional branch statement in the use, the type of the info is a critical use variable (corresponding to step 5)). The features of the function pairs before and after patch repair (corresponding to steps 6), 7), 8), 9), 10), 11)) are obtained. Then, vulnerability functions F before and after patch patching are carried out according to the information entropyvAnd a patch function FpOf a set S of common semantic feature sentencescontextAn optional step of performing a screening (corresponding to step 11). The extracted function feature statements and the source code of the target software to be tested are normalized (corresponding to step 12)). Finally, the characteristics extracted by the published and suspected bug patches in the characteristic database are compared with each function of the target software to be detected, so that a function nfc _ gen _ fw _ download (net/nfc/netlink.c) with similar security inspection defects shown in fig. 4 is obtained. After manual audit, the function is determined to have no check on whether the NFC _ ATTR _ FIRMWARE _ NAME attribute exists before the attribute is used, and a program causing a malicious user mode can touchNull pointer dereferencing (corresponding to step 13)).
Although the present invention has been described with reference to the above embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A security inspection defect detection method based on key semantic features is characterized by comprising the following steps:
1) collecting patches related to safety inspection defects aiming at the target software to be tested;
2) obtaining ID of patch and function pair before and after patch repair (F)v,Fp) And patch difference statement SdiffForming a triple group, and adding the triple group into a patch difference information list;
3) for the patch difference statement S in each triple in the patch difference informationdiffJudging each statement in the sentence, acquiring a true branch statement, marking the true branch statement as safety check, and adding the true branch statement into a safety check set;
4) integrating the safety check set to remove redundant safety check;
5) identifying each critical variable of each safety check in the safety check set, and adding the critical variables into a critical variable list; identifying whether the type of the critical variable is a state variable or a key use variable, and adding the type of the critical variable into a type list of the critical variable; forming a quadruple by the ID of the patch, the security check, the critical variable list and the type list of the critical variable, and adding the quadruple into the critical variable information set of the patch;
6) for each quadruple in the critical variable information set, extracting a repaired function F and a corresponding function F 'before repair according to the statement type of security check in the quadruple to obtain a function pair (F, F') with semantic characteristics; forming a triple group by the ID, the safety inspection sum (F, F') of the patch, and adding the triple group into a function pair information list of the patch for extracting semantic features;
7) obtaining a patch ID of each triple in the function pair information list and a quadruple in a critical variable information set corresponding to security check, and obtaining a key operation set of the repaired function F according to the type of each critical variable in the critical variable list in the quadruple;
8) for each critical operation CV in the set of critical operationsiFinding the corresponding critical operation CV in the function F' before patchingi'; if CV is not foundi', then will CViDelete from the set of critical operations; aggregating CVs in the set of critical operationsiAnd corresponding CVi'pairwise pairing to form a key operation pair (CV) corresponding to the function pair (F, F')i,CVi') to join a set of key operation pairs;
9) for each key operation pair (CV) in the set of key operation pairsi,CVi') the function pair (F, F') where the patch is located, obtaining a statement set which has control dependence on the key operation according to the type of the critical variable, integrating the key operation and the statement set to obtain a key semantic feature statement set before and after patch repair
Figure FDA0003404956450000011
And
Figure FDA0003404956450000012
10) for each critical operation pair (CV) in the set of critical operation pairsi,CVi') the function pair (F, F') where the patch is located, acquiring a statement set which has data dependence with the key operation according to the type of the critical variable, integrating the key operation and the statement set to obtain a common semantic feature statement set before and after patching
Figure FDA0003404956450000013
11) Collecting the key semantic feature sentences before and after the patch is repaired
Figure FDA0003404956450000021
And a common semantic feature statement set
Figure FDA0003404956450000022
Respectively merging to obtain a triplet (S)vul,Sfix,Scontext) By patch ID, security check and triplet (S)vul,Sfix,Scontext) Composing functional characteristics of the triples;
12) normalizing the function characteristics and the source code of the target software to be tested to obtain the function characteristics (G) including the ID and G of the patchvul,Gfix,Gcontext) And the function feature set F of the target software to be testedTStoring the binary list into a characteristic database;
13) for the doublet in the feature database, FTEach function F oftAnd Gcontext
Figure FDA0003404956450000023
A comparison is made to find out the function that has a security check defect similar to the patch.
2. The method of claim 1, wherein the step of collecting patches related to security check defects in step 1) comprises:
searching out a patch with a character description containing the keyword from a patch of a code managed library of the target software to be tested by taking a bug number, a bug type or words related to bug information as the keyword, or directly acquiring a related patch from a reference link of a historical bug of the target software to be tested;
screening out patch difference statements S from obtained patchesdiffThe patch contains conditional branch statements, error codes for prompting exception or function statements related to exception handling, namely the patch related to the security inspection defect.
3. Method according to claim 1, characterized in that in step 2) a pair of functions (F) is obtainedv,Fp) Comprises the following steps:
analyzing the head information of the patch, and identifying the file modified by the patch and the patch difference statement SdiffAnd said SdiffThe row number of (c);
obtaining the S in the files before and after patchingdiffFunction F of the row numbervObtaining a pairing function F before or after patchingpForm a function pair (F)v,Fp)。
4. The method of claim 1, wherein the step of fetching the true branch statement and identifying it as a security check in step 3) comprises:
to SdiffTraversing each statement, if the statement is a true branch of a conditional branch statement, marking the statement as security check and adding the statement into a security check set;
if the statement is a false branch of a conditional branch statement, a return error code or a function related to exception handling, acquiring all predecessors of a basic block where the statement is located on a control flow graph of the function, and traversing all the predecessors; each node of the control flow graph represents a basic block in the function, and edges represent a control flow transfer relation;
if a predecessor ends with a true branch statement, its corresponding true branch statement is obtained, the true branch statement is identified as a security check and added to the security check set.
5. The method of claim 1, wherein the step 4) of integrating the set of security checks comprises: regarding the safety check statements in the safety check set, if the two safety check statements are the same conditional branch statement or the conditional branch statement which is partially modified but has the same semantic meaning, the two safety check statements are regarded as the same safety check, and the statement type is kept as the new added statementSaddA security check statement of (1);
step 5) the step of identifying the type of the critical variable comprises: acquiring a definition point set and a use point set of a critical variable; if the definition point set of the critical variable is not empty and the use point set is empty, or the use point set is not empty and no function call except a return statement or a conditional branch statement exists in the use point set, the critical variable is a state variable; if the set of usage points is not empty and there are other function calls in the set of usage points besides the return statement or the conditional branch statement, then the critical variable is a critical usage variable.
6. The method as claimed in claim 1, wherein in step 6), if the security check is a new added statement, the repaired function F is FpThe corresponding function F' before repair is Fv(ii) a If the security check is a pruned statement, the repaired function F is FvThe corresponding function F' before repair is Fp
7. The method according to claim 1, wherein the obtaining of the critical operation set FCV of the repaired function F according to the type of the critical variable in step 7) comprises: if the critical variable is a state variable, acquiring a statement set which has direct backward data dependence on the critical variable on the F as a key operation set FCV of the F; if the critical variable is a key use variable, acquiring a statement set which has direct forward data dependence with the critical variable on the F as a key operation set FCV of the F;
merging the redundant key operation pairs in the CVpair obtained in the step 8);
step 11), according to the information entropy, collecting the public semantic feature sentences before and after patch patching ScontextAnd (5) screening.
8. The method of claim 1, wherein the step of 9) obtaining a set of statements having control dependencies with the critical operation comprises:
if it is imminentIf the boundary variable is a state variable, obtaining the critical operation CV in F and FiAnd its corresponding CVi' statement set FC with Forward control dependencyiAnd FCi′;
If the critical variable is the key usage variable, then obtain the CV from the key operation in F and FiAnd its corresponding CVi' statement set FC with Backward control dependencyiAnd FCi′;
Will CV isiAnd FCiPerforming integration to form a key semantic feature statement set of F
Figure FDA0003404956450000031
Will CV isi' and FCi' integration into Key semantic feature statement set constituting F
Figure FDA0003404956450000032
9. The method of claim 1, wherein the step of integrating the set of critical operations and statements in step 10) comprises:
if the critical variable is a state variable, obtaining the critical operation CV in F and FiAnd its corresponding CVi' statement set FDB with backward data flow dependencyiAnd FDBi', and a set of statements FCB with backward control flow dependenciesiAnd FCBi'; will CV isi、FDBi、FCBiPerforming integration to form a common semantic feature sentence set S of Fcontext-FWill CV ofi′、FDBi′、FCBi' integration into common semantic feature sentence set S constituting Fcontext-F', will (S)context-F,Scontext-F') common semantic feature statement set with duplets as vulnerabilities
Figure FDA0003404956450000041
If the critical variable is the key usage variable, obtain the CV from F and F' and key operationiAnd its corresponding CVi' statement set FDB with backward data flow dependencyiAnd FDBi', and a statement set FDF having forward data flow dependenciesiAnd FDF'; will CV isi、FDBi、FDFiPerforming integration to form a common semantic feature sentence set S of Fcontext-FC, Ci′、FDBi', FDF' are integrated to form a common semantic feature sentence set S of Fcontext-F'; will (S)context-F,Scontext-F') binary set as common semantic feature set for security check defects
Figure FDA0003404956450000042
10. The method of claim 1, wherein the comparing step in step 13) comprises: a11 ═ FtNeutralization of HVNumber of matched statements/HVTotal number of sentences, a12 ═ FtAnd HpNumber of matched statements/HpTotal number of sentences, a2 ═ FtNeutral GvulNumber of matched statements/GvulTotal number of sentences, a3 ═ FtNeutral GfixNumber of matched statements/GfixA total number of statements; f is considered to be F if a11 > S1 and a2 > S2 with a3 < S3, or a12 > S1 with a2 > S2 with a3 < S3tThere are similar security check defects as patches.
CN202111507955.2A 2021-12-10 2021-12-10 Security inspection defect detection method based on key semantic features Pending CN114490313A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111507955.2A CN114490313A (en) 2021-12-10 2021-12-10 Security inspection defect detection method based on key semantic features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111507955.2A CN114490313A (en) 2021-12-10 2021-12-10 Security inspection defect detection method based on key semantic features

Publications (1)

Publication Number Publication Date
CN114490313A true CN114490313A (en) 2022-05-13

Family

ID=81492395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111507955.2A Pending CN114490313A (en) 2021-12-10 2021-12-10 Security inspection defect detection method based on key semantic features

Country Status (1)

Country Link
CN (1) CN114490313A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7392545B1 (en) * 2002-01-18 2008-06-24 Cigital, Inc. Systems and methods for detecting software security vulnerabilities
CN103955426A (en) * 2014-04-21 2014-07-30 中国科学院计算技术研究所 Method and device for detecting code C null-pointer reference
KR20150100586A (en) * 2015-08-07 2015-09-02 단국대학교 산학협력단 Appratus for detectiing similarity of software and method thereof
KR20190030490A (en) * 2017-09-14 2019-03-22 국방과학연구소 Apparatus and method for detecting security weakness of program source code
US20200159934A1 (en) * 2018-11-15 2020-05-21 ShiftLeft Inc System and method for information flow analysis of application code
CN111611586A (en) * 2019-02-25 2020-09-01 上海信息安全工程技术研究中心 Software vulnerability detection method and device based on graph convolution network
US10769250B1 (en) * 2017-10-26 2020-09-08 Amazon Technologies, Inc. Targeted security monitoring using semantic behavioral change analysis
CN111967013A (en) * 2020-07-13 2020-11-20 复旦大学 C/C + + patch existence detection method based on patch summary comparison
CN112651028A (en) * 2021-01-05 2021-04-13 西安工业大学 Vulnerability code clone detection method based on context semantics and patch verification
CN113297580A (en) * 2021-05-18 2021-08-24 广东电网有限责任公司 Code semantic analysis-based electric power information system safety protection method and device
CN113553593A (en) * 2021-07-21 2021-10-26 浙江大学 Internet of things firmware kernel vulnerability mining method and system based on semantic analysis

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7392545B1 (en) * 2002-01-18 2008-06-24 Cigital, Inc. Systems and methods for detecting software security vulnerabilities
CN103955426A (en) * 2014-04-21 2014-07-30 中国科学院计算技术研究所 Method and device for detecting code C null-pointer reference
KR20150100586A (en) * 2015-08-07 2015-09-02 단국대학교 산학협력단 Appratus for detectiing similarity of software and method thereof
KR20190030490A (en) * 2017-09-14 2019-03-22 국방과학연구소 Apparatus and method for detecting security weakness of program source code
US10769250B1 (en) * 2017-10-26 2020-09-08 Amazon Technologies, Inc. Targeted security monitoring using semantic behavioral change analysis
US20200159934A1 (en) * 2018-11-15 2020-05-21 ShiftLeft Inc System and method for information flow analysis of application code
CN111611586A (en) * 2019-02-25 2020-09-01 上海信息安全工程技术研究中心 Software vulnerability detection method and device based on graph convolution network
CN111967013A (en) * 2020-07-13 2020-11-20 复旦大学 C/C + + patch existence detection method based on patch summary comparison
CN112651028A (en) * 2021-01-05 2021-04-13 西安工业大学 Vulnerability code clone detection method based on context semantics and patch verification
CN113297580A (en) * 2021-05-18 2021-08-24 广东电网有限责任公司 Code semantic analysis-based electric power information system safety protection method and device
CN113553593A (en) * 2021-07-21 2021-10-26 浙江大学 Internet of things firmware kernel vulnerability mining method and system based on semantic analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘臻;武泽慧;曹琰;魏强;: "基于漏洞指纹的软件脆弱性代码复用检测方法", 浙江大学学报(工学版), no. 11, 15 November 2018 (2018-11-15) *
文琪;江喆越;张源;: "基于关键路径测试的安全补丁存在性检测", 计算机应用与软件, no. 03, 12 March 2020 (2020-03-12) *
王赞;郜健;陈翔;傅浩杰;樊向宇;: "自动程序修复方法研究述评", 计算机学报, no. 03, 28 July 2017 (2017-07-28) *

Similar Documents

Publication Publication Date Title
Tian et al. Automatically diagnosing and repairing error handling bugs in C
CN107273751B (en) Multi-mode matching-based security vulnerability online discovery method
Kim et al. Vuddy: A scalable approach for vulnerable code clone discovery
Xu et al. Spain: security patch analysis for binaries towards understanding the pain and pills
Muske et al. Survey of approaches for handling static analysis alarms
US20190138731A1 (en) Method for determining defects and vulnerabilities in software code
US8621278B2 (en) System and method for automated solution of functionality problems in computer systems
US8359583B2 (en) Methods for selectively pruning false paths in graphs that use high-precision state information
US8312440B2 (en) Method, computer program product, and hardware product for providing program individuality analysis for source code programs
CN109670318B (en) Vulnerability detection method based on cyclic verification of nuclear control flow graph
Li et al. A mining approach to obtain the software vulnerability characteristics
Sun et al. When gpt meets program analysis: Towards intelligent detection of smart contract logic vulnerabilities in gptscan
Sun et al. Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis
Jiang et al. Tracing back the history of commits in low-tech reviewing environments: a case study of the linux kernel
Alomari et al. Clone detection through srcClone: A program slicing based approach
Aleti et al. E-APR: Mapping the effectiveness of automated program repair techniques
Viertel et al. Detecting Security Vulnerabilities using Clone Detection and Community Knowledge.
Yi et al. BlockScope: Detecting and investigating propagated vulnerabilities in forked blockchain projects
Di Angelo et al. Consolidation of ground truth sets for weakness detection in smart contracts
Sun et al. Propagating bug fixes with fast subgraph matching
US20230315843A1 (en) Systems and methods for cybersecurity alert deduplication, grouping, and prioritization
Harzevili et al. Automatic Static Vulnerability Detection for Machine Learning Libraries: Are We There Yet?
CN114490313A (en) Security inspection defect detection method based on key semantic features
Pereira et al. A Software Vulnerability Dataset of Large Open Source C/C++ Projects
Ehsan et al. Ranking code clones to support maintenance activities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination