CN101908006B - GCC abstract syntax tree-based buffer overflow vulnerability detection method - Google Patents

GCC abstract syntax tree-based buffer overflow vulnerability detection method Download PDF

Info

Publication number
CN101908006B
CN101908006B CN2010102409081A CN201010240908A CN101908006B CN 101908006 B CN101908006 B CN 101908006B CN 2010102409081 A CN2010102409081 A CN 2010102409081A CN 201010240908 A CN201010240908 A CN 201010240908A CN 101908006 B CN101908006 B CN 101908006B
Authority
CN
China
Prior art keywords
node
var
buffer zone
syntax tree
abstract syntax
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010102409081A
Other languages
Chinese (zh)
Other versions
CN101908006A (en
Inventor
胡昌振
邹家莘
王崑声
马锐
薛静锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN2010102409081A priority Critical patent/CN101908006B/en
Publication of CN101908006A publication Critical patent/CN101908006A/en
Application granted granted Critical
Publication of CN101908006B publication Critical patent/CN101908006B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a buffer overflow vulnerability detection method, in particular to a GCC abstract syntax tree-based buffer overflow vulnerability detection method, and belongs to the technical field of information security. The method comprises the following steps of: operating a source program by using a GCC compiler to generate an abstract syntax tree; eliminating all information not related to analysis data streams and control streams in the text of the abstract syntax tree and maintaining the integrity of useful information; and using the useful information in program analysis, and monitoring related nodes on the abstract syntax tree to fulfill the aims of analyzing and detecting the buffer overflow vulnerability. Compared with the traditional analysis method which does not eliminate redundancy, the method has better practicability and higher efficiency and precision.

Description

A kind of buffer overflow vulnerability detection method based on the GCC abstract syntax tree
Technical field
The present invention relates to a kind of detection method of buffer-overflow vulnerability, particularly a kind of buffer overflow vulnerability detection method based on the GCC abstract syntax tree belongs to field of information security technology.
Background technology
Along with computer technology rapid development, the level of informatization of human society is more and more higher, and the politics of entire society, economy, military affairs, culture and other field are also more and more higher to the degree of dependence of computer information system.In this case, the security of computer system has obtained people and has more and more paid close attention to.Yet writing of large software, system needs various programmers to finish jointly, and they are divided into some plates with software or system, and the division of labor is write, and then gathers, test; Repair at last again, issue, it almost is inevitable therefore having security breaches in software.That software security flaw refers to be introduced in the software design implementation procedure, in the defective of aspects such as data access or behavior logic.The usually victim utilization of these leaks, thus make program behavior run counter to certain security strategy.For these reasons, at present the research of software security flaw detection technique is more and more come into one's own.
According to the standard that whether needs executive routine in the testing process, the software security flaw detection technique is divided into detection of dynamic and Static Detection.
(1) Static Detection
Static detection method is broadly divided into three classes:
The first kind is based on the detection method of lexical analysis.Corresponding to the early detection instrument, Grep instrument etc. for example.The time that its occurs is longer, and development is ripe, and its advantage is: the leak feature is independent of routine analyzer with the form of data and exists, can flexible expansion; In addition, lexical analysis can guarantee to carry out preferably efficient.But its shortcoming is very obvious: the feature database that exists with data mode can not carry out abundant, complete description to leak, thereby cause the imperfect of leak information gathering, also limit the related algorithm that cooperates with it and only can carry out lexical analysis, therefore influenced detectability.
Second class is about beam analysis and the detection method that note drives.Though it has introduced grammatical analysis, be based on that the thought of program verification system and method carry out.This requires operating personnel very familiar to detecting target, even needs manual compiling program specification and note, and therefore the automaticity that detects is lower.The Splint that David Evans and David Larochelle are studied and all be to belong to this method based on the improvement that Splint did.
The 3rd class methods are converted into about beam analysis and the problem of finding the solution for the feature of source code is carried out abstract, modeling with the Hole Detection problem.They are generally realized based on existing program analysis tool (as business software codesurfer), its advantage is: the function of these program analysis tools is very powerful, can generate grammer, semantic informations such as abstract syntax tree, function calling relationship figure, control flow graph even pointed graph of a relation.The DLL (dynamic link library) that tool using provides can directly be analyzed based on these information, thereby has reduced the complexity in the design.At present, in these class methods, have a kind of comparatively general, detect the also detection method of reasonable buffer-overflow vulnerability of effect, be specially:
The 1st step:, directly utilize the GCC compiler to generate abstract syntax tree AST at source program to be analyzed.
The 2nd step: travel through the abstract syntax tree AST that the 1st step produced, to being associated, claim this integer to being the attribute information of corresponding buffer zone each buffer zone and an integer, this integer is to comprising the big or small max_length of allocation space and having used length used_length.
The 3rd step: source program statement to be analyzed that will be relevant with buffer zone and function call be abstract to be the operation to buffer zone attribute information (allocation space big or small max_length and used length used_length).
The 4th step: judge allocation space size max_length and used the magnitude relationship of length used_length.If max_length<used_length judges that then it buffer zone takes place overflows.
The shortcoming of this method is: 1. the abstract syntax tree AST that directly generates at the GCC compiler carries out analyzing and testing, and comprise many detailed information of compiling of helping in the abstract syntax tree text that GCC produces, for example order function and the structure do not used that produces by source program by " #include ", and some intrinsic functions that produce in the compilation process, type declarations, error message, constant etc., these information are unfavorable for code analysis.Quantitatively, to a very little compilation unit, probably can produce its abstract syntax tree text of 1000 times, the final abstract syntax tree that produces can occupy whole internal memory.For the source program of complexity, the detection efficiency of these methods will reduce greatly.2. in the 2nd step, the attribute information of buffer zone to representing that this method for expressing is not accurate enough, can cause result of determination inaccurate with an integer.
(2) detection of dynamic
Detection of dynamic is to inject test data in program operation process, analyze by running environment (comprising environmental variance, internal memory, heap and stack etc.) program, whether the procedures of observation operation is normal, whether program behavior meets the demands, and comes trace routine whether to have leak.The advantage of dynamic detection technology is not directly in the face of source code, does not need the modifying target program source code, and this improves the confidentiality of program to a certain extent.But its tangible deficiency is a dynamic detection technology to the dependence of input, have only when specific input be program when carrying out dangerous point, leak just can be found, therefore, locate inaccurate, rate of failing to report is high.
Summary of the invention
The objective of the invention is deficiency, propose a kind of buffer overflow vulnerability detection method based on the GCC abstract syntax tree at above-mentioned prior art existence.Basic thought of the present invention is: utilize the GCC compiler that source program to be analyzed is operated, generate abstract syntax tree; All flow the information that has nothing to do and maintain with analysis data stream, control and use information integrity in the elimination abstract syntax tree text; Use it for then in the process analysis, reach the purpose of the analysis and the detection of buffer-overflow vulnerability by monitoring junction associated on the abstract syntax tree.Compare with traditional redundant analytic method of not eliminating, this method has efficient and the accuracy rate of better practicality and Geng Gao.
The objective of the invention is to be achieved through the following technical solutions.
A kind of buffer overflow vulnerability detection method based on the GCC abstract syntax tree, its concrete operations step is as follows:
Step 1, at source program to be analyzed, directly utilize the GCC compiler to generate abstract syntax tree AST.
Described source program to be analyzed is the C/C++ source program.
Step 2, on the basis of step 1, eliminate the redundant information among the abstract syntax tree AST.Be specially:
The 1st step: all nodes among the traversal abstract syntax tree AST are divided into 3 types according to the situation of " srcp " field in the node of abstract syntax tree AST (" srcp " field identification the source of this node) with all nodes:
If a. the value of " srcp " field is a source filename, judge that then this node for flowing relevant node with routine analyzer data stream, control, is labeled as useful node;
If b. " srcp " field is not a source filename, judge that then this node for flowing irrelevant node with routine analyzer data stream, control, is labeled as useless node;
If c. do not contain " srcp " field in this node, illustrate that this node temporarily can not determine its source, need further to check and determine, be labeled as node undetermined.
The 2nd step: travel through all nodes among the abstract syntax tree AST again, and all nodes that are marked as useful node and useless node are called father node; For each father node, search its each child node successively, and judge according to following rule:
A. if father node is useful node, its child node is a node undetermined, then this child node is labeled as useful node;
B. if father node is useful node, its child node is useless node, then this child node is labeled as useless node;
C. if father node is useless node, its child node is a node undetermined, then this child node is labeled as useless node;
D. if father node is useless node, its child node is useful node, then this child node is labeled as useful node;
Repeating this step is zero until the quantity of node undetermined.
The 3rd step:, also need to give for change the built-in function that source file is used because the operation in the 1st step and the 2nd step has all been deleted all built-in functions, intrinsic function and relevant information thereof.Therefore travel through all nodes among the abstract syntax tree AST again, if this node or its child node comprise " call_expr ", promptly this node or its child node comprise and call expression formula, and this node or its child node are marked as useless node, then this node and child node thereof are labeled as useful node.
Through the operation of step 2, the abstract syntax tree AST ' of the redundant information that can be eliminated.
Abstract syntax tree AST ' after the removal redundant information that step 3, traversal step two produce, it is right that each buffer zone is increased an interval, claim this interval to being the attribute information of corresponding buffer zone, this attribute information comprises interval alloc of the original allocation of representing corresponding buffer zone and the interval len of physical length; The interval alloc of original allocation is expressed as: alloc (var)=[m Var, n Var]; The interval len of physical length is expressed as: len (var)=[x Var, y Var]; Wherein, var is the buffer zone variable; m Var, n Var, x Var, y VarBe respectively positive integer, m Var, n VarThe lower limit and the upper limit of the interval alloc of expression original allocation; x Var, y VarThe lower limit and the upper limit of the interval len of expression physical length.
Step 4, on the basis of step 3, source program statement to be analyzed that will be relevant with buffer zone and function call are abstract to be operation to the buffer zone attribute information, promptly interval alloc of original allocation and the interval len of physical length is upgraded.
Step 5, on the basis of step 4, judge the state of each buffer zone: be specially:
A. if y Var≤ n Var, judge that then this buffer zone is a safe condition, buffer zone can not take place overflow;
B. if x Var≤ n Var≤ y Var, judge that then this buffer zone is a unsafe condition, buffer zone may take place overflow;
C. if n Var≤ x Var, judge that then this buffer zone is a precarious position, be certain to take place buffer zone and overflow.
To may be certain to take place the situation that buffer zone overflows and make alarm.
Operation through above-mentioned steps can detect buffer-overflow vulnerability.
Beneficial effect
The method that the present invention proposes compared with the prior art has following advantage:
1. eliminated the redundant information in the abstract syntax tree and maintained and therefore had the efficient of better practicality and Geng Gao with information integrity;
2. solved the problem that abstract syntax tree in the existing method takies big quantity space;
3. to replacing with two interval alloc and len, it is more accurate to represent, causes result of determination more accurate with the integer of the attribute information of buffer zone in the prior art in the present invention.
Embodiment
Below in conjunction with specific embodiment technical solution of the present invention is described in detail.
Present embodiment adopts the inventive method that the C linguistic source program of one section 7 row is tested, and source program code is as follows:
Figure GDA0000081872880000061
Figure GDA0000081872880000071
Its operating process is as follows:
Step 1, at source program to be analyzed, directly utilize the GCC compiler to generate abstract syntax tree AST, it comprises 2280 nodal point numbers.
Step 2, on the basis of step 1, eliminate the redundant information among the abstract syntax tree AST.Be specially:
The 1st step: all nodes among the traversal abstract syntax tree AST are divided into 3 types according to the situation of " srcp " field in the node of abstract syntax tree AST with all nodes:
If a. the value of " srcp " field is a source filename, judge that then this node for flowing relevant node with routine analyzer data stream, control, is labeled as useful node;
If b. " srcp " field is not a source filename, judge that then this node for flowing irrelevant node with routine analyzer data stream, control, is labeled as useless node;
If c. do not contain " srcp " field in this node, illustrate that this node temporarily can not determine its source, need further to check and determine, be labeled as node undetermined.
The 2nd step: travel through all nodes among the abstract syntax tree AST again, and all nodes that are marked as useful node and useless node are called father node; For each father node, search its each child node successively, and judge according to following rule:
A. if father node is useful node, its child node is a node undetermined, then this child node is labeled as useful node;
B. if father node is useful node, its child node is useless node, then this child node is labeled as useless node;
C. if father node is useless node, its child node is a node undetermined, then this child node is labeled as useless node;
D. if father node is useless node, its child node is useful node, then this child node is labeled as useful node;
Repeating this step is zero until the quantity of node undetermined.
The 3rd step:, also need to give for change the built-in function that source file is used because the operation in the 1st step and the 2nd step has all been deleted all built-in functions, intrinsic function and relevant information thereof.Therefore travel through all nodes among the abstract syntax tree AST again, if this node or its child node comprise " call_expr ", promptly this node or its child node comprise and call expression formula, and this node or its child node are marked as useless node, then this node and child node thereof are labeled as useful node.
Through the operation of step 2, the abstract syntax tree AST ' of the redundant information that can be eliminated, it comprises 82 nodes.
Abstract syntax tree AST ' after the removal redundant information that step 3, traversal step two produce, it is right that each buffer zone is increased an interval, claim this interval to being the attribute information of corresponding buffer zone, this attribute information comprises interval alloc of the original allocation of representing corresponding buffer zone and the interval len of physical length; The interval alloc of original allocation is expressed as: alloc (var)=[m Var, n Var]; The interval len of physical length is expressed as: len (var)=[x Var, y Var]; Wherein, var is the buffer zone variable; m Var, n Var, x Var, y VarBe respectively positive integer, m Var, n VarThe lower limit and the upper limit of the interval alloc of expression original allocation; x Var, y VarThe lower limit and the upper limit of the interval len of expression physical length.
In this example, the attribute information of buffer zone test is expressed as:
Alloc (test)=[2,2]; The interval len of physical length is expressed as: len (test)=[0,0];
Step 4, on the basis of step 3, source program statement to be analyzed that will be relevant with buffer zone and function call are abstract to be operation to the buffer zone attribute information, promptly interval alloc of original allocation and the interval len of physical length is upgraded.
In this example, the attribute information of buffer zone test is updated.The attribute information of buffer zone test after the renewal is expressed as:
Alloc (test)=[2,2]; The interval len of physical length is expressed as: len (test)=[5,5];
Step 5, on the basis of step 4, judge the state of each buffer zone:
For buffer zone test:n Var≤ x Var, therefore judge that this buffer zone is a precarious position, be certain to take place buffer zone and overflow, and make alarm.
The foregoing description is found, the ratio of eliminating the resulting abstract syntax tree node quantity in redundant front and back is 2280: 82, and by giving node or its child node that comprises " call_expr " again for change, guaranteed information program data stream, control stream information complete, can accurately make alarm to buffer-overflow vulnerability.
The above only is a preferred implementation of the present invention; should be understood that; for those skilled in the art; under the prerequisite that does not break away from the principle of the invention; can also make some improvement; perhaps part technical characterictic wherein is equal to replacement, these improvement and replace and also should be considered as protection scope of the present invention.

Claims (1)

1. buffer overflow vulnerability detection method based on the GCC abstract syntax tree, it is characterized in that: its concrete operations step is as follows:
Step 1, at source program to be analyzed, directly utilize the GCC compiler to generate abstract syntax tree AST;
Described source program to be analyzed is the C/C++ source program;
Step 2, on the basis of step 1, eliminate the redundant information among the abstract syntax tree AST; Be specially:
The 1st step: all nodes among the traversal abstract syntax tree AST are divided into 3 types according to the situation of " srcp " field in the node of abstract syntax tree AST with all nodes:
If a. the value of " srcp " field is a source filename, be useful node then with this vertex ticks;
If b. " srcp " field is not a source filename, be useless node then with this vertex ticks;
If c. do not contain " srcp " field in this node, be node undetermined then with this vertex ticks;
The 2nd step: travel through all nodes among the abstract syntax tree AST again, and all nodes that are marked as useful node and useless node are called father node; For each father node, search its each child node successively, and judge according to following rule:
A. if father node is useful node, its child node is a node undetermined, then this child node is labeled as useful node;
B. if father node is useful node, its child node is useless node, then this child node is labeled as useless node;
C. if father node is useless node, its child node is a node undetermined, then this child node is labeled as useless node;
D. if father node is useless node, its child node is useful node, then this child node is labeled as useful node;
Repeating this step is zero until the quantity of node undetermined;
The 3rd step: travel through all nodes among the abstract syntax tree AST again, if this node or its child node comprise " call_expr ", promptly this node or its child node comprise and call expression formula, and this node or its child node are marked as useless node, then this node and child node thereof are labeled as useful node;
Through the operation of step 2, the abstract syntax tree AST ' of the redundant information that can be eliminated;
Abstract syntax tree AST ' after the removal redundant information that step 3, traversal step two produce, to increase an interval right for each buffer zone, claim this interval to being the attribute information of corresponding buffer zone, this attribute information comprises interval alloc of the original allocation of representing corresponding buffer zone and the interval len of physical length; The interval alloc of original allocation is expressed as: alloc (var)=[m Var, n Var]; The interval len of physical length is expressed as: len (var)=[x Var, y Var]; Wherein, var is the buffer zone variable; m Var, n Var, x Var, y VarBe respectively positive integer, m Var, n VarThe lower limit and the upper limit of the interval alloc of expression original allocation; x Var, y VarThe lower limit and the upper limit of the interval len of expression physical length;
Step 4, on the basis of step 3, source program statement to be analyzed that will be relevant with buffer zone and function call are abstract to be operation to the buffer zone attribute information, promptly interval alloc of original allocation and the interval len of physical length is upgraded;
Step 5, on the basis of step 4, judge the state of each buffer zone: be specially:
A. if y Var≤ n Var, judge that then this buffer zone is a safe condition, buffer zone can not take place overflow;
B. if x Var≤ n Var≤ y Var, judge that then this buffer zone is a unsafe condition, buffer zone may take place overflow;
C. if n Var≤ x Var, judge that then this buffer zone is a precarious position, be certain to take place buffer zone and overflow;
To may be certain to take place the situation that buffer zone overflows and make alarm;
Operation through above-mentioned steps can detect buffer-overflow vulnerability.
CN2010102409081A 2010-07-30 2010-07-30 GCC abstract syntax tree-based buffer overflow vulnerability detection method Expired - Fee Related CN101908006B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102409081A CN101908006B (en) 2010-07-30 2010-07-30 GCC abstract syntax tree-based buffer overflow vulnerability detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102409081A CN101908006B (en) 2010-07-30 2010-07-30 GCC abstract syntax tree-based buffer overflow vulnerability detection method

Publications (2)

Publication Number Publication Date
CN101908006A CN101908006A (en) 2010-12-08
CN101908006B true CN101908006B (en) 2011-12-14

Family

ID=43263470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102409081A Expired - Fee Related CN101908006B (en) 2010-07-30 2010-07-30 GCC abstract syntax tree-based buffer overflow vulnerability detection method

Country Status (1)

Country Link
CN (1) CN101908006B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129538B (en) * 2011-03-04 2013-05-08 北京邮电大学 System and method for detecting buffer overflow vulnerability of source code of sensor network
CN102662834B (en) * 2012-03-22 2014-09-03 中国电子科技集团公司第五十八研究所 Detection method for buffer overflow of reconstructed CoSy intermediate representation
CN103455759B (en) * 2012-06-05 2017-03-15 深圳市腾讯计算机系统有限公司 A kind of page Hole Detection device and detection method
CN103823694A (en) * 2014-02-10 2014-05-28 深圳市同洲电子股份有限公司 Method and device for updating script file
CN106155893B (en) * 2015-04-03 2021-03-02 腾讯科技(深圳)有限公司 Method for judging application program test coverage and program test equipment
CN109784048B (en) * 2018-12-12 2023-12-01 天航长鹰(江苏)科技有限公司 Method for detecting overflow vulnerability of stack buffer based on program diagram
CN110162474B (en) * 2019-05-10 2020-09-15 北京理工大学 Intelligent contract reentry vulnerability detection method based on abstract syntax tree
CN110928550B (en) * 2019-11-19 2023-11-24 上海工程技术大学 Method for eliminating GCC abstract syntax tree redundancy based on keyword Trie tree
CN111124414B (en) * 2019-12-02 2024-02-06 东巽科技(北京)有限公司 Abstract grammar tree word-taking method based on operation link

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0650884B2 (en) * 1988-02-02 1994-06-29 日本電気株式会社 Protocol data unit assembly / disassembly method
CN101482847B (en) * 2009-01-19 2011-06-29 北京邮电大学 Detection method based on safety bug defect mode

Also Published As

Publication number Publication date
CN101908006A (en) 2010-12-08

Similar Documents

Publication Publication Date Title
CN101908006B (en) GCC abstract syntax tree-based buffer overflow vulnerability detection method
US10664601B2 (en) Method and system automatic buffer overflow warning inspection and bug repair
CN102622558B (en) Excavating device and excavating method of binary system program loopholes
CN101710378B (en) Software security flaw detection method based on sequential pattern mining
Ray et al. The uniqueness of changes: Characteristics and applications
Song et al. Efficient alignment between event logs and process models
CN102651062B (en) System and method for tracking malicious behavior based on virtual machine architecture
CN105808369B (en) A kind of memory leakage detecting method based on semiology analysis
CN104766015B (en) A kind of buffer-overflow vulnerability dynamic testing method based on function call
Fatima et al. Flakify: A black-box, language model-based predictor for flaky tests
CN105787367A (en) Patch security detecting method and system for software update
Sinha et al. Predictive analysis for detecting serializability violations through trace segmentation
Xu et al. Experience mining Google's production console logs
CN103294596A (en) Early warning method for contract-type software fault based on program invariants
Tang et al. Compiler testing: a systematic literature analysis
CN101710303B (en) Memory leakage detecting method based on flow sensitivity and context sensitivity directing picture
Fu et al. An empirical study of the impact of log parsers on the performance of log-based anomaly detection
Meng et al. Predicting buffer overflow using semi-supervised learning
CN102662829B (en) Processing method and apparatus for complex data structure in code static state testing
Wang et al. Invariant based fault localization by analyzing error propagation
CN115080448A (en) Method and device for automatically detecting inaccessible path of software code
Yu et al. AdaptiveLock: efficient hybrid data race detection based on real-world locking patterns
Zhang et al. Quality assurance technologies of big data applications: A systematic literature review
Li et al. Automatically detecting integrity violations in database-centric applications
Zhang et al. A novel memory leak classification for evaluating the applicability of static analysis tools

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111214

Termination date: 20210730