CN105389195A - Static analysis tool improvement method based on code substitution and regular expression - Google Patents

Static analysis tool improvement method based on code substitution and regular expression Download PDF

Info

Publication number
CN105389195A
CN105389195A CN201510707442.4A CN201510707442A CN105389195A CN 105389195 A CN105389195 A CN 105389195A CN 201510707442 A CN201510707442 A CN 201510707442A CN 105389195 A CN105389195 A CN 105389195A
Authority
CN
China
Prior art keywords
code
source code
token
regular expression
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510707442.4A
Other languages
Chinese (zh)
Other versions
CN105389195B (en
Inventor
胡昌振
单纯
于泽群
蔡弘非
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201510707442.4A priority Critical patent/CN105389195B/en
Publication of CN105389195A publication Critical patent/CN105389195A/en
Application granted granted Critical
Publication of CN105389195B publication Critical patent/CN105389195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Stored Programmes (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention discloses a static analysis tool improvement method based on code substitution and a regular expression. A static analysis tool adopts a preprocessing module to preprocess a source code so as to generate an intermediate code; a grammar analysis module is adopted to perform grammar analysis on the intermediate code so as to finally obtain a bidirectional token link table; a defect mode matching module is adopted to compare the bidirectional token link table with a defect mode to find a matching part in the bidirectional token link table, and process the matching part to obtain a static analysis result; when the source code is preprocessed in the preprocessing module, (i=i+1)-1 is substituted for i++, i=i+1 is substituted for ++i, (i=i-1)+1 is substituted for i--, and i=i-1 is substituted for --i; and a regular expression is added in the defect mode matching module as follows: if(%var%=%num%), if(%any%&%any%), scanf(%str%,%var%), wherein %var%, %num%,%any%,%any%,%str% and %var% are all variables. The static analysis tool improvement method based on the code substitution and the regular expression provided by the present invention can improve the accuracy of an integer overflow problem, reduce a misreporting rate and reduce manual inspection costs.

Description

A kind of static analysis tools based on code replacement and regular expression is improved one's methods
Technical field
The invention belongs to computer programming field, be specifically related to a kind of static analysis tools based on code replacement and regular expression and improve one's methods.
Background technology
The treatment scheme of current static analysis tools can be divided into four parts: pretreatment module, syntax Analysis Module, defect mode matching module, defect mode processing module.First source code in project file to be detected or file obtains the intermediate code being convenient to detect through the process of pretreater.Intermediate code, through the process of syntax Analysis Module, generates abstract syntax tree.Then known defect mode and abstract syntax tree mate by defect mode matching module, detect defect wherein.The defect detected by the output of defect location and output module form format, according to the requirement generating run result of user.
Wherein the course of work of modules is specific as follows:
Pretreatment module mainly comprises three parts to the process of source code: travel through catalogue at different levels and obtain the to be detected listed files comprising source code, combination head file and source code file, carry out pre-service to macro definition.Wherein first step is optional, only just can enable when input is file.
The major function of syntax Analysis Module carries out grammatical analysis to the code after crossing through pretreater process, and the groundwork of syntax Analysis Module comprises lexical analysis, grammatical analysis and semantic analysis three parts.Intermediate code after treatment first cut into through lexical analysis be called as " token " one by one minimize significant unit, then the relation between difference " token " is set up by grammatical analysis, set up two-way " token " chained list, doubly linked list finally for " token " carries out semantic analysis, structure abstract syntax tree.
The major function of defect mode matching module is analyzed generating " token " doubly linked list after syntax Analysis Module process, by the number of drawbacks summing up out in commercial production and software development pattern and " token " bi-directional list are contrasted, find the part of wherein mating, and result is given defect processing module and process.
The major function of defect processing module is that the defect mode that matches for defect mode matching module and " token " information process, and generates user and can intuitively find out or the defect report of User Defined form.The groundwork of this module comprises 2 points, is wherein defect type, defective locations and defect correlated variables etc. that the defect mode that matches according to defect mode matching module and " token " information extraction are relevant.Another point is then need defect type, defective locations and the defect correlated variables that can extract previous step to process, and the form of specifying according to directly perceived or user, to carry out the generation of defect report, is analyzed the result that defect mode detects for user.
Existing static analysis tools relates to from increasing Shortcomings on the shaping overflow problem of reducing in detection, and rate of failing to report is higher, and the defect causing some serious can not Timeliness coverage, the stability of influential system.In addition, this kind of problem of the key entry of possible errors do not considered by current static analysis tools, this kind of problem is very common and cannot be compiled device and detect in daily programming, also hand inspection is not easy, so the inspection realizing this kind of problem by static analysis tools is necessary.
Summary of the invention
In view of this, the invention provides a kind of static analysis tools based on code replacement and regular expression to improve one's methods, the object of the invention is to utilize expression formula to replace the accuracy improving shaping overflow problem, reduce rate of failing to report, use regular expression to mate the code of the key entry of possible errors simultaneously, reduce because the impact that code brings is wrongly write in carelessness, reduce the cost of hand inspection.
In order to achieve the above object, technical scheme of the present invention is: static analysis tools comprises pretreatment module, grammatical analysis and defect mode matching module, and pretreatment module is used for carrying out pre-service to source code, generates intermediate code; Syntax Analysis Module is used for carrying out grammatical analysis to intermediate code and finally obtains two-way token chained list; Defect mode matching module is used for two-way token chained list and defect mode to contrast, and finds the part of wherein mating, and process obtains staticaanalysis results; It is characterized in that, when pre-service being carried out to source code in pretreatment module, to the increment operator i++ occurred in source code, ++ i with from reducing i--,--i replaces as follows: i++ is replaced with (i=i+1)-1, will ++ i replaces with i=i+1, i--is changed to (i=i-1)+1, will--i replaces with i=i-1; Wherein i is increment operator or the variable from reducing.
Following regular expression is increased: if (%var%=%num%) in defect mode matching module; If (%any% & %any%); Scanf (%str%, %var%);
Wherein %var%, %num%, %any%, %any%, %str% and %var% are variable; If is the if statement in source code, and scanf is the scanf statement in source code.
Further, pretreatment module receives the file comprising source code of outside input, first the header file in file and source code are combined, then to the increment operator i++ occurred in source code, ++ i with from reducing i--,--after i replaces it, again pre-service is carried out to macro definition, finally obtain intermediate code.
Further, syntax Analysis Module comprises the steps: first to carry out lexical analysis to intermediate code, it is the two-way token chained list being used for the source code be made up of character string sequence division establishment one to store token, then word for word save land each character read in source code, and be divided into the minimized unit token with clear and definite implication complete one by one, then all token are added in two-way token chained list by the order occurred in source code according to token.
Further, defect mode is included in the number of drawbacks pattern summing up out in commercial production and software development.
Beneficial effect:
The object of the invention is to utilize expression formula to replace the accuracy improving shaping overflow problem, reduce rate of failing to report, use regular expression to mate the code of the key entry of possible errors simultaneously, reduce because the impact that code brings is wrongly write in carelessness, reduce the cost of hand inspection.
Embodiment
Below for embodiment, describe the present invention.
Step (1), pre-service
The major function of pretreatment module is that the source code in the project file or project file folder of specifying for user carries out pre-service, generates and is convenient to the intermediate code of carrying out grammatical analysis.
Pretreatment module mainly comprises four parts to the process of source code: travel through catalogue at different levels and obtain the to be detected listed files comprising source code, combination head file and source code file, replacing, carrying out pre-service to macro definition to from increasing from reducing.Wherein first step is optional, only just can enable when input is file.The alternative of the 3rd step is as follows:
i++=>((i=i+1)-1)
++i=>(i=i+1)
i--=>((i=i-1)+1)
--i=>(i=i-1)
Replace with ((i=i+1)-1) for i++ to illustrate, suppose i=3, i++ represents and first uses i, allow i add one again, if namely have assignment statement a=i++, then perform a=3 after this statement, i=4, see a=((i=i+1)-1) this formula again, first process (i=i+1), now the value of i is 4, afterwards this value is subtracted an assignment to a, i.e. a=3, now not to i assignment, so the value of i does not become 4.So the effect of two formulas is the same.
And original method is when the change of recording certain variable, to the change that certainly adding from subtract not record corresponding value of shape as i++ and so on, and can record for shape such as the normal plus-minus of i+1, therefore can detect.
Step (2), grammatical analysis
Analyze for program's source code, the first step is exactly lexical analysis.The function of lexical analysis the source code be made up of character string sequence is divided into the minimized unit with clear and definite implication complete one by one, namely " token ".
First lexical analysis part can create one for storing the doubly linked list of " token ", then word for word save land each character read in source code, and to " token " that be divided into one by one, then they are added in doubly linked list for grammatical analysis by the order occurred in source code according to " token ".
Can parsing process be specially: first judge open file, then start to perform circulation.Whether cycle criterion current character is EOF, if EOF, and so program end of run.If not EOF, read one piece of data buffer zone and also start new cycle criterion character, the end loop when character is " n " (newline).First judging in circulation whether character is letter, if so, so turns key word and identifier process, is not continue to judge whether character is numeral, if so, forwards digital processing part to.If not, continue to judge whether character is operational symbol, if so, forwards digital processing part to.If not, continue circulation, read in character late.When reading during " n " (newline), jump to and judge that end of file part continues circulation.
Carry out grammatical analysis afterwards, orderly " token " sequence set is synthesized an acceptable expression formula by the major function of grammatical analysis.In this process, usually recursively define the context-free grammer of expression formula ingredient with reference to one, a series of character expression with permanent order is finally combined into an expression formula.While carrying out grammatical analysis, program is directly analyzed the doubly linked list of " token ", and here, program structure goes out an abstract syntax tree, and this abstract syntax tree is logically equivalent to original code.After constructing abstract syntax tree, program is that " token " adds corresponding attribute according to the pass between " token " of each node of this abstract syntax tree of formation, the analysis after being convenient to.
Step (3), defect mode mate
First, construct a basic inspection class, for all inspection classes, all must inherit and check class from this, and realize the Virtual Function as interface that wherein defines.
Afterwards, the example produced after all inspection class instantiations is placed in a list being used for checking, " token " doubly linked list then produced grammatical analysis carries out the coupling of defect mode one by one.Regular expression wherein for the key entry defect of mating possible errors is as follows:
if(%var%=%num%)
if(%any%&%any%)
scanf(%str%,%var%)
Finally, for all defect modes matched, the relevant informations such as defect mode and " token " that be checked through are passed to ensuing defect processing module, is carried out the Formatting Output of defect by defect processing module.
To sum up, these are only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (4)

1. improve one's methods based on the static analysis tools of code replacement and regular expression for one kind, described static analysis tools comprises pretreatment module, grammatical analysis and defect mode matching module, described pretreatment module is used for carrying out pre-service to source code, generates intermediate code; Described syntax Analysis Module is used for carrying out grammatical analysis to intermediate code and finally obtains two-way token chained list; Described defect mode matching module is used for described two-way token chained list and defect mode to contrast, and finds the part of wherein mating, and process obtains staticaanalysis results; It is characterized in that, when pre-service being carried out to source code in described pretreatment module, to the increment operator i++ occurred in source code, ++ i with from reducing i--,--i replaces as follows: i++ is replaced with (i=i+1)-1, will ++ i replaces with i=i+1, i--is changed to (i=i-1)+1, will--i replaces with i=i-1; Wherein i is increment operator or the variable from reducing;
Following regular expression is increased: if (%var%=%num%) in described defect mode matching module; If (%any% & %any%); Scanf (%str%, %var%);
Wherein %var%, %num%, %any%, %any%, %str% and %var% are variable; If is the if statement in source code, and scanf is the scanf statement in source code.
2. a kind of static analysis tools based on code replacement and regular expression is improved one's methods as claimed in claim 1, it is characterized in that, described pretreatment module receives the file comprising source code of outside input, first the header file in file and source code are combined, then to the increment operator i++ occurred in source code, ++ i with from reducing i--,--after i replaces it, again pre-service is carried out to macro definition, finally obtain intermediate code.
3. a kind of static analysis tools based on code replacement and regular expression is improved one's methods as claimed in claim 1 or 2, it is characterized in that, described syntax Analysis Module comprises the steps: first to carry out lexical analysis to intermediate code, it is the two-way token chained list being used for the source code be made up of character string sequence division establishment one to store token, then word for word save land each character read in source code, and be divided into the minimized unit token with clear and definite implication complete one by one, then all token are added in two-way token chained list by the order occurred in source code according to token.
4. a kind of static analysis tools based on code replacement and regular expression is improved one's methods as claimed in claim 1 or 2, and it is characterized in that, described defect mode is included in the number of drawbacks pattern summing up out in commercial production and software development.
CN201510707442.4A 2015-10-27 2015-10-27 A kind of static analysis tools improved method replaced based on code with regular expression Active CN105389195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510707442.4A CN105389195B (en) 2015-10-27 2015-10-27 A kind of static analysis tools improved method replaced based on code with regular expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510707442.4A CN105389195B (en) 2015-10-27 2015-10-27 A kind of static analysis tools improved method replaced based on code with regular expression

Publications (2)

Publication Number Publication Date
CN105389195A true CN105389195A (en) 2016-03-09
CN105389195B CN105389195B (en) 2018-08-10

Family

ID=55421502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510707442.4A Active CN105389195B (en) 2015-10-27 2015-10-27 A kind of static analysis tools improved method replaced based on code with regular expression

Country Status (1)

Country Link
CN (1) CN105389195B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908405A (en) * 2017-11-17 2018-04-13 苏州蜗牛数字科技股份有限公司 The static examination & verification device and method of code
CN108062474A (en) * 2016-11-08 2018-05-22 阿里巴巴集团控股有限公司 The detection method and device of file
WO2018232767A1 (en) * 2017-06-24 2018-12-27 拜椰特(上海)软件技术有限公司 Lexical analysis tool
CN109542555A (en) * 2018-10-26 2019-03-29 深圳点猫科技有限公司 A kind of international programming implementation method of realization educational applications and device
CN109582567A (en) * 2018-11-07 2019-04-05 深圳竹云科技有限公司 A kind of software defect mode research method based on static analysis
CN112733153A (en) * 2021-01-27 2021-04-30 腾讯科技(深圳)有限公司 Source code scanning method and device, electronic equipment and storage medium
CN113778852A (en) * 2021-06-04 2021-12-10 南方科技大学 Code analysis method based on regular expression

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955914A (en) * 2011-08-19 2013-03-06 百度在线网络技术(北京)有限公司 Method and device for detecting security flaws of source files
CN102968367A (en) * 2012-08-28 2013-03-13 华南理工大学 Static detection method on basis of embedded software and system thereof
CN104298594A (en) * 2014-09-25 2015-01-21 南京航空航天大学 Automatic detection and positioning method for source code mid-value miscalculation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955914A (en) * 2011-08-19 2013-03-06 百度在线网络技术(北京)有限公司 Method and device for detecting security flaws of source files
CN102968367A (en) * 2012-08-28 2013-03-13 华南理工大学 Static detection method on basis of embedded software and system thereof
CN104298594A (en) * 2014-09-25 2015-01-21 南京航空航天大学 Automatic detection and positioning method for source code mid-value miscalculation

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062474A (en) * 2016-11-08 2018-05-22 阿里巴巴集团控股有限公司 The detection method and device of file
WO2018232767A1 (en) * 2017-06-24 2018-12-27 拜椰特(上海)软件技术有限公司 Lexical analysis tool
CN107908405A (en) * 2017-11-17 2018-04-13 苏州蜗牛数字科技股份有限公司 The static examination & verification device and method of code
CN109542555A (en) * 2018-10-26 2019-03-29 深圳点猫科技有限公司 A kind of international programming implementation method of realization educational applications and device
CN109582567A (en) * 2018-11-07 2019-04-05 深圳竹云科技有限公司 A kind of software defect mode research method based on static analysis
CN112733153A (en) * 2021-01-27 2021-04-30 腾讯科技(深圳)有限公司 Source code scanning method and device, electronic equipment and storage medium
CN113778852A (en) * 2021-06-04 2021-12-10 南方科技大学 Code analysis method based on regular expression
CN113778852B (en) * 2021-06-04 2023-07-28 南方科技大学 Code analysis method based on regular expression

Also Published As

Publication number Publication date
CN105389195B (en) 2018-08-10

Similar Documents

Publication Publication Date Title
CN105389195A (en) Static analysis tool improvement method based on code substitution and regular expression
CN106980637B (en) SQL checking method and device
CN101359351B (en) Multilayer semantic annotation and detection method against malignancy
CN109670022A (en) A kind of java application interface use pattern recommended method based on semantic similarity
CN103177120B (en) A kind of XPath query pattern tree matching method based on index
CN110147235B (en) Semantic comparison method and device between source code and binary code
CN102360336B (en) Automatic testing system based on grammatical rules and method
CN101499063A (en) Tracing-based database schema evolution method and system
Sumner et al. Memory indexing: Canonicalizing addresses across executions
CN102147726B (en) Script-based method for implementing service configuration
CN103914379B (en) Fault is automatically injected the method with fault detect and system thereof
CN103176905B (en) A kind of Defect Correlation method and device
US20100153430A1 (en) Method of and Apparatus for Extraction and Analysis of Macro Operations within Query Language Statement
CN108920140A (en) Method of calibration is unified in a kind of front and back end
CN106850531A (en) A kind of protocol code generation method based on template
CN105260223B (en) A kind of SCPI command definitions, the method for parsing, execution and test
US9436664B2 (en) Performing multiple scope based search and replace within a document
Zhong et al. Neural program repair: Systems, challenges and solutions
Solanki et al. Comparative study of software clone detection techniques
Ge et al. Keywords guided method name generation
Zhang et al. Query-based filtering and graphical view generation for clone analysis
CN105653669B (en) Hypertext markup language generation method and device
CN105302547A (en) Fault injection method for Verilog HDL design
Kats et al. Providing rapid feedback in generated modular language environments: adding error recovery to scannerless generalized-LR parsing
Tomassetti et al. JavaParser

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant