CN105389195A - Static analysis tool improvement method based on code substitution and regular expression - Google Patents
Static analysis tool improvement method based on code substitution and regular expression Download PDFInfo
- Publication number
- CN105389195A CN105389195A CN201510707442.4A CN201510707442A CN105389195A CN 105389195 A CN105389195 A CN 105389195A CN 201510707442 A CN201510707442 A CN 201510707442A CN 105389195 A CN105389195 A CN 105389195A
- Authority
- CN
- China
- Prior art keywords
- code
- source code
- token
- regular expression
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
- Stored Programmes (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention discloses a static analysis tool improvement method based on code substitution and a regular expression. A static analysis tool adopts a preprocessing module to preprocess a source code so as to generate an intermediate code; a grammar analysis module is adopted to perform grammar analysis on the intermediate code so as to finally obtain a bidirectional token link table; a defect mode matching module is adopted to compare the bidirectional token link table with a defect mode to find a matching part in the bidirectional token link table, and process the matching part to obtain a static analysis result; when the source code is preprocessed in the preprocessing module, (i=i+1)-1 is substituted for i++, i=i+1 is substituted for ++i, (i=i-1)+1 is substituted for i--, and i=i-1 is substituted for --i; and a regular expression is added in the defect mode matching module as follows: if(%var%=%num%), if(%any%&%any%), scanf(%str%,%var%), wherein %var%, %num%,%any%,%any%,%str% and %var% are all variables. The static analysis tool improvement method based on the code substitution and the regular expression provided by the present invention can improve the accuracy of an integer overflow problem, reduce a misreporting rate and reduce manual inspection costs.
Description
Technical field
The invention belongs to computer programming field, be specifically related to a kind of static analysis tools based on code replacement and regular expression and improve one's methods.
Background technology
The treatment scheme of current static analysis tools can be divided into four parts: pretreatment module, syntax Analysis Module, defect mode matching module, defect mode processing module.First source code in project file to be detected or file obtains the intermediate code being convenient to detect through the process of pretreater.Intermediate code, through the process of syntax Analysis Module, generates abstract syntax tree.Then known defect mode and abstract syntax tree mate by defect mode matching module, detect defect wherein.The defect detected by the output of defect location and output module form format, according to the requirement generating run result of user.
Wherein the course of work of modules is specific as follows:
Pretreatment module mainly comprises three parts to the process of source code: travel through catalogue at different levels and obtain the to be detected listed files comprising source code, combination head file and source code file, carry out pre-service to macro definition.Wherein first step is optional, only just can enable when input is file.
The major function of syntax Analysis Module carries out grammatical analysis to the code after crossing through pretreater process, and the groundwork of syntax Analysis Module comprises lexical analysis, grammatical analysis and semantic analysis three parts.Intermediate code after treatment first cut into through lexical analysis be called as " token " one by one minimize significant unit, then the relation between difference " token " is set up by grammatical analysis, set up two-way " token " chained list, doubly linked list finally for " token " carries out semantic analysis, structure abstract syntax tree.
The major function of defect mode matching module is analyzed generating " token " doubly linked list after syntax Analysis Module process, by the number of drawbacks summing up out in commercial production and software development pattern and " token " bi-directional list are contrasted, find the part of wherein mating, and result is given defect processing module and process.
The major function of defect processing module is that the defect mode that matches for defect mode matching module and " token " information process, and generates user and can intuitively find out or the defect report of User Defined form.The groundwork of this module comprises 2 points, is wherein defect type, defective locations and defect correlated variables etc. that the defect mode that matches according to defect mode matching module and " token " information extraction are relevant.Another point is then need defect type, defective locations and the defect correlated variables that can extract previous step to process, and the form of specifying according to directly perceived or user, to carry out the generation of defect report, is analyzed the result that defect mode detects for user.
Existing static analysis tools relates to from increasing Shortcomings on the shaping overflow problem of reducing in detection, and rate of failing to report is higher, and the defect causing some serious can not Timeliness coverage, the stability of influential system.In addition, this kind of problem of the key entry of possible errors do not considered by current static analysis tools, this kind of problem is very common and cannot be compiled device and detect in daily programming, also hand inspection is not easy, so the inspection realizing this kind of problem by static analysis tools is necessary.
Summary of the invention
In view of this, the invention provides a kind of static analysis tools based on code replacement and regular expression to improve one's methods, the object of the invention is to utilize expression formula to replace the accuracy improving shaping overflow problem, reduce rate of failing to report, use regular expression to mate the code of the key entry of possible errors simultaneously, reduce because the impact that code brings is wrongly write in carelessness, reduce the cost of hand inspection.
In order to achieve the above object, technical scheme of the present invention is: static analysis tools comprises pretreatment module, grammatical analysis and defect mode matching module, and pretreatment module is used for carrying out pre-service to source code, generates intermediate code; Syntax Analysis Module is used for carrying out grammatical analysis to intermediate code and finally obtains two-way token chained list; Defect mode matching module is used for two-way token chained list and defect mode to contrast, and finds the part of wherein mating, and process obtains staticaanalysis results; It is characterized in that, when pre-service being carried out to source code in pretreatment module, to the increment operator i++ occurred in source code, ++ i with from reducing i--,--i replaces as follows: i++ is replaced with (i=i+1)-1, will ++ i replaces with i=i+1, i--is changed to (i=i-1)+1, will--i replaces with i=i-1; Wherein i is increment operator or the variable from reducing.
Following regular expression is increased: if (%var%=%num%) in defect mode matching module; If (%any% & %any%); Scanf (%str%, %var%);
Wherein %var%, %num%, %any%, %any%, %str% and %var% are variable; If is the if statement in source code, and scanf is the scanf statement in source code.
Further, pretreatment module receives the file comprising source code of outside input, first the header file in file and source code are combined, then to the increment operator i++ occurred in source code, ++ i with from reducing i--,--after i replaces it, again pre-service is carried out to macro definition, finally obtain intermediate code.
Further, syntax Analysis Module comprises the steps: first to carry out lexical analysis to intermediate code, it is the two-way token chained list being used for the source code be made up of character string sequence division establishment one to store token, then word for word save land each character read in source code, and be divided into the minimized unit token with clear and definite implication complete one by one, then all token are added in two-way token chained list by the order occurred in source code according to token.
Further, defect mode is included in the number of drawbacks pattern summing up out in commercial production and software development.
Beneficial effect:
The object of the invention is to utilize expression formula to replace the accuracy improving shaping overflow problem, reduce rate of failing to report, use regular expression to mate the code of the key entry of possible errors simultaneously, reduce because the impact that code brings is wrongly write in carelessness, reduce the cost of hand inspection.
Embodiment
Below for embodiment, describe the present invention.
Step (1), pre-service
The major function of pretreatment module is that the source code in the project file or project file folder of specifying for user carries out pre-service, generates and is convenient to the intermediate code of carrying out grammatical analysis.
Pretreatment module mainly comprises four parts to the process of source code: travel through catalogue at different levels and obtain the to be detected listed files comprising source code, combination head file and source code file, replacing, carrying out pre-service to macro definition to from increasing from reducing.Wherein first step is optional, only just can enable when input is file.The alternative of the 3rd step is as follows:
i++=>((i=i+1)-1)
++i=>(i=i+1)
i--=>((i=i-1)+1)
--i=>(i=i-1)
Replace with ((i=i+1)-1) for i++ to illustrate, suppose i=3, i++ represents and first uses i, allow i add one again, if namely have assignment statement a=i++, then perform a=3 after this statement, i=4, see a=((i=i+1)-1) this formula again, first process (i=i+1), now the value of i is 4, afterwards this value is subtracted an assignment to a, i.e. a=3, now not to i assignment, so the value of i does not become 4.So the effect of two formulas is the same.
And original method is when the change of recording certain variable, to the change that certainly adding from subtract not record corresponding value of shape as i++ and so on, and can record for shape such as the normal plus-minus of i+1, therefore can detect.
Step (2), grammatical analysis
Analyze for program's source code, the first step is exactly lexical analysis.The function of lexical analysis the source code be made up of character string sequence is divided into the minimized unit with clear and definite implication complete one by one, namely " token ".
First lexical analysis part can create one for storing the doubly linked list of " token ", then word for word save land each character read in source code, and to " token " that be divided into one by one, then they are added in doubly linked list for grammatical analysis by the order occurred in source code according to " token ".
Can parsing process be specially: first judge open file, then start to perform circulation.Whether cycle criterion current character is EOF, if EOF, and so program end of run.If not EOF, read one piece of data buffer zone and also start new cycle criterion character, the end loop when character is " n " (newline).First judging in circulation whether character is letter, if so, so turns key word and identifier process, is not continue to judge whether character is numeral, if so, forwards digital processing part to.If not, continue to judge whether character is operational symbol, if so, forwards digital processing part to.If not, continue circulation, read in character late.When reading during " n " (newline), jump to and judge that end of file part continues circulation.
Carry out grammatical analysis afterwards, orderly " token " sequence set is synthesized an acceptable expression formula by the major function of grammatical analysis.In this process, usually recursively define the context-free grammer of expression formula ingredient with reference to one, a series of character expression with permanent order is finally combined into an expression formula.While carrying out grammatical analysis, program is directly analyzed the doubly linked list of " token ", and here, program structure goes out an abstract syntax tree, and this abstract syntax tree is logically equivalent to original code.After constructing abstract syntax tree, program is that " token " adds corresponding attribute according to the pass between " token " of each node of this abstract syntax tree of formation, the analysis after being convenient to.
Step (3), defect mode mate
First, construct a basic inspection class, for all inspection classes, all must inherit and check class from this, and realize the Virtual Function as interface that wherein defines.
Afterwards, the example produced after all inspection class instantiations is placed in a list being used for checking, " token " doubly linked list then produced grammatical analysis carries out the coupling of defect mode one by one.Regular expression wherein for the key entry defect of mating possible errors is as follows:
if(%var%=%num%)
if(%any%&%any%)
scanf(%str%,%var%)
Finally, for all defect modes matched, the relevant informations such as defect mode and " token " that be checked through are passed to ensuing defect processing module, is carried out the Formatting Output of defect by defect processing module.
To sum up, these are only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (4)
1. improve one's methods based on the static analysis tools of code replacement and regular expression for one kind, described static analysis tools comprises pretreatment module, grammatical analysis and defect mode matching module, described pretreatment module is used for carrying out pre-service to source code, generates intermediate code; Described syntax Analysis Module is used for carrying out grammatical analysis to intermediate code and finally obtains two-way token chained list; Described defect mode matching module is used for described two-way token chained list and defect mode to contrast, and finds the part of wherein mating, and process obtains staticaanalysis results; It is characterized in that, when pre-service being carried out to source code in described pretreatment module, to the increment operator i++ occurred in source code, ++ i with from reducing i--,--i replaces as follows: i++ is replaced with (i=i+1)-1, will ++ i replaces with i=i+1, i--is changed to (i=i-1)+1, will--i replaces with i=i-1; Wherein i is increment operator or the variable from reducing;
Following regular expression is increased: if (%var%=%num%) in described defect mode matching module; If (%any% & %any%); Scanf (%str%, %var%);
Wherein %var%, %num%, %any%, %any%, %str% and %var% are variable; If is the if statement in source code, and scanf is the scanf statement in source code.
2. a kind of static analysis tools based on code replacement and regular expression is improved one's methods as claimed in claim 1, it is characterized in that, described pretreatment module receives the file comprising source code of outside input, first the header file in file and source code are combined, then to the increment operator i++ occurred in source code, ++ i with from reducing i--,--after i replaces it, again pre-service is carried out to macro definition, finally obtain intermediate code.
3. a kind of static analysis tools based on code replacement and regular expression is improved one's methods as claimed in claim 1 or 2, it is characterized in that, described syntax Analysis Module comprises the steps: first to carry out lexical analysis to intermediate code, it is the two-way token chained list being used for the source code be made up of character string sequence division establishment one to store token, then word for word save land each character read in source code, and be divided into the minimized unit token with clear and definite implication complete one by one, then all token are added in two-way token chained list by the order occurred in source code according to token.
4. a kind of static analysis tools based on code replacement and regular expression is improved one's methods as claimed in claim 1 or 2, and it is characterized in that, described defect mode is included in the number of drawbacks pattern summing up out in commercial production and software development.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510707442.4A CN105389195B (en) | 2015-10-27 | 2015-10-27 | A kind of static analysis tools improved method replaced based on code with regular expression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510707442.4A CN105389195B (en) | 2015-10-27 | 2015-10-27 | A kind of static analysis tools improved method replaced based on code with regular expression |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105389195A true CN105389195A (en) | 2016-03-09 |
CN105389195B CN105389195B (en) | 2018-08-10 |
Family
ID=55421502
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510707442.4A Active CN105389195B (en) | 2015-10-27 | 2015-10-27 | A kind of static analysis tools improved method replaced based on code with regular expression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105389195B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107908405A (en) * | 2017-11-17 | 2018-04-13 | 苏州蜗牛数字科技股份有限公司 | The static examination & verification device and method of code |
CN108062474A (en) * | 2016-11-08 | 2018-05-22 | 阿里巴巴集团控股有限公司 | The detection method and device of file |
WO2018232767A1 (en) * | 2017-06-24 | 2018-12-27 | 拜椰特(上海)软件技术有限公司 | Lexical analysis tool |
CN109542555A (en) * | 2018-10-26 | 2019-03-29 | 深圳点猫科技有限公司 | A kind of international programming implementation method of realization educational applications and device |
CN109582567A (en) * | 2018-11-07 | 2019-04-05 | 深圳竹云科技有限公司 | A kind of software defect mode research method based on static analysis |
CN112733153A (en) * | 2021-01-27 | 2021-04-30 | 腾讯科技(深圳)有限公司 | Source code scanning method and device, electronic equipment and storage medium |
CN113778852A (en) * | 2021-06-04 | 2021-12-10 | 南方科技大学 | Code analysis method based on regular expression |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102955914A (en) * | 2011-08-19 | 2013-03-06 | 百度在线网络技术(北京)有限公司 | Method and device for detecting security flaws of source files |
CN102968367A (en) * | 2012-08-28 | 2013-03-13 | 华南理工大学 | Static detection method on basis of embedded software and system thereof |
CN104298594A (en) * | 2014-09-25 | 2015-01-21 | 南京航空航天大学 | Automatic detection and positioning method for source code mid-value miscalculation |
-
2015
- 2015-10-27 CN CN201510707442.4A patent/CN105389195B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102955914A (en) * | 2011-08-19 | 2013-03-06 | 百度在线网络技术(北京)有限公司 | Method and device for detecting security flaws of source files |
CN102968367A (en) * | 2012-08-28 | 2013-03-13 | 华南理工大学 | Static detection method on basis of embedded software and system thereof |
CN104298594A (en) * | 2014-09-25 | 2015-01-21 | 南京航空航天大学 | Automatic detection and positioning method for source code mid-value miscalculation |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062474A (en) * | 2016-11-08 | 2018-05-22 | 阿里巴巴集团控股有限公司 | The detection method and device of file |
WO2018232767A1 (en) * | 2017-06-24 | 2018-12-27 | 拜椰特(上海)软件技术有限公司 | Lexical analysis tool |
CN107908405A (en) * | 2017-11-17 | 2018-04-13 | 苏州蜗牛数字科技股份有限公司 | The static examination & verification device and method of code |
CN109542555A (en) * | 2018-10-26 | 2019-03-29 | 深圳点猫科技有限公司 | A kind of international programming implementation method of realization educational applications and device |
CN109582567A (en) * | 2018-11-07 | 2019-04-05 | 深圳竹云科技有限公司 | A kind of software defect mode research method based on static analysis |
CN112733153A (en) * | 2021-01-27 | 2021-04-30 | 腾讯科技(深圳)有限公司 | Source code scanning method and device, electronic equipment and storage medium |
CN113778852A (en) * | 2021-06-04 | 2021-12-10 | 南方科技大学 | Code analysis method based on regular expression |
CN113778852B (en) * | 2021-06-04 | 2023-07-28 | 南方科技大学 | Code analysis method based on regular expression |
Also Published As
Publication number | Publication date |
---|---|
CN105389195B (en) | 2018-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105389195A (en) | Static analysis tool improvement method based on code substitution and regular expression | |
CN106980637B (en) | SQL checking method and device | |
CN101359351B (en) | Multilayer semantic annotation and detection method against malignancy | |
CN109670022A (en) | A kind of java application interface use pattern recommended method based on semantic similarity | |
CN103177120B (en) | A kind of XPath query pattern tree matching method based on index | |
CN110147235B (en) | Semantic comparison method and device between source code and binary code | |
CN102360336B (en) | Automatic testing system based on grammatical rules and method | |
CN101499063A (en) | Tracing-based database schema evolution method and system | |
Sumner et al. | Memory indexing: Canonicalizing addresses across executions | |
CN102147726B (en) | Script-based method for implementing service configuration | |
CN103914379B (en) | Fault is automatically injected the method with fault detect and system thereof | |
CN103176905B (en) | A kind of Defect Correlation method and device | |
US20100153430A1 (en) | Method of and Apparatus for Extraction and Analysis of Macro Operations within Query Language Statement | |
CN108920140A (en) | Method of calibration is unified in a kind of front and back end | |
CN106850531A (en) | A kind of protocol code generation method based on template | |
CN105260223B (en) | A kind of SCPI command definitions, the method for parsing, execution and test | |
US9436664B2 (en) | Performing multiple scope based search and replace within a document | |
Zhong et al. | Neural program repair: Systems, challenges and solutions | |
Solanki et al. | Comparative study of software clone detection techniques | |
Ge et al. | Keywords guided method name generation | |
Zhang et al. | Query-based filtering and graphical view generation for clone analysis | |
CN105653669B (en) | Hypertext markup language generation method and device | |
CN105302547A (en) | Fault injection method for Verilog HDL design | |
Kats et al. | Providing rapid feedback in generated modular language environments: adding error recovery to scannerless generalized-LR parsing | |
Tomassetti et al. | JavaParser |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |