CN110011986A - A kind of source code leak detection method based on deep learning - Google Patents

A kind of source code leak detection method based on deep learning Download PDF

Info

Publication number
CN110011986A
CN110011986A CN201910214764.3A CN201910214764A CN110011986A CN 110011986 A CN110011986 A CN 110011986A CN 201910214764 A CN201910214764 A CN 201910214764A CN 110011986 A CN110011986 A CN 110011986A
Authority
CN
China
Prior art keywords
function
code
vector
deep learning
source code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910214764.3A
Other languages
Chinese (zh)
Other versions
CN110011986B (en
Inventor
金舒原
吴跃隆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201910214764.3A priority Critical patent/CN110011986B/en
Publication of CN110011986A publication Critical patent/CN110011986A/en
Application granted granted Critical
Publication of CN110011986B publication Critical patent/CN110011986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Machine Translation (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention proposes a kind of source code leak detection method based on deep learning, based on deep learning, it is automatically performed the feature extraction of source code, and the source code feature for combining code metric index and extracting automatically, constructs Hole Detection model using random deep woods algorithm.The present invention provides a kind of, and the source code leak detection method based on deep learning has higher the degree of automation, reduces the dependence to domain-specialist knowledge, greatlys save code audit cost, improve the efficiency of code audit.And compared to other methods for carrying out Hole Detection using deep learning, a variety of expression bigizationner ground of this method combination code retains the syntactic and semantic information of code, the feature for enabling deep learning algorithm to extract automatically preferably portrays code, feature in combination with common code metric index as detection, further increases detection effect.

Description

A kind of source code leak detection method based on deep learning
Technical field
The present invention relates to technical field of network security, more particularly, to a kind of source code loophole based on deep learning Detection method.
Background technique
Under the high environment of the current level of informatization, the every aspect of people's life is all ceased with miscellaneous software It is related.In daily life, people are communicated by instant communication software, carry out shopping online by shopping software, and It completes to pay using payment software;And software is similarly played an important role in various tissues, such as the financial system, each of school Self-help serving system, the data base management system in enterprise etc. in kind mechanism.And due in software design, realization and use Mistake that may be present, most of software are all inevitably present loophole.The loophole of software is once utilized by criminal, no Only the interests of directly damage software user can also influence the interests of software supplier indirectly, therefore software supplier is past Toward the code audit that can put into great cost progress software to reduce software loophole that may be present, the safety of software is improved Property.
Hole Detection tool usually can be used in code audit and carry out quick loophole positioning, improve audit efficiency. The leak detection method of software can be divided into the method for static analysis and the method for dynamic analysis according to whether executing program. The method of static analysis mainly passes through the syntactic and semantic information determining program of analysis program with the presence or absence of loophole, is mainly used for point Analyse the source code of program.Main problem existing for existing more mature Static Analysis Method is heavy dependence domain expert Knowledge needs domain expert to expend considerable time and effort and analyzes the source code of program, and rate of failing to report and wrong report Rate is relatively high;The method of dynamic analysis then passes through the information determining program generated in analysis program process with the presence or absence of leakage Hole, commonly used in analysis executable file.The dynamically analyzing of program of mainstream has a stain analysis and semiology analysis, the stain point of program Analysis can not cover all execution routes of program, also need domain expert and expend considerable time and effort analysis program and leakage Report rate is high, and semiology analysis is by inputting symbolism for program, program executes formulation, can theoretically cover all execution roads Diameter, but it is difficult to practical application greatly due to solving expense.
Summary of the invention
Existing leak detection method rate of failing to report is high, heavy dependence domain-specialist knowledge aiming at the problem that, the present invention proposes A kind of source code leak detection method based on deep learning, the technical solution adopted by the present invention is that:
A kind of source code leak detection method based on deep learning, comprising the following steps:
S1. the code metric index of function in source code file is calculated, and is integrated into a code metric vector Vcm
S2. using the function in source code as basic unit, the automatic pumping of Function feature is completed using deep learning method It takes;Extract abstract syntax tree (AbstractSyntaxTree, AST), controlling stream graph (the Control Flow in source code Graph, CFG), program dependency graph (ProgramDependencyGraph, PDG);
S3. by vector Vcm、Vast、Vcfg、VpdgIt is merged into a vector VfAs the feature vector of function, by the spy of function Levy vector VfTraining in random deep woods algorithm, which is input to, with the label of function obtains final Hole Detection model M (Vf);
S4. for the function in source code to be detected, four vector V are obtained by S1 and S2cm、Vast、Vcfg、Vpdg, spell It is connected into feature vector VfAs Hole Detection model M (Vf) input, output result be 1 representative function there are loophole, input results Loophole is not present for 0 representative function.
In a preferred embodiment, specific step is as follows by the S2:
S21. two-way length memory network (BidirectionalLong-shortTerm Memory, BLSTM) in short-term is used The automatic extraction feature from function AST;
S211. the traversing operation that depth-first is carried out to AST, is stored in one for the mark (Token) in AST in order In conceptual vector;
S212. the set indicated using in all conceptual vectors carries out word insertion as dictionary, by mark (WordEmbedding) and by AST conceptual vector it is converted into numerical value vector, it is suitable to select according to the distribution of all vector lengths Numerical value carries out cutting or 0 padding for the length standard of the numerical value vector of all functions to vector as regular length, with this Change;
S213. loophole whether there is according to function, adds label for function, the vector sum vector of length normalization is corresponding Label be input in BLSTM network and be trained;Trained model is tested and used using the method for cross validation F1 value carries out model evaluation as evaluation index;Tuning parameter, the training of duplication model, test process, when F1 value reaches maximum When value, by the output of global maximum pond layer as the feature vector V extracted from function ASTast
S22. CFG, PDG of extraction are indicated with adjacency matrix, is separately input to figure incorporation model and obtains two fixed length The vector V of degreecfg、Vpdg, as the feature extracted from function CFG and PDG.
In a preferred embodiment, the code metric index includes statistical indicator and complexity index, wherein counting There is line number statistical indicator in index: total line number, blank line number, annotation line number, pre-processes lines of code, is inactive lines of code Line number, annotation and lines of code ratio, sentence number statistical indicator have: total sentence number, executes sentence number, null statement at declarative statement number Number;Complexity index includes circulation complexity, amendment circulation complexity, weighted shift complexity, essential complexity and maximum The nested number of plies.
In a preferred embodiment, the lines of code, blank line number, annotation line number and sentence number statistical indicator Statistical activity code, the i.e. not code among pretreated code block.
In a preferred embodiment, in step S213, the label of function addition is as follows, and label is that 0 representative function does not have Loophole, there are loopholes for 1 representative function.
In a preferred embodiment, the network structure of the BLSTM network include an input layer, one BLSTM layers, One global maximum pond layer, several full articulamentums and an output layer.
In a preferred embodiment, the parameter for needing to debug in the BLSTM network includes: learningrate, Epoch, batchsize, the unit number in each hidden layer, the number of full articulamentum, the unit number of full articulamentum, hidden layer and The activation primitive of full articulamentum, loss function, optimizer (Optimizer).
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The present invention provides a kind of, and the source code leak detection method based on deep learning has higher the degree of automation, The dependence to domain-specialist knowledge is reduced, code audit cost is greatlyd save, improves the efficiency of code audit.And compared to other The method for carrying out Hole Detection using deep learning, a variety of expression bigizationner of this method combination code by the grammer of code with Semantic information retains, and the feature for enabling deep learning algorithm to extract automatically preferably portrays code, in combination with common code Feature of the Measure Indexes as detection, further increases detection effect.
Detailed description of the invention
Fig. 1 is the overall framework figure of the source code leak detection method provided by the invention based on deep learning;
Fig. 2 is the example function for the source code leak detection method based on deep learning that embodiment 2 provides;
Fig. 3 is the basic system of the programming process in embodiment 2;
Fig. 4 is the process of the example function recurrence abbreviation basic structure in embodiment 2;
Fig. 5 is the extraction of AST, PDG, CFG of function in embodiment 2;
Fig. 6 is the network structure of BLSTM in embodiment 2.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, only for illustration, Bu Nengli Solution is the limitation to this patent.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative labor Every other embodiment obtained under the premise of dynamic, shall fall within the protection scope of the present invention.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
Embodiment 1
Referring to FIG. 1, a kind of source code leak detection method based on deep learning, comprising the following steps:
S1. the code metric index of function in source code file is calculated, and is integrated into a code metric vector Vcm
S2. using the function in source code as basic unit, the automatic pumping of Function feature is completed using deep learning method It takes;Extract abstract syntax tree AST, the controlling stream graph CFG, program dependency graph PDG in source code;
S21. two-way length memory network BLSTM automatic extraction feature from function AST in short-term is used;
S211. to AST carry out depth-first traversing operation, by the mark in AST be stored in order one indicate to In amount;
S212. the set indicated using in all conceptual vectors as dictionary, will mark carry out word insertion and by AST indicate to Amount is converted into numerical value vector, selects suitable numerical value as regular length according to the distribution of all vector lengths, with this to vector Cutting or 0 padding are carried out by the length normalization of the numerical value vector of all functions;
S213. loophole whether there is according to function, adds label for function, the vector sum vector of length normalization is corresponding Label be input in BLSTM network and be trained;Trained model is tested and used using the method for cross validation F1 value carries out model evaluation as evaluation index;Tuning parameter, the training of duplication model, test process, when F1 value reaches maximum When value, by the output of global maximum pond layer as the feature vector V extracted from function ASTast
S22. CFG, PDG of extraction are indicated with adjacency matrix, is separately input to figure incorporation model and obtains two fixed length The vector V of degreecfg、Vpdg, as the feature extracted from function CFG and PDG.
S3. by vector Vcm、Vast、Vcfg、VpdgIt is merged into a vector VfAs the feature vector of function, by the spy of function Levy vector VfTraining in random deep woods algorithm, which is input to, with the label of function obtains final Hole Detection model M (Vf);
S4. for the function in source code to be detected, four vector V are obtained by S1 and S2cm、Vast、Vcfg、Vpdg, spell It is connected into feature vector VfAs Hole Detection model M (Vf) input, output result be 1 representative function there are loophole, input results Loophole is not present for 0 representative function.
In a preferred embodiment, the code metric index includes statistical indicator and complexity index, wherein counting There is line number statistical indicator in index: total line number, blank line number, annotation line number, pre-processes lines of code, is inactive lines of code Line number, annotation and lines of code ratio, sentence number statistical indicator have: total sentence number, executes sentence number, null statement at declarative statement number Number;Complexity index includes circulation complexity, amendment circulation complexity, weighted shift complexity, essential complexity and maximum The nested number of plies.
In a preferred embodiment, the lines of code, blank line number, annotation line number and sentence number statistical indicator Statistical activity code, the i.e. not code among pretreated code block.
In a preferred embodiment, in step S213, the label of function addition is as follows, and label is that 0 representative function does not have Loophole, there are loopholes for 1 representative function.
In a preferred embodiment, the network structure of the BLSTM network include an input layer, one BLSTM layers, One global maximum pond layer, several full articulamentums and an output layer.
Embodiment 2
The present embodiment is consistent with 1 content of embodiment, and the precondition of implementation is that have an available large software loophole Database and the position that loophole type and the loophole place in source code can be clearly learnt from vulnerability scan, from the data It can be collected in library comprising certain seed type loophole and the identical source code of programming language is as data set.
A kind of source code leak detection method based on deep learning, comprising the following steps:
S1. the code metric index of function in source code is calculated, the code metric index of function includes statistical indicator and answers Miscellaneous degree index.There is line number statistical indicator in statistical indicator: total line number, lines of code, blank line number, annotation line number, pretreatment generation Code line number, inactive line number, annotation and lines of code ratio;Sentence number statistical indicator has: total sentence number, executes declarative statement number Sentence number, null statement number.Wherein lines of code, blank line number, annotation line number and sentence number statistical indicator statistical activity generation Code, the i.e. not code among pretreated code block.
For the calculating for stating above-mentioned code metric index explicitly, using Fig. 2 example function as objective function, then head office Number is 35, lines of code 28, and blank line number is 3, and annotation line number is 2, and pretreatment lines of code is 2, i.e. pretreatment code block 31,34 interior rows;Inactive line number is 2, i.e., 32,33 rows in pretreatment code block, annotation are 3/28 with lines of code ratio, is protected Staying two-decimal is 0.07;Total sentence number is 19, and executing sentence number is 16, declarative statement 2, in respectively the 2nd row Intj=0 in intcount=5 and the 5th row, null statement number are 1, i.e. 32,33 rows in pretreatment code block.
Complexity index include circulation complexity, amendment circulation complexity, weighted shift complexity, essential complexity and The maximum nesting number of plies.The number that circulation complexity is equal to all decision points in function adds 1 decision point in c/c++ language to have If, for, while, case, catch,?;Amendment circulation complexity is identical with the circulation calculating of complexity, but only by more decision knots Structure calculates as a whole, and if the case of the switch structure in C language disregards, total is counted by 1;Weighted shift is multiple It is miscellaneous degree be equal to circulation complexity plus decision point conditional expression formula logical AND and logic or number;Essential complexity is pair All basic structures in function carry out calculating its circulation complexity after recurrence simplifies, and the basic structure in structured language includes Sequential organization, selection three kinds of structures of structure and loop structure, the controlling stream graph of three kinds of structures are as shown in Figure 3;The maximum nesting number of plies The nested number of plies of maximum of control structure in function is calculated, the nested number of plies of the maximum of Fig. 2 example function is 3.
Note circulation complexity is v (G), and amendment circulation complexity is mv (G), and weighted shift complexity is sv (G), and essence is multiple Miscellaneous degree is ev (G), then its calculation formula is as follows:
V (G)=dp (F)+1 (1)
Mv (G)=v (G)-c (F)+1 (2)
Sv (G)=v (G)+lce (F) (3)
Ev (G)=v (SG) (4)
Wherein, dp (F) is the number of decision point in function, and c (F) is non-default branch's number of multiple-branching construction in function, Lce (F) be in function decision point logical AND and logic or number, v (SG) is function by passing program basic structure Return simplified circulation complexity.Process is simplified to the recurrence of Fig. 2 example function as shown in figure 4, extracting the control of function first Flow graph, then from the controlling stream graph of innermost layer recurrence simplified function, all basic structure can be reduced to a flow points, The circulation complexity of function after last computational short cut.
As shown in Fig. 2, the decision point in function has 6, therefore dp (F) is 6, non-default branch in more decision structure switch Number is 2, therefore c (F) is 2, has one and logic in conditional expression, therefore lce (F) is 1, can be counted according to formula (1), (2), (3) Calculate the v (G) of example function be 7, mv (G) be 6, sv (G) is 8.According to the controlling stream graph of Fig. 3 abbreviation example function, finalization The circulation complexity of the control flow chart of function after solution is 1, therefore ev (G) is 1.
Finally all code metric indexs being calculated are stored into vector Vcm
S2. using function as basic unit, the automatic extraction of Function feature is completed using deep learning method.It is quiet using code State analysis tool can complete the extraction of function abstract syntax tree AST, controlling stream graph CFG and program dependency graph PDG.Fig. 5 with For one simple function, AST, CFG and the PDG therefrom extracted is shown;
S21. for the AST of function, two-way length memory network BLSTM therefrom automatic extraction feature in short-term is used;
S211. the traversing operation for carrying out depth-first to the AST of function first, the content in AST node is suitable by traversing Sequence is stored in a conceptual vector.Be as the AST of Fig. 5 (b) carries out the conceptual vector that depth-first traversal obtains [func, Fool, DECL, int ,=, temp, CALL, number, IF, PRED ,==, temp, DATA, CALL, print, ARG, temp];
S212. dictionary is constructed with the mark in all conceptual vectors, each mark is carried out using a hot coding mode Coding, the input as word incorporation model word2vec.Remember that total conventional number is n, the dimension of insertion is m, then i-th of mark uses The vector v that length is n is expressed as after one heat codingi, viI-th bit be 1 remaining be 0.It is raw after the completion of word2vec model training At one dimension of generation be (n, m) word embeded matrix En,m, i-th mark word insertion after become length be m vector Vi, Vi=vi*En,m, and then numerical value vector is converted by AST conceptual vector.Due in different size, the institute of the AST of different functions The length of the conceptual vector of extraction and its corresponding numerical value vector is also therefore different, and the input of BLSTM needs for fixed length The vector of degree, so selecting suitable numerical value as regular length according to the distribution of all vector lengths, by carrying out to vector It cuts or 0 filling behaviour completes vector length standardization, obtain vector Vstd
S213. loophole whether there is according to function, adds label for function.Label is that 0 representative function does not have a loophole, 1 There are loopholes for representative function.By the vector V of length normalizationstdThe data set that label corresponding with its is constituted is input to BLSTM It is trained and tests in network, use n times k folding cross-validation method and BLSTM model is carried out using F1 value as evaluation index Assessment, n and k choose suitable value according to the size of data set, and the network configuration of BLSTM in BLSTM network as shown in fig. 6, need The parameter to be debugged includes: learningrate, epoch, batchsize, the unit number in each hidden layer, full articulamentum Number, the unit number of full articulamentum, the activation primitive of hidden layer and full articulamentum, loss function, optimizer;
For the calculating for illustrating F1 value, by four kinds of possible prediction results of BLSTM model, real example TruePositive, False positive example FalsePositive, true counter-example TrueNegative, vacation counter-example FalseNegative be denoted as respectively TP, FP, TN, FN.Practical real example expression test sample is function containing loophole, and prediction result is function containing loophole;False positive example indicates test sample Practical is without loophole function, and prediction result is function containing loophole;True counter-example indicate test sample it is practical be without loophole function, Prediction result is also without loophole function;Practical false counter-example expression test sample is function containing loophole, and prediction result is without leakage Hole function.F1 is calculated by following formula.
Tuning parameter, the training of duplication model, test process will be global when the F1 value of cross validation reaches maximum value The output vector of maximum pond layer is as the feature vector V extracted from function ASTast
S22. CFG, PDG of extraction are indicated with adjacency matrix, using the adjacency matrix of the CFG of all functions as data Collection, as the input of graph2vec figure incorporation model, the vector that output obtains the CFG of a regular length indicates Vcfg;Equally Using the adjacency matrix of the PDG of all functions as data set, it is input in graph2vec figure incorporation model and obtains the fixation of PDG The vector of length indicates Vpdg, finally with VcfgAnd VpdgAs the feature extracted from function CFG and PDG;
S3. by vector Vcm、Vast、Vcfg、VpdgIt is merged into a vector VfAs the feature vector of function, by the spy of function Levy vector VfTraining in random deep woods algorithm, which is input to, with the label of function obtains final Hole Detection model M (Vf)
S4. for the function in source code to be detected, four vector V are obtained by step A and step Bcm、Va t、Vcfg、 Vpdg, it is spliced into feature vector VfAs Hole Detection model M (Vf) input, output result be 1 representative function there are loopholes, it is defeated Entering result is that there is no loopholes for 0 representative function.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (7)

1. a kind of source code leak detection method based on deep learning, which comprises the following steps:
S1. the code metric index of function in source code file is calculated, and is integrated into a code metric vector Vcm
S2. using the function in source code as basic unit, the automatic extraction of Function feature is completed using deep learning method;Make The pumping of function abstract syntax tree AST, controlling stream graph CFG and program dependency graph PDG can be completed with code static analysis tool It takes and is successively converted into numerical value vector Vast、Vcfg、Vpdg
S3. by vector Vcm、Vast、Vcfg、VpdgIt is merged into a vector VfAs the feature vector of function, by the feature of function to Measure VfTraining in random deep woods algorithm, which is input to, with the label of function obtains final Hole Detection model M (Vf);
S4. for the function in source code to be detected, four vector V are obtained by S1 and S2cm、Vast、Vcfg、Vpdg, it is spliced into Feature vector VfAs Hole Detection model M (Vf) input, output result be 1 representative function there are loophole, input results 0 Loophole is not present in representative function.
2. the source code leak detection method according to claim 1 based on deep learning, which is characterized in that the S2 Specific step is as follows:
S21. two-way length memory network automatic extraction feature from function AST in short-term is used;
S211. the traversing operation that depth-first is carried out to AST, the mark in AST is stored in order in a conceptual vector;
Mark is carried out word insertion and turns AST conceptual vector by the set S212. indicated using in all conceptual vectors as dictionary Numerical value vector is turned to, selects suitable numerical value as regular length according to the distribution of all vector lengths, vector is carried out with this It cuts or 0 padding is by the length normalization of the numerical value vector of all functions;
S213. loophole whether there is according to function, label is added for function, by the corresponding mark of vector sum vector of length normalization Label are input to two-way length and are trained in memory network in short-term;Trained model is tested simultaneously using the method for cross validation F1 value is used to carry out model evaluation as evaluation index;Tuning parameter, the training of duplication model, test process, when F1 value reaches When maximum value, by the output of global maximum pond layer as the feature vector V extracted from function ASTast
S22. CFG, PDG of extraction are indicated and are separately input to figure incorporation model to obtain two regular lengths with adjacency matrix Vector Vcfg、Vpdg, as the feature extracted from function CFG and PDG.
3. the source code leak detection method according to claim 1 based on deep learning, which is characterized in that the generation Code Measure Indexes include statistical indicator and complexity index, and wherein there is line number statistical indicator in statistical indicator: total line number, code line Number, blank line number, annotation line number, pretreatment lines of code, inactive line number, annotation and lines of code ratio, sentence number statistics refer to Indicate: total sentence number, executes sentence number, null statement number at declarative statement number;Complexity index includes circulation complexity, corrects and follow Ring complexity, weighted shift complexity, essential complexity and the maximum nested number of plies.
4. the source code leak detection method according to claim 3 based on deep learning, which is characterized in that the generation Code line number, blank line number, annotation line number and sentence number statistical indicator statistical activity code.
5. the source code leak detection method according to claim 2 based on deep learning, which is characterized in that in step In S213, the label of function addition is as follows, and label is that 0 representative function does not have loophole, and there are loopholes for 1 representative function.
6. the source code leak detection method according to claim 2 based on deep learning, which is characterized in that described is double Network structure to long memory network in short-term include an input layer, one BLSTM layer, it is a global maximum pond layer, several A full articulamentum and an output layer.
7. the source code leak detection method according to claim 6 based on deep learning, which is characterized in that described The parameter for needing to debug in BLSTM network includes: learningrate, epoch, batchsize, the unit in each hidden layer Number, the number of full articulamentum, the unit number of full articulamentum, the activation primitive of hidden layer and full articulamentum, loss function, optimization Device.
CN201910214764.3A 2019-03-20 2019-03-20 Deep learning-based source code vulnerability detection method Active CN110011986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910214764.3A CN110011986B (en) 2019-03-20 2019-03-20 Deep learning-based source code vulnerability detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910214764.3A CN110011986B (en) 2019-03-20 2019-03-20 Deep learning-based source code vulnerability detection method

Publications (2)

Publication Number Publication Date
CN110011986A true CN110011986A (en) 2019-07-12
CN110011986B CN110011986B (en) 2021-04-02

Family

ID=67167516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910214764.3A Active CN110011986B (en) 2019-03-20 2019-03-20 Deep learning-based source code vulnerability detection method

Country Status (1)

Country Link
CN (1) CN110011986B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110737899A (en) * 2019-09-24 2020-01-31 暨南大学 machine learning-based intelligent contract security vulnerability detection method
CN110825642A (en) * 2019-11-11 2020-02-21 浙江大学 Software code line-level defect detection method based on deep learning
CN111259394A (en) * 2020-01-15 2020-06-09 中山大学 Fine-grained source code vulnerability detection method based on graph neural network
CN111832028A (en) * 2020-07-27 2020-10-27 中国工商银行股份有限公司 Code auditing method and device, electronic equipment and medium
WO2021037196A1 (en) * 2019-08-28 2021-03-04 杭州趣链科技有限公司 Smart contract code vulnerability detection method and apparatus, computer device and storage medium
CN112883378A (en) * 2021-03-30 2021-06-01 北京理工大学 Android malicious software detection method integrating graph embedding and deep neural network
CN113220286A (en) * 2021-04-27 2021-08-06 浙大城市学院 Evaluation method of graphical programming product
CN113342318A (en) * 2021-04-19 2021-09-03 山东师范大学 Fine-grained code automatic generation method and system based on multi-view code characteristics
CN113378178A (en) * 2021-06-21 2021-09-10 大连海事大学 Deep learning-based graph confidence learning software vulnerability detection method
CN113448857A (en) * 2021-07-09 2021-09-28 北京理工大学 Software code quality measurement method based on deep learning
CN113742205A (en) * 2020-05-27 2021-12-03 南京大学 Code vulnerability intelligent detection method based on man-machine cooperation
CN115130110A (en) * 2022-07-08 2022-09-30 国网浙江省电力有限公司电力科学研究院 Vulnerability mining method, device, equipment and medium based on parallel ensemble learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018016671A2 (en) * 2016-07-20 2018-01-25 주식회사 이븐스타 Dangerous code detection system for checking security vulnerability and method thereof
CN107885999A (en) * 2017-11-08 2018-04-06 华中科技大学 A kind of leak detection method and system based on deep learning
CN108446540A (en) * 2018-03-19 2018-08-24 中山大学 Program code based on source code multi-tag figure neural network plagiarizes type detection method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018016671A2 (en) * 2016-07-20 2018-01-25 주식회사 이븐스타 Dangerous code detection system for checking security vulnerability and method thereof
CN107885999A (en) * 2017-11-08 2018-04-06 华中科技大学 A kind of leak detection method and system based on deep learning
CN108446540A (en) * 2018-03-19 2018-08-24 中山大学 Program code based on source code multi-tag figure neural network plagiarizes type detection method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李珍: "面向源代码的软件漏洞静态检测综述", 《网络与信息安全学报》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021037196A1 (en) * 2019-08-28 2021-03-04 杭州趣链科技有限公司 Smart contract code vulnerability detection method and apparatus, computer device and storage medium
CN110737899A (en) * 2019-09-24 2020-01-31 暨南大学 machine learning-based intelligent contract security vulnerability detection method
CN110737899B (en) * 2019-09-24 2022-09-06 暨南大学 Intelligent contract security vulnerability detection method based on machine learning
CN110825642A (en) * 2019-11-11 2020-02-21 浙江大学 Software code line-level defect detection method based on deep learning
CN111259394A (en) * 2020-01-15 2020-06-09 中山大学 Fine-grained source code vulnerability detection method based on graph neural network
CN111259394B (en) * 2020-01-15 2022-08-05 中山大学 Fine-grained source code vulnerability detection method based on graph neural network
CN113742205A (en) * 2020-05-27 2021-12-03 南京大学 Code vulnerability intelligent detection method based on man-machine cooperation
CN113742205B (en) * 2020-05-27 2024-04-23 南京大学 Code vulnerability intelligent detection method based on man-machine cooperation
CN111832028A (en) * 2020-07-27 2020-10-27 中国工商银行股份有限公司 Code auditing method and device, electronic equipment and medium
CN112883378A (en) * 2021-03-30 2021-06-01 北京理工大学 Android malicious software detection method integrating graph embedding and deep neural network
CN112883378B (en) * 2021-03-30 2023-02-10 北京理工大学 Android malicious software detection method integrating graph embedding and deep neural network
CN113342318A (en) * 2021-04-19 2021-09-03 山东师范大学 Fine-grained code automatic generation method and system based on multi-view code characteristics
CN113220286B (en) * 2021-04-27 2022-04-19 浙大城市学院 Evaluation method of graphical programming product
CN113220286A (en) * 2021-04-27 2021-08-06 浙大城市学院 Evaluation method of graphical programming product
CN113378178A (en) * 2021-06-21 2021-09-10 大连海事大学 Deep learning-based graph confidence learning software vulnerability detection method
CN113378178B (en) * 2021-06-21 2023-08-22 大连海事大学 Deep learning-based graph self-confidence learning software vulnerability detection method
CN113448857A (en) * 2021-07-09 2021-09-28 北京理工大学 Software code quality measurement method based on deep learning
CN113448857B (en) * 2021-07-09 2022-03-22 北京理工大学 Software code quality measurement method based on deep learning
CN115130110A (en) * 2022-07-08 2022-09-30 国网浙江省电力有限公司电力科学研究院 Vulnerability mining method, device, equipment and medium based on parallel ensemble learning
CN115130110B (en) * 2022-07-08 2024-03-19 国网浙江省电力有限公司电力科学研究院 Vulnerability discovery method, device, equipment and medium based on parallel integrated learning

Also Published As

Publication number Publication date
CN110011986B (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN110011986A (en) A kind of source code leak detection method based on deep learning
CN111259394B (en) Fine-grained source code vulnerability detection method based on graph neural network
Wang et al. Learning semantic program embeddings with graph interval neural network
Harman et al. Optimizing for the number of tests generated in search based test data generation with an application to the oracle cost problem
Siddiq et al. An empirical study of code smells in transformer-based code generation techniques
Shen et al. A survey of automatic software vulnerability detection, program repair, and defect prediction techniques
Fraser et al. Assessing and generating test sets in terms of behavioural adequacy
Del Carpio et al. Trends in software engineering processes using deep learning: a systematic literature review
CN112256271B (en) Block chain intelligent contract safety detection system based on static analysis
Pashakhanloo et al. Codetrek: Flexible modeling of code using an extensible relational representation
Huang et al. Unseen entity handling in complex question answering over knowledge base via language generation
White et al. Reassert: Deep learning for assert generation
Palacio et al. Toward a theory of causation for interpreting neural code models
Kim et al. Predictive mutation analysis via the natural language channel in source code
Pan et al. Refactoring packages of object–oriented software using genetic algorithm based community detection technique
Aghdasifam et al. A new metaheuristic-based hierarchical clustering algorithm for software modularization
Deng et al. Model-based testing and maintenance
Jiang et al. Evaluating Natural Language Inference Models: A Metamorphic Testing Approach
Feyzi et al. Bayes‐TDG: effective test data generation using Bayesian belief network: toward failure‐detection effectiveness and maximum coverage
Martins et al. Online verification through model checking of medical critical intelligent systems
Walkinshaw et al. Evaluation and comparison of inferred regular grammars
Anthony et al. Software development automation: An approach to automate the processes of SDLC
Li et al. Hybrid model with multi-level code representation for multi-label code smell detection (077)
Kollár et al. Abstraction in programming languages according to domain-specific patterns
Fontes et al. Automated support for unit test generation: a tutorial book chapter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant