CN110011986A - A kind of source code leak detection method based on deep learning - Google Patents
A kind of source code leak detection method based on deep learning Download PDFInfo
- Publication number
- CN110011986A CN110011986A CN201910214764.3A CN201910214764A CN110011986A CN 110011986 A CN110011986 A CN 110011986A CN 201910214764 A CN201910214764 A CN 201910214764A CN 110011986 A CN110011986 A CN 110011986A
- Authority
- CN
- China
- Prior art keywords
- function
- code
- vector
- deep learning
- source code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1433—Vulnerability analysis
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Networks & Wireless Communication (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Signal Processing (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Machine Translation (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention proposes a kind of source code leak detection method based on deep learning, based on deep learning, it is automatically performed the feature extraction of source code, and the source code feature for combining code metric index and extracting automatically, constructs Hole Detection model using random deep woods algorithm.The present invention provides a kind of, and the source code leak detection method based on deep learning has higher the degree of automation, reduces the dependence to domain-specialist knowledge, greatlys save code audit cost, improve the efficiency of code audit.And compared to other methods for carrying out Hole Detection using deep learning, a variety of expression bigizationner ground of this method combination code retains the syntactic and semantic information of code, the feature for enabling deep learning algorithm to extract automatically preferably portrays code, feature in combination with common code metric index as detection, further increases detection effect.
Description
Technical field
The present invention relates to technical field of network security, more particularly, to a kind of source code loophole based on deep learning
Detection method.
Background technique
Under the high environment of the current level of informatization, the every aspect of people's life is all ceased with miscellaneous software
It is related.In daily life, people are communicated by instant communication software, carry out shopping online by shopping software, and
It completes to pay using payment software;And software is similarly played an important role in various tissues, such as the financial system, each of school
Self-help serving system, the data base management system in enterprise etc. in kind mechanism.And due in software design, realization and use
Mistake that may be present, most of software are all inevitably present loophole.The loophole of software is once utilized by criminal, no
Only the interests of directly damage software user can also influence the interests of software supplier indirectly, therefore software supplier is past
Toward the code audit that can put into great cost progress software to reduce software loophole that may be present, the safety of software is improved
Property.
Hole Detection tool usually can be used in code audit and carry out quick loophole positioning, improve audit efficiency.
The leak detection method of software can be divided into the method for static analysis and the method for dynamic analysis according to whether executing program.
The method of static analysis mainly passes through the syntactic and semantic information determining program of analysis program with the presence or absence of loophole, is mainly used for point
Analyse the source code of program.Main problem existing for existing more mature Static Analysis Method is heavy dependence domain expert
Knowledge needs domain expert to expend considerable time and effort and analyzes the source code of program, and rate of failing to report and wrong report
Rate is relatively high;The method of dynamic analysis then passes through the information determining program generated in analysis program process with the presence or absence of leakage
Hole, commonly used in analysis executable file.The dynamically analyzing of program of mainstream has a stain analysis and semiology analysis, the stain point of program
Analysis can not cover all execution routes of program, also need domain expert and expend considerable time and effort analysis program and leakage
Report rate is high, and semiology analysis is by inputting symbolism for program, program executes formulation, can theoretically cover all execution roads
Diameter, but it is difficult to practical application greatly due to solving expense.
Summary of the invention
Existing leak detection method rate of failing to report is high, heavy dependence domain-specialist knowledge aiming at the problem that, the present invention proposes
A kind of source code leak detection method based on deep learning, the technical solution adopted by the present invention is that:
A kind of source code leak detection method based on deep learning, comprising the following steps:
S1. the code metric index of function in source code file is calculated, and is integrated into a code metric vector Vcm;
S2. using the function in source code as basic unit, the automatic pumping of Function feature is completed using deep learning method
It takes;Extract abstract syntax tree (AbstractSyntaxTree, AST), controlling stream graph (the Control Flow in source code
Graph, CFG), program dependency graph (ProgramDependencyGraph, PDG);
S3. by vector Vcm、Vast、Vcfg、VpdgIt is merged into a vector VfAs the feature vector of function, by the spy of function
Levy vector VfTraining in random deep woods algorithm, which is input to, with the label of function obtains final Hole Detection model M (Vf);
S4. for the function in source code to be detected, four vector V are obtained by S1 and S2cm、Vast、Vcfg、Vpdg, spell
It is connected into feature vector VfAs Hole Detection model M (Vf) input, output result be 1 representative function there are loophole, input results
Loophole is not present for 0 representative function.
In a preferred embodiment, specific step is as follows by the S2:
S21. two-way length memory network (BidirectionalLong-shortTerm Memory, BLSTM) in short-term is used
The automatic extraction feature from function AST;
S211. the traversing operation that depth-first is carried out to AST, is stored in one for the mark (Token) in AST in order
In conceptual vector;
S212. the set indicated using in all conceptual vectors carries out word insertion as dictionary, by mark
(WordEmbedding) and by AST conceptual vector it is converted into numerical value vector, it is suitable to select according to the distribution of all vector lengths
Numerical value carries out cutting or 0 padding for the length standard of the numerical value vector of all functions to vector as regular length, with this
Change;
S213. loophole whether there is according to function, adds label for function, the vector sum vector of length normalization is corresponding
Label be input in BLSTM network and be trained;Trained model is tested and used using the method for cross validation
F1 value carries out model evaluation as evaluation index;Tuning parameter, the training of duplication model, test process, when F1 value reaches maximum
When value, by the output of global maximum pond layer as the feature vector V extracted from function ASTast。
S22. CFG, PDG of extraction are indicated with adjacency matrix, is separately input to figure incorporation model and obtains two fixed length
The vector V of degreecfg、Vpdg, as the feature extracted from function CFG and PDG.
In a preferred embodiment, the code metric index includes statistical indicator and complexity index, wherein counting
There is line number statistical indicator in index: total line number, blank line number, annotation line number, pre-processes lines of code, is inactive lines of code
Line number, annotation and lines of code ratio, sentence number statistical indicator have: total sentence number, executes sentence number, null statement at declarative statement number
Number;Complexity index includes circulation complexity, amendment circulation complexity, weighted shift complexity, essential complexity and maximum
The nested number of plies.
In a preferred embodiment, the lines of code, blank line number, annotation line number and sentence number statistical indicator
Statistical activity code, the i.e. not code among pretreated code block.
In a preferred embodiment, in step S213, the label of function addition is as follows, and label is that 0 representative function does not have
Loophole, there are loopholes for 1 representative function.
In a preferred embodiment, the network structure of the BLSTM network include an input layer, one BLSTM layers,
One global maximum pond layer, several full articulamentums and an output layer.
In a preferred embodiment, the parameter for needing to debug in the BLSTM network includes: learningrate,
Epoch, batchsize, the unit number in each hidden layer, the number of full articulamentum, the unit number of full articulamentum, hidden layer and
The activation primitive of full articulamentum, loss function, optimizer (Optimizer).
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The present invention provides a kind of, and the source code leak detection method based on deep learning has higher the degree of automation,
The dependence to domain-specialist knowledge is reduced, code audit cost is greatlyd save, improves the efficiency of code audit.And compared to other
The method for carrying out Hole Detection using deep learning, a variety of expression bigizationner of this method combination code by the grammer of code with
Semantic information retains, and the feature for enabling deep learning algorithm to extract automatically preferably portrays code, in combination with common code
Feature of the Measure Indexes as detection, further increases detection effect.
Detailed description of the invention
Fig. 1 is the overall framework figure of the source code leak detection method provided by the invention based on deep learning;
Fig. 2 is the example function for the source code leak detection method based on deep learning that embodiment 2 provides;
Fig. 3 is the basic system of the programming process in embodiment 2;
Fig. 4 is the process of the example function recurrence abbreviation basic structure in embodiment 2;
Fig. 5 is the extraction of AST, PDG, CFG of function in embodiment 2;
Fig. 6 is the network structure of BLSTM in embodiment 2.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, only for illustration, Bu Nengli
Solution is the limitation to this patent.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative labor
Every other embodiment obtained under the premise of dynamic, shall fall within the protection scope of the present invention.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
Embodiment 1
Referring to FIG. 1, a kind of source code leak detection method based on deep learning, comprising the following steps:
S1. the code metric index of function in source code file is calculated, and is integrated into a code metric vector Vcm;
S2. using the function in source code as basic unit, the automatic pumping of Function feature is completed using deep learning method
It takes;Extract abstract syntax tree AST, the controlling stream graph CFG, program dependency graph PDG in source code;
S21. two-way length memory network BLSTM automatic extraction feature from function AST in short-term is used;
S211. to AST carry out depth-first traversing operation, by the mark in AST be stored in order one indicate to
In amount;
S212. the set indicated using in all conceptual vectors as dictionary, will mark carry out word insertion and by AST indicate to
Amount is converted into numerical value vector, selects suitable numerical value as regular length according to the distribution of all vector lengths, with this to vector
Cutting or 0 padding are carried out by the length normalization of the numerical value vector of all functions;
S213. loophole whether there is according to function, adds label for function, the vector sum vector of length normalization is corresponding
Label be input in BLSTM network and be trained;Trained model is tested and used using the method for cross validation
F1 value carries out model evaluation as evaluation index;Tuning parameter, the training of duplication model, test process, when F1 value reaches maximum
When value, by the output of global maximum pond layer as the feature vector V extracted from function ASTast。
S22. CFG, PDG of extraction are indicated with adjacency matrix, is separately input to figure incorporation model and obtains two fixed length
The vector V of degreecfg、Vpdg, as the feature extracted from function CFG and PDG.
S3. by vector Vcm、Vast、Vcfg、VpdgIt is merged into a vector VfAs the feature vector of function, by the spy of function
Levy vector VfTraining in random deep woods algorithm, which is input to, with the label of function obtains final Hole Detection model M (Vf);
S4. for the function in source code to be detected, four vector V are obtained by S1 and S2cm、Vast、Vcfg、Vpdg, spell
It is connected into feature vector VfAs Hole Detection model M (Vf) input, output result be 1 representative function there are loophole, input results
Loophole is not present for 0 representative function.
In a preferred embodiment, the code metric index includes statistical indicator and complexity index, wherein counting
There is line number statistical indicator in index: total line number, blank line number, annotation line number, pre-processes lines of code, is inactive lines of code
Line number, annotation and lines of code ratio, sentence number statistical indicator have: total sentence number, executes sentence number, null statement at declarative statement number
Number;Complexity index includes circulation complexity, amendment circulation complexity, weighted shift complexity, essential complexity and maximum
The nested number of plies.
In a preferred embodiment, the lines of code, blank line number, annotation line number and sentence number statistical indicator
Statistical activity code, the i.e. not code among pretreated code block.
In a preferred embodiment, in step S213, the label of function addition is as follows, and label is that 0 representative function does not have
Loophole, there are loopholes for 1 representative function.
In a preferred embodiment, the network structure of the BLSTM network include an input layer, one BLSTM layers,
One global maximum pond layer, several full articulamentums and an output layer.
Embodiment 2
The present embodiment is consistent with 1 content of embodiment, and the precondition of implementation is that have an available large software loophole
Database and the position that loophole type and the loophole place in source code can be clearly learnt from vulnerability scan, from the data
It can be collected in library comprising certain seed type loophole and the identical source code of programming language is as data set.
A kind of source code leak detection method based on deep learning, comprising the following steps:
S1. the code metric index of function in source code is calculated, the code metric index of function includes statistical indicator and answers
Miscellaneous degree index.There is line number statistical indicator in statistical indicator: total line number, lines of code, blank line number, annotation line number, pretreatment generation
Code line number, inactive line number, annotation and lines of code ratio;Sentence number statistical indicator has: total sentence number, executes declarative statement number
Sentence number, null statement number.Wherein lines of code, blank line number, annotation line number and sentence number statistical indicator statistical activity generation
Code, the i.e. not code among pretreated code block.
For the calculating for stating above-mentioned code metric index explicitly, using Fig. 2 example function as objective function, then head office
Number is 35, lines of code 28, and blank line number is 3, and annotation line number is 2, and pretreatment lines of code is 2, i.e. pretreatment code block
31,34 interior rows;Inactive line number is 2, i.e., 32,33 rows in pretreatment code block, annotation are 3/28 with lines of code ratio, is protected
Staying two-decimal is 0.07;Total sentence number is 19, and executing sentence number is 16, declarative statement 2, in respectively the 2nd row
Intj=0 in intcount=5 and the 5th row, null statement number are 1, i.e. 32,33 rows in pretreatment code block.
Complexity index include circulation complexity, amendment circulation complexity, weighted shift complexity, essential complexity and
The maximum nesting number of plies.The number that circulation complexity is equal to all decision points in function adds 1 decision point in c/c++ language to have
If, for, while, case, catch,?;Amendment circulation complexity is identical with the circulation calculating of complexity, but only by more decision knots
Structure calculates as a whole, and if the case of the switch structure in C language disregards, total is counted by 1;Weighted shift is multiple
It is miscellaneous degree be equal to circulation complexity plus decision point conditional expression formula logical AND and logic or number;Essential complexity is pair
All basic structures in function carry out calculating its circulation complexity after recurrence simplifies, and the basic structure in structured language includes
Sequential organization, selection three kinds of structures of structure and loop structure, the controlling stream graph of three kinds of structures are as shown in Figure 3;The maximum nesting number of plies
The nested number of plies of maximum of control structure in function is calculated, the nested number of plies of the maximum of Fig. 2 example function is 3.
Note circulation complexity is v (G), and amendment circulation complexity is mv (G), and weighted shift complexity is sv (G), and essence is multiple
Miscellaneous degree is ev (G), then its calculation formula is as follows:
V (G)=dp (F)+1 (1)
Mv (G)=v (G)-c (F)+1 (2)
Sv (G)=v (G)+lce (F) (3)
Ev (G)=v (SG) (4)
Wherein, dp (F) is the number of decision point in function, and c (F) is non-default branch's number of multiple-branching construction in function,
Lce (F) be in function decision point logical AND and logic or number, v (SG) is function by passing program basic structure
Return simplified circulation complexity.Process is simplified to the recurrence of Fig. 2 example function as shown in figure 4, extracting the control of function first
Flow graph, then from the controlling stream graph of innermost layer recurrence simplified function, all basic structure can be reduced to a flow points,
The circulation complexity of function after last computational short cut.
As shown in Fig. 2, the decision point in function has 6, therefore dp (F) is 6, non-default branch in more decision structure switch
Number is 2, therefore c (F) is 2, has one and logic in conditional expression, therefore lce (F) is 1, can be counted according to formula (1), (2), (3)
Calculate the v (G) of example function be 7, mv (G) be 6, sv (G) is 8.According to the controlling stream graph of Fig. 3 abbreviation example function, finalization
The circulation complexity of the control flow chart of function after solution is 1, therefore ev (G) is 1.
Finally all code metric indexs being calculated are stored into vector Vcm。
S2. using function as basic unit, the automatic extraction of Function feature is completed using deep learning method.It is quiet using code
State analysis tool can complete the extraction of function abstract syntax tree AST, controlling stream graph CFG and program dependency graph PDG.Fig. 5 with
For one simple function, AST, CFG and the PDG therefrom extracted is shown;
S21. for the AST of function, two-way length memory network BLSTM therefrom automatic extraction feature in short-term is used;
S211. the traversing operation for carrying out depth-first to the AST of function first, the content in AST node is suitable by traversing
Sequence is stored in a conceptual vector.Be as the AST of Fig. 5 (b) carries out the conceptual vector that depth-first traversal obtains [func,
Fool, DECL, int ,=, temp, CALL, number, IF, PRED ,==, temp, DATA, CALL, print, ARG,
temp];
S212. dictionary is constructed with the mark in all conceptual vectors, each mark is carried out using a hot coding mode
Coding, the input as word incorporation model word2vec.Remember that total conventional number is n, the dimension of insertion is m, then i-th of mark uses
The vector v that length is n is expressed as after one heat codingi, viI-th bit be 1 remaining be 0.It is raw after the completion of word2vec model training
At one dimension of generation be (n, m) word embeded matrix En,m, i-th mark word insertion after become length be m vector
Vi, Vi=vi*En,m, and then numerical value vector is converted by AST conceptual vector.Due in different size, the institute of the AST of different functions
The length of the conceptual vector of extraction and its corresponding numerical value vector is also therefore different, and the input of BLSTM needs for fixed length
The vector of degree, so selecting suitable numerical value as regular length according to the distribution of all vector lengths, by carrying out to vector
It cuts or 0 filling behaviour completes vector length standardization, obtain vector Vstd;
S213. loophole whether there is according to function, adds label for function.Label is that 0 representative function does not have a loophole, 1
There are loopholes for representative function.By the vector V of length normalizationstdThe data set that label corresponding with its is constituted is input to BLSTM
It is trained and tests in network, use n times k folding cross-validation method and BLSTM model is carried out using F1 value as evaluation index
Assessment, n and k choose suitable value according to the size of data set, and the network configuration of BLSTM in BLSTM network as shown in fig. 6, need
The parameter to be debugged includes: learningrate, epoch, batchsize, the unit number in each hidden layer, full articulamentum
Number, the unit number of full articulamentum, the activation primitive of hidden layer and full articulamentum, loss function, optimizer;
For the calculating for illustrating F1 value, by four kinds of possible prediction results of BLSTM model, real example TruePositive,
False positive example FalsePositive, true counter-example TrueNegative, vacation counter-example FalseNegative be denoted as respectively TP, FP, TN,
FN.Practical real example expression test sample is function containing loophole, and prediction result is function containing loophole;False positive example indicates test sample
Practical is without loophole function, and prediction result is function containing loophole;True counter-example indicate test sample it is practical be without loophole function,
Prediction result is also without loophole function;Practical false counter-example expression test sample is function containing loophole, and prediction result is without leakage
Hole function.F1 is calculated by following formula.
Tuning parameter, the training of duplication model, test process will be global when the F1 value of cross validation reaches maximum value
The output vector of maximum pond layer is as the feature vector V extracted from function ASTast;
S22. CFG, PDG of extraction are indicated with adjacency matrix, using the adjacency matrix of the CFG of all functions as data
Collection, as the input of graph2vec figure incorporation model, the vector that output obtains the CFG of a regular length indicates Vcfg;Equally
Using the adjacency matrix of the PDG of all functions as data set, it is input in graph2vec figure incorporation model and obtains the fixation of PDG
The vector of length indicates Vpdg, finally with VcfgAnd VpdgAs the feature extracted from function CFG and PDG;
S3. by vector Vcm、Vast、Vcfg、VpdgIt is merged into a vector VfAs the feature vector of function, by the spy of function
Levy vector VfTraining in random deep woods algorithm, which is input to, with the label of function obtains final Hole Detection model M (Vf)
S4. for the function in source code to be detected, four vector V are obtained by step A and step Bcm、Va t、Vcfg、
Vpdg, it is spliced into feature vector VfAs Hole Detection model M (Vf) input, output result be 1 representative function there are loopholes, it is defeated
Entering result is that there is no loopholes for 0 representative function.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description
To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this
Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention
Protection scope within.
Claims (7)
1. a kind of source code leak detection method based on deep learning, which comprises the following steps:
S1. the code metric index of function in source code file is calculated, and is integrated into a code metric vector Vcm;
S2. using the function in source code as basic unit, the automatic extraction of Function feature is completed using deep learning method;Make
The pumping of function abstract syntax tree AST, controlling stream graph CFG and program dependency graph PDG can be completed with code static analysis tool
It takes and is successively converted into numerical value vector Vast、Vcfg、Vpdg。
S3. by vector Vcm、Vast、Vcfg、VpdgIt is merged into a vector VfAs the feature vector of function, by the feature of function to
Measure VfTraining in random deep woods algorithm, which is input to, with the label of function obtains final Hole Detection model M (Vf);
S4. for the function in source code to be detected, four vector V are obtained by S1 and S2cm、Vast、Vcfg、Vpdg, it is spliced into
Feature vector VfAs Hole Detection model M (Vf) input, output result be 1 representative function there are loophole, input results 0
Loophole is not present in representative function.
2. the source code leak detection method according to claim 1 based on deep learning, which is characterized in that the S2
Specific step is as follows:
S21. two-way length memory network automatic extraction feature from function AST in short-term is used;
S211. the traversing operation that depth-first is carried out to AST, the mark in AST is stored in order in a conceptual vector;
Mark is carried out word insertion and turns AST conceptual vector by the set S212. indicated using in all conceptual vectors as dictionary
Numerical value vector is turned to, selects suitable numerical value as regular length according to the distribution of all vector lengths, vector is carried out with this
It cuts or 0 padding is by the length normalization of the numerical value vector of all functions;
S213. loophole whether there is according to function, label is added for function, by the corresponding mark of vector sum vector of length normalization
Label are input to two-way length and are trained in memory network in short-term;Trained model is tested simultaneously using the method for cross validation
F1 value is used to carry out model evaluation as evaluation index;Tuning parameter, the training of duplication model, test process, when F1 value reaches
When maximum value, by the output of global maximum pond layer as the feature vector V extracted from function ASTast。
S22. CFG, PDG of extraction are indicated and are separately input to figure incorporation model to obtain two regular lengths with adjacency matrix
Vector Vcfg、Vpdg, as the feature extracted from function CFG and PDG.
3. the source code leak detection method according to claim 1 based on deep learning, which is characterized in that the generation
Code Measure Indexes include statistical indicator and complexity index, and wherein there is line number statistical indicator in statistical indicator: total line number, code line
Number, blank line number, annotation line number, pretreatment lines of code, inactive line number, annotation and lines of code ratio, sentence number statistics refer to
Indicate: total sentence number, executes sentence number, null statement number at declarative statement number;Complexity index includes circulation complexity, corrects and follow
Ring complexity, weighted shift complexity, essential complexity and the maximum nested number of plies.
4. the source code leak detection method according to claim 3 based on deep learning, which is characterized in that the generation
Code line number, blank line number, annotation line number and sentence number statistical indicator statistical activity code.
5. the source code leak detection method according to claim 2 based on deep learning, which is characterized in that in step
In S213, the label of function addition is as follows, and label is that 0 representative function does not have loophole, and there are loopholes for 1 representative function.
6. the source code leak detection method according to claim 2 based on deep learning, which is characterized in that described is double
Network structure to long memory network in short-term include an input layer, one BLSTM layer, it is a global maximum pond layer, several
A full articulamentum and an output layer.
7. the source code leak detection method according to claim 6 based on deep learning, which is characterized in that described
The parameter for needing to debug in BLSTM network includes: learningrate, epoch, batchsize, the unit in each hidden layer
Number, the number of full articulamentum, the unit number of full articulamentum, the activation primitive of hidden layer and full articulamentum, loss function, optimization
Device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910214764.3A CN110011986B (en) | 2019-03-20 | 2019-03-20 | Deep learning-based source code vulnerability detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910214764.3A CN110011986B (en) | 2019-03-20 | 2019-03-20 | Deep learning-based source code vulnerability detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110011986A true CN110011986A (en) | 2019-07-12 |
CN110011986B CN110011986B (en) | 2021-04-02 |
Family
ID=67167516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910214764.3A Active CN110011986B (en) | 2019-03-20 | 2019-03-20 | Deep learning-based source code vulnerability detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110011986B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110737899A (en) * | 2019-09-24 | 2020-01-31 | 暨南大学 | machine learning-based intelligent contract security vulnerability detection method |
CN110825642A (en) * | 2019-11-11 | 2020-02-21 | 浙江大学 | Software code line-level defect detection method based on deep learning |
CN111259394A (en) * | 2020-01-15 | 2020-06-09 | 中山大学 | Fine-grained source code vulnerability detection method based on graph neural network |
CN111832028A (en) * | 2020-07-27 | 2020-10-27 | 中国工商银行股份有限公司 | Code auditing method and device, electronic equipment and medium |
WO2021037196A1 (en) * | 2019-08-28 | 2021-03-04 | 杭州趣链科技有限公司 | Smart contract code vulnerability detection method and apparatus, computer device and storage medium |
CN112883378A (en) * | 2021-03-30 | 2021-06-01 | 北京理工大学 | Android malicious software detection method integrating graph embedding and deep neural network |
CN113220286A (en) * | 2021-04-27 | 2021-08-06 | 浙大城市学院 | Evaluation method of graphical programming product |
CN113342318A (en) * | 2021-04-19 | 2021-09-03 | 山东师范大学 | Fine-grained code automatic generation method and system based on multi-view code characteristics |
CN113378178A (en) * | 2021-06-21 | 2021-09-10 | 大连海事大学 | Deep learning-based graph confidence learning software vulnerability detection method |
CN113448857A (en) * | 2021-07-09 | 2021-09-28 | 北京理工大学 | Software code quality measurement method based on deep learning |
CN113742205A (en) * | 2020-05-27 | 2021-12-03 | 南京大学 | Code vulnerability intelligent detection method based on man-machine cooperation |
CN115130110A (en) * | 2022-07-08 | 2022-09-30 | 国网浙江省电力有限公司电力科学研究院 | Vulnerability mining method, device, equipment and medium based on parallel ensemble learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018016671A2 (en) * | 2016-07-20 | 2018-01-25 | 주식회사 이븐스타 | Dangerous code detection system for checking security vulnerability and method thereof |
CN107885999A (en) * | 2017-11-08 | 2018-04-06 | 华中科技大学 | A kind of leak detection method and system based on deep learning |
CN108446540A (en) * | 2018-03-19 | 2018-08-24 | 中山大学 | Program code based on source code multi-tag figure neural network plagiarizes type detection method and system |
-
2019
- 2019-03-20 CN CN201910214764.3A patent/CN110011986B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018016671A2 (en) * | 2016-07-20 | 2018-01-25 | 주식회사 이븐스타 | Dangerous code detection system for checking security vulnerability and method thereof |
CN107885999A (en) * | 2017-11-08 | 2018-04-06 | 华中科技大学 | A kind of leak detection method and system based on deep learning |
CN108446540A (en) * | 2018-03-19 | 2018-08-24 | 中山大学 | Program code based on source code multi-tag figure neural network plagiarizes type detection method and system |
Non-Patent Citations (1)
Title |
---|
李珍: "面向源代码的软件漏洞静态检测综述", 《网络与信息安全学报》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021037196A1 (en) * | 2019-08-28 | 2021-03-04 | 杭州趣链科技有限公司 | Smart contract code vulnerability detection method and apparatus, computer device and storage medium |
CN110737899A (en) * | 2019-09-24 | 2020-01-31 | 暨南大学 | machine learning-based intelligent contract security vulnerability detection method |
CN110737899B (en) * | 2019-09-24 | 2022-09-06 | 暨南大学 | Intelligent contract security vulnerability detection method based on machine learning |
CN110825642A (en) * | 2019-11-11 | 2020-02-21 | 浙江大学 | Software code line-level defect detection method based on deep learning |
CN111259394A (en) * | 2020-01-15 | 2020-06-09 | 中山大学 | Fine-grained source code vulnerability detection method based on graph neural network |
CN111259394B (en) * | 2020-01-15 | 2022-08-05 | 中山大学 | Fine-grained source code vulnerability detection method based on graph neural network |
CN113742205A (en) * | 2020-05-27 | 2021-12-03 | 南京大学 | Code vulnerability intelligent detection method based on man-machine cooperation |
CN113742205B (en) * | 2020-05-27 | 2024-04-23 | 南京大学 | Code vulnerability intelligent detection method based on man-machine cooperation |
CN111832028A (en) * | 2020-07-27 | 2020-10-27 | 中国工商银行股份有限公司 | Code auditing method and device, electronic equipment and medium |
CN112883378A (en) * | 2021-03-30 | 2021-06-01 | 北京理工大学 | Android malicious software detection method integrating graph embedding and deep neural network |
CN112883378B (en) * | 2021-03-30 | 2023-02-10 | 北京理工大学 | Android malicious software detection method integrating graph embedding and deep neural network |
CN113342318A (en) * | 2021-04-19 | 2021-09-03 | 山东师范大学 | Fine-grained code automatic generation method and system based on multi-view code characteristics |
CN113220286B (en) * | 2021-04-27 | 2022-04-19 | 浙大城市学院 | Evaluation method of graphical programming product |
CN113220286A (en) * | 2021-04-27 | 2021-08-06 | 浙大城市学院 | Evaluation method of graphical programming product |
CN113378178A (en) * | 2021-06-21 | 2021-09-10 | 大连海事大学 | Deep learning-based graph confidence learning software vulnerability detection method |
CN113378178B (en) * | 2021-06-21 | 2023-08-22 | 大连海事大学 | Deep learning-based graph self-confidence learning software vulnerability detection method |
CN113448857A (en) * | 2021-07-09 | 2021-09-28 | 北京理工大学 | Software code quality measurement method based on deep learning |
CN113448857B (en) * | 2021-07-09 | 2022-03-22 | 北京理工大学 | Software code quality measurement method based on deep learning |
CN115130110A (en) * | 2022-07-08 | 2022-09-30 | 国网浙江省电力有限公司电力科学研究院 | Vulnerability mining method, device, equipment and medium based on parallel ensemble learning |
CN115130110B (en) * | 2022-07-08 | 2024-03-19 | 国网浙江省电力有限公司电力科学研究院 | Vulnerability discovery method, device, equipment and medium based on parallel integrated learning |
Also Published As
Publication number | Publication date |
---|---|
CN110011986B (en) | 2021-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110011986A (en) | A kind of source code leak detection method based on deep learning | |
CN111259394B (en) | Fine-grained source code vulnerability detection method based on graph neural network | |
Wang et al. | Learning semantic program embeddings with graph interval neural network | |
Harman et al. | Optimizing for the number of tests generated in search based test data generation with an application to the oracle cost problem | |
Siddiq et al. | An empirical study of code smells in transformer-based code generation techniques | |
Shen et al. | A survey of automatic software vulnerability detection, program repair, and defect prediction techniques | |
Fraser et al. | Assessing and generating test sets in terms of behavioural adequacy | |
Del Carpio et al. | Trends in software engineering processes using deep learning: a systematic literature review | |
CN112256271B (en) | Block chain intelligent contract safety detection system based on static analysis | |
Pashakhanloo et al. | Codetrek: Flexible modeling of code using an extensible relational representation | |
Huang et al. | Unseen entity handling in complex question answering over knowledge base via language generation | |
White et al. | Reassert: Deep learning for assert generation | |
Palacio et al. | Toward a theory of causation for interpreting neural code models | |
Kim et al. | Predictive mutation analysis via the natural language channel in source code | |
Pan et al. | Refactoring packages of object–oriented software using genetic algorithm based community detection technique | |
Aghdasifam et al. | A new metaheuristic-based hierarchical clustering algorithm for software modularization | |
Deng et al. | Model-based testing and maintenance | |
Jiang et al. | Evaluating Natural Language Inference Models: A Metamorphic Testing Approach | |
Feyzi et al. | Bayes‐TDG: effective test data generation using Bayesian belief network: toward failure‐detection effectiveness and maximum coverage | |
Martins et al. | Online verification through model checking of medical critical intelligent systems | |
Walkinshaw et al. | Evaluation and comparison of inferred regular grammars | |
Anthony et al. | Software development automation: An approach to automate the processes of SDLC | |
Li et al. | Hybrid model with multi-level code representation for multi-label code smell detection (077) | |
Kollár et al. | Abstraction in programming languages according to domain-specific patterns | |
Fontes et al. | Automated support for unit test generation: a tutorial book chapter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |