CN113297584A

CN113297584A - Vulnerability detection method, device, equipment and storage medium

Info

Publication number: CN113297584A
Application number: CN202110855058.4A
Authority: CN
Inventors: 贾鹏; 王炎; 刘嘉勇
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-08-24

Abstract

The embodiment of the application provides a vulnerability detection method, a vulnerability detection device, vulnerability detection equipment and a storage medium, and relates to the technical field of network information security, wherein the method comprises the following steps: first, the binary code of the program to be detected is reverse compiled into a pseudo code. Detecting a danger function in the pseudo code, taking the danger function as a slicing point, extracting a slicing code related to the calling of the danger function, converting the slicing code into vector representation, taking the vectorized slicing code as input, and judging whether the program to be detected contains a bug or not through a detection neural network. The method can be used for cross-architecture and cross-platform binary code vulnerability recognition scenes, fine-grained vulnerability detection is realized on the level of binary codes, automatic feature extraction can be effectively realized, high false alarm influence caused by different compiling options and patch codes is relieved, and the method has extremely high accuracy and extremely low false alarm rate and false alarm rate.

Description

Vulnerability detection method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of network information security, in particular to a vulnerability detection method, device, equipment and storage medium.

Background

Many of the current network attacks are implemented by vulnerabilities, so the vulnerabilities are discovered to be an important research direction in the security field.

Because there are a large number of reused code libraries or shared code logic in a software system (e.g., similar objects have similar processing logic in different uses), there are widely recurring bugs in actual programs that have similar characteristics to each other but are not discovered. Also, many developers do not perform deep security analysis to discover potential vulnerabilities in code when reusing libraries. Thus, repeated vulnerability detection has gained widespread popularity, particularly as vulnerability availability increases.

The existing binary code clone vulnerability detection is mainly divided into two detection methods based on pattern matching and code similarity. The method based on pattern matching requires a vulnerability pattern defined in advance by an expert to perform vulnerability detection. The code similarity detection is based on the principle that a vulnerability library is built in advance, similarity comparison is carried out on the vulnerability library and unknown codes, if high similarity exists, the code is indicated to have a vulnerability, and otherwise, the code is normal. However, these detection methods have the disadvantages of high false alarm rate/false negative rate, low accuracy, and the like.

Therefore, a new detection method with low false alarm rate or missing alarm rate and high detection accuracy is urgently needed by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides a vulnerability detection method, a vulnerability detection device, vulnerability detection equipment and a storage medium, and aims to solve at least one technical problem.

A first aspect of an embodiment of the present application provides a vulnerability detection method, where the method includes:

decompiling a program to be detected to obtain a pseudo code of the program to be detected;

detecting whether the pseudo code contains a danger function or not;

when a danger function is contained, extracting a forward slicing code segment and a backward slicing code segment of the danger function;

combining the forward slice code segments and backward slice code segments to obtain complete slice codes of a hazard function;

transcoding the complete slice into a vector representation;

and inputting the vector representation into a detection neural network for detection so as to detect whether the program to be detected contains a bug.

Optionally, the neural network comprises a BiGRU layer, a self-attention layer, a flattening layer, a full-link layer, and an activation layer, and the neural network is pre-trained.

Optionally, the extracting the forward slice code segments and the backward slice code segments of the risk function includes:

extracting a control dependency graph and a data dependency graph of each function in the pseudo code, and constructing a program dependency graph;

analyzing parameters and return values of dangerous function call to perform forward slicing based on the program dependency graph to obtain forward slicing code segments;

and analyzing parameters and return values of the dangerous function call to perform backward slicing based on the program dependency graph, and obtaining backward slicing code segments.

Optionally, the method further comprises:

removing all non-ASCII characters and comments in the pseudo code;

and performing symbolization processing on the variable name and the function name.

A second aspect of the embodiments of the present application provides a vulnerability detection apparatus, the apparatus including:

the decompiling module is used for decompiling the program to be detected to obtain a pseudo code of the program to be detected;

a hazard function determination module, configured to detect whether the pseudo code includes a hazard function;

the slicing module is used for extracting a forward slicing code segment and a backward slicing code segment of the danger function when the danger function is contained;

a slice code combining module for combining the forward slice code segment and the backward slice code segment to obtain a complete slice code of the hazard function;

a vector representation conversion module for converting the full slice code into a vector representation;

and the detection neural network is used for receiving the vector representation so as to detect whether the program to be detected contains a bug.

Optionally, the slicing module includes:

the program dependency graph constructing submodule is used for extracting a control dependency graph and a data dependency graph of each function in the pseudo code and constructing a program dependency graph;

a forward slice code segment obtaining submodule for analyzing the parameters and return values of the dangerous function call based on the program dependency graph to perform forward slice to obtain a forward slice code segment;

and the backward slicing code segment acquisition submodule is used for analyzing parameters and return values called by the danger function based on the program dependency graph to carry out backward slicing so as to obtain a backward slicing code segment.

Optionally, the apparatus further comprises:

the removing module is used for removing all non-ASCII characters and annotations in the pseudo code;

and the symbolization processing module is used for symbolizing the variable names and the function names.

A third aspect of embodiments of the present application provides a readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps in the method according to the first aspect of the present application.

A fourth aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps in the method according to the first aspect of the present application.

By adopting the vulnerability detection method provided by the application, firstly, the binary code of the program to be detected is reversely compiled into the pseudo code. Detecting a danger function in the pseudo code, taking the danger function as a slicing point, extracting a slicing code related to the calling of the danger function, converting the slicing code into vector representation, taking the vectorized slicing code as input, and judging whether the program to be detected contains a bug or not through a detection neural network. The method can be used for cross-architecture and cross-platform binary code vulnerability recognition scenes, fine-grained vulnerability detection is realized on the level of binary codes, automatic feature extraction can be effectively realized, high false alarm influence caused by different compiling options and patch codes is relieved, and the method has extremely high accuracy and extremely low false alarm rate and false alarm rate.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a flowchart of a vulnerability detection method according to an embodiment of the present application;

FIG. 2 is a flow chart of slicing a risk function according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a neural network for detection according to an embodiment of the present application;

fig. 4 is a schematic diagram of functional modules of a vulnerability detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Some problems still exist for the existing binary code clone vulnerability detection technology, such as:

firstly, there is a high cost of manpower and material resources in defining vulnerability characteristics based on the requirement of experts for pattern matching, and a high false alarm rate or missing report rate is usually generated.

Secondly, only code multiplexing loopholes can be detected based on code similarity detection, and for the condition that the loopholes have different code structures but similar loophole triggering scenes can cause higher false alarm.

Thirdly, due to the existence of the bug patch, the difference between the bug code and the patch code is very small, and the bug detection by taking the function as a unit can cause higher false alarm rate.

Therefore, the current binary code vulnerability detection method cannot meet the current requirement for high-precision vulnerability detection, and particularly, when the binary code is compiled through different compiling options, the detection precision of the original detection technology is lower.

Aiming at the problems in the prior art, the application provides a new vulnerability detection method which can identify more similar vulnerabilities and resist the influence of different compiling options and patch codes.

Referring to fig. 1, fig. 1 is a flowchart of a vulnerability detection method according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

and step S110, decompiling the program to be detected to obtain the pseudo code of the program to be detected.

Decompilation, which is called Reverse compilation entirely, is a computer software Reverse engineering (Reverse engineering) work that derives design elements such as ideas, principles, structures, algorithms, processing procedures, operation methods and the like used by software products of others by performing 'Reverse analysis and research' work on target programs (such as executable programs) of the software of others, and may derive source codes under certain specific conditions. Decompilation is the inverse process of compilation, and since the codes of the application programs are all machine codes, it is not beneficial to understand the association relationship of each function in the codes, and therefore, the detection program needs to be decompilated for subsequent detection.

In the prior art, decompiling is usually to decompile a binary program to obtain a corresponding assembly code, but each instruction in the assembly code contains less semantic information and is easily affected by different architectures, different platforms and different compiling options. In order to avoid the defects of the traditional decompilation method, the vulnerability detection method can be used for cross-architecture and cross-platform binary code vulnerability identification scenes, and the binary code of the program to be detected is directly subjected to decompilation operation to obtain corresponding pseudo codes, wherein the pseudo codes have the advantages that: the method is similar to common C codes, supports corresponding syntax semantic analysis, and has high similarity between the recovered pseudo codes and source codes.

And step S120, detecting whether the pseudo code contains a danger function.

And detecting whether the obtained pseudo code contains a danger function or not, wherein the danger function is a predefined function with high risk. Because not all library/API functions will generate bugs, it is not necessary or inefficient to analyze all functions related to the library/API in the program, and only the part calling the dangerous function needs to be bug-detected. After determining the library/API function with the danger, extracting the function containing the library/API function with the danger, and further analyzing to determine whether the vulnerability exists.

In an embodiment of the application, a data set matching mode is adopted for detection, vulnerability triggering conditions caused by improper use of functions in all common vulnerability types are analyzed, then the vulnerability functions are summarized, and a dangerous library/API function set is constructed. Generalizing in the dataset summarizes the common 66 hazard/API functions: access, alloca, alert, calloc, close, connect, execute, execlp, fclose, fgets, fopen, fprintf, fputc, fread, free, freopen, fscafcaff, fwrite, getev, gets, listen, malloc, memcpy, memmset, mkstemp, mktemp, openn, pop, printf, putc, tchpuar, putev, puts, RAND32, RAND64, realloc, remove, rename, scanf, setsockopt, snprintf, sprintf, sqrt, ssf, sstchthredfree, Loacthredstredcastquardt, Louth, strostryredtstr, stryre, str, scriptmark, script, netdown, copy, replay, copy, runtime, replay;

when a dangerous function in the data set is detected in the pseudo code (the program calls a function in the library through the API, and the called function is recorded in the dangerous function set), the program to be detected has a vulnerability risk and needs to be further analyzed.

And step S130, when the danger function is contained, extracting a forward slice code segment and a backward slice code segment of the danger function.

When the fact that the danger function is contained in the pseudo code is detected, the danger function is subjected to program slicing to obtain a forward slicing code segment and a backward slicing code segment of the danger function. Program slices are defined as: it is a program analysis technique for decomposing a program, aiming at extracting code fragments satisfying certain constraints from the program. The method is a program decomposition technology, and aims to understand and recognize the whole program by decomposing the program by searching relevant characteristics in the program and then analyzing and researching a program slice obtained by decomposition. In short, the method can find the relevant parts of the method codes, and eliminate the irrelevant parts, so that the program can be debugged, tested, maintained and the like conveniently.

The method starts from sentences which may cause the vulnerability, slices are carried out, the sentences which specifically cause the vulnerability can be located in subsequent analysis, and compared with the existing method for analyzing by taking functions as units, the method is thinner in analysis granularity, and therefore more accurate detection can be achieved.

In one embodiment of the present application, a flow diagram of a program slicing process is shown in FIG. 2. Extracting forward and backward slice code segments of the hazard function, comprising:

step A, extracting a control dependency graph and a data dependency graph of each function in the pseudo code, and constructing a program dependency graph;

extracting a control dependency graph and a data dependency graph of each function in the pseudo code; control dependencies, e.g., whether statement a is executed or not, are determined by the execution result of statement b, data according to, e.g.: statement a reads one of the variables written by statement b. And constructing a program dependency graph of the program to be detected based on the control dependency graph and the data dependency graph of each function.

As shown in fig. 2, when a danger function call is detected in the pseudo code, such as when line 5 is detected to relate to a danger function call, the line of code is extracted.

B, analyzing parameters and return values of dangerous function call based on the program dependency graph to perform forward slicing to obtain forward slicing code segments;

and D, forward slicing the danger function based on the program dependence graph constructed in the step A. Forward slices refer to all statements and predicates found for a given point of interest that are affected by the value of the variable for that point. Analyzing the calling parameters and the return values of the danger function, and constructing a set afect (v/n), wherein v represents the variable of the output of the danger function, n represents the interest point, and the set is the forward slice code segment of the danger function.

And C, analyzing parameters and return values of the dangerous function call based on the program dependency graph to perform backward slicing, and obtaining backward slicing code segments.

Backward slicing is the opposite of forward slicing, which is to construct a set, the select (v/n), such that the set is composed of all statements and predicates that affect v at n points, v representing the variables received by the hazard function, and n representing the points of interest.

When a situation containing multiple danger functions, each detected danger function should be sliced to obtain a complete slice of each danger function for analysis of each called danger function.

As shown in fig. 2, forward and backward slicing is performed based on the line code and the position thereof as slicing references, and a forward slice code segment and a backward slice code segment are obtained.

And S140, combining the forward slice code segment and the backward slice code segment to obtain a complete slice code of the danger function.

And after the forward slicing code segment and the backward slicing code segment of the danger function are obtained, assembling the final complete slicing code called by the danger library/API function by removing repeated code statements according to the code sequence. As shown in fig. 2, the codes obtained by slicing are assembled according to the code order, and then the complete slice code can be obtained.

In an embodiment of the present application, after obtaining the complete slice code, a symbolization process is further performed on the slice code segment, where the processing method includes:

1. all non-ASCII characters and comments in the pseudo-code are removed.

Because during decompilation, the decompilation tool automatically adds some annotations. And some abnormal conditions in decompilation can cause some non-ASCII characters in the decompilated pseudo code, which are useless or cause interference to vulnerability detection and need to be removed.

2. And performing symbolization processing on the variable name and the function name.

Because variable names in the pseudo code are customized by a decompiler, although the naming mode is fixed, some memory address-related naming is involved, which results in a large bag of words in a word vector model and is not beneficial to the expansion of a vulnerability detection model, and therefore, the variable names need to be uniformly symbolized.

The variable name symbolization rule is as follows: the variable names are replaced by the symbols "VAR" + "number", where the number refers to the order in which the variables first appear in the slice code, starting with the number 1, such as "VAR 1". For example, the variable name in the code fragment, such as local _1b, is renamed as VAR 1.

The method and the device also perform symbolization processing on the function name, but only symbolize the user-defined function name, and because the library/API function name and the vulnerability are high in relevance, the user-defined name is large in difference, and the difference needs to be reduced. The rule is as follows: the user-defined function is only signed, the function name is replaced by the symbol "FUN" + "number", here the number also refers to the order of the first appearance of the function name in the slice code, but there is a discrepancy between the number start value and the start value of the number in the variable sign, the number in the function sign starts from 0 like FUN0, and the function name "FUN 0" is only used for naming the function where the slice code is located, and the symbol of the user-defined function appearing in the code segment starts from "FUN 1". For example, the name of the function where the slice code is located, char _ param _1, is replaced with FUN0, and if there are also user-defined functions in the code segment, the replacement is started from FUN 1.

And step S150, converting the complete slice code into vector representation.

And inputting the complete code slice segment of the risk function into a word vector model, wherein the word vector model is trained in advance to obtain vector representation of the slice code. In one embodiment of the present application, independent byte units such as void, fun0 in slice code are used as word input word vector model, which converts each input word into a vector, for example, fun0 into a vector

. The Word vector model is a Word2Vec Word vector model based on a skip-gram mode, the Word vector model is trained in advance, and for the training process, the application trains a pseudo code Word stock to the basic Word2Vec to obtain the Word vector model which can be used for converting the pseudo code.

Wherein said transcoding the full slice into a vector representation comprises:

performing lexical analysis on the symbolized slice codes; each slice is code decomposed into a fixed number of tokens.

The method comprises the steps that a Token part scene is translated into a lexical unit which is a product of structural scanning of a program and represents a grammatical structure, slice codes are analyzed, the slice codes are converted into a structural body formed by a series of tokens, the number of tokens input by each slice code is unified to be 500, if the number of tokens exceeds 500, the first 500 tokens are intercepted, and if the number of tokens is less than 500, 0 is filled and supplemented.

And S160, inputting the vector representation into a detection neural network for detection so as to detect whether the program to be detected contains a bug.

And inputting the vector representation obtained in the step S150 into a detection neural network for detection, wherein the detection neural network represents whether the target program to be detected contains the bug or not according to the vector.

In one embodiment of the present application, the neural network for detection is shown in fig. 3 and includes a BiGRU layer, a self-attention layer, a flattening layer, a full link layer and an activation layer, and the neural network for detection is pre-trained.

Gru (gate recovery unit) is one of Recurrent Neural Networks (RNN). Like LSTM (Long-Short Term Memory), it is proposed to solve the problems of Long-Term Memory and gradient in back-propagation. In a unidirectional neural network architecture, states are always output from front to back. Therefore, the bidirectional neural network is provided, and the current output is determined according to the previous time step and the next time step, so that the extraction of the deep level features of the text is facilitated.

The vulnerability is often caused by a plurality of code sentences with associated semantics, the distance between the sentences is long, and in order to capture the long-distance semantic association, a BiGRU network which is more advantageous in capturing long text context semantic information is adopted to establish the context semantic association. The BiGRU, bidirectional Gated regenerative Unity, bi-directional Gated cyclic unit used in the present application is a neural network model composed of unidirectional and opposite-directional GRUs. At each time, the input provides two GRUs in opposite directions simultaneously, and the output is determined by both of the unidirectional GRUs.

In one embodiment of the present application, the BiGRU layer mainly includes 2 BiGRU layers, and the number of nodes in each BiGRU layer is 256.

Although BiGRU can extract context semantic information in slice code over long distances very well, different timestamps in BiGRU have different degrees of importance. In order to embody the importance, the code statements which are more relevant to the vulnerability have more important functions, and a self-attention mechanism is also adopted in the neural network for detection. This attention mechanism enables efficient processing of sequential data and takes into account the context of each timestamp. The self-attention layer adopts a sigmoid activation function, and the calculation formula is as follows:

wherein the content of the first and second substances,

which represents the data of the query, is,

an input matrix representing the query data,

the representation of the critical data is shown,

an input matrix representing key data, T represents a transpose operation,

a query (query) key value parameter is represented,

represents a key (key) key value parameter,

indicating an attention parameter;

and

respectively, the values of the offset are indicated,

，

an intermediate representation of the self-attentional value is shown,

to represent

Relative to

The value of the self-attentive force of (c),

representing the final output after multiplying the attention value,

representing the sigmoid activation function.

The method comprises the steps that a flattening layer is connected behind a self-attention layer and used for conducting multidimensional input in a one-dimensional mode, a full-connection layer is connected behind the flattening layer, an activation layer is connected behind the full-connection layer and can use sigmoid as an activation function to obtain final output, the flattening layer, the full-connection layer and the activation layer form a two-classifier, the two-classifier receives the BiGRU layer and the features extracted from the attention layer to achieve vulnerability detection, and finally, the loss value is calculated through binary cross entropy of the neural network.

The detection neural network is obtained through training, and the training process is similar to the detection method process of the application and can be explained mutually. Specifically, a training sample can be established to train a plurality of basic models, wherein one training sample is a complete section code sample carrying a label, wherein the complete section code sample can be obtained through the following processes from the first step to the third step, and the label is used for representing whether a program corresponding to the complete section code sample contains a vulnerability. Then, a preset network is trained by using the training sample, wherein the preset network can be a self-attention neural network. The training process is shown as the fifth step in the following steps.

The whole training stage mainly comprises five steps, namely, randomly selecting a plurality of binary programs, and performing decompilation operation on the selected binary codes to obtain pseudo codes corresponding to the binary codes. Secondly, finding dangerous functions (which can be based on the dangerous function set in the above embodiment of the present application) in the program pseudo code, extracting dangerous library/API function calls in each function in the pseudo code, and then extracting forward slice segments and backward slice segments of parameters and return values of the library/API function calls by using a program slicing technology; and thirdly, assembling the forward slice segment and the backward slice segment of each dangerous library/API function call in the program, wherein each assembled code segment is related to the corresponding library/API function call. In the training process, after a complete code slice is assembled, analyzing an actual vulnerability trigger code statement of each function, confirming whether the assembled slice code segment contains a vulnerability, marking each code slice with a label, if the assembled slice code segment contains the vulnerability, marking the assembled slice code segment as '1', otherwise marking the assembled slice code segment as '0'; and fourthly, symbolizing the assembled slice code segments to reduce the difference influence caused by non-ASCII characters, annotations, custom function names, variable names and the like. The training phase further comprises the steps of performing lexical analysis on the symbolized slice codes, converting each slice code into a structural body consisting of a series of tokens, and then constructing a corpus based on the tokens of all slice sequences. Based on the corpus, the dimension of the Word2Vec Word vector model Word vector of the training base is 100. The trained Word2Vec Word vector model can map all tokens into vector representations, and then assemble the vector representations of the tokens, so that each slice code can be vectorized to be used as the input of the deep learning model. After the training of the Word2Vec model is completed, vector representation of each code slice can be obtained; and fifthly, inputting the data converted into the vector into a BiGRU neural network based on a self-attention mechanism for training to obtain a detection neural network model.

According to the vulnerability detection method, firstly, the binary codes of the program to be detected are inversely compiled into the pseudo codes. Detecting a danger function in the pseudo code, taking the danger function as a slicing point, extracting a slicing code related to the calling of the danger function, converting the slicing code into vector representation, taking the vectorized slicing code as input, and judging whether the program to be detected contains a bug or not through a detection neural network. The method can be used for cross-architecture and cross-platform binary code vulnerability recognition scenes, fine-grained vulnerability detection is realized on the level of binary codes, automatic feature extraction can be effectively realized, high false alarm influence caused by different compiling options and patch codes is relieved, and the method has extremely high accuracy and extremely low false alarm rate and false alarm rate.

Compared with the existing binary code vulnerability detection method, the method has the following advantages that: 1. semantic features of relevant codes of the danger function can be automatically extracted for judgment without manpower, and the problem that experts are needed to define vulnerability features is solved; 2. the analysis is carried out based on a decompiling method, so that the influence caused by compiling difference and patch codes can be effectively resisted; 3. because the method and the device automatically extract the features and do not depend on the inherent features, the method and the device can identify the clone loopholes and detect unknown loopholes because the loopholes have similar loophole features; 4. the method has high expansibility, and the accuracy and the feasibility of the model are higher and higher with the increase of the vulnerability sample set, so that the method can be effectively used for cloning vulnerability identification, unknown vulnerability identification, vulnerability type identification and other scenes.

Based on the same inventive concept, an embodiment of the present application provides a vulnerability detection apparatus. Referring to fig. 4, fig. 4 is a schematic diagram of a vulnerability detection apparatus according to an embodiment of the present application. As shown in fig. 4, the apparatus includes:

the decompiling module 410 is used for decompiling the program to be detected to obtain a pseudo code of the program to be detected;

a hazard function determining module 420, configured to detect whether a hazard function is included in the pseudo code;

a slicing module 430, configured to, when a danger function is included, extract a forward slicing code segment and a backward slicing code segment of the danger function;

a slice code combining module 440, configured to combine the forward slice code segments and the backward slice code segments to obtain complete slice codes of the risk function;

a vector representation conversion module 450 for converting the full slice code into a vector representation;

a detection neural network 460 for receiving the vector representation to detect whether the program to be detected contains a bug.

Preferably, the slicing module includes:

In an optional embodiment of the present application, the neural network for detection includes a BiGRU layer, a self-attention layer, a flattening layer, a full link layer, and an activation layer, and the neural network for detection is pre-trained.

Further, the apparatus further comprises:

Based on the same inventive concept, another embodiment of the present application provides a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the vulnerability detection method according to any of the above embodiments of the present application.

Based on the same inventive concept, another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, the electronic device implements the steps in the vulnerability detection method according to any of the above embodiments of the present application.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method, the device, the equipment and the storage medium for detecting the vulnerability provided by the application are introduced in detail, a specific example is applied in the method to explain the principle and the implementation mode of the application, and the description of the embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A vulnerability detection method, the method comprising:

detecting whether the pseudo code contains a danger function or not;

transcoding the complete slice into a vector representation;

2. The method of claim 1, wherein the neural network comprises a BiGRU layer, a self-attention layer, a flattening layer, a full connectivity layer, and an activation layer, and wherein the neural network is pre-trained.

3. The method of claim 1, wherein extracting forward slice code segments and backward slice code segments of the risk function comprises:

4. The method of claim 1, further comprising:

removing all non-ASCII characters and comments in the pseudo code;

5. A vulnerability detection apparatus, the apparatus comprising:

6. The apparatus of claim 5, wherein the neural network comprises a BiGRU layer, a self-attention layer, a flattening layer, a full connectivity layer, and an activation layer, and wherein the neural network is pre-trained.

7. The apparatus of claim 5, wherein the slicing module comprises:

8. The apparatus of claim 5, further comprising:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as claimed in any one of claims 1 to 4.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when executing the computer program.