CN117171757A - Model construction method for software vulnerability discovery and software vulnerability discovery method - Google Patents

Model construction method for software vulnerability discovery and software vulnerability discovery method Download PDF

Info

Publication number
CN117171757A
CN117171757A CN202311052928.XA CN202311052928A CN117171757A CN 117171757 A CN117171757 A CN 117171757A CN 202311052928 A CN202311052928 A CN 202311052928A CN 117171757 A CN117171757 A CN 117171757A
Authority
CN
China
Prior art keywords
vulnerability
model
output result
initial
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311052928.XA
Other languages
Chinese (zh)
Inventor
崔旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311052928.XA priority Critical patent/CN117171757A/en
Publication of CN117171757A publication Critical patent/CN117171757A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Stored Programmes (AREA)

Abstract

The embodiment of the specification relates to the field of information security, and particularly provides a model construction method for software vulnerability discovery and a software vulnerability discovery method, which comprise the following steps: obtaining vulnerability codes and vulnerability description information corresponding to vulnerabilities of known vulnerability types; inputting the vulnerability description information into an initial vulnerability semantic model for vulnerability mining according to vulnerability semantics to obtain a first output result; inputting the vulnerability codes into an initial vulnerability structure model for vulnerability mining according to a vulnerability structure to obtain a second output result; and synthesizing the first output result and the second output result through a full connection layer, training the initial vulnerability semantic model and the initial vulnerability structural model, and constructing a fusion model. According to the embodiment of the specification, the software vulnerability mining model can be constructed to mine the software vulnerability, and the accuracy and efficiency of vulnerability mining are improved.

Description

Model construction method for software vulnerability discovery and software vulnerability discovery method
Technical Field
The embodiment of the specification relates to the field of information security, in particular to a model construction method for software vulnerability discovery and a software vulnerability discovery method.
Background
Vulnerability discovery can help organizations and enterprises identify and resolve potential security threats, prevent and malicious attackers from utilizing these vulnerabilities to invade systems, steal sensitive information, or destroy business processes. With the continuous development and application of information technology, vulnerability discovery is becoming more and more important.
The existing vulnerability mining algorithm may have a large number of false positives in the mining process, namely, normal codes or data are misjudged as vulnerabilities. This can bring significant effort to security researchers, and can also affect the efficiency of vulnerability discovery.
Therefore, a method for constructing a model for excavating software vulnerabilities is needed, which can construct a model for excavating software vulnerabilities and improve the accuracy and efficiency of the vulnerability excavation.
Disclosure of Invention
The embodiment of the specification aims to provide a model construction method and a software vulnerability mining method for software vulnerability mining, so as to construct a software vulnerability mining model to mine software vulnerabilities and improve accuracy and efficiency of vulnerability mining.
In order to achieve the above objective, in one aspect, an embodiment of the present disclosure provides a method for constructing a model for software vulnerability discovery, including:
obtaining vulnerability codes and vulnerability description information corresponding to vulnerabilities of known vulnerability types;
inputting the vulnerability description information into an initial vulnerability semantic model for vulnerability mining according to vulnerability semantics to obtain a first output result;
inputting the vulnerability codes into an initial vulnerability structure model for vulnerability mining according to a vulnerability structure to obtain a second output result;
and synthesizing the first output result and the second output result through a full connection layer, training the initial vulnerability semantic model and the initial vulnerability structural model, and constructing a fusion model.
Preferably, the step of integrating the first output result and the second output result through the full connection layer, the step of training the initial vulnerability semantic model and the initial vulnerability structural model, and the step of constructing a fusion model further include:
carrying out maximum pooling treatment on the first output result and the second output result to obtain a first output result and a second output result with fixed lengths;
and inputting the first output result and the second output result with the fixed lengths into a full-connection layer, and simultaneously training the initial vulnerability semantic model and the initial vulnerability structural model through a comprehensive loss function deployed in the full-connection layer to obtain a fusion model.
Preferably, the method for determining the comprehensive loss function further includes:
setting a first loss function for training the initial vulnerability semantic model;
setting a second loss function for training the initial vulnerability structure model;
and synthesizing the first loss function and the second loss function by using different weight factors to obtain a comprehensive loss function.
Preferably, the method comprises the steps of,
further comprises:
constructing a first loss function through a first output result and a first actual result corresponding to at least one vulnerability of a known vulnerability type;
and constructing a second loss function through a second output result and a second actual result corresponding to the loopholes of the at least one known loophole type.
Preferably, the weight factor corresponding to the first loss function is smaller than the weight factor corresponding to the second loss function.
Preferably, a model building apparatus, the apparatus includes:
the known type acquisition module is used for acquiring vulnerability codes and vulnerability description information corresponding to vulnerabilities of known vulnerability types;
the information input module is used for inputting the vulnerability description information into an initial vulnerability semantic model for vulnerability mining according to vulnerability semantics to obtain a first output result;
the code input module is used for inputting the vulnerability code into an initial vulnerability structure model for carrying out vulnerability mining according to a vulnerability structure to obtain a second output result;
the construction module is used for integrating the first output result and the second output result through the full connection layer, training the initial vulnerability semantic model and the initial vulnerability structural model and constructing a fusion model.
Preferably, a software vulnerability discovery method, based on the above model building method, includes:
obtaining vulnerability codes and vulnerability description information corresponding to vulnerabilities of unknown vulnerability types;
and inputting the vulnerability codes and the vulnerability description information into a fusion model to obtain the vulnerability types corresponding to the vulnerabilities.
On the other hand, the embodiment of the specification provides a software vulnerability discovery device, based on the model building device, including:
the unknown type acquisition module is used for acquiring the vulnerability codes and vulnerability description information corresponding to the vulnerabilities of the unknown vulnerability types;
and the vulnerability type determining module is used for inputting the vulnerability codes and the vulnerability description information into a fusion model to obtain the vulnerability types corresponding to the vulnerabilities.
In yet another aspect, embodiments of the present disclosure further provide a computer device including a memory, a processor, and a computer program stored on the memory, which when executed by the processor, performs instructions of any one of the methods described above.
In yet another aspect, embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor of a computer device, performs instructions of any of the methods described above.
A computer program product which, when executed by a processor of a computer device, performs instructions of any of the methods described above.
According to the technical scheme provided by the embodiment of the specification, by the method of the embodiment of the specification, the vulnerability codes and the vulnerability description information corresponding to the vulnerabilities of the known vulnerability types can be respectively input into the initial vulnerability semantic model and the initial vulnerability structural model, the first output result and the second output result are synthesized through the full-connection layer, and the two models are trained simultaneously to construct the fusion model. The fusion model can be used for predicting the loopholes of the unknown loopholes after being constructed, and the accuracy and the efficiency of the loophole type prediction can be improved by comprehensively considering the loophole codes and the loophole description information of the unknown loopholes.
The foregoing and other objects, features and advantages of the application will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow diagram of a model building method for software vulnerability discovery according to an embodiment of the present disclosure;
FIG. 2 shows a schematic flow chart of training an initial vulnerability semantic model and an initial vulnerability structural model to obtain a fusion model according to an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating a method for determining a comprehensive loss function according to an embodiment of the present disclosure;
fig. 4 is a schematic flow chart of a software vulnerability discovery method according to an embodiment of the present disclosure;
fig. 5 is a schematic block diagram of a model building device for software vulnerability discovery according to an embodiment of the present disclosure;
fig. 6 is a schematic block diagram of a software vulnerability discovery apparatus according to an embodiment of the present disclosure;
fig. 7 shows a schematic structural diagram of a computer device provided in an embodiment of the present specification.
Description of the drawings:
100. a known type acquisition module;
200. an information input module;
300. a code input module;
400. constructing a module;
500. an unknown type acquisition module;
600. a vulnerability type determining module;
702. a computer device;
704. a processor;
706. a memory;
708. a driving mechanism;
710. an input/output module;
712. an input device;
714. an output device;
716. a presentation device;
718. a graphical user interface;
720. a network interface;
722. a communication link;
724. a communication bus.
Detailed Description
The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the embodiments herein, are intended to be within the scope of the embodiments herein.
The existing vulnerability mining algorithm may have a large number of false positives in the mining process, namely, normal codes or data are misjudged as vulnerabilities. This can bring significant effort to security researchers, and can also affect the efficiency of vulnerability discovery.
In order to solve the above problems, the embodiment of the present disclosure provides a method for constructing a model for software vulnerability discovery. FIG. 1 is a flow diagram of a model building method for software vulnerability discovery provided by an embodiment of the present specification, which provides the method operational steps as described in the examples or flow diagrams, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings.
It should be noted that the terms "first," "second," and the like in the description and the claims of the embodiments of the present specification and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the present description described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.
Referring to fig. 1, an embodiment of the present disclosure provides a method for constructing a model for software vulnerability discovery, including:
s101: obtaining vulnerability codes and vulnerability description information corresponding to vulnerabilities of known vulnerability types;
s102: inputting the vulnerability description information into an initial vulnerability semantic model for vulnerability mining according to vulnerability semantics to obtain a first output result;
s103: inputting the vulnerability codes into an initial vulnerability structure model for vulnerability mining according to a vulnerability structure to obtain a second output result;
s104: and synthesizing the first output result and the second output result through a full connection layer, training the initial vulnerability semantic model and the initial vulnerability structural model, and constructing a fusion model.
The loopholes of the known loophole types can be obtained from a public loophole library and used as sample data for training an initial loophole semantic model and an initial loophole structure model, and corresponding loophole codes and loophole description information corresponding to the loopholes of the known loophole types need to be obtained, wherein the loophole codes can be source codes with the loopholes or binary codes with the loopholes, and the loophole description information can be information related to the loopholes in natural language texts such as loophole description, loophole report, loophole annotation, loophole comment and the like.
For example, the vulnerability code may be:
indicating that $ { bussinesstable }, there is a risk of SQL injection.
For example, vulnerability description information may be:
OpenSSH is an implementation of the SSH protocol of open source code, and target migration works under a variety of systems. The transplanted OpenSSH version has a timing sequence common problem under the condition of supporting PAM, and a remote attacker can judge whether a user exists or not by utilizing the vulnerability, so that information leakage is caused.
The method comprises the steps of inputting vulnerability description information into an initial vulnerability semantic model, carrying out vulnerability mining by using semantics described by natural language, inputting vulnerability codes into an initial vulnerability structure model, and carrying out vulnerability mining by using the vulnerability codes representing a vulnerability structure, wherein the initial vulnerability semantic model can be a pretrained model such as BERT and the like, and the initial vulnerability structure model can be a neural network model such as CNN and the like.
The first output result and the second output result can be synthesized through the full connection layer, the initial vulnerability semantic model and the initial vulnerability structural model are trained at the same time, a fusion model is obtained after training, and the fusion model can be predicted through two aspects of vulnerability codes and vulnerability descriptions at the same time, so that the type of the vulnerability is determined.
According to the method, the initial vulnerability semantic model and the initial vulnerability structural model can be respectively input through the vulnerability codes and the vulnerability description information corresponding to the vulnerabilities of the known vulnerability types, the first output result and the second output result are synthesized through the full-connection layer, the two models are trained at the same time, and the fusion model is constructed. The fusion model can be used for predicting the loopholes of the unknown loopholes after being constructed, and the accuracy and the efficiency of the loophole type prediction can be improved by comprehensively considering the loophole codes and the loophole description information of the unknown loopholes.
In this embodiment of the present disclosure, referring to fig. 2, the synthesizing, by a full connection layer, the first output result and the second output result, training the initial vulnerability semantic model and the initial vulnerability structural model, and building a fusion model further includes:
s201: carrying out maximum pooling treatment on the first output result and the second output result to obtain a first output result and a second output result with fixed lengths;
s202: and inputting the first output result and the second output result with the fixed lengths into a full-connection layer, and simultaneously training the initial vulnerability semantic model and the initial vulnerability structural model through a comprehensive loss function deployed in the full-connection layer to obtain a fusion model.
The vulnerability description information characterized as natural language text is converted into a computer recognizable vector through Word2Vec processing, and the initial vulnerability semantic model can be a pre-training model, wherein a dropout layer is arranged to prevent overfitting.
The method comprises the steps of processing a vulnerability code by using Word2Vec, converting the processing into a vector identifiable by a computer, wherein an initial vulnerability structure model can be a neural network model, the number of convolution kernels of the neural network model is 512, the convolution kernel size is m x k, k is the length of the vector processed by the Word2Vec, and a ReLU is used as an activation function of a convolution layer.
Before the first output result and the second output result are synthesized through the full connection layer, the output of the initial vulnerability semantic model and the initial vulnerability structural model is a variable-length sequence, the length of the variable-length sequence depends on the lengths of the input vulnerability codes and the vulnerability description information, and the first output result and the second output result are respectively subjected to maximum pooling operation and converted into the first output result and the second output result with fixed lengths.
The specific first output result and the specific second output result are generally output vectors, the initial vulnerability semantic model and the initial vulnerability structural model feature extraction are processed through the maximum pooling operation, and the maximum pooling is only carried out along the length 1 of the vector matrix, so that the features with fixed lengths can be respectively generated and used as the input of the full-connection layer. Each node of the full connection layer is connected with all nodes of the upper layer, so that all the characteristics of the characteristic vector can be integrated.
It should be noted that, in the embodiment of the present disclosure, the full connection layer (fully connected layers, FC) may perform classification and identification, and in the process of training the initial vulnerability semantic model and the initial vulnerability structural model simultaneously by using the comprehensive loss function deployed in the full connection layer, the value of the comprehensive loss function is continuously reduced until the training of the initial vulnerability semantic model and the initial vulnerability structural model is completed when the value of the comprehensive loss function is smaller than a set value, and the two models are combined together to obtain a fusion model, where the fusion model may identify a software vulnerability of an unknown vulnerability type.
Wherein, referring to fig. 3, the method for determining the comprehensive loss function further includes:
s301: setting a first loss function for training the initial vulnerability semantic model;
s302: setting a second loss function for training the initial vulnerability structure model;
s303: and synthesizing the first loss function and the second loss function by using different weight factors to obtain a comprehensive loss function.
In this embodiment of the present specification, further includes:
constructing a first loss function through a first output result and a first actual result corresponding to at least one vulnerability of a known vulnerability type;
and constructing a second loss function through a second output result and a second actual result corresponding to the loopholes of the at least one known loophole type.
Specifically, the first loss function includes:
where n is the number of vulnerabilities of known vulnerability type, y i For the first output result corresponding to the i-th vulnerability of the known vulnerability type,the first actual result corresponding to the i-th vulnerability of the known vulnerability type;
the second loss function includes:
where n is the number of vulnerabilities of known vulnerability type, y j For a second output result corresponding to the j-th vulnerability of the known vulnerability type,and a second actual result corresponding to the j-th vulnerability of the known vulnerability type.
For the first loss function or the second loss function, taking the first loss function as an example, y i For the first output result corresponding to the i-th vulnerability of the known vulnerability types, the first output result is the vulnerability type obtained by predicting the i-th vulnerability through the initial vulnerability semantic model,the first actual result corresponding to the loophole of the ith known loophole type, namely the actual loophole type of the ith loophole, because the actual loophole type is known, the first actual result is +.>Known vulnerability type y obtained by predicting initial vulnerability semantic model through first loss function i Actual vulnerability type with corresponding vulnerability->To be a gap between them, the model can be aided in optimization. The second loss function is the same as the first loss function, and the description thereof will not be repeated.
Wherein the weight factor corresponding to the first loss function is smaller than the weight factor corresponding to the second loss function. In general, the vulnerability code is considered to be more capable of representing the type of the vulnerability, the vulnerability description information can be used as a reference to assist in judging the type of the vulnerability, and the specific weight factors corresponding to the first loss function and the second loss function can be 30% and 70%, or 40% and 60%, etc., which is not limited in the embodiment of the present specification.
For example, the weight factors corresponding to the first and second loss functions may be 30% and 70%. The comprehensive loss function at this time is:
L=0.3·L 1 +0.7·L 2
wherein L is a comprehensive loss function, L 1 As a first loss function, L 2 Is a second loss function.
Based on the above model construction method for software vulnerability discovery, referring to fig. 4, an embodiment of the present disclosure further provides a software vulnerability discovery method, including:
s401: obtaining vulnerability codes and vulnerability description information corresponding to vulnerabilities of unknown vulnerability types;
s402: and inputting the vulnerability codes and the vulnerability description information into a fusion model to obtain the vulnerability types corresponding to the vulnerabilities.
For the loopholes of unknown loopholes types, the loopholes codes and the loophole description information of the loopholes need to be acquired, the loopholes codes and the loophole description information are input into a fusion model, the loophole description information is input into a trained initial loophole semantic model, the loopholes codes are input into a trained initial loophole structure model, and the types of the unknown loopholes are predicted through the fusion model.
The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party. In addition, the technical scheme described by the embodiment of the application accords with the relevant regulations of national laws and regulations for acquiring, storing, using, processing and the like of the data.
Based on the above-mentioned model construction method for software vulnerability discovery, the embodiment of the present specification further correspondingly provides a model construction device for software vulnerability discovery. The apparatus may include a system (including a distributed system), software (applications), modules, components, servers, clients, etc. that employ the methods described in the embodiments of the present specification in combination with the necessary apparatus to implement the hardware. Based on the same innovative concepts, the embodiments of the present description provide means in one or more embodiments as described in the following embodiments. Because the implementation scheme and the method for solving the problem by the device are similar, the implementation of the device in the embodiment of the present disclosure may refer to the implementation of the foregoing method, and the repetition is not repeated. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Specifically, fig. 5 is a schematic block diagram of an embodiment of a model building device for software vulnerability discovery provided in an embodiment of the present disclosure, and referring to fig. 5, the model building device for software vulnerability discovery provided in an embodiment of the present disclosure includes: a known type acquisition module 100, an information input module 200, a code input module 300, a construction module 400.
The known type acquisition module 100 is configured to acquire vulnerability codes and vulnerability description information corresponding to vulnerabilities of known vulnerability types;
the information input module 200 is configured to input the vulnerability description information into an initial vulnerability semantic model for performing vulnerability mining according to vulnerability semantics, so as to obtain a first output result;
the code input module 300 is configured to input the vulnerability code into an initial vulnerability structure model for performing vulnerability mining according to a vulnerability structure, so as to obtain a second output result;
the construction module 400 is configured to integrate the first output result and the second output result through a full connection layer, train the initial vulnerability semantic model and the initial vulnerability structural model, and construct a fusion model.
Based on the above-mentioned software vulnerability discovery method, referring to fig. 6, the embodiment of the present disclosure further provides a software vulnerability discovery device correspondingly. The software vulnerability discovery device provided in the embodiment of the present specification includes: unknown type acquisition module 500, vulnerability type determination module 600:
the unknown type obtaining module 500 is configured to obtain a vulnerability code and vulnerability description information corresponding to a vulnerability of an unknown vulnerability type;
and the vulnerability type determining module 600 is configured to input the vulnerability code and the vulnerability description information into a fusion model, so as to obtain a vulnerability type corresponding to the vulnerability.
Referring to fig. 7, a computer device 702 is further provided in an embodiment of the present disclosure based on the above-mentioned method for constructing a model of software vulnerability discovery or the method for software vulnerability discovery, where the above-mentioned method is run on the computer device 702. The computer device 702 may include one or more processors 704, such as one or more Central Processing Units (CPUs) or Graphics Processors (GPUs), each of which may implement one or more hardware threads. The computer device 702 may also comprise any memory 706 for storing any kind of information, such as code, settings, data, etc., and in a particular embodiment, a computer program on the memory 706 and executable on the processor 704, which computer program, when executed by the processor 704, may execute instructions according to the methods described above. For example, and without limitation, the memory 706 may include any one or more of the following combinations: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may store information using any technique. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 702. In one case, the computer device 702 can perform any of the operations of the associated instructions when the processor 704 executes the associated instructions stored in any memory or combination of memories. The computer device 702 also includes one or more drive mechanisms 708, such as a hard disk drive mechanism, an optical disk drive mechanism, and the like, for interacting with any memory.
The computer device 702 may also include an input/output module 710 (I/O) for receiving various inputs (via an input device 712) and for providing various outputs (via an output device 714). One particular output mechanism may include a presentation device 716 and an associated graphical user interface 718 (GUI). In other embodiments, input/output module 710 (I/O), input device 712, and output device 714 may not be included as just one computer device in a network. The computer device 702 can also include one or more network interfaces 720 for exchanging data with other devices via one or more communication links 722. One or more communication buses 724 couple the above-described components together.
Communication link 722 may be implemented in any manner, for example, through a local area network, a wide area network (e.g., the internet), a point-to-point connection, etc., or any combination thereof. Communication link 722 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.
Corresponding to the method in fig. 1-4, the present description also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.
The present embodiments also provide a computer readable instruction wherein the program therein causes the processor to perform the method as shown in fig. 1 to 4 when the processor executes the instruction.
The present description also provides a computer program product, wherein the method as shown in fig. 1 to 4 is performed when the computer program product is run by a processor of a computer device.
It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation of the embodiments of the present disclosure.
It should also be understood that, in the embodiments of the present specification, the term "and/or" is merely one association relationship describing the association object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In the embodiment of the present specification, the character "/", generally indicates that the front and rear associated objects are in an "or" relationship.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the various illustrative elements and steps have been described above generally in terms of function in order to best explain the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this specification, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present description.
In addition, each functional unit in each embodiment of the present specification may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present specification are essential or contribute to the prior art, or all or part of the technical solutions may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present specification. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Specific embodiments are applied in the present specification to illustrate the principles and implementations of the embodiments of the present specification, and the description of the above embodiments is only used to help understand the methods of the embodiments of the present specification and the core ideas thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope based on the ideas of the embodiments of the present specification, the contents of the present specification should not be construed as limiting the embodiments of the present specification in view of the above.

Claims (11)

1. The model construction method for the software vulnerability mining is characterized by comprising the following steps of:
obtaining vulnerability codes and vulnerability description information corresponding to vulnerabilities of known vulnerability types;
inputting the vulnerability description information into an initial vulnerability semantic model for vulnerability mining according to vulnerability semantics to obtain a first output result;
inputting the vulnerability codes into an initial vulnerability structure model for vulnerability mining according to a vulnerability structure to obtain a second output result;
and synthesizing the first output result and the second output result through a full connection layer, training the initial vulnerability semantic model and the initial vulnerability structural model, and constructing a fusion model.
2. The model building method according to claim 1, wherein the synthesizing the first output result and the second output result through the full connection layer trains the initial vulnerability semantic model and the initial vulnerability structural model, and building a fusion model further comprises:
carrying out maximum pooling treatment on the first output result and the second output result to obtain a first output result and a second output result with fixed lengths;
and inputting the first output result and the second output result with the fixed lengths into a full-connection layer, and simultaneously training the initial vulnerability semantic model and the initial vulnerability structural model through a comprehensive loss function deployed in the full-connection layer to obtain a fusion model.
3. The model construction method according to claim 2, wherein the method of determining the integrated loss function further comprises:
setting a first loss function for training the initial vulnerability semantic model;
setting a second loss function for training the initial vulnerability structure model;
and synthesizing the first loss function and the second loss function by using different weight factors to obtain a comprehensive loss function.
4. A model building method according to claim 3, further comprising:
constructing a first loss function through a first output result and a first actual result corresponding to at least one vulnerability of a known vulnerability type;
and constructing a second loss function through a second output result and a second actual result corresponding to the loopholes of the at least one known loophole type.
5. A model building method according to claim 3, wherein the weight factor corresponding to the first loss function is smaller than the weight factor corresponding to the second loss function.
6. A model building apparatus, characterized in that the apparatus comprises:
the known type acquisition module is used for acquiring vulnerability codes and vulnerability description information corresponding to vulnerabilities of known vulnerability types;
the information input module is used for inputting the vulnerability description information into an initial vulnerability semantic model for vulnerability mining according to vulnerability semantics to obtain a first output result;
the code input module is used for inputting the vulnerability code into an initial vulnerability structure model for carrying out vulnerability mining according to a vulnerability structure to obtain a second output result;
the construction module is used for integrating the first output result and the second output result through the full connection layer, training the initial vulnerability semantic model and the initial vulnerability structural model and constructing a fusion model.
7. A method for mining software vulnerabilities, characterized in that the method for constructing a model according to any one of claims 1 to 5 comprises:
obtaining vulnerability codes and vulnerability description information corresponding to vulnerabilities of unknown vulnerability types;
and inputting the vulnerability codes and the vulnerability description information into a fusion model to obtain the vulnerability types corresponding to the vulnerabilities.
8. A software vulnerability discovery apparatus according to claim 6, comprising:
the unknown type acquisition module is used for acquiring the vulnerability codes and vulnerability description information corresponding to the vulnerabilities of the unknown vulnerability types;
and the vulnerability type determining module is used for inputting the vulnerability codes and the vulnerability description information into a fusion model to obtain the vulnerability types corresponding to the vulnerabilities.
9. A computer device comprising a memory, a processor, and a computer program stored on the memory, characterized in that the computer program, when being executed by the processor, performs the instructions of the method according to any one of claims 1-5 and claim 7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor of a computer device, executes instructions of a method according to any one of claims 1-5 and 7.
11. A computer program product, characterized in that the computer program product, when being executed by a processor of a computer device, executes instructions of the method according to any one of claims 1-5 and 7.
CN202311052928.XA 2023-08-21 2023-08-21 Model construction method for software vulnerability discovery and software vulnerability discovery method Pending CN117171757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311052928.XA CN117171757A (en) 2023-08-21 2023-08-21 Model construction method for software vulnerability discovery and software vulnerability discovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311052928.XA CN117171757A (en) 2023-08-21 2023-08-21 Model construction method for software vulnerability discovery and software vulnerability discovery method

Publications (1)

Publication Number Publication Date
CN117171757A true CN117171757A (en) 2023-12-05

Family

ID=88935887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311052928.XA Pending CN117171757A (en) 2023-08-21 2023-08-21 Model construction method for software vulnerability discovery and software vulnerability discovery method

Country Status (1)

Country Link
CN (1) CN117171757A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574393A (en) * 2024-01-16 2024-02-20 国网浙江省电力有限公司 Method, device, equipment and storage medium for mining loopholes of information terminal

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574393A (en) * 2024-01-16 2024-02-20 国网浙江省电力有限公司 Method, device, equipment and storage medium for mining loopholes of information terminal
CN117574393B (en) * 2024-01-16 2024-03-29 国网浙江省电力有限公司 Method, device, equipment and storage medium for mining loopholes of information terminal

Similar Documents

Publication Publication Date Title
EP3475822B1 (en) Methods and apparatus for detecting whether a string of characters represents malicious activity using machine learning
Do et al. Deep learning for phishing detection: Taxonomy, current challenges and future directions
EP3323075B1 (en) Malware detection
KR101949338B1 (en) Method for detecting sql injection from payload based on machine learning model and apparatus using the same
US10186060B2 (en) Method for processing graphs and information processing apparatus
Moh et al. Detecting web attacks using multi-stage log analysis
US9305116B2 (en) Dual DFA decomposition for large scale regular expression matching
Ali Alheeti et al. Intelligent intrusion detection in external communication systems for autonomous vehicles
Gai et al. Security-aware information classifications using supervised learning for cloud-based cyber risk management in financial big data
Mengi et al. Automated machine learning (AutoML): The future of computational intelligence
Ravi et al. A Multi-View attention-based deep learning framework for malware detection in smart healthcare systems
CN117171757A (en) Model construction method for software vulnerability discovery and software vulnerability discovery method
Shettar et al. Intrusion detection system using MLP and chaotic neural networks
Yoo et al. The image game: exploit kit detection based on recursive convolutional neural networks
AU2022215147B2 (en) Machine learning methods and systems for determining file risk using content disarm and reconstruction analysis
Kang et al. A study on variant malware detection techniques using static and dynamic features
Vijayalakshmi et al. Hybrid dual-channel convolution neural network (DCCNN) with spider monkey optimization (SMO) for cyber security threats detection in internet of things
Yi et al. Recurrent semantic learning-driven fast binary vulnerability detection in healthcare cyber physical systems
KR20200133644A (en) Artificial intelligence based apparatus and method for classifying malicious multimedia file, and computer readable recording medium recording program for performing the method
CN109800797A (en) File black and white judgment method, device and equipment based on AI
CN115567305B (en) Sequential network attack prediction analysis method based on deep learning
Feng et al. Detecting contradictions from IoT protocol specification documents based on neural generated knowledge graph
Bu et al. Triplet-trained graph transformer with control flow graph for few-shot malware classification
Jonnala et al. Malware detection using binary visualization and neural networks
Santoso et al. Malware Detection using Hybrid Autoencoder Approach for Better Security in Educational Institutions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination