CN113742733A - Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device - Google Patents

Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device Download PDF

Info

Publication number
CN113742733A
CN113742733A CN202110909147.2A CN202110909147A CN113742733A CN 113742733 A CN113742733 A CN 113742733A CN 202110909147 A CN202110909147 A CN 202110909147A CN 113742733 A CN113742733 A CN 113742733A
Authority
CN
China
Prior art keywords
vulnerability
event trigger
description
answer
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110909147.2A
Other languages
Chinese (zh)
Other versions
CN113742733B (en
Inventor
李莉莉
孙小兵
薄莉莉
魏颖
李斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN202110909147.2A priority Critical patent/CN113742733B/en
Publication of CN113742733A publication Critical patent/CN113742733A/en
Application granted granted Critical
Publication of CN113742733B publication Critical patent/CN113742733B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for reading and understanding vulnerability event trigger word extraction and vulnerability type identification, wherein the method comprises the following steps: collecting vulnerability data; vulnerability description statement representation learning; constructing a syntactic dependency relationship of the vulnerability description text by using a Graph Convolution Network (GCN), and extracting vulnerability characteristics; and recognizing and classifying vulnerability event trigger words based on the question-answering task in the BERT fine tuning model. The vulnerability classification method can better utilize grammar and semantic information in vulnerability description, fully excavate context information in vulnerability description, achieve recognition and classification of vulnerability event trigger words, solve the problem of inaccurate vulnerability classification to a certain extent, capture the dependence relationship among different events compared with the current popular event trigger word extraction method, and output the trigger words of vulnerability events to assist developers in analyzing vulnerabilities.

Description

Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
Technical Field
The invention belongs to the field of software security, and particularly relates to a method and a device for reading and understanding vulnerability event trigger word extraction and vulnerability type identification.
Background
Software bugs can weaken the security of computer software, leading to problems of data loss and tampering, privacy disclosure, and the like. From the reason of the bug generation, the software bug mainly comprises buffer overflow, no expected check on the input content and the like. With the rapid development of computers and the internet and the increase of 0-day bugs, software bugs cause huge damage to individuals, communities and countries, and in order to perform security assessment, it is necessary to identify and classify bug trigger words (occurrence causes). In previous work, features are learned from original texts through typical neural networks (CNNs, RNNs and the like), and some additional fine-grained information is used to improve representation, such as entity-level features, document-level features and grammar-level features, aiming at locating and identifying the classification of each event trigger/parameter. Much recent work has explored the use of pre-trained language models for feature learning. Because pre-trained language models can use a large amount of unlabeled data to learn the universal language representation, the method for learning features by using the pre-trained language models is usually improved considerably compared with the method for learning features by using a traditional neural network, but the pre-trained models are not combined with other fine-grained information, so that the model loses syntactic dependencies in vulnerability description sentences, and the recognition accuracy of trigger words is not high.
At present, some work uses a machine learning/deep learning method to extract event trigger words and identify event types, for example, a new joint multi-event Extraction framework is proposed in a document "join Multiple Events Extraction view attachment-based Graph Information Aggregation", word characterization is performed by connecting word embedding vectors, part-of-speech tagging vectors, position embedding vectors and entity tagging vectors in series, and a grammar shortcut arc is introduced to enhance Information flow and an attention-based Graph convolution network to model Graph Information, so that a plurality of event trigger words and parameters are Jointly extracted. However, the generalization capability of the word vector of the model is poor, and the character level, the word level, the sentence level and the relation characteristics among the sentences cannot be fully described. Some works begin to use a pre-training method to identify vulnerability trigger words, for example, in the document "Event Extraction as Machine Reading comparison" that sentence representation learning is performed by using a BERT pre-training model, trigger words of events are extracted based on Reading understanding tasks, and events are classified by using a logistic regression model, but the grammatical relation in texts is not used, and the association among a plurality of events cannot be captured, so that the Event type identification is limited.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems, the invention provides a method and a device for reading and understanding vulnerability event trigger word extraction and vulnerability type identification, which can be used for determining the cause of vulnerability generation and assisting developers in vulnerability repair.
The technical scheme is as follows: the invention relates to a reading understanding vulnerability event trigger word extraction and vulnerability type identification method, which specifically comprises the following steps:
(1) acquiring vulnerability data, acquiring a CVE-ID of a vulnerability entry, vulnerability description and vulnerability type corresponding to each ID, and designing a question Q for a vulnerability event;
(2) based on a BERT pre-training model, performing vulnerability description statement representation learning as initial node characteristics input by GCN;
(3) extracting node characteristics of vulnerability information by using a Graph Convolution Network (GCN);
(4) and recognizing and classifying vulnerability event trigger words based on the question-answering task in the BERT fine tuning model.
Further, the step (2) comprises the steps of:
(21) converting the description Text of the designed question Q and the vulnerability entry into an input sequence of a BERT pre-training model; the method is characterized in that a special mark [ CLS ] is placed at the beginning for fusing semantic information of each word in description, and problem and vulnerability descriptions are separated by using [ SEP ]; converting each word into Token embedding, Segment embedding and Position embedding, and summing the embedded representations to obtain a representation vector;
(22) and transmitting the expression vector to an encoder layer of the BERT, predicting a next sentence task by utilizing a Transformer in combination with a mask language model to realize a bidirectional language model task, and performing expression learning to obtain an embedded vector X serving as an initial node characteristic input by the GCN.
Further, the step (3) includes the steps of:
(31) based on the text description of the vulnerability entry, acquiring the syntactic dependency relationship of the vulnerability description text by using a Stanford syntactic analysis tool;
(32) constructing a syntax information graph G (V, E) of the vulnerability description according to the syntax dependency relationship; where V is the word node V1,v2,...,vi...,vnSet of (v)iRepresenting the ith word in the vulnerability description, n being the number of words in the vulnerability description, E being the node viTo node vjDirected edge (v)i,vj) A set of (a); adding a reverse edge (v) to each directed edgej,vi) Each node viAdding a self-looping edge (v)i,vi) And adding a relationship type label K (v) for each edgei,vj);
Obtaining the adjacency matrix A based on the syntactic information graph G, i.e. if the node viAnd node vjConnected to and adjoining the element a of the ith row and jth column in matrix Aij1, otherwise aij=0;
Figure BDA0003202813270000031
Is a normalized matrix of the adjacency matrix A, and is obtained by the following transformation:
Figure BDA0003202813270000032
wherein a' ═ a + I, where I is the identity matrix;
Figure BDA0003202813270000033
is the degree matrix of A';
(34) carrying out gradient descent training on the information of the vulnerability nodes, extracting characteristics of the vulnerability nodes, and transforming as follows:
Figure BDA0003202813270000034
in the formula (I), the compound is shown in the specification,
Figure BDA0003202813270000035
vulnerability node information input by the l layer of the graph convolution neural network; using normalized matrices
Figure BDA0003202813270000036
And a label K (v) of a specific type per layeri,vj) Weight matrix of
Figure BDA0003202813270000037
Linear transformation is carried out, and then the nonlinear activation function sigma is carried out to obtain the input loophole node information of the next layer
Figure BDA0003202813270000038
Performing convolution training for multiple times to obtain a feature vector of the vulnerability node;
(35) the operation is also carried out for the question asked by the bug event trigger word, the syntactic dependency relationship is constructed, and the feature vector of the question sentence is obtained.
Further, the steps include the steps of:
(41) the problem description characteristic vector A and the vulnerability description characteristic vector B are accessed to a full connection layer and a softmax layer in a BERT question-answering task;
(42) introducing a starting vector S and an ending vector E for the BERT question-answering task, and calculating the probability P that the ith word in the vulnerability description starts as the answer spaniThe beginning of the most probable word as the span of answers is transformed as follows:
Figure BDA0003202813270000039
wherein, TiIs the feature vector of word i; using formula in the same way
Figure BDA00032028132700000310
Calculating the end of the answer span; defining the score of the candidate answer from position i to position j as Si,j=S·Ti+E·TjTaking the maximum score span when j is larger than or equal to i as a prediction result;
simultaneously, the non-answer prediction is carried out, and the question without answer is regarded as [ CLS ]]Marking the answer span of the beginning and the end, calculating the score S without answernull(ii) S · C + E · C, wherein C is a special mark [ CLS]The vector of (a);
will have no answer span SnullScore of (2) and score of best non-null span Si,jComparing; when S isi,j>SnullWhen the answer is positive tau, the tau is a self-defined threshold value, a non-null answer is predicted, and the answer is a vulnerability event trigger word;
(43) based on the vulnerability event trigger words, the feature vector of each word is used as the input of a logistic regression model, and the probability that the trigger words belong to different vulnerability types is calculated to predict the categories of vulnerability events.
Based on the same inventive concept, the invention also provides a device for extracting the read understanding vulnerability event trigger words and identifying the vulnerability type, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the computer program realizes the method for extracting the read understanding vulnerability event trigger words and identifying the vulnerability type when being loaded to the processor.
Has the advantages that: compared with the prior art, the invention has the beneficial effects that: 1. constructing word vector representation of vulnerability description by using a BERT pre-training model, and fully describing character level, word level, sentence level and relation characteristics among sentences; 2. expressing syntax information of vulnerability description from the angle of a graph, constructing syntax dependence relationship of vulnerability description, bringing relationship among words into the learning process of a model, and giving different weights to different types of relationships to learn the influence of different dependence relationships on trigger word identification and classification effects; 3. different from the traditional vulnerability classification method, the vulnerability classification method uses the vulnerability trigger words and the vulnerability types as final output results, can clarify the causes of vulnerability generation, and assists developers in vulnerability repair.
Drawings
FIG. 1 is a flowchart of a method for reading and understanding vulnerability event trigger word extraction and vulnerability type identification;
FIG. 2 is a description of vulnerability entries and their types;
FIG. 3 is a syntactic dependency of a vulnerability description;
FIG. 4 is a schematic diagram of the BERT question-answering task.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The invention provides a reading understanding-based vulnerability event trigger word extraction and vulnerability type identification method, which specifically comprises the following steps of:
step 1: and (5) vulnerability data acquisition.
(1.1) acquiring CVE-IDs of all vulnerability entries from the vulnerability database NVD in 1999, and vulnerability descriptions and vulnerability types corresponding to each ID; as shown in fig. 2.
(1.2) firstly, making the description of the vulnerability entries collected in the step (1.1) into word lists WordList with sequence numbers corresponding to words one by one and without repetition; then, a question Q is designed for the vulnerability event: what is the house of the vulgaris?
Step 2: and (4) performing vulnerability description statement representation learning based on a BERT pre-training model, and using the vulnerability description statement representation learning as initial node features input by the GCN.
And (2.1) converting the question Q designed in the step (1.2) and the description Text of the vulnerability entry into an input sequence of a BERT model. Namely, a special mark [ CLS ] is placed at the beginning for fusing semantic information of each word in the description, and the problem description and the vulnerability description are separated by using [ SEP ]. Converting each word into Token embedding (based on WordList generation in step (1.2), converting a word into a vector of fixed dimensions), Segment embedding (representing the sentence to which the word belongs) and Position embedding (indicating the Position information of the word in the sentence), and summing these embedded representations to obtain a representation vector.
(2.2) passing the representation vector generated in step (2.1) to the encoder layer of the BERT. And (3) realizing a bidirectional language model task by using a Transformer in combination with a mask language model and predicting a next sentence task, and performing expression learning to obtain an embedded vector X as an initial node characteristic input by the GCN.
And step 3: and extracting the node characteristics of the vulnerability information by using a Graph Convolution Network (GCN).
And (3.1) acquiring the syntactic dependency relationship of the vulnerability description text by using a Stanford syntactic analysis tool based on the vulnerability description obtained in the step (1.1), as shown in FIG. 3.
And (3.2) constructing a syntax information graph G which describes the text according to the syntax dependency relationship obtained in the step (3.1), wherein the syntax information graph G is (V, E). Where V is a vulnerability node V1,v2,...,vi...,vnSet of (v)iRepresenting the ith word in the vulnerability description, n being the number of words in the vulnerability description, E being the node viTo node vjDirected edge (v)i,vj) A collection of (a). To facilitate information flow, a reverse edge (v) is added to each directed edgej,vi) Simultaneously for each node viAdding a self-looping edge (v)i,vi) And adding a relationship type label K (v) for each edgei,vj)。
(3.3) obtaining the adjacency matrix A by constructing a syntax information graph G of the vulnerability description, namely if the node viAnd node vjConnected to and adjoining the element a of the ith row and jth column in matrix Aij1, otherwise aij0. The normalization process is performed on the adjacency matrix a, and can be obtained by the following transformation:
Figure BDA0003202813270000061
wherein I is a ═ a + I, where I is the identity matrix;
Figure BDA0003202813270000062
is the degree matrix of a'.
(3.4) carrying out gradient descent training on the vulnerability node information, extracting vulnerability node characteristics, and transforming as follows:
Figure BDA0003202813270000063
in the formula (I), the compound is shown in the specification,
Figure BDA0003202813270000064
the information of the bug nodes input by the l layer of the graph convolution neural network is obtained, when l is 0,
Figure BDA0003202813270000065
is the embedding vector X obtained in step (2.2); using normalized matrices
Figure BDA0003202813270000066
And a label K (v) of a specific type per layeri,vj) Weight matrix of
Figure BDA0003202813270000067
Linear transformation is carried out, and then the information of the next layer of input vulnerability nodes is obtained through a nonlinear activation function sigma
Figure BDA0003202813270000068
And performing convolution training for multiple times to obtain the characteristic vector of the vulnerability node.
And (3.5) performing the operation on the question asked by the vulnerability event trigger word, constructing the syntactic dependency relationship of the question and acquiring the feature vector of the question sentence.
And 4, step 4: and training a vulnerability event trigger word recognition and classification model based on the question-answering task in the BERT fine tuning model.
(4.1) accessing the feature vectors of the question and the vulnerability description obtained in the step 3 (the feature vector of the question is represented by A, and the feature vector of the vulnerability description is represented by B) to a full connection layer and a softmax layer in the BERT question answering task, as shown in FIG. 4.
(4.2) during the trimming, leadA start vector S and an end vector E are entered. Calculating the probability P of the beginning of the ith word as answer span in the vulnerability descriptioniThe beginning of the most probable word as the span of answers is transformed as follows:
Figure BDA0003202813270000069
wherein, TiIs the feature vector of word i; using formula in the same way
Figure BDA00032028132700000610
Calculating the end of the answer span; defining the score of the candidate answer from position i to position j as Si,j=S·Ti+E·TjAnd taking the maximum score span when j is larger than or equal to i as a prediction result.
Simultaneously, the non-answer prediction is carried out, and the question without answer is regarded as [ CLS ]]Marking the answer span of the beginning and the end, calculating the score S without answernull(ii) S · C + E · C; wherein C is a special mark [ CLS]The vector of (2).
Will have no answer span SnullScore of (2) and score of best non-null span Si,jA comparison is made. When S isi,j>SnullWhen the answer is + tau (tau is a self-defined threshold), a non-null answer is predicted, and the answer is a vulnerability event trigger word.
And (4.3) based on the vulnerability event trigger words obtained in the step (4.2), taking the feature vector of each word as the input of a logistic regression model, and calculating the probability that the trigger words belong to different vulnerability types so as to predict the categories of vulnerability events.
Based on the same inventive concept, the invention also provides a device for extracting the read understanding vulnerability event trigger words and identifying the vulnerability type, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the computer program realizes the method for extracting the read understanding vulnerability event trigger words and identifying the vulnerability type when being loaded to the processor.
According to the invention, answer span and classification are used as final output results, so that identification and classification of vulnerability event trigger words can be achieved, developers are assisted in locating the causes of vulnerability occurrence, vulnerability classification is beneficial to vulnerability management, and help is provided for vulnerability mitigation. The vulnerability classification method can better utilize grammar and semantic information in vulnerability description, fully excavate context information in vulnerability description, achieve recognition and classification of vulnerability event trigger words, solve the problem of inaccurate vulnerability classification to a certain extent, output the trigger words (generation reasons) of vulnerability events and assist developers in analyzing vulnerabilities compared with the current popular vulnerability classification method.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. A reading understanding vulnerability event trigger word extraction and vulnerability type identification method is characterized by comprising the following steps:
(1) acquiring vulnerability data, acquiring a CVE-ID of a vulnerability entry, vulnerability description and vulnerability type corresponding to each ID, and designing a question Q for a vulnerability event;
(2) based on a BERT pre-training model, performing vulnerability description statement representation learning as initial node characteristics input by GCN;
(3) extracting node characteristics of vulnerability information by using a Graph Convolution Network (GCN);
(4) and recognizing and classifying vulnerability event trigger words based on the question-answering task in the BERT fine tuning model.
2. The reading understanding vulnerability event trigger word extraction and vulnerability type identification method according to claim 1, wherein the step (2) comprises the steps of:
(21) converting the description Text of the designed question Q and the vulnerability entry into an input sequence of a BERT pre-training model; the method is characterized in that a special mark [ CLS ] is placed at the beginning for fusing semantic information of each word in description, and problem and vulnerability descriptions are separated by using [ SEP ]; converting each word into Token embedding, Segment embedding and Position embedding, and summing the embedded representations to obtain a representation vector;
(22) and transmitting the expression vector to an encoder layer of the BERT, predicting a next sentence task by utilizing a Transformer in combination with a mask language model to realize a bidirectional language model task, and performing expression learning to obtain an embedded vector X serving as an initial node characteristic input by the GCN.
3. The reading understanding vulnerability event trigger word extraction and vulnerability type identification method according to claim 1, wherein the step (3) comprises the steps of:
(31) based on the text description of the vulnerability entry, acquiring the syntactic dependency relationship of the vulnerability description text by using a Stanford syntactic analysis tool;
(32) constructing a syntax information graph G (V, E) of the vulnerability description according to the syntax dependency relationship; where V is a vulnerability node V1,v2,...,vi...,vnSet of (v)iRepresenting the ith word in the vulnerability description, n being the number of words in the vulnerability description, E being the node viTo node vjDirected edge (v)i,vj) A set of (a); adding a reverse edge (v) to each directed edgej,vi) Each node viAdding a self-looping edge (v)i,vi) And adding a relationship type label K (v) for each edgei,vj);
Obtaining the adjacency matrix A based on the syntactic information graph G, i.e. if the node viAnd node vjConnected to and adjoining the element a of the ith row and jth column in matrix Aij1, otherwise aij=0;
Figure FDA0003202813260000021
Is a normalized matrix of the adjacency matrix A, and is obtained by the following transformation:
Figure FDA0003202813260000022
wherein a' ═ a + I, where I is the identity matrix;
Figure FDA0003202813260000023
is the degree matrix of A';
(34) carrying out gradient descent training on the information of the vulnerability nodes, extracting characteristics of the vulnerability nodes, and transforming as follows:
Figure FDA0003202813260000024
in the formula (I), the compound is shown in the specification,
Figure FDA0003202813260000025
vulnerability node information input by the l layer of the graph convolution neural network; using normalized matrices
Figure FDA0003202813260000026
And a label K (v) of a specific type per layeri,vj) Weight matrix of
Figure FDA0003202813260000027
Linear transformation is carried out, and then the information of the next layer of input vulnerability nodes is obtained through a nonlinear activation function sigma
Figure FDA0003202813260000028
Performing convolution training for multiple times to obtain a feature vector of the vulnerability node;
(35) the operation is also carried out for the question asked by the bug event trigger word, the syntactic dependency relationship is constructed, and the feature vector of the question sentence is obtained.
4. The reading understanding vulnerability event trigger word extraction and vulnerability type identification method according to claim 1, wherein the steps comprise the steps of:
(41) the problem description characteristic vector A and the vulnerability description characteristic vector B are accessed to a full connection layer and a softmax layer in a BERT question-answering task;
(42) introducing a starting vector S and an ending vector E for the BERT question-answering task, and calculating the probability P that the ith word in the vulnerability description starts as the answer spaniThe beginning of the most probable word as the span of answers is transformed as follows:
Figure FDA0003202813260000029
wherein, TiIs the feature vector of word i; using formula in the same way
Figure FDA00032028132600000210
Calculating the end of the answer span; defining the score of the candidate answer from position i to position j as Si,j=S·Ti+E·TjTaking the maximum score span when j is larger than or equal to i as a prediction result;
simultaneously, the non-answer prediction is carried out, and the question without answer is regarded as [ CLS ]]Marking the answer span of the beginning and the end, calculating the score S without answernull(ii) S · C + E · C, wherein C is a special mark [ CLS]The vector of (a);
will have no answer span SnullScore of (2) and score of best non-null span Si,jComparing; when S isi,j>SnullWhen the answer is positive tau, the tau is a self-defined threshold value, a non-null answer is predicted, and the answer is a vulnerability event trigger word;
(43) based on the vulnerability event trigger words, the feature vector of each word is used as the input of a logistic regression model, and the probability that the vulnerability event trigger words belong to different vulnerability types is calculated to predict the categories of vulnerability events.
5. A reading understanding vulnerability event trigger word extraction and vulnerability type identification apparatus, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program when loaded into the processor implements the reading understanding vulnerability event trigger word extraction and vulnerability type identification method according to any one of claims 1-4.
CN202110909147.2A 2021-08-09 2021-08-09 Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type Active CN113742733B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110909147.2A CN113742733B (en) 2021-08-09 2021-08-09 Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110909147.2A CN113742733B (en) 2021-08-09 2021-08-09 Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type

Publications (2)

Publication Number Publication Date
CN113742733A true CN113742733A (en) 2021-12-03
CN113742733B CN113742733B (en) 2023-05-26

Family

ID=78730392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110909147.2A Active CN113742733B (en) 2021-08-09 2021-08-09 Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type

Country Status (1)

Country Link
CN (1) CN113742733B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114239566A (en) * 2021-12-14 2022-03-25 公安部第三研究所 Method, device and processor for realizing two-step Chinese event accurate detection based on information enhancement and computer readable storage medium thereof
CN114491209A (en) * 2022-01-24 2022-05-13 南京中新赛克科技有限责任公司 Method and system for mining enterprise business label based on internet information capture
CN115329347A (en) * 2022-10-17 2022-11-11 中国汽车技术研究中心有限公司 Prediction method, device and storage medium based on car networking vulnerability data
CN116777908A (en) * 2023-08-18 2023-09-19 新疆塔林投资(集团)有限责任公司 Auxiliary method and system for plugging casing of oil-gas well

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866254A (en) * 2019-09-29 2020-03-06 华为终端有限公司 Vulnerability detection method and electronic equipment
CN111274134A (en) * 2020-01-17 2020-06-12 扬州大学 Vulnerability identification and prediction method and system based on graph neural network, computer equipment and storage medium
CN111460450A (en) * 2020-03-11 2020-07-28 西北大学 Source code vulnerability detection method based on graph convolution network
CN111723182A (en) * 2020-07-10 2020-09-29 云南电网有限责任公司曲靖供电局 Key information extraction method and device for vulnerability text
CN112163416A (en) * 2020-10-09 2021-01-01 北京理工大学 Event joint extraction method for merging syntactic and entity relation graph convolution network
CN112364352A (en) * 2020-10-21 2021-02-12 扬州大学 Interpretable software vulnerability detection and recommendation method and system
CN112668013A (en) * 2020-12-31 2021-04-16 西安电子科技大学 Java source code-oriented vulnerability detection method for statement-level mode exploration

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866254A (en) * 2019-09-29 2020-03-06 华为终端有限公司 Vulnerability detection method and electronic equipment
CN111274134A (en) * 2020-01-17 2020-06-12 扬州大学 Vulnerability identification and prediction method and system based on graph neural network, computer equipment and storage medium
CN111460450A (en) * 2020-03-11 2020-07-28 西北大学 Source code vulnerability detection method based on graph convolution network
CN111723182A (en) * 2020-07-10 2020-09-29 云南电网有限责任公司曲靖供电局 Key information extraction method and device for vulnerability text
CN112163416A (en) * 2020-10-09 2021-01-01 北京理工大学 Event joint extraction method for merging syntactic and entity relation graph convolution network
CN112364352A (en) * 2020-10-21 2021-02-12 扬州大学 Interpretable software vulnerability detection and recommendation method and system
CN112668013A (en) * 2020-12-31 2021-04-16 西安电子科技大学 Java source code-oriented vulnerability detection method for statement-level mode exploration

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
LILI BO等: "Bug Question Answering with Pretrained Encoders", 《IEEE》 *
ZHIBIN LU等: "VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification", 《WEB OF SCIENCE》 *
刘莉莉: "面向产品需求分析的事件抽取研究", 《信息科技》 *
吴凡等: "基于字词联合表示的中文事件检测方法", 《计算机科学》 *
李建鹏: "面向软件语言的漏洞分析技术应用研究", 《信息科技》 *
查云杰等: "基于BERT和GCN的引文推荐模型", 《计算机应用与软件》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114239566A (en) * 2021-12-14 2022-03-25 公安部第三研究所 Method, device and processor for realizing two-step Chinese event accurate detection based on information enhancement and computer readable storage medium thereof
CN114239566B (en) * 2021-12-14 2024-04-23 公安部第三研究所 Method, device, processor and computer readable storage medium for realizing accurate detection of two-step Chinese event based on information enhancement
CN114491209A (en) * 2022-01-24 2022-05-13 南京中新赛克科技有限责任公司 Method and system for mining enterprise business label based on internet information capture
CN115329347A (en) * 2022-10-17 2022-11-11 中国汽车技术研究中心有限公司 Prediction method, device and storage medium based on car networking vulnerability data
CN116777908A (en) * 2023-08-18 2023-09-19 新疆塔林投资(集团)有限责任公司 Auxiliary method and system for plugging casing of oil-gas well
CN116777908B (en) * 2023-08-18 2023-11-03 新疆塔林投资(集团)有限责任公司 Auxiliary method and system for plugging casing of oil-gas well

Also Published As

Publication number Publication date
CN113742733B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN110765265B (en) Information classification extraction method and device, computer equipment and storage medium
US20210200961A1 (en) Context-based multi-turn dialogue method and storage medium
CN109992664B (en) Dispute focus label classification method and device, computer equipment and storage medium
CN111738004A (en) Training method of named entity recognition model and named entity recognition method
CN113011533A (en) Text classification method and device, computer equipment and storage medium
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN110633366B (en) Short text classification method, device and storage medium
CN109726745B (en) Target-based emotion classification method integrating description knowledge
CN113806563A (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN113449204B (en) Social event classification method and device based on local aggregation graph attention network
CN113138920B (en) Software defect report allocation method and device based on knowledge graph and semantic role labeling
CN113157859A (en) Event detection method based on upper concept information
CN114417851A (en) Emotion analysis method based on keyword weighted information
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN117076653A (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN115859980A (en) Semi-supervised named entity identification method, system and electronic equipment
CN115221332A (en) Construction method and system of dangerous chemical accident event map
CN110852071A (en) Knowledge point detection method, device, equipment and readable storage medium
CN113239694B (en) Argument role identification method based on argument phrase
CN113377844A (en) Dialogue type data fuzzy retrieval method and device facing large relational database
CN113705207A (en) Grammar error recognition method and device
CN117574898A (en) Domain knowledge graph updating method and system based on power grid equipment
CN110377753B (en) Relation extraction method and device based on relation trigger word and GRU model
CN114648029A (en) Electric power field named entity identification method based on BiLSTM-CRF model
CN115204140A (en) Legal provision prediction method based on attention mechanism and knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant