CN112989831A - Entity extraction method applied to network security field - Google Patents
Entity extraction method applied to network security field Download PDFInfo
- Publication number
- CN112989831A CN112989831A CN202110333374.5A CN202110333374A CN112989831A CN 112989831 A CN112989831 A CN 112989831A CN 202110333374 A CN202110333374 A CN 202110333374A CN 112989831 A CN112989831 A CN 112989831A
- Authority
- CN
- China
- Prior art keywords
- network security
- vector
- model
- word
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Probability & Statistics with Applications (AREA)
- Animal Behavior & Ethology (AREA)
- Databases & Information Systems (AREA)
- Computer And Data Communications (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses an entity extraction method applied to the field of network security, which comprises the following steps: inputting the segmented network security text data into a trained word2vec model to obtain a network security field word vector; carrying out artificial corpus annotation on the text data to construct a network security data set; inputting the network security data set into a SecurityBERT model to obtain a character-level vector; fusing the word vector and the character level vector in the network security field; and inputting the output of the BilSTM model into a self-attention layer, and performing local key network security word feature enhancement on the character vector by using a self-attention mechanism to obtain semantic information. The invention further models by using a BilSTM model and a self-attention mechanism to obtain context semantics and capture local key information, thereby improving the entity extraction performance in the field of network security and obtaining better accuracy, recall rate and F1 value.
Description
Technical Field
The invention relates to the field of network security, in particular to an entity extraction method applied to the field of network security.
Background
The rapid development and wide application of internet technology greatly promote the prosperity and progress of society, but at the same time, the network space environment becomes increasingly complex and severe. Various types of network attacks, Lesso viruses, trojans, backdoor programs, security holes and the like pose serious threats to the network space. The frequent occurrence of network security events causes economic losses to countries, enterprises and people, and seriously affects the stability of society.
The network space contains a great deal of valuable security information, such as network security logs, alarm information and traffic data, and important security data including system logs, attack events, security blogs, security intelligence, and vulnerability libraries, which can be acquired from a security forum or website. The massive security data has great value, and how to extract effective security information from the massive and fragmented network security data is an important research direction in the field of network security. Therefore, the entity extraction technology oriented to the network security field is produced.
The network security entity extraction technology is a specific domain-oriented entity extraction technology, and generally refers to extracting entities with network security related semantics from unstructured network security text data, such as: attackers, vulnerabilities, virus trojans, attack methods, software, and the like. The entity extraction task generally comprises related tasks such as ontology design, data collection, cleaning and construction, text word segmentation, entity extraction and classification and the like. Compared with the traditional field, the data in the network security field has the characteristics of less data sets, Chinese and English mixing, case and case mixing, digital mixing and the like, and new entities are increased and changed frequently, have more categories, have stronger professional field characteristics, and even have the characteristics of semantic diversity and ambiguity of the same entity. And the traditional word2vec pre-training method, RNN, LSTM model and CRF model entity extraction model algorithms are difficult to accurately identify and cannot be well adapted to the field of network security.
Disclosure of Invention
The invention provides an entity extraction method applied to the field of network security, which aims to solve the problem that the existing method has lower performance indexes of entity extraction accuracy, recall rate and F1 value in the field of network security.
In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:
an entity extraction method applied in the field of network security comprises the following steps: acquiring unstructured text data in the field of network security, and constructing a network security dictionary according to the text data; preprocessing text data and segmenting words; inputting the segmented network security text data into a trained word2vec model or Glove model to obtain a network security field word vector; carrying out artificial corpus annotation on the text data to construct a network security data set; inputting the network security data set into a SecurityBERT model which is subjected to field pre-training to obtain a character-level vector; fusing the word vector of the network security field and the character-level vector output by the SecurityBERT model to obtain a word vector enhanced by the network security word level; inputting the word vector sequence into a BilSTM model for further modeling, wherein the BilSTM model outputs a character vector containing context semantic feature information; inputting the output of the BilSTM model into a self-attention layer, and performing local key network security word feature enhancement on the character vector by using a self-attention mechanism to obtain semantic information; and fusing the output of the self-attention layer and the output of the BilSTM model, and sequentially inputting the fused output into the softmax layer and the conditional random field CRF model to obtain a final label sequence, namely an entity extraction result.
Preferably, preprocessing and word segmentation of the text data comprises: analyzing the HTML webpage by using a python and beautifusoup HTML (hypertext markup language) parser, removing useless tag information, and reserving a core network security related text; removing special characters, converting simplified and traditional forms and converting case and case of the network security related text; the text data is participled using a segmentation tool.
Preferably, the performing artificial corpus annotation on the text data and the constructing the network security data set include: designing a body model of the network security field to obtain the category of a network security entity; and according to the ontology model, carrying out structural annotation on the text by using a brat tool, and converting the structural annotation result into a BIO or BIOES annotation format.
Preferably, the step of training the SecurityBERT model comprises: and performing field pre-training on the BERT-Base-Chinese pre-training model by using the collected text data of the unmarked network security field, so that the trained SecurityBERT model has network security field adaptability.
Preferably, fusing the network security domain word vector and the character-level vector output by the SecurityBERT model comprises: vector stitching based methods and/or vector addition based methods.
Preferably, each character in the character-level vector output by the SecurityBERT model has a corresponding word segmentation result in a sentence, the corresponding word vector is searched in a network security word vector table according to the word segmentation result, the searched word vector is fused with the character-level vector output by the SecurityBERT model, and the word-level characteristics are enhanced; if the corresponding word vector is searched in the network security word vector table according to the word segmentation result, fusing the < padding > vector or the random vector with the character-level vector output by the SecurityBERT model; if one character corresponds to a plurality of word segmentation results, all searched word vectors are fused with the character level vectors output by the SecurityBERT model, or one or more searched word vectors are selected to be fused with the character level vectors output by the SecurityBERT model.
Preferably, the word vector sequence is further modeled by inputting into a BilTM model, and the BilTM model outputting the character vector containing the context semantic feature information comprises: and splicing the output vector of the forward LSTM and the output vector of the reverse LSTM in the BiLSTM model to obtain the feature vector with context information.
Preferably, the local key network security word feature enhancement on the character vector using a self-attention mechanism comprises: the self-attention mechanism distributes weights larger than M to the network security words with the important value larger than K in the sentences through a weighting method, so that the local key network security word feature enhancement is realized; k is greater than 0, M is greater than 0, wherein the calculation method of the weight is a scaling dot product operation function.
Preferably, fusing the output of the attention layer and the output of the BilSTM model, and sequentially inputting the fused output into the softmax layer and the conditional random field CRF model to obtain the final tag sequence, wherein the final tag sequence comprises: adding or splicing the output vector of the self-attention layer and the output vector of the BilSTM layer to obtain a new vector; and inputting the new vector into a softmax layer for multi-classification and probability normalization, then inputting into a CRF layer for sequence label conversion modeling, and outputting a label sequence, namely an entity extraction result.
Compared with the prior art, the embodiment of the invention has the beneficial effects that:
the entity extraction method applied to the network security field of the invention carries out field pre-training based on the BERT model to obtain a SecurityBERT model facing the network security field, has field adaptability, is more suitable for downstream security entity extraction tasks, simultaneously fuses network security word vectors and SecurityBERT word vectors, enhances the expression capability of word level, is more easy to distinguish the boundary information of the network security entity, further models by using a BilSTM model and a self-attention mechanism, obtains context semantics and captures local key information, improves the entity extraction performance of the network security field, obtains better accuracy rate, recall rate and F1 value, also improves the automatic extraction capability of the information of the network security field, greatly reduces the workload of security expert analysis, and lays a foundation for the construction of a subsequent network security knowledge map.
Drawings
Fig. 1 is a flowchart illustrating an entity extraction method applied in the field of network security according to this embodiment.
Fig. 2 is a structural diagram of a model of an entity extraction method applied in the network security field according to the present embodiment.
Detailed Description
The invention is further described below with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the work flow diagram of the entity extraction method applied in the network security field provided in this embodiment includes the following steps:
step 101: acquiring unstructured text data in the field of network security from the Internet, and constructing a network security dictionary according to the text data;
step 102: performing operations such as cleaning, preprocessing, word segmentation and the like on the text data;
step 103: carrying out artificial corpus annotation on the text data to construct a network security data set;
step 104: the security domain-oriented security domain pretrains a SecurityBERT model;
step 105: inputting the segmented network security text data into a trained word2vec model or Glove model to obtain a network security field word vector;
step 106: inputting the data set into a SecurityBERT model to obtain character-level vector output;
step 107: fusing the word vector of the network security field and the character level vector output by the SecurityBERT;
step 108: inputting the fused word vector sequence into a BilSTM model to further model context semantic features;
step 109: performing local key network security word feature enhancement on the character vector by using a self-attention mechanism;
step 110: fusing the outputs of the self-attention layer and the BilSTM layer and inputting the fused outputs into a softmax layer and a CRF model;
step 111: and outputting the extraction result of the network security entity.
The entities are vulnerabilities (Vulnerability), Software (Software), Malware (Malware), and the like. A vulnerability represents a flaw in the specific implementation of hardware, software, protocols, or system security policies. For example: a permanent blue leak, a UAF leak, cve-2018-5002, etc. Software (Software) represents an entity, such as a data, program, business system, etc., that runs on a computer. For example: office, IE browser, softenable, Web server, etc. Malware (Malware) refers to software or files that are run by executing unauthorized functions or computer systems. For example: bait documents, Havex trojans, remote trojans, and the like.
In this embodiment, the method in step 102 specifically includes: analyzing the HTML webpage by using a python and beautifusoup HTML (hypertext markup language) analyzer, removing useless tag information, and reserving core network security related text content; carrying out preprocessing operations such as special character removal, simplified and traditional body conversion, case and case conversion and the like on the network security text; the text data is tokenized using jieba or other tokenization tools.
In this embodiment, the method in step 103 specifically includes: firstly, designing a body model in the field of network security to obtain the category of a network security entity; then, according to the ontology model, carrying out structured labeling on the text by using a brat tool; and finally, converting the result of the structured annotation into a BIO or BIOES annotation format.
In this embodiment, the method in step 104 specifically includes: based on a BERT-Base-Chinese pre-training model, the collected massive unlabeled text data of the network security field is used for deep field pre-training, so that the trained SecurityBERT model has network security field adaptability.
In this embodiment, the method in step 107 specifically includes: the SecurityBERT model outputs a vector of each character, each character has a corresponding word segmentation result in a sentence, the corresponding word vector is searched in a network security word vector table according to the word segmentation, and the word vector is fused with the character-level vector output by the securityBERT model to enhance the word-level characteristics; for the words which are not found, the word vectors are replaced by < padding > vectors or random vectors; for the situation that one character corresponds to a plurality of word segmentation results, all word vectors can be fused with the character-level vectors output by the ecurityBERT model, and one or more word vectors can be selected according to strategies to be fused with the character-level vectors output by the ecurityBERT model; the fusion method may be based on a vector splicing method, a vector addition method, or a combination of various methods.
In this embodiment, the method in step 108 specifically includes: and splicing the output vector of the forward LSTM and the output vector of the backward LSTM to obtain a feature vector with context information.
In this embodiment, the method in step 109 specifically includes: the self-attention mechanism distributes higher weight to more important network security words in the sentence through a weighting method, and achieves the enhancement of local key network security word characteristics; the weight calculation method uses a scaled dot product operation function.
In this embodiment, the method in step 110 specifically includes: adding or splicing the output vector of the self-attention layer and the output vector of the BilSTM layer to obtain a new vector; and inputting the new vector into a softmax layer for multi-classification and probability normalization, then inputting into a CRF layer for sequence label conversion modeling, and outputting a label sequence, namely an entity extraction result.
The above-described embodiments are only preferred embodiments of the present invention, and it should be understood that many variations and modifications can be made by one of ordinary skill in the art in light of the above-described inventive concept without undue experimentation. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.
Claims (9)
1. An entity extraction method applied in the field of network security is characterized by comprising the following steps:
acquiring unstructured text data in the field of network security, and constructing a network security dictionary according to the text data;
preprocessing text data and segmenting words; inputting the segmented network security text data into a trained word2vec model or Glove model to obtain a network security field word vector;
carrying out artificial corpus annotation on the text data to construct a network security data set; inputting the network security data set into a SecurityBERT model which is subjected to field pre-training to obtain a character-level vector;
fusing the word vector of the network security field and the character-level vector output by the SecurityBERT model to obtain a word vector enhanced by the network security word level;
inputting the word vector sequence into a BilSTM model for further modeling, wherein the BilSTM model outputs a character vector containing context semantic feature information;
inputting the output of the BilSTM model into a self-attention layer, and performing local key network security word feature enhancement on the character vector by using a self-attention mechanism to obtain semantic information;
and fusing the output of the self-attention layer and the output of the BilSTM model, and sequentially inputting the fused output into the softmax layer and the conditional random field CRF model to obtain a final label sequence, namely an entity extraction result.
2. The entity extraction method applied to the network security field as claimed in claim 1, wherein the preprocessing and word segmentation of the text data comprises:
analyzing the HTML webpage by using a python and beautifusoup HTML (hypertext markup language) parser, removing useless tag information, and reserving a core network security related text;
removing special characters, converting simplified and traditional forms and converting case and case of the network security related text;
the text data is participled using a segmentation tool.
3. The entity extraction method applied to the network security field of claim 1, wherein the manual corpus labeling is performed on the text data, and the constructing of the network security data set comprises:
designing a body model of the network security field to obtain the category of a network security entity;
and according to the ontology model, carrying out structural annotation on the text by using a brat tool, and converting the structural annotation result into a BIO or BIOES annotation format.
4. The entity extraction method applied in the network security field according to claim 1, wherein the step of training the SecurityBERT model comprises:
and performing field pre-training on the BERT-Base-Chinese pre-training model by using the collected text data of the unmarked network security field, so that the trained SecurityBERT model has network security field adaptability.
5. The entity extraction method applied to the network security domain according to claim 1, wherein fusing the network security domain word vector and the character-level vector output by the SecurityBERT model comprises: vector stitching based methods and/or vector addition based methods.
6. The entity extraction method applied in the network security field according to claim 5, wherein each character in the character level vector output by the SecurityBERT model has a corresponding word segmentation result in a sentence, the corresponding word vector is searched in the network security word vector table according to the word segmentation result, and the searched word vector is fused with the character level vector output by the SecurityBERT model to enhance the word level characteristics;
if the corresponding word vector is searched in the network security word vector table according to the word segmentation result, fusing the < padding > vector or the random vector with the character-level vector output by the SecurityBERT model;
if one character corresponds to a plurality of word segmentation results, all searched word vectors are fused with the character level vectors output by the SecurityBERT model, or one or more searched word vectors are selected to be fused with the character level vectors output by the SecurityBERT model.
7. The entity extraction method applied in the network security field of claim 1, wherein inputting the word vector sequence into a BilSTM model for further modeling, the BilSTM model outputting the character vector containing the context semantic feature information comprises:
and splicing the output vector of the forward LSTM and the output vector of the reverse LSTM in the BiLSTM model to obtain the feature vector with context information.
8. The entity extraction method applied to the network security field according to claim 1, wherein the local key network security word feature enhancement of the character vector using a self-attention mechanism comprises:
the self-attention mechanism distributes weights larger than M to the network security words with the important value larger than K in the sentences through a weighting method, so that the local key network security word feature enhancement is realized; k is greater than 0, M is greater than 0, wherein the calculation method of the weight is a scaling dot product operation function.
9. The method for extracting entities applied in the network security field of claim 1, wherein fusing the output from the attention layer and the output of the BilSTM model and then sequentially inputting the fused output into the softmax layer and the conditional random field CRF model to obtain the final tag sequence comprises:
adding or splicing the output vector of the self-attention layer and the output vector of the BilSTM layer to obtain a new vector;
and inputting the new vector into a softmax layer for multi-classification and probability normalization, then inputting into a CRF layer for sequence label conversion modeling, and outputting a label sequence, namely an entity extraction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110333374.5A CN112989831B (en) | 2021-03-29 | 2021-03-29 | Entity extraction method applied to network security field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110333374.5A CN112989831B (en) | 2021-03-29 | 2021-03-29 | Entity extraction method applied to network security field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112989831A true CN112989831A (en) | 2021-06-18 |
CN112989831B CN112989831B (en) | 2023-04-28 |
Family
ID=76337838
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110333374.5A Active CN112989831B (en) | 2021-03-29 | 2021-03-29 | Entity extraction method applied to network security field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112989831B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673219A (en) * | 2021-08-20 | 2021-11-19 | 合肥中科类脑智能技术有限公司 | Power failure plan text analysis method |
CN113743104A (en) * | 2021-08-31 | 2021-12-03 | 合肥智能语音创新发展有限公司 | Entity linking method and related device, electronic equipment and storage medium |
CN114297987A (en) * | 2022-03-09 | 2022-04-08 | 杭州实在智能科技有限公司 | Document information extraction method and system based on text classification and reading understanding |
CN115587594A (en) * | 2022-09-20 | 2023-01-10 | 广东财经大学 | Network security unstructured text data extraction model training method and system |
CN115687754A (en) * | 2022-10-21 | 2023-02-03 | 四川大学 | Active network information mining method based on intelligent conversation |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9992209B1 (en) * | 2016-04-22 | 2018-06-05 | Awake Security, Inc. | System and method for characterizing security entities in a computing environment |
CN109299262A (en) * | 2018-10-09 | 2019-02-01 | 中山大学 | A kind of text implication relation recognition methods for merging more granular informations |
CN111310470A (en) * | 2020-01-17 | 2020-06-19 | 西安交通大学 | Chinese named entity recognition method fusing word and word features |
CN111460820A (en) * | 2020-03-06 | 2020-07-28 | 中国科学院信息工程研究所 | Network space security domain named entity recognition method and device based on pre-training model BERT |
CN111709241A (en) * | 2020-05-27 | 2020-09-25 | 西安交通大学 | Named entity identification method oriented to network security field |
CN111783462A (en) * | 2020-06-30 | 2020-10-16 | 大连民族大学 | Chinese named entity recognition model and method based on dual neural network fusion |
CN111914097A (en) * | 2020-07-13 | 2020-11-10 | 吉林大学 | Entity extraction method and device based on attention mechanism and multi-level feature fusion |
WO2020252950A1 (en) * | 2019-06-17 | 2020-12-24 | 五邑大学 | Named entity recognition method for medical texts based on pre-training model and fine turning technology |
US20210021621A1 (en) * | 2019-07-16 | 2021-01-21 | Hewlett Packard Enterprise Development Lp | Methods and systems for using embedding from natural language processing (nlp) for enhanced network analytics |
-
2021
- 2021-03-29 CN CN202110333374.5A patent/CN112989831B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9992209B1 (en) * | 2016-04-22 | 2018-06-05 | Awake Security, Inc. | System and method for characterizing security entities in a computing environment |
CN109299262A (en) * | 2018-10-09 | 2019-02-01 | 中山大学 | A kind of text implication relation recognition methods for merging more granular informations |
WO2020252950A1 (en) * | 2019-06-17 | 2020-12-24 | 五邑大学 | Named entity recognition method for medical texts based on pre-training model and fine turning technology |
US20210021621A1 (en) * | 2019-07-16 | 2021-01-21 | Hewlett Packard Enterprise Development Lp | Methods and systems for using embedding from natural language processing (nlp) for enhanced network analytics |
CN111310470A (en) * | 2020-01-17 | 2020-06-19 | 西安交通大学 | Chinese named entity recognition method fusing word and word features |
CN111460820A (en) * | 2020-03-06 | 2020-07-28 | 中国科学院信息工程研究所 | Network space security domain named entity recognition method and device based on pre-training model BERT |
CN111709241A (en) * | 2020-05-27 | 2020-09-25 | 西安交通大学 | Named entity identification method oriented to network security field |
CN111783462A (en) * | 2020-06-30 | 2020-10-16 | 大连民族大学 | Chinese named entity recognition model and method based on dual neural network fusion |
CN111914097A (en) * | 2020-07-13 | 2020-11-10 | 吉林大学 | Entity extraction method and device based on attention mechanism and multi-level feature fusion |
Non-Patent Citations (2)
Title |
---|
沈思等: "基于深度学习的食品安全事件实体自动抽取模型研究", 《信息与电脑(理论版)》 * |
陆以勤 等: "SDN拓扑攻击及其防御", 《华南理工大学学报(自然科学版)》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673219A (en) * | 2021-08-20 | 2021-11-19 | 合肥中科类脑智能技术有限公司 | Power failure plan text analysis method |
CN113743104A (en) * | 2021-08-31 | 2021-12-03 | 合肥智能语音创新发展有限公司 | Entity linking method and related device, electronic equipment and storage medium |
CN113743104B (en) * | 2021-08-31 | 2024-04-16 | 合肥智能语音创新发展有限公司 | Entity linking method, related device, electronic equipment and storage medium |
CN114297987A (en) * | 2022-03-09 | 2022-04-08 | 杭州实在智能科技有限公司 | Document information extraction method and system based on text classification and reading understanding |
CN114297987B (en) * | 2022-03-09 | 2022-07-19 | 杭州实在智能科技有限公司 | Document information extraction method and system based on text classification and reading understanding |
CN115587594A (en) * | 2022-09-20 | 2023-01-10 | 广东财经大学 | Network security unstructured text data extraction model training method and system |
CN115587594B (en) * | 2022-09-20 | 2023-06-30 | 广东财经大学 | Unstructured text data extraction model training method and system for network security |
CN115687754A (en) * | 2022-10-21 | 2023-02-03 | 四川大学 | Active network information mining method based on intelligent conversation |
CN115687754B (en) * | 2022-10-21 | 2024-01-23 | 四川大学 | Active network information mining method based on intelligent dialogue |
Also Published As
Publication number | Publication date |
---|---|
CN112989831B (en) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | A stacking model using URL and HTML features for phishing webpage detection | |
CN112989831B (en) | Entity extraction method applied to network security field | |
CN109005145B (en) | Malicious URL detection system and method based on automatic feature extraction | |
CN103544255B (en) | Text semantic relativity based network public opinion information analysis method | |
KR102452123B1 (en) | Apparatus for Building Big-data on unstructured Cyber Threat Information, Method for Building and Analyzing Cyber Threat Information | |
CN112307473A (en) | Malicious JavaScript code detection model based on Bi-LSTM network and attention mechanism | |
CN113596007B (en) | Vulnerability attack detection method and device based on deep learning | |
CN111931935B (en) | Network security knowledge extraction method and device based on One-shot learning | |
Liu et al. | Multi-scale semantic deep fusion models for phishing website detection | |
CN112148956A (en) | Hidden net threat information mining system and method based on machine learning | |
Gong et al. | Model uncertainty based annotation error fixing for web attack detection | |
CN111538893B (en) | Method for extracting network security new words from unstructured data | |
Kim et al. | Towards attention based vulnerability discovery using source code representation | |
CN112445862A (en) | Internet of things equipment data set construction method and device, electronic equipment and storage medium | |
CN115567306B (en) | APT attack traceability analysis method based on bidirectional long-short-term memory network | |
CN116702143A (en) | Intelligent malicious software detection method based on API (application program interface) characteristics | |
Li et al. | PipCKG-BS: A Method to Build Cybersecurity Knowledge Graph for Blockchain Systems via the Pipeline Approach | |
CN110413909B (en) | Machine learning-based intelligent identification method for online firmware of large-scale embedded equipment | |
Zhu et al. | SQL Injection Attack Detection Framework Based on HTTP Traffic | |
Khan | Detecting phishing attacks using nlp | |
Zhang et al. | Survey of research on named entity recognition in cyber threat intelligence | |
Sithole et al. | Attributes extraction for fine-grained differentiation of the Internet of Things patterns | |
Wan et al. | Generation of malicious webpage samples based on GAN | |
Zhen et al. | Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF. | |
CN117278322B (en) | Web intrusion detection method, device, terminal equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |