CN112989831A - Entity extraction method applied to network security field - Google Patents

Entity extraction method applied to network security field Download PDF

Info

Publication number
CN112989831A
CN112989831A CN202110333374.5A CN202110333374A CN112989831A CN 112989831 A CN112989831 A CN 112989831A CN 202110333374 A CN202110333374 A CN 202110333374A CN 112989831 A CN112989831 A CN 112989831A
Authority
CN
China
Prior art keywords
network security
vector
model
word
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110333374.5A
Other languages
Chinese (zh)
Other versions
CN112989831B (en
Inventor
陆以勤
陈帅豪
覃健诚
谢树禄
李智鹏
洪炜妍
陈嘉睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110333374.5A priority Critical patent/CN112989831B/en
Publication of CN112989831A publication Critical patent/CN112989831A/en
Application granted granted Critical
Publication of CN112989831B publication Critical patent/CN112989831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Computer And Data Communications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an entity extraction method applied to the field of network security, which comprises the following steps: inputting the segmented network security text data into a trained word2vec model to obtain a network security field word vector; carrying out artificial corpus annotation on the text data to construct a network security data set; inputting the network security data set into a SecurityBERT model to obtain a character-level vector; fusing the word vector and the character level vector in the network security field; and inputting the output of the BilSTM model into a self-attention layer, and performing local key network security word feature enhancement on the character vector by using a self-attention mechanism to obtain semantic information. The invention further models by using a BilSTM model and a self-attention mechanism to obtain context semantics and capture local key information, thereby improving the entity extraction performance in the field of network security and obtaining better accuracy, recall rate and F1 value.

Description

Entity extraction method applied to network security field
Technical Field
The invention relates to the field of network security, in particular to an entity extraction method applied to the field of network security.
Background
The rapid development and wide application of internet technology greatly promote the prosperity and progress of society, but at the same time, the network space environment becomes increasingly complex and severe. Various types of network attacks, Lesso viruses, trojans, backdoor programs, security holes and the like pose serious threats to the network space. The frequent occurrence of network security events causes economic losses to countries, enterprises and people, and seriously affects the stability of society.
The network space contains a great deal of valuable security information, such as network security logs, alarm information and traffic data, and important security data including system logs, attack events, security blogs, security intelligence, and vulnerability libraries, which can be acquired from a security forum or website. The massive security data has great value, and how to extract effective security information from the massive and fragmented network security data is an important research direction in the field of network security. Therefore, the entity extraction technology oriented to the network security field is produced.
The network security entity extraction technology is a specific domain-oriented entity extraction technology, and generally refers to extracting entities with network security related semantics from unstructured network security text data, such as: attackers, vulnerabilities, virus trojans, attack methods, software, and the like. The entity extraction task generally comprises related tasks such as ontology design, data collection, cleaning and construction, text word segmentation, entity extraction and classification and the like. Compared with the traditional field, the data in the network security field has the characteristics of less data sets, Chinese and English mixing, case and case mixing, digital mixing and the like, and new entities are increased and changed frequently, have more categories, have stronger professional field characteristics, and even have the characteristics of semantic diversity and ambiguity of the same entity. And the traditional word2vec pre-training method, RNN, LSTM model and CRF model entity extraction model algorithms are difficult to accurately identify and cannot be well adapted to the field of network security.
Disclosure of Invention
The invention provides an entity extraction method applied to the field of network security, which aims to solve the problem that the existing method has lower performance indexes of entity extraction accuracy, recall rate and F1 value in the field of network security.
In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:
an entity extraction method applied in the field of network security comprises the following steps: acquiring unstructured text data in the field of network security, and constructing a network security dictionary according to the text data; preprocessing text data and segmenting words; inputting the segmented network security text data into a trained word2vec model or Glove model to obtain a network security field word vector; carrying out artificial corpus annotation on the text data to construct a network security data set; inputting the network security data set into a SecurityBERT model which is subjected to field pre-training to obtain a character-level vector; fusing the word vector of the network security field and the character-level vector output by the SecurityBERT model to obtain a word vector enhanced by the network security word level; inputting the word vector sequence into a BilSTM model for further modeling, wherein the BilSTM model outputs a character vector containing context semantic feature information; inputting the output of the BilSTM model into a self-attention layer, and performing local key network security word feature enhancement on the character vector by using a self-attention mechanism to obtain semantic information; and fusing the output of the self-attention layer and the output of the BilSTM model, and sequentially inputting the fused output into the softmax layer and the conditional random field CRF model to obtain a final label sequence, namely an entity extraction result.
Preferably, preprocessing and word segmentation of the text data comprises: analyzing the HTML webpage by using a python and beautifusoup HTML (hypertext markup language) parser, removing useless tag information, and reserving a core network security related text; removing special characters, converting simplified and traditional forms and converting case and case of the network security related text; the text data is participled using a segmentation tool.
Preferably, the performing artificial corpus annotation on the text data and the constructing the network security data set include: designing a body model of the network security field to obtain the category of a network security entity; and according to the ontology model, carrying out structural annotation on the text by using a brat tool, and converting the structural annotation result into a BIO or BIOES annotation format.
Preferably, the step of training the SecurityBERT model comprises: and performing field pre-training on the BERT-Base-Chinese pre-training model by using the collected text data of the unmarked network security field, so that the trained SecurityBERT model has network security field adaptability.
Preferably, fusing the network security domain word vector and the character-level vector output by the SecurityBERT model comprises: vector stitching based methods and/or vector addition based methods.
Preferably, each character in the character-level vector output by the SecurityBERT model has a corresponding word segmentation result in a sentence, the corresponding word vector is searched in a network security word vector table according to the word segmentation result, the searched word vector is fused with the character-level vector output by the SecurityBERT model, and the word-level characteristics are enhanced; if the corresponding word vector is searched in the network security word vector table according to the word segmentation result, fusing the < padding > vector or the random vector with the character-level vector output by the SecurityBERT model; if one character corresponds to a plurality of word segmentation results, all searched word vectors are fused with the character level vectors output by the SecurityBERT model, or one or more searched word vectors are selected to be fused with the character level vectors output by the SecurityBERT model.
Preferably, the word vector sequence is further modeled by inputting into a BilTM model, and the BilTM model outputting the character vector containing the context semantic feature information comprises: and splicing the output vector of the forward LSTM and the output vector of the reverse LSTM in the BiLSTM model to obtain the feature vector with context information.
Preferably, the local key network security word feature enhancement on the character vector using a self-attention mechanism comprises: the self-attention mechanism distributes weights larger than M to the network security words with the important value larger than K in the sentences through a weighting method, so that the local key network security word feature enhancement is realized; k is greater than 0, M is greater than 0, wherein the calculation method of the weight is a scaling dot product operation function.
Preferably, fusing the output of the attention layer and the output of the BilSTM model, and sequentially inputting the fused output into the softmax layer and the conditional random field CRF model to obtain the final tag sequence, wherein the final tag sequence comprises: adding or splicing the output vector of the self-attention layer and the output vector of the BilSTM layer to obtain a new vector; and inputting the new vector into a softmax layer for multi-classification and probability normalization, then inputting into a CRF layer for sequence label conversion modeling, and outputting a label sequence, namely an entity extraction result.
Compared with the prior art, the embodiment of the invention has the beneficial effects that:
the entity extraction method applied to the network security field of the invention carries out field pre-training based on the BERT model to obtain a SecurityBERT model facing the network security field, has field adaptability, is more suitable for downstream security entity extraction tasks, simultaneously fuses network security word vectors and SecurityBERT word vectors, enhances the expression capability of word level, is more easy to distinguish the boundary information of the network security entity, further models by using a BilSTM model and a self-attention mechanism, obtains context semantics and captures local key information, improves the entity extraction performance of the network security field, obtains better accuracy rate, recall rate and F1 value, also improves the automatic extraction capability of the information of the network security field, greatly reduces the workload of security expert analysis, and lays a foundation for the construction of a subsequent network security knowledge map.
Drawings
Fig. 1 is a flowchart illustrating an entity extraction method applied in the field of network security according to this embodiment.
Fig. 2 is a structural diagram of a model of an entity extraction method applied in the network security field according to the present embodiment.
Detailed Description
The invention is further described below with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the work flow diagram of the entity extraction method applied in the network security field provided in this embodiment includes the following steps:
step 101: acquiring unstructured text data in the field of network security from the Internet, and constructing a network security dictionary according to the text data;
step 102: performing operations such as cleaning, preprocessing, word segmentation and the like on the text data;
step 103: carrying out artificial corpus annotation on the text data to construct a network security data set;
step 104: the security domain-oriented security domain pretrains a SecurityBERT model;
step 105: inputting the segmented network security text data into a trained word2vec model or Glove model to obtain a network security field word vector;
step 106: inputting the data set into a SecurityBERT model to obtain character-level vector output;
step 107: fusing the word vector of the network security field and the character level vector output by the SecurityBERT;
step 108: inputting the fused word vector sequence into a BilSTM model to further model context semantic features;
step 109: performing local key network security word feature enhancement on the character vector by using a self-attention mechanism;
step 110: fusing the outputs of the self-attention layer and the BilSTM layer and inputting the fused outputs into a softmax layer and a CRF model;
step 111: and outputting the extraction result of the network security entity.
The entities are vulnerabilities (Vulnerability), Software (Software), Malware (Malware), and the like. A vulnerability represents a flaw in the specific implementation of hardware, software, protocols, or system security policies. For example: a permanent blue leak, a UAF leak, cve-2018-5002, etc. Software (Software) represents an entity, such as a data, program, business system, etc., that runs on a computer. For example: office, IE browser, softenable, Web server, etc. Malware (Malware) refers to software or files that are run by executing unauthorized functions or computer systems. For example: bait documents, Havex trojans, remote trojans, and the like.
In this embodiment, the method in step 102 specifically includes: analyzing the HTML webpage by using a python and beautifusoup HTML (hypertext markup language) analyzer, removing useless tag information, and reserving core network security related text content; carrying out preprocessing operations such as special character removal, simplified and traditional body conversion, case and case conversion and the like on the network security text; the text data is tokenized using jieba or other tokenization tools.
In this embodiment, the method in step 103 specifically includes: firstly, designing a body model in the field of network security to obtain the category of a network security entity; then, according to the ontology model, carrying out structured labeling on the text by using a brat tool; and finally, converting the result of the structured annotation into a BIO or BIOES annotation format.
In this embodiment, the method in step 104 specifically includes: based on a BERT-Base-Chinese pre-training model, the collected massive unlabeled text data of the network security field is used for deep field pre-training, so that the trained SecurityBERT model has network security field adaptability.
In this embodiment, the method in step 107 specifically includes: the SecurityBERT model outputs a vector of each character, each character has a corresponding word segmentation result in a sentence, the corresponding word vector is searched in a network security word vector table according to the word segmentation, and the word vector is fused with the character-level vector output by the securityBERT model to enhance the word-level characteristics; for the words which are not found, the word vectors are replaced by < padding > vectors or random vectors; for the situation that one character corresponds to a plurality of word segmentation results, all word vectors can be fused with the character-level vectors output by the ecurityBERT model, and one or more word vectors can be selected according to strategies to be fused with the character-level vectors output by the ecurityBERT model; the fusion method may be based on a vector splicing method, a vector addition method, or a combination of various methods.
In this embodiment, the method in step 108 specifically includes: and splicing the output vector of the forward LSTM and the output vector of the backward LSTM to obtain a feature vector with context information.
In this embodiment, the method in step 109 specifically includes: the self-attention mechanism distributes higher weight to more important network security words in the sentence through a weighting method, and achieves the enhancement of local key network security word characteristics; the weight calculation method uses a scaled dot product operation function.
In this embodiment, the method in step 110 specifically includes: adding or splicing the output vector of the self-attention layer and the output vector of the BilSTM layer to obtain a new vector; and inputting the new vector into a softmax layer for multi-classification and probability normalization, then inputting into a CRF layer for sequence label conversion modeling, and outputting a label sequence, namely an entity extraction result.
The above-described embodiments are only preferred embodiments of the present invention, and it should be understood that many variations and modifications can be made by one of ordinary skill in the art in light of the above-described inventive concept without undue experimentation. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (9)

1. An entity extraction method applied in the field of network security is characterized by comprising the following steps:
acquiring unstructured text data in the field of network security, and constructing a network security dictionary according to the text data;
preprocessing text data and segmenting words; inputting the segmented network security text data into a trained word2vec model or Glove model to obtain a network security field word vector;
carrying out artificial corpus annotation on the text data to construct a network security data set; inputting the network security data set into a SecurityBERT model which is subjected to field pre-training to obtain a character-level vector;
fusing the word vector of the network security field and the character-level vector output by the SecurityBERT model to obtain a word vector enhanced by the network security word level;
inputting the word vector sequence into a BilSTM model for further modeling, wherein the BilSTM model outputs a character vector containing context semantic feature information;
inputting the output of the BilSTM model into a self-attention layer, and performing local key network security word feature enhancement on the character vector by using a self-attention mechanism to obtain semantic information;
and fusing the output of the self-attention layer and the output of the BilSTM model, and sequentially inputting the fused output into the softmax layer and the conditional random field CRF model to obtain a final label sequence, namely an entity extraction result.
2. The entity extraction method applied to the network security field as claimed in claim 1, wherein the preprocessing and word segmentation of the text data comprises:
analyzing the HTML webpage by using a python and beautifusoup HTML (hypertext markup language) parser, removing useless tag information, and reserving a core network security related text;
removing special characters, converting simplified and traditional forms and converting case and case of the network security related text;
the text data is participled using a segmentation tool.
3. The entity extraction method applied to the network security field of claim 1, wherein the manual corpus labeling is performed on the text data, and the constructing of the network security data set comprises:
designing a body model of the network security field to obtain the category of a network security entity;
and according to the ontology model, carrying out structural annotation on the text by using a brat tool, and converting the structural annotation result into a BIO or BIOES annotation format.
4. The entity extraction method applied in the network security field according to claim 1, wherein the step of training the SecurityBERT model comprises:
and performing field pre-training on the BERT-Base-Chinese pre-training model by using the collected text data of the unmarked network security field, so that the trained SecurityBERT model has network security field adaptability.
5. The entity extraction method applied to the network security domain according to claim 1, wherein fusing the network security domain word vector and the character-level vector output by the SecurityBERT model comprises: vector stitching based methods and/or vector addition based methods.
6. The entity extraction method applied in the network security field according to claim 5, wherein each character in the character level vector output by the SecurityBERT model has a corresponding word segmentation result in a sentence, the corresponding word vector is searched in the network security word vector table according to the word segmentation result, and the searched word vector is fused with the character level vector output by the SecurityBERT model to enhance the word level characteristics;
if the corresponding word vector is searched in the network security word vector table according to the word segmentation result, fusing the < padding > vector or the random vector with the character-level vector output by the SecurityBERT model;
if one character corresponds to a plurality of word segmentation results, all searched word vectors are fused with the character level vectors output by the SecurityBERT model, or one or more searched word vectors are selected to be fused with the character level vectors output by the SecurityBERT model.
7. The entity extraction method applied in the network security field of claim 1, wherein inputting the word vector sequence into a BilSTM model for further modeling, the BilSTM model outputting the character vector containing the context semantic feature information comprises:
and splicing the output vector of the forward LSTM and the output vector of the reverse LSTM in the BiLSTM model to obtain the feature vector with context information.
8. The entity extraction method applied to the network security field according to claim 1, wherein the local key network security word feature enhancement of the character vector using a self-attention mechanism comprises:
the self-attention mechanism distributes weights larger than M to the network security words with the important value larger than K in the sentences through a weighting method, so that the local key network security word feature enhancement is realized; k is greater than 0, M is greater than 0, wherein the calculation method of the weight is a scaling dot product operation function.
9. The method for extracting entities applied in the network security field of claim 1, wherein fusing the output from the attention layer and the output of the BilSTM model and then sequentially inputting the fused output into the softmax layer and the conditional random field CRF model to obtain the final tag sequence comprises:
adding or splicing the output vector of the self-attention layer and the output vector of the BilSTM layer to obtain a new vector;
and inputting the new vector into a softmax layer for multi-classification and probability normalization, then inputting into a CRF layer for sequence label conversion modeling, and outputting a label sequence, namely an entity extraction result.
CN202110333374.5A 2021-03-29 2021-03-29 Entity extraction method applied to network security field Active CN112989831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110333374.5A CN112989831B (en) 2021-03-29 2021-03-29 Entity extraction method applied to network security field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110333374.5A CN112989831B (en) 2021-03-29 2021-03-29 Entity extraction method applied to network security field

Publications (2)

Publication Number Publication Date
CN112989831A true CN112989831A (en) 2021-06-18
CN112989831B CN112989831B (en) 2023-04-28

Family

ID=76337838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110333374.5A Active CN112989831B (en) 2021-03-29 2021-03-29 Entity extraction method applied to network security field

Country Status (1)

Country Link
CN (1) CN112989831B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673219A (en) * 2021-08-20 2021-11-19 合肥中科类脑智能技术有限公司 Power failure plan text analysis method
CN113743104A (en) * 2021-08-31 2021-12-03 合肥智能语音创新发展有限公司 Entity linking method and related device, electronic equipment and storage medium
CN114297987A (en) * 2022-03-09 2022-04-08 杭州实在智能科技有限公司 Document information extraction method and system based on text classification and reading understanding
CN115587594A (en) * 2022-09-20 2023-01-10 广东财经大学 Network security unstructured text data extraction model training method and system
CN115687754A (en) * 2022-10-21 2023-02-03 四川大学 Active network information mining method based on intelligent conversation

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9992209B1 (en) * 2016-04-22 2018-06-05 Awake Security, Inc. System and method for characterizing security entities in a computing environment
CN109299262A (en) * 2018-10-09 2019-02-01 中山大学 A kind of text implication relation recognition methods for merging more granular informations
CN111310470A (en) * 2020-01-17 2020-06-19 西安交通大学 Chinese named entity recognition method fusing word and word features
CN111460820A (en) * 2020-03-06 2020-07-28 中国科学院信息工程研究所 Network space security domain named entity recognition method and device based on pre-training model BERT
CN111709241A (en) * 2020-05-27 2020-09-25 西安交通大学 Named entity identification method oriented to network security field
CN111783462A (en) * 2020-06-30 2020-10-16 大连民族大学 Chinese named entity recognition model and method based on dual neural network fusion
CN111914097A (en) * 2020-07-13 2020-11-10 吉林大学 Entity extraction method and device based on attention mechanism and multi-level feature fusion
WO2020252950A1 (en) * 2019-06-17 2020-12-24 五邑大学 Named entity recognition method for medical texts based on pre-training model and fine turning technology
US20210021621A1 (en) * 2019-07-16 2021-01-21 Hewlett Packard Enterprise Development Lp Methods and systems for using embedding from natural language processing (nlp) for enhanced network analytics

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9992209B1 (en) * 2016-04-22 2018-06-05 Awake Security, Inc. System and method for characterizing security entities in a computing environment
CN109299262A (en) * 2018-10-09 2019-02-01 中山大学 A kind of text implication relation recognition methods for merging more granular informations
WO2020252950A1 (en) * 2019-06-17 2020-12-24 五邑大学 Named entity recognition method for medical texts based on pre-training model and fine turning technology
US20210021621A1 (en) * 2019-07-16 2021-01-21 Hewlett Packard Enterprise Development Lp Methods and systems for using embedding from natural language processing (nlp) for enhanced network analytics
CN111310470A (en) * 2020-01-17 2020-06-19 西安交通大学 Chinese named entity recognition method fusing word and word features
CN111460820A (en) * 2020-03-06 2020-07-28 中国科学院信息工程研究所 Network space security domain named entity recognition method and device based on pre-training model BERT
CN111709241A (en) * 2020-05-27 2020-09-25 西安交通大学 Named entity identification method oriented to network security field
CN111783462A (en) * 2020-06-30 2020-10-16 大连民族大学 Chinese named entity recognition model and method based on dual neural network fusion
CN111914097A (en) * 2020-07-13 2020-11-10 吉林大学 Entity extraction method and device based on attention mechanism and multi-level feature fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
沈思等: "基于深度学习的食品安全事件实体自动抽取模型研究", 《信息与电脑(理论版)》 *
陆以勤 等: "SDN拓扑攻击及其防御", 《华南理工大学学报(自然科学版)》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673219A (en) * 2021-08-20 2021-11-19 合肥中科类脑智能技术有限公司 Power failure plan text analysis method
CN113743104A (en) * 2021-08-31 2021-12-03 合肥智能语音创新发展有限公司 Entity linking method and related device, electronic equipment and storage medium
CN113743104B (en) * 2021-08-31 2024-04-16 合肥智能语音创新发展有限公司 Entity linking method, related device, electronic equipment and storage medium
CN114297987A (en) * 2022-03-09 2022-04-08 杭州实在智能科技有限公司 Document information extraction method and system based on text classification and reading understanding
CN114297987B (en) * 2022-03-09 2022-07-19 杭州实在智能科技有限公司 Document information extraction method and system based on text classification and reading understanding
CN115587594A (en) * 2022-09-20 2023-01-10 广东财经大学 Network security unstructured text data extraction model training method and system
CN115587594B (en) * 2022-09-20 2023-06-30 广东财经大学 Unstructured text data extraction model training method and system for network security
CN115687754A (en) * 2022-10-21 2023-02-03 四川大学 Active network information mining method based on intelligent conversation
CN115687754B (en) * 2022-10-21 2024-01-23 四川大学 Active network information mining method based on intelligent dialogue

Also Published As

Publication number Publication date
CN112989831B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
Li et al. A stacking model using URL and HTML features for phishing webpage detection
CN112989831B (en) Entity extraction method applied to network security field
CN109005145B (en) Malicious URL detection system and method based on automatic feature extraction
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
KR102452123B1 (en) Apparatus for Building Big-data on unstructured Cyber Threat Information, Method for Building and Analyzing Cyber Threat Information
CN112307473A (en) Malicious JavaScript code detection model based on Bi-LSTM network and attention mechanism
CN113596007B (en) Vulnerability attack detection method and device based on deep learning
CN111931935B (en) Network security knowledge extraction method and device based on One-shot learning
Liu et al. Multi-scale semantic deep fusion models for phishing website detection
CN112148956A (en) Hidden net threat information mining system and method based on machine learning
Gong et al. Model uncertainty based annotation error fixing for web attack detection
CN111538893B (en) Method for extracting network security new words from unstructured data
Kim et al. Towards attention based vulnerability discovery using source code representation
CN112445862A (en) Internet of things equipment data set construction method and device, electronic equipment and storage medium
CN115567306B (en) APT attack traceability analysis method based on bidirectional long-short-term memory network
CN116702143A (en) Intelligent malicious software detection method based on API (application program interface) characteristics
Li et al. PipCKG-BS: A Method to Build Cybersecurity Knowledge Graph for Blockchain Systems via the Pipeline Approach
CN110413909B (en) Machine learning-based intelligent identification method for online firmware of large-scale embedded equipment
Zhu et al. SQL Injection Attack Detection Framework Based on HTTP Traffic
Khan Detecting phishing attacks using nlp
Zhang et al. Survey of research on named entity recognition in cyber threat intelligence
Sithole et al. Attributes extraction for fine-grained differentiation of the Internet of Things patterns
Wan et al. Generation of malicious webpage samples based on GAN
Zhen et al. Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF.
CN117278322B (en) Web intrusion detection method, device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant