WO2023138047A1 - Cyber threat information extraction method, device, storage medium, and apparatus - Google Patents

Cyber threat information extraction method, device, storage medium, and apparatus Download PDF

Info

Publication number
WO2023138047A1
WO2023138047A1 PCT/CN2022/113831 CN2022113831W WO2023138047A1 WO 2023138047 A1 WO2023138047 A1 WO 2023138047A1 CN 2022113831 W CN2022113831 W CN 2022113831W WO 2023138047 A1 WO2023138047 A1 WO 2023138047A1
Authority
WO
WIPO (PCT)
Prior art keywords
threat
sentence
attack
threat information
network
Prior art date
Application number
PCT/CN2022/113831
Other languages
French (fr)
Chinese (zh)
Inventor
唐杰
吴龙平
莫建平
余凯
Original Assignee
三六零科技集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三六零科技集团有限公司 filed Critical 三六零科技集团有限公司
Publication of WO2023138047A1 publication Critical patent/WO2023138047A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to the technical field of the Internet, in particular to a network threat information extraction method, equipment, storage medium and device.
  • the main purpose of the present invention is to provide a network threat information extraction method, equipment, storage medium and device, aiming to solve the technical problem in the prior art that analyzing complex and unstructured threat analysis reports is very time-consuming and laborious due to the lack of standard structured language description and automatic extraction and analysis technology of network threat report technical and tactical intelligence.
  • the present invention provides a network threat information extraction method
  • the network threat information extraction method includes the following steps:
  • the step of performing natural language processing on the unstructured network threat information to obtain the attack purpose and attack means includes:
  • the attack purpose and attack means are determined according to the target corpus.
  • the step of performing semantic dependency analysis on the threat sentence to obtain the dependency relationship between each vocabulary in the threat sentence includes:
  • the threat sentence is standardized according to the dependency relationship to obtain a standard threat sentence.
  • the step of standardizing the threat statement according to the dependency relationship to obtain a standard threat statement includes:
  • the step of performing synonym expansion on the threat corpus to obtain a target corpus includes:
  • the threat keywords are synonymously expanded based on a preset dictionary to obtain a target corpus.
  • the step of performing in-depth sentence segmentation on the simplified network threat information to obtain a threat sentence includes:
  • the simplified network threat information is segmented in depth according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction to obtain a threat sentence.
  • the step of performing lexical tagging on the standard threat sentence to obtain a threat corpus includes:
  • the standard threat sentence is simplified according to the necessary score to obtain a threat corpus.
  • the step of performing text preprocessing on unstructured network threat information to obtain simplified network threat information further includes:
  • the random information is simplified to obtain simplified network threat information.
  • the step of predicting the attack means of the attack purpose through the preset machine learning model, and obtaining the unknown attack means it also includes:
  • the initial machine learning model is trained according to the training set corpus to obtain a preset machine learning model.
  • the step of constructing a training set corpus according to the target corpus includes:
  • a training set corpus is constructed based on the threat sentence samples.
  • the step of selecting a threat sentence sample from the target corpus according to the number of sentences includes:
  • the step of predicting the attack means of the attack purpose through a preset machine learning model, and obtaining unknown attack means includes:
  • the attack method is predicted for the attack purpose through a preset machine learning model, and an unknown attack method is obtained.
  • the present invention also proposes a network threat information extraction device, the network threat information extraction device includes a memory, a processor, and a network threat information extraction program stored in the memory and operable on the processor, the network threat information extraction program is configured to implement the network threat information extraction method as described above.
  • the present invention also proposes a storage medium on which a network threat information extraction program is stored, and when the network threat information extraction program is executed by a processor, the network threat information extraction method as described above is realized.
  • the present invention also proposes a network threat information extraction device, the network threat information extraction device includes: a language processing module, a means prediction module and an information generation module;
  • the language processing module is used to perform natural language processing on unstructured network threat information to obtain attack purposes and attack methods;
  • the method prediction module is used to predict the attack method for the attack purpose through a preset machine learning model, and obtain an unknown attack method;
  • the information generating module is configured to generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
  • the language processing module is further configured to perform text preprocessing on unstructured network threat information to obtain simplified network threat information;
  • the language processing module is further configured to perform in-depth sentence segmentation on the simplified network threat information to obtain threat sentences;
  • the language processing module is further configured to perform semantic dependency analysis on the threat sentence to obtain a standard threat sentence
  • the language processing module is further configured to perform lexical tagging on the standard threat sentence to obtain a threat corpus;
  • the language processing module is further configured to expand synonyms to the threat corpus to obtain a target corpus;
  • the language processing module is further configured to determine an attack purpose and an attack method according to the target corpus.
  • the language processing module is further configured to perform semantic dependency analysis on the threat sentence to obtain the dependency relationship between the words in the threat sentence;
  • the language processing module is further configured to standardize the threat sentence according to the dependency relationship to obtain a standard threat sentence.
  • the language processing module is further configured to obtain part-of-speech information of each vocabulary in the threat sentence;
  • the language processing module is further configured to standardize the threat sentence according to the part-of-speech information and the dependency relationship to obtain a standard threat sentence.
  • the language processing module is further configured to obtain the frequency of occurrence of each keyword in the threat corpus, and determine the threat keyword according to the frequency of occurrence;
  • the language processing module is further configured to perform synonym expansion on the threat keywords based on a preset dictionary to obtain a target corpus.
  • the language processing module is further configured to acquire sentence-ending symbols, coordinating relative conjunctions, and progressive relative conjunctions in the simplified network threat information;
  • the language processing module is further configured to perform in-depth sentence segmentation on the simplified network threat information according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction, to obtain a threat sentence.
  • the present invention it is disclosed that natural language processing is performed on unstructured network threat information to obtain the attack purpose and attack means, and the attack means are predicted by a preset machine learning model to obtain unknown attack means, and structured network threat information is generated according to the attack purpose, attack means, and unknown attack means; since the present invention automatically identifies and extracts the attack purpose and attack means of the attacker in the unstructured network threat information based on natural language processing and preset machine learning models, the analysis process of network threat information can be simplified, and the security defense capability can be improved.
  • FIG. 1 is a schematic structural diagram of a network threat information extraction device in a hardware operating environment involved in the solution of an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for extracting network threat information according to the present invention
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for extracting network threat information according to the present invention
  • FIG. 4 is a schematic flowchart of a third embodiment of a method for extracting network threat information according to the present invention.
  • FIG. 5 is a schematic diagram of semantic dependency analysis of an embodiment of the network threat information extraction method of the present invention.
  • FIG. 6 is a schematic flowchart of a fourth embodiment of a method for extracting network threat information according to the present invention.
  • Fig. 7 is a structural block diagram of the first embodiment of the device for extracting network threat information according to the present invention.
  • FIG. 1 is a schematic structural diagram of a device for extracting network threat information in a hardware operating environment according to an embodiment of the present invention.
  • the network threat information extraction device may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • a processor 1001 such as a central processing unit (Central Processing Unit, CPU)
  • a communication bus 1002 is used to realize connection and communication between these components.
  • the user interface 1003 may include a display screen (Display).
  • the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the wired interface of the user interface 1003 may be a USB interface in the present invention.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a Wireless-Fidelity (Wi-Fi) interface).
  • Wi-Fi Wireless-Fidelity
  • the memory 1005 may be a high-speed random access memory (Random Access Memory, RAM) memory, or a stable memory (Non-volatile Memory, NVM), such as a disk memory.
  • RAM Random Access Memory
  • NVM Non-volatile Memory
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
  • FIG. 1 does not constitute a limitation on the network threat information extraction device, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.
  • the memory 1005 identified as a computer storage medium may include an operating system, a network communication module, a user interface module, and a network threat information extraction program.
  • the network interface 1004 is mainly used to connect to a background server and perform data communication with the background server;
  • the user interface 1003 is mainly used to connect to a user device;
  • the network threat information extraction device calls the network threat information extraction program stored in the memory 1005 through the processor 1001, and executes the network threat information extraction method provided by the embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of the first embodiment of the network threat information extraction method of the present invention, and proposes the first embodiment of the network threat information extraction method of the present invention.
  • the method for extracting network threat information includes the following steps:
  • Step S10 Perform natural language processing on unstructured network threat information to obtain attack purpose and attack means.
  • the execution subject of the method in this embodiment may be a network threat information extraction device with functions of data processing, network communication, and program operation, such as a server, or other electronic devices capable of achieving the same or similar functions, which is not limited in this embodiment.
  • the existing TRAM project implemented by MITER based on machine learning (ML) technology and the TTPDrill project implemented by the University of North Carolina based on information retrieval (IR) technology can process threat analysis reports relatively easily, but because the processing methods are based on English reports, they cannot be applied to Chinese threat analysis reports with complex and changeable descriptions. Moreover, the output accuracy and false alarm rate of these two projects are not ideal.
  • the attacker's attack purpose and attack means in unstructured network threat information are automatically identified and extracted, thereby simplifying the analysis process of network threat information and improving security defense capabilities.
  • the natural language processing may be at least one of text preprocessing, text deep sentence segmentation, sentence semantic dependency analysis, vocabulary tokenization, and vocabulary synonym expansion, which is not limited in this embodiment.
  • the attack purpose may be the techniques and tactics adopted by the attacker in the unstructured network threat information, for example, the techniques and tactics may be that the virus continues to run on the computer.
  • the attack means can be the attack implementation process of the attacker in the unstructured network threat information.
  • the attack implementation process can be modifying the registry or booting automatically.
  • Step S20 Predict the attack means of the attack purpose through the preset machine learning model, and obtain the unknown attack means.
  • the preset machine learning model can be preset.
  • a Bag of Words (BOW) model is used as an example for illustration.
  • the bag-of-words model puts all words into a bag, regardless of their grammar and word order, that is, each word is independent.
  • Step S30 Generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
  • generating structured network threat information according to attack purpose, attack means and unknown attack means may be to aggregate attack purpose, attack means and unknown attack means to obtain structured network threat information.
  • the attack purpose is predicted by the preset machine learning model to obtain unknown attack means
  • structured network threat information is generated according to the attack purpose, attack means and unknown attack means; since this embodiment automatically recognizes and extracts the attack purpose and attack means of the attacker in the unstructured network threat information based on natural language processing and preset machine learning models, the analysis process of network threat information can be simplified, and security defense capabilities can be improved.
  • FIG. 3 is a schematic flowchart of a second embodiment of the network threat information extraction method of the present invention. Based on the first embodiment shown in FIG. 2 above, a second embodiment of the network threat information extraction method of the present invention is proposed.
  • step S10 includes:
  • Step S101 Perform text preprocessing on unstructured network threat information to obtain simplified network threat information.
  • text preprocessing may be performed on unstructured network threat information first, then in-depth sentence segmentation, semantic dependency analysis, vocabulary tagging, and synonym expansion may be performed to obtain the attack purpose and attack means of the attacker in the unstructured network threat information.
  • performing text preprocessing on unstructured network threat information to obtain simplified network threat information may be obtaining random information in the unstructured network threat information, performing simplified processing on the random information, and obtaining simplified network threat information.
  • Step S102 Perform in-depth sentence segmentation on the simplified network threat information to obtain threat sentences.
  • each sentence to be analyzed independently expresses a technique, tactics and implementation process (TTP).
  • TTP technique, tactics and implementation process
  • the simplified network threat information can be segmented in depth to obtain threat sentences.
  • performing in-depth sentence segmentation on the simplified network threat information to obtain the threat sentence may be to obtain the sentence end symbols, coordinating relative conjunctions, and progressive relative conjunctions in the simplified network threat information, and perform deep sentence segmentation on the simplified network threat information according to the sentence end symbols, parallel relative conjunctions, and progressive relative conjunctions to obtain the threat sentence.
  • Step S103 Perform semantic dependency analysis on the threat sentence to obtain a standard threat sentence.
  • semantic dependency analysis may also be performed on the threat sentence to obtain the dependency relationship between each vocabulary in the threat sentence, and standardize the threat sentence according to the dependency relationship to obtain a standard threat sentence.
  • performing semantic dependency analysis on the threat sentence to obtain a standard threat sentence may be performing semantic dependency analysis on the threat sentence to obtain a dependency relationship between each vocabulary in the threat sentence, and standardizing the threat sentence according to the dependency relationship to obtain a standard threat sentence.
  • Step S104 Perform vocabulary tagging on the standard threat sentences to obtain a threat corpus.
  • the necessary scores of each part of the standard threat sentence can be obtained first, and then the standard threat sentence can be simplified according to the necessary score to obtain the threat corpus.
  • the lexical tagging of the standard threat sentences to obtain the threat corpus may be to obtain the necessary scores of each part of the standard threat sentences, and simplify the standard threat sentences according to the necessary scores to obtain the threat corpus.
  • Step S105 performing synonym expansion on the threat corpus to obtain a target corpus.
  • synonym expansion may be performed on high-frequency keywords in the threat corpus.
  • the synonym expansion of the threat corpus and the acquisition of the target corpus may be obtained by obtaining the occurrence frequency of each keyword in the threat corpus, determining the threat keywords according to the frequency of occurrence, and performing synonym expansion on the threat keywords based on a preset dictionary to obtain the target corpus.
  • Step S106 Determine the attack purpose and attack means according to the target corpus.
  • the attack purpose can be the techniques and tactics adopted by the attacker in the unstructured network threat information, for example, the technique and tactics can be that the virus continues to run on the computer.
  • the attack means can be the attack implementation process of the attacker in the unstructured network threat information.
  • the attack implementation process can be modifying the registry or booting automatically.
  • text preprocessing of unstructured network threat information is disclosed, simplified network threat information is obtained, simplified network threat information is subjected to in-depth sentence segmentation, threat sentences are obtained, threat sentences are subjected to semantic dependency analysis, standard threat sentences are obtained, standard threat sentences are lexically marked, threat corpus is obtained, threat corpus is synonymously expanded, target corpus is obtained, and attack purpose and attack means are determined according to the target corpus; since this embodiment first performs text preprocessing on unstructured network threat information, and then performs deep sentence segmentation , then semantic dependency analysis, vocabulary tagging, and synonym expansion to obtain the attack purpose and attack means of the attacker in the unstructured network threat information, so as to improve the processing effect of natural language processing.
  • step S20 includes:
  • Step S201 Obtain multi-platform network threat information.
  • multi-platform network threat information may be obtained first, and then based on the multi-platform network threat information, the attack method is predicted for the attack purpose through a preset machine learning model to obtain unknown attack methods.
  • the multi-platform network threat information may be network threat information detected and obtained by multiple security platforms.
  • Step S202 Based on the multi-platform network threat information, predict the attack means of the attack target through a preset machine learning model, and obtain unknown attack means.
  • the preset machine learning model can be preset.
  • a Bag of Words (BOW) model is used as an example for illustration.
  • multi-platform network threat information is obtained, and based on the multi-platform network threat information, attack means are predicted for the attack purpose through a preset machine learning model to obtain unknown attack means; since this embodiment predicts attack means based on multi-platform network threat information, multi-dimensional unknown attack means can be obtained.
  • FIG. 4 is a schematic flowchart of a third embodiment of the network threat information extraction method of the present invention. Based on the second embodiment shown in FIG. 3 above, a third embodiment of the network threat information extraction method of the present invention is proposed.
  • step S101 includes:
  • Step S1011 Obtain random information in unstructured network threat information.
  • random information in unstructured network threat information may be obtained first, and then the random information may be simplified to obtain simplified network threat information.
  • Step S1012 Simplify the random information to obtain simplified network threat information.
  • the unstructured network threat information is "a Trojan horse program that releases a normal TP program TPHelper.exe and a malicious TPHelperBase.dll in the %TEMP% directory after it runs to constitute a dll hijacking.”
  • the simplified network threat information "a Trojan horse program that releases a normal TP program EXE file and a malicious DLL file in a specific directory to constitute a dll hijacking" is obtained after it runs.
  • the random information in the unstructured network threat information is obtained, and the random information is simplified to simplify the network threat information; because this embodiment first obtains the random information in the unstructured network threat information, and then performs simplified processing on the random information to obtain the simplified network threat information, thereby reducing the input randomness and improving the information processing speed.
  • step S102 includes:
  • Step S1021 Obtain the sentence-end symbols, coordinating relative conjunctions and progressive relative conjunctions in the simplified network threat information.
  • each sentence to be analyzed independently expresses a technique, tactics and implementation process (TTP).
  • TTP technique, tactics and implementation process
  • the simplified network threat information can be segmented in depth to obtain threat sentences.
  • Coordinating relative conjunctions can include and, and, with, and, etc.
  • Progressive relative conjunctions can include not only, not only, but also, not to mention, etc.
  • Step S1022 Perform in-depth sentence segmentation on the simplified network threat information according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction to obtain a threat sentence.
  • the simplified network threat information is segmented in depth according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction, and the threat sentence is obtained by obtaining the position of the sentence end symbol, the coordinate relative conjunction and the progressive relative conjunction in the simplified network threat information, and performing deep sentence segmentation on the simplified network threat information according to the position to obtain the threat sentence.
  • the third embodiment it is disclosed to obtain the sentence end symbols, coordinating relative conjunctions and progressive relative conjunctions in the simplified network threat information, and perform in-depth sentence segmentation on the simplified network threat information according to the sentence end symbols, coordinating relative conjunctions and progressive relative conjunctions to obtain threat sentences; because this embodiment performs deep sentence segmentation on the simplified network threat information to obtain threat sentences, the network threat information can be simplified, ensuring that each sentence to be analyzed independently expresses a technical strategy and an implementation process.
  • step S103 includes:
  • Step S1031 Perform semantic dependency analysis on the threat sentence to obtain the dependency relationship between the words in the threat sentence.
  • semantic dependency analysis may also be performed on the threat sentence to obtain the dependency relationship between each vocabulary in the threat sentence, and standardize the threat sentence according to the dependency relationship to obtain a standard threat sentence.
  • dependency relationship may be a dependency relationship between parent and child words.
  • Step S1032 Perform standardization processing on the threat sentence according to the dependency relationship to obtain a standard threat sentence.
  • the subject words and the words and sentences can be standardized and unified.
  • the step S1032 includes:
  • FIG. 5 is a schematic diagram of semantic dependency analysis.
  • the threat sentence is "the Trojan will send the obtained keyboard log to a configurable email address”
  • ROOT represents the root node, which is the core node of the whole sentence
  • mDEPD represents the auxiliary word
  • FEAT represents the modifier
  • PAT represents the object of the subject operation (the object changes)
  • rPAT represents the object of the subject operation (the object changes, passive)
  • CONT represents the object of the subject operation (the object does not change significantly)
  • rCONT represents the object of the subject operation (the object changes).
  • mRELA represents conjunctions and prepositions, such as but, and etc.
  • AGT represents the subject
  • LOC represents space
  • mPUNC represents punctuation marks.
  • the semantic dependency analysis is performed on the threat sentence to obtain the dependency relationship between the vocabulary in the threat sentence, and the threat sentence is standardized according to the dependency relationship to obtain the standard threat sentence; because the semantic dependency analysis is performed on the threat sentence in this embodiment, the dependency relationship between each vocabulary in the threat sentence is obtained, and the threat sentence is standardized according to the dependency relationship to obtain the standard threat sentence, so that complex and changeable description methods can be standardized and unified.
  • step S104 includes:
  • Step S1041 Obtain the necessary scores of each part in the standard threat sentence.
  • the necessary scores of each part of the standard threat sentence can be obtained first, and then the standard threat sentence can be simplified according to the necessary score to obtain the threat corpus.
  • the necessary score is used to measure the degree of necessity of each vocabulary in the threat sentence in the sentence.
  • Step S1042 Simplify the standard threat sentence according to the necessary score to obtain a threat corpus.
  • the standard threat sentence is "the Trojan horse will send the obtained keyboard logs to a configurable email address”.
  • the threat corpus "send keyboard logs to an email address" in the threat corpus is obtained.
  • the necessary scores of each part of the standard threat sentence are obtained, and the standard threat sentence is simplified according to the necessary score to obtain the threat corpus; since this embodiment first obtains the necessary score of each part of the standard threat sentence, and then according to the necessary score, the standard threat sentence is simplified to obtain the threat corpus, so that the threat corpus can be converged.
  • step S105 includes:
  • Step S1051 Obtain the occurrence frequency of each keyword in the threat corpus, and determine the threat keyword according to the occurrence frequency.
  • synonym expansion may be performed on high-frequency keywords in the threat corpus.
  • determining the threat keyword according to the frequency of occurrence may be sorting the keywords according to the frequency of occurrence, and determining the threat keyword according to the ranking result.
  • Step S1052 Perform synonym expansion on the threat keywords based on a preset dictionary to obtain a target corpus.
  • the preset dictionary can be preset, and synonyms corresponding to each keyword can be stored in the preset dictionary.
  • the "account name in the Trojan horse collection domain" in the threat corpus can be expanded to "user name in the Trojan horse collection domain", "user account in the Trojan horse collection domain”, and "user login name in the Trojan horse harvesting domain”.
  • the frequency of occurrence of each keyword in the threat corpus is obtained, and the threat keyword is determined according to the frequency of occurrence, and the threat keyword is synonymously expanded based on the preset dictionary to obtain the target corpus; since this embodiment performs synonym expansion on the high-frequency keywords in the threat corpus, the recall rate of subsequent model predictions can be improved.
  • FIG. 6 is a schematic flowchart of a fourth embodiment of the network threat information extraction method of the present invention. Based on the second embodiment shown in FIG. 3 above, the fourth embodiment of the network threat information extraction method of the present invention is proposed.
  • step S201 before step S201, it also includes:
  • Step S110 Construct a training set corpus according to the target corpus.
  • the initial machine learning model may be trained first to obtain the preset machine learning model.
  • constructing the training set corpus according to the target corpus may be to randomly select training samples from the target corpus to construct the training set corpus.
  • the step S110 includes:
  • a training set corpus is constructed based on the threat sentence samples.
  • a synonymous threat statement may be a statement with the same semantics.
  • selecting the threat sentence samples from the target corpus according to the number of sentences may be sorting the synonymous threat sentences according to the number of sentences in descending order, and using the top-ranked preset number of synonymous threat sentences as the threat sentence samples.
  • the said threat sentence sample is selected from the target corpus according to the number of sentences, including:
  • the user may also input semantic tags to mark each threat sentence in the target corpus.
  • the selection of threat sentence samples from the target corpus according to the sorting results and semantic labels may be a preset number of synonymous threat sentences that are ranked first and have preset semantic labels as threat sentence samples.
  • the preset label may be preset, which is not limited in this embodiment.
  • Step S120 Train the initial machine learning model according to the training set corpus to obtain a preset machine learning model.
  • training the initial machine learning model according to the training set corpus to obtain the preset machine learning model may be inputting each threat sentence sample in the training set corpus into the initial machine learning model, and adjusting the initial machine learning model according to the output results, so as to train the initial machine learning model and obtain the preset machine learning model.
  • the training set corpus is constructed according to the target corpus, and the initial machine learning model is trained according to the training set corpus to obtain the preset machine learning model; since this example pre-trains the initial machine learning model to obtain the preset machine learning model, thereby improving the accuracy of the preset machine learning model.
  • an embodiment of the present invention also proposes a storage medium, on which a network threat information extraction program is stored, and when the network threat information extraction program is executed by a processor, the network threat information extraction method as described above is implemented.
  • an embodiment of the present invention also proposes a network threat information extraction device, the network threat information extraction device includes: a language processing module 10, a method prediction module 20, and an information generation module 30;
  • the language processing module 10 is configured to perform natural language processing on unstructured network threat information to obtain attack purpose and attack means.
  • the existing TRAM project implemented by MITER based on machine learning (ML) technology and the TTPDrill project implemented by the University of North Carolina based on information retrieval (IR) technology can process threat analysis reports relatively easily, but because the processing methods are based on English reports, they cannot be applied to Chinese threat analysis reports with complex and changeable descriptions. Moreover, the output accuracy and false alarm rate of these two projects are not ideal.
  • the attacker's attack purpose and attack means in unstructured network threat information are automatically identified and extracted, thereby simplifying the analysis process of network threat information and improving security defense capabilities.
  • the natural language processing may be at least one of text preprocessing, text deep sentence segmentation, sentence semantic dependency analysis, vocabulary tokenization, and vocabulary synonym expansion, which is not limited in this embodiment.
  • the purpose of the attack may be the techniques and tactics adopted by the attacker in the unstructured network threat information, for example, the techniques and tactics may be that the virus continues to run on the computer.
  • the attack means can be the attack implementation process of the attacker in the unstructured network threat information.
  • the attack implementation process can be modifying the registry or booting automatically.
  • the method prediction module 20 is configured to predict the attack method for the attack purpose through a preset machine learning model, and obtain an unknown attack method.
  • the preset machine learning model can be preset.
  • a Bag of Words (BOW) model is used as an example for illustration.
  • the bag-of-words model puts all words into a bag, regardless of their grammar and word order, that is, each word is independent.
  • the information generating module 30 is configured to generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
  • generating structured network threat information based on attack purpose, attack means and unknown attack means can be to aggregate attack purpose, attack means and unknown attack means to obtain structured network threat information.
  • the computer software products are stored in a storage medium (such as Read only Memory Image (ROM)/random access memory (RAM), magnetic magnetic (RAM), and magnetic.
  • ROM Read only Memory Image
  • RAM random access memory
  • RAM magnetic magnetic
  • the disc, discs there are several instructions to enable a terminal device (can be a mobile phone, computer, server, air conditioner, or network device, etc.) to perform the methods described in each embodiment of the present invention.
  • the present invention discloses A1.
  • a method for extracting network threat information includes the following steps:
  • A2 The method for extracting network threat information as described in A1, wherein the steps of performing natural language processing on unstructured network threat information to obtain attack purpose and attack means include:
  • the attack purpose and attack means are determined according to the target corpus.
  • the method for extracting network threat information as described in A2 includes:
  • the threat sentence is standardized according to the dependency relationship to obtain a standard threat sentence.
  • A4 The method for extracting network threat information as described in A3, wherein the step of standardizing the threat statement according to the dependency relationship to obtain a standard threat statement includes:
  • the method for extracting network threat information as described in A2, the step of performing synonym expansion on the threat corpus to obtain the target corpus includes:
  • the threat keywords are synonymously expanded based on a preset dictionary to obtain a target corpus.
  • the method for extracting network threat information as described in A2, the step of performing in-depth sentence segmentation on the simplified network threat information to obtain a threat sentence includes:
  • the simplified network threat information is segmented in depth according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction to obtain a threat sentence.
  • the method for extracting network threat information as described in A2, wherein the step of performing lexical tagging on the standard threat sentence and obtaining a threat corpus includes:
  • the standard threat sentence is simplified according to the necessary score to obtain a threat corpus.
  • A8 The method for extracting network threat information as described in A2, wherein the step of performing text preprocessing on unstructured network threat information to obtain simplified network threat information further includes:
  • the random information is simplified to obtain simplified network threat information.
  • the method for extracting network threat information as described in A2, before the step of predicting the attack means of the attack purpose through the preset machine learning model, and obtaining the unknown attack means it also includes:
  • the initial machine learning model is trained according to the training set corpus to obtain a preset machine learning model.
  • the method for extracting network threat information as described in A9 includes:
  • a training set corpus is constructed based on the threat sentence samples.
  • the method for extracting network threat information as described in A10, the step of selecting a threat sentence sample from the target corpus according to the number of sentences includes:
  • the network threat information extraction method described in any one of A1 to A11, the step of predicting the attack means for the attack purpose through a preset machine learning model, and obtaining unknown attack means including:
  • the attack method is predicted for the attack purpose through a preset machine learning model, and an unknown attack method is obtained.
  • the present invention also discloses B13, a network threat information extraction device.
  • the network threat information extraction device includes: a memory, a processor, and a network threat information extraction program stored in the memory and operable on the processor. When the network threat information extraction program is executed by the processor, the network threat information extraction method as described above is realized.
  • the present invention also discloses C14, a storage medium, on which a network threat information extraction program is stored, and when the network threat information extraction program is executed by a processor, the above-mentioned network threat information extraction method is realized.
  • the present invention also discloses D15, a network threat information extraction device, the network threat information extraction device includes: a language processing module, a means prediction module and an information generation module;
  • the language processing module is used to perform natural language processing on unstructured network threat information to obtain attack purposes and attack means;
  • the method prediction module is used to predict the attack method for the attack purpose through a preset machine learning model, and obtain an unknown attack method;
  • the information generating module is configured to generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
  • D16 The device for extracting network threat information as described in D15, wherein the language processing module is further configured to perform text preprocessing on unstructured network threat information to obtain simplified network threat information;
  • the language processing module is further configured to perform in-depth sentence segmentation on the simplified network threat information to obtain threat sentences;
  • the language processing module is further configured to perform semantic dependency analysis on the threat sentence to obtain a standard threat sentence
  • the language processing module is further configured to perform lexical tagging on the standard threat sentence to obtain a threat corpus;
  • the language processing module is further configured to expand synonyms to the threat corpus to obtain a target corpus;
  • the language processing module is further configured to determine an attack purpose and an attack method according to the target corpus.
  • D17 The device for extracting network threat information as described in D16, wherein the language processing module is further configured to perform semantic dependency analysis on the threat sentence, and obtain a dependency relationship between each vocabulary in the threat sentence;
  • the language processing module is further configured to standardize the threat sentence according to the dependency relationship to obtain a standard threat sentence.
  • D18 The device for extracting network threat information as described in D17, wherein the language processing module is further configured to obtain part-of-speech information of each vocabulary in the threat sentence;
  • the language processing module is further configured to standardize the threat sentence according to the part-of-speech information and the dependency relationship to obtain a standard threat sentence.
  • D19 The device for extracting network threat information as described in D16, wherein the language processing module is further configured to obtain the frequency of occurrence of each keyword in the threat corpus, and determine the threat keyword according to the frequency of occurrence;
  • the language processing module is further configured to perform synonym expansion on the threat keywords based on a preset dictionary to obtain a target corpus.
  • D20 The device for extracting network threat information as described in D16, wherein the language processing module is further configured to acquire the sentence-end symbols, coordinating relative conjunctions, and progressive relative conjunctions in the simplified network threat information;
  • the language processing module is further configured to perform in-depth sentence segmentation on the simplified network threat information according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction, to obtain a threat sentence.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Virology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to the field of internet technologies, and disclosed are a cyber threat information extraction method, a device, a storage medium, and an apparatus. The method comprises: performing natural language processing on unstructured cyber threat information to obtain a purpose of attack and a means of attack; performing means of attack prediction with respect to the purpose of attack by means of a preset machine learning model to obtain a knowledge-less means of attack; and generating structured cyber threat information according to the purpose of attack, the means of attack, and the knowledge-less means of attack. The purpose of attack and the means of attack of an attacker are automatically identified and extracted from the unstructured cyber threat information on the basis of natural language processing and the preset machine learning model, and consequently a process of analyzing the cyber threat information can be simplified, and a security and defense capability can also be improved.

Description

网络威胁信息提取方法、设备、存储介质及装置Network threat information extraction method, equipment, storage medium and device 技术领域technical field
本发明涉及互联网技术领域,尤其涉及一种网络威胁信息提取方法、设备、存储介质及装置。The present invention relates to the technical field of the Internet, in particular to a network threat information extraction method, equipment, storage medium and device.
背景技术Background technique
随着网络威胁攻击的爆炸式增长,威胁分析报告中攻击者所使用的技战术及攻击实现过程(TTP)的相关情报的提取和共享对网络安全建设显得至关重要。然而,由于缺乏标准结构化语言描述和网络威胁报告技战术情报的自动化提取及分析技术,分析复杂和非结构化的威胁分析报告非常耗时费力。With the explosive growth of network threat attacks, the extraction and sharing of information related to the techniques and tactics used by the attackers and the attack realization process (TTP) in the threat analysis report is crucial to the construction of network security. However, due to the lack of standard structured language description and automatic extraction and analysis technology of network threat reporting technical and tactical intelligence, analyzing complex and unstructured threat analysis reports is very time-consuming and laborious.
上述内容仅用于辅助理解本发明的技术方案,并不代表承认上述内容是现有技术。The above content is only used to assist in understanding the technical solution of the present invention, and does not mean that the above content is admitted as prior art.
发明内容Contents of the invention
本发明的主要目的在于提供一种网络威胁信息提取方法、设备、存储介质及装置,旨在解决现有技术中由于缺乏标准结构化语言描述和网络威胁报告技战术情报的自动化提取及分析技术,分析复杂和非结构化的威胁分析报告非常耗时费力的技术问题。The main purpose of the present invention is to provide a network threat information extraction method, equipment, storage medium and device, aiming to solve the technical problem in the prior art that analyzing complex and unstructured threat analysis reports is very time-consuming and laborious due to the lack of standard structured language description and automatic extraction and analysis technology of network threat report technical and tactical intelligence.
为实现上述目的,本发明提供一种网络威胁信息提取方法,所述网络威胁信息提取方法包括以下步骤:In order to achieve the above object, the present invention provides a network threat information extraction method, the network threat information extraction method includes the following steps:
对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段;Perform natural language processing on unstructured network threat information to obtain attack purpose and attack means;
通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段;Predict the attack method of the attack purpose through the preset machine learning model, and obtain the unknown attack method;
根据所述攻击目的、所述攻击手段以及所述未知攻击手段生成结构化网络威胁信息。Generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
可选地,所述对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段的步骤,包括:Optionally, the step of performing natural language processing on the unstructured network threat information to obtain the attack purpose and attack means includes:
对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息;Perform text preprocessing on unstructured network threat information to obtain simplified network threat information;
对所述简化网络威胁信息进行深层次断句,获得威胁语句;performing in-depth sentence segmentation on the simplified network threat information to obtain threat sentences;
对所述威胁语句进行语义依存分析,获得标准威胁语句;Performing semantic dependency analysis on the threat sentence to obtain a standard threat sentence;
对所述标准威胁语句进行词汇标记,获得威胁语料库;performing lexical tagging on the standard threat sentence to obtain a threat corpus;
对所述威胁语料库进行同义词扩充,获得目标语料库;performing synonym expansion on the threat corpus to obtain a target corpus;
根据所述目标语料库确定攻击目的和攻击手段。The attack purpose and attack means are determined according to the target corpus.
可选地,所述对所述威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系的步骤,包括:Optionally, the step of performing semantic dependency analysis on the threat sentence to obtain the dependency relationship between each vocabulary in the threat sentence includes:
对所述威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系;Performing semantic dependency analysis on the threat sentence to obtain the dependency relationship between the words in the threat sentence;
根据所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。The threat sentence is standardized according to the dependency relationship to obtain a standard threat sentence.
可选地,所述根据所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句的步骤,包括:Optionally, the step of standardizing the threat statement according to the dependency relationship to obtain a standard threat statement includes:
获取所述威胁语句中各词汇的词性信息;Obtain the part-of-speech information of each vocabulary in the threat sentence;
根据所述词性信息和所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。Standardize the threat sentence according to the part-of-speech information and the dependency relationship to obtain a standard threat sentence.
可选地,所述对所述威胁语料库进行同义词扩充,获得目标语料库的步骤,包括:Optionally, the step of performing synonym expansion on the threat corpus to obtain a target corpus includes:
获取所述威胁语料库中各关键词的出现频率,并根据所述出现频率确定威胁关键词;Obtain the frequency of occurrence of each keyword in the threat corpus, and determine the threat keyword according to the frequency of occurrence;
基于预设词典对所述威胁关键词进行同义词扩充,获得目标语料库。The threat keywords are synonymously expanded based on a preset dictionary to obtain a target corpus.
可选地,所述对所述简化网络威胁信息进行深层次断句,获得威胁语句的步骤,包括:Optionally, the step of performing in-depth sentence segmentation on the simplified network threat information to obtain a threat sentence includes:
获取所述简化网络威胁信息中的语句结束符号、并列关系连词以及递进关系连词;Obtaining the sentence end symbols, coordinating relative conjunctions and progressive relative conjunctions in the simplified network threat information;
根据所述语句结束符号、所述并列关系连词以及所述递进关系连词对所述简化网络威胁信息进行深层次断句,获得威胁语句。The simplified network threat information is segmented in depth according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction to obtain a threat sentence.
可选地,所述对所述标准威胁语句进行词汇标记,获得威胁语料库的步骤,包括:Optionally, the step of performing lexical tagging on the standard threat sentence to obtain a threat corpus includes:
获取所述标准威胁语句中各部分的必要分值;obtaining the necessary scores for each part of said standard threat statement;
根据所述必要分值对所述标准威胁语句进行简化,获得威胁语料库。The standard threat sentence is simplified according to the necessary score to obtain a threat corpus.
可选地,所述对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息的步骤,还包括:Optionally, the step of performing text preprocessing on unstructured network threat information to obtain simplified network threat information further includes:
获取非结构化网络威胁信息中的随机信息;Obtain random information in unstructured cyber threat information;
对所述随机信息进行简化处理,获得简化网络威胁信息。The random information is simplified to obtain simplified network threat information.
可选地,所述通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段的步骤之前,还包括:Optionally, before the step of predicting the attack means of the attack purpose through the preset machine learning model, and obtaining the unknown attack means, it also includes:
根据所述目标语料库构建训练集语料库;Constructing a training set corpus according to the target corpus;
根据所述训练集语料库对初始机器学习模型进行训练,获得预设机器学习模型。The initial machine learning model is trained according to the training set corpus to obtain a preset machine learning model.
可选地,所述根据所述目标语料库构建训练集语料库的步骤,包括:Optionally, the step of constructing a training set corpus according to the target corpus includes:
获取所述目标语料库中同义威胁语句的语句数量;Obtain the number of sentences of synonymous threat sentences in the target corpus;
根据所述语句数量从所述目标语料库中选取威胁语句样本;Selecting threat sentence samples from the target corpus according to the number of sentences;
基于所述威胁语句样本构建训练集语料库。A training set corpus is constructed based on the threat sentence samples.
可选地,所述根据所述语句数量从所述目标语料库中选取威胁语句样本的步骤,包括:Optionally, the step of selecting a threat sentence sample from the target corpus according to the number of sentences includes:
根据所述语句数量对所述目标语料库中的威胁语句进行排序;sorting the threat sentences in the target corpus according to the number of sentences;
接收用户基于所述目标语料库输入语义标签;Receiving user input semantic tags based on the target corpus;
根据排序结果和所述语义标签从所述目标语料库中选取威胁语句样本。Select a threat statement sample from the target corpus according to the sorting result and the semantic label.
可选地,所述通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段的步骤,包括:Optionally, the step of predicting the attack means of the attack purpose through a preset machine learning model, and obtaining unknown attack means includes:
获取多平台网络威胁信息;Obtain multi-platform cyber threat information;
基于所述多平台网络威胁信息通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段。Based on the multi-platform network threat information, the attack method is predicted for the attack purpose through a preset machine learning model, and an unknown attack method is obtained.
此外,为实现上述目的,本发明还提出一种网络威胁信息提取设备,所述网络威胁信息提取设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的网络威胁信息提取程序,所述网络威胁信息提取程序配置为实现如上文所述的网络威胁信息提取方法。In addition, in order to achieve the above object, the present invention also proposes a network threat information extraction device, the network threat information extraction device includes a memory, a processor, and a network threat information extraction program stored in the memory and operable on the processor, the network threat information extraction program is configured to implement the network threat information extraction method as described above.
此外,为实现上述目的,本发明还提出一种存储介质,所述存储介质上 存储有网络威胁信息提取程序,所述网络威胁信息提取程序被处理器执行时实现如上文所述的网络威胁信息提取方法。In addition, in order to achieve the above object, the present invention also proposes a storage medium on which a network threat information extraction program is stored, and when the network threat information extraction program is executed by a processor, the network threat information extraction method as described above is realized.
此外,为实现上述目的,本发明还提出一种网络威胁信息提取装置,所述网络威胁信息提取装置包括:语言处理模块、手段预测模块以及信息生成模块;In addition, in order to achieve the above purpose, the present invention also proposes a network threat information extraction device, the network threat information extraction device includes: a language processing module, a means prediction module and an information generation module;
所述语言处理模块,用于对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段;The language processing module is used to perform natural language processing on unstructured network threat information to obtain attack purposes and attack methods;
所述手段预测模块,用于通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段;The method prediction module is used to predict the attack method for the attack purpose through a preset machine learning model, and obtain an unknown attack method;
所述信息生成模块,用于根据所述攻击目的、所述攻击手段以及所述未知攻击手段生成结构化网络威胁信息。The information generating module is configured to generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
可选地,所述语言处理模块,还用于对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息;Optionally, the language processing module is further configured to perform text preprocessing on unstructured network threat information to obtain simplified network threat information;
所述语言处理模块,还用于对所述简化网络威胁信息进行深层次断句,获得威胁语句;The language processing module is further configured to perform in-depth sentence segmentation on the simplified network threat information to obtain threat sentences;
所述语言处理模块,还用于对所述威胁语句进行语义依存分析,获得标准威胁语句;The language processing module is further configured to perform semantic dependency analysis on the threat sentence to obtain a standard threat sentence;
所述语言处理模块,还用于对所述标准威胁语句进行词汇标记,获得威胁语料库;The language processing module is further configured to perform lexical tagging on the standard threat sentence to obtain a threat corpus;
所述语言处理模块,还用于对所述威胁语料库进行同义词扩充,获得目标语料库;The language processing module is further configured to expand synonyms to the threat corpus to obtain a target corpus;
所述语言处理模块,还用于根据所述目标语料库确定攻击目的和攻击手段。The language processing module is further configured to determine an attack purpose and an attack method according to the target corpus.
可选地,所述语言处理模块,还用于对所述威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系;Optionally, the language processing module is further configured to perform semantic dependency analysis on the threat sentence to obtain the dependency relationship between the words in the threat sentence;
所述语言处理模块,还用于根据所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。The language processing module is further configured to standardize the threat sentence according to the dependency relationship to obtain a standard threat sentence.
可选地,所述语言处理模块,还用于获取所述威胁语句中各词汇的词性信息;Optionally, the language processing module is further configured to obtain part-of-speech information of each vocabulary in the threat sentence;
所述语言处理模块,还用于根据所述词性信息和所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。The language processing module is further configured to standardize the threat sentence according to the part-of-speech information and the dependency relationship to obtain a standard threat sentence.
可选地,所述语言处理模块,还用于获取所述威胁语料库中各关键词的出现频率,并根据所述出现频率确定威胁关键词;Optionally, the language processing module is further configured to obtain the frequency of occurrence of each keyword in the threat corpus, and determine the threat keyword according to the frequency of occurrence;
所述语言处理模块,还用于基于预设词典对所述威胁关键词进行同义词扩充,获得目标语料库。The language processing module is further configured to perform synonym expansion on the threat keywords based on a preset dictionary to obtain a target corpus.
可选地,所述语言处理模块,还用于获取所述简化网络威胁信息中的语句结束符号、并列关系连词以及递进关系连词;Optionally, the language processing module is further configured to acquire sentence-ending symbols, coordinating relative conjunctions, and progressive relative conjunctions in the simplified network threat information;
所述语言处理模块,还用于根据所述语句结束符号、所述并列关系连词以及所述递进关系连词对所述简化网络威胁信息进行深层次断句,获得威胁语句。The language processing module is further configured to perform in-depth sentence segmentation on the simplified network threat information according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction, to obtain a threat sentence.
在本发明中,公开了对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段,通过预设机器学习模型对攻击目的进行攻击手段预测,获得未知攻击手段,根据攻击目的、攻击手段以及未知攻击手段生成结构化网络威胁信息;由于本发明基于自然语言处理和预设机器学习模型对非结构化网络威胁信息中攻击者的攻击目的和攻击手段进行自动化识别和提取,从而能够简化网络威胁信息的分析过程,进而能够提高安全防御能力。In the present invention, it is disclosed that natural language processing is performed on unstructured network threat information to obtain the attack purpose and attack means, and the attack means are predicted by a preset machine learning model to obtain unknown attack means, and structured network threat information is generated according to the attack purpose, attack means, and unknown attack means; since the present invention automatically identifies and extracts the attack purpose and attack means of the attacker in the unstructured network threat information based on natural language processing and preset machine learning models, the analysis process of network threat information can be simplified, and the security defense capability can be improved.
附图说明Description of drawings
图1是本发明实施例方案涉及的硬件运行环境的网络威胁信息提取设备的结构示意图;FIG. 1 is a schematic structural diagram of a network threat information extraction device in a hardware operating environment involved in the solution of an embodiment of the present invention;
图2为本发明网络威胁信息提取方法第一实施例的流程示意图;FIG. 2 is a schematic flowchart of a first embodiment of a method for extracting network threat information according to the present invention;
图3为本发明网络威胁信息提取方法第二实施例的流程示意图;FIG. 3 is a schematic flowchart of a second embodiment of a method for extracting network threat information according to the present invention;
图4为本发明网络威胁信息提取方法第三实施例的流程示意图;FIG. 4 is a schematic flowchart of a third embodiment of a method for extracting network threat information according to the present invention;
图5为本发明网络威胁信息提取方法一实施例的语义依存分析示意图;5 is a schematic diagram of semantic dependency analysis of an embodiment of the network threat information extraction method of the present invention;
图6为本发明网络威胁信息提取方法第四实施例的流程示意图;6 is a schematic flowchart of a fourth embodiment of a method for extracting network threat information according to the present invention;
图7为本发明网络威胁信息提取装置第一实施例的结构框图。Fig. 7 is a structural block diagram of the first embodiment of the device for extracting network threat information according to the present invention.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步 说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
参照图1,图1为本发明实施例方案涉及的硬件运行环境的网络威胁信息提取设备结构示意图。Referring to FIG. 1 , FIG. 1 is a schematic structural diagram of a device for extracting network threat information in a hardware operating environment according to an embodiment of the present invention.
如图1所示,该网络威胁信息提取设备可以包括:处理器1001,例如中央处理器(Central Processing Unit,CPU),通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display),可选用户接口1003还可以包括标准的有线接口、无线接口,对于用户接口1003的有线接口在本发明中可为USB接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真(Wireless-Fidelity,Wi-Fi)接口)。存储器1005可以是高速的随机存取存储器(Random Access Memory,RAM)存储器,也可以是稳定的存储器(Non-volatile Memory,NVM),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1, the network threat information extraction device may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein, the communication bus 1002 is used to realize connection and communication between these components. The user interface 1003 may include a display screen (Display). The optional user interface 1003 may also include a standard wired interface and a wireless interface. The wired interface of the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a Wireless-Fidelity (Wi-Fi) interface). The memory 1005 may be a high-speed random access memory (Random Access Memory, RAM) memory, or a stable memory (Non-volatile Memory, NVM), such as a disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
本领域技术人员可以理解,图1中示出的结构并不构成对网络威胁信息提取设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the network threat information extraction device, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.
如图1所示,认定为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及网络威胁信息提取程序。As shown in FIG. 1 , the memory 1005 identified as a computer storage medium may include an operating system, a network communication module, a user interface module, and a network threat information extraction program.
在图1所示的网络威胁信息提取设备中,网络接口1004主要用于连接后台服务器,与所述后台服务器进行数据通信;用户接口1003主要用于连接用户设备;所述网络威胁信息提取设备通过处理器1001调用存储器1005中存储的网络威胁信息提取程序,并执行本发明实施例提供的网络威胁信息提取方法。In the network threat information extraction device shown in FIG. 1 , the network interface 1004 is mainly used to connect to a background server and perform data communication with the background server; the user interface 1003 is mainly used to connect to a user device; the network threat information extraction device calls the network threat information extraction program stored in the memory 1005 through the processor 1001, and executes the network threat information extraction method provided by the embodiment of the present invention.
基于上述硬件结构,提出本发明网络威胁信息提取方法的实施例。Based on the above hardware structure, an embodiment of the network threat information extraction method of the present invention is proposed.
参照图2,图2为本发明网络威胁信息提取方法第一实施例的流程示意图, 提出本发明网络威胁信息提取方法第一实施例。Referring to FIG. 2, FIG. 2 is a schematic flowchart of the first embodiment of the network threat information extraction method of the present invention, and proposes the first embodiment of the network threat information extraction method of the present invention.
在第一实施例中,所述网络威胁信息提取方法包括以下步骤:In the first embodiment, the method for extracting network threat information includes the following steps:
步骤S10:对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段。Step S10: Perform natural language processing on unstructured network threat information to obtain attack purpose and attack means.
应当理解的是,本实施例方法的执行主体可以是具有数据处理、网络通信以及程序运行功能的网络威胁信息提取设备,例如,服务器等,或者是其他能够实现相同或相似功能的电子设备,本实施例对此不加限制。It should be understood that the execution subject of the method in this embodiment may be a network threat information extraction device with functions of data processing, network communication, and program operation, such as a server, or other electronic devices capable of achieving the same or similar functions, which is not limited in this embodiment.
可以理解的是,随着网络威胁攻击的爆炸式增长,威胁分析报告中攻击者所使用的技战术及攻击实现过程(TTP)的相关情报的提取和共享对网络安全建设显得至关重要。然而,由于缺乏标准结构化语言描述和网络威胁报告技战术情报的自动化提取及分析技术,分析复杂和非结构化的威胁分析报告非常耗时费力。It is understandable that with the explosive growth of network threat attacks, the extraction and sharing of relevant information on the techniques and tactics used by the attackers and the attack implementation process (TTP) in the threat analysis report is crucial to the construction of network security. However, due to the lack of standard structured language description and automatic extraction and analysis technology of network threat reporting technical and tactical intelligence, analyzing complex and unstructured threat analysis reports is very time-consuming and laborious.
现有美国MITRE公司基于机器学习(ML)技术实现的TRAM项目及美国北卡罗来纳大学基于信息检索(IR)技术实现的TTPDrill项目能够较为简单的处理威胁分析报告,但由于处理方式都是基于英文报告,无法适用于描述方式复杂多变的中文威胁分析报告,并且这两个项目输出的精准度及误报率都不理想,皆为研究性质,基本无实用性可言。The existing TRAM project implemented by MITER based on machine learning (ML) technology and the TTPDrill project implemented by the University of North Carolina based on information retrieval (IR) technology can process threat analysis reports relatively easily, but because the processing methods are based on English reports, they cannot be applied to Chinese threat analysis reports with complex and changeable descriptions. Moreover, the output accuracy and false alarm rate of these two projects are not ideal.
因此,为了克服上述缺陷,本实施例中基于自然语言处理和预设机器学习模型对非结构化网络威胁信息中攻击者的攻击目的和攻击手段进行自动化识别和提取,从而能够简化网络威胁信息的分析过程,进而能够提高安全防御能力。Therefore, in order to overcome the above-mentioned defects, in this embodiment, based on natural language processing and preset machine learning models, the attacker's attack purpose and attack means in unstructured network threat information are automatically identified and extracted, thereby simplifying the analysis process of network threat information and improving security defense capabilities.
需要说明的是,自然语言处理可以是文本预处理、文本深层次断句、语句语义依存分析、词汇标记化以及词汇同义词扩充中的至少一种,本实施例对此不加以限制。It should be noted that the natural language processing may be at least one of text preprocessing, text deep sentence segmentation, sentence semantic dependency analysis, vocabulary tokenization, and vocabulary synonym expansion, which is not limited in this embodiment.
需要说明的是,攻击目的可以是非结构化网络威胁信息中攻击者采用的技战术,例如,技战术可以是病毒持续在电脑上运行。It should be noted that the attack purpose may be the techniques and tactics adopted by the attacker in the unstructured network threat information, for example, the techniques and tactics may be that the virus continues to run on the computer.
攻击手段可以是非结构化网络威胁信息中攻击者的攻击实现过程,例如,攻击实现过程可以是修改注册表或开机自启动等。The attack means can be the attack implementation process of the attacker in the unstructured network threat information. For example, the attack implementation process can be modifying the registry or booting automatically.
步骤S20:通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段。Step S20: Predict the attack means of the attack purpose through the preset machine learning model, and obtain the unknown attack means.
需要说明的是,预设机器学习模型可以预先设置,在本实施例和其他实施例中,以词袋(Bag of Words,BOW)模型为例进行说明。It should be noted that the preset machine learning model can be preset. In this embodiment and other embodiments, a Bag of Words (BOW) model is used as an example for illustration.
词袋模型将所有词语装进一个袋子里,不考虑其词法和语序的问题,即每个词语都是独立的。The bag-of-words model puts all words into a bag, regardless of their grammar and word order, that is, each word is independent.
步骤S30:根据所述攻击目的、所述攻击手段以及所述未知攻击手段生成结构化网络威胁信息。Step S30: Generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
应当理解的是,根据攻击目的、攻击手段以及未知攻击手段生成结构化网络威胁信息可以是将攻击目的、攻击手段以及未知攻击手段进行聚合,获得结构化网络威胁信息。It should be understood that generating structured network threat information according to attack purpose, attack means and unknown attack means may be to aggregate attack purpose, attack means and unknown attack means to obtain structured network threat information.
在第一实施例中,公开了对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段,通过预设机器学习模型对攻击目的进行攻击手段预测,获得未知攻击手段,根据攻击目的、攻击手段以及未知攻击手段生成结构化网络威胁信息;由于本实施例基于自然语言处理和预设机器学习模型对非结构化网络威胁信息中攻击者的攻击目的和攻击手段进行自动化识别和提取,从而能够简化网络威胁信息的分析过程,进而能够提高安全防御能力。In the first embodiment, it is disclosed that natural language processing is performed on unstructured network threat information to obtain the attack purpose and attack means, and the attack purpose is predicted by the preset machine learning model to obtain unknown attack means, and structured network threat information is generated according to the attack purpose, attack means and unknown attack means; since this embodiment automatically recognizes and extracts the attack purpose and attack means of the attacker in the unstructured network threat information based on natural language processing and preset machine learning models, the analysis process of network threat information can be simplified, and security defense capabilities can be improved.
参照图3,图3为本发明网络威胁信息提取方法第二实施例的流程示意图,基于上述图2所示的第一实施例,提出本发明网络威胁信息提取方法的第二实施例。Referring to FIG. 3 , FIG. 3 is a schematic flowchart of a second embodiment of the network threat information extraction method of the present invention. Based on the first embodiment shown in FIG. 2 above, a second embodiment of the network threat information extraction method of the present invention is proposed.
在第二实施例中,所述步骤S10,包括:In the second embodiment, the step S10 includes:
步骤S101:对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息。Step S101: Perform text preprocessing on unstructured network threat information to obtain simplified network threat information.
应当理解的是,为了提高自然语言处理的处理效果,本实施例中,可以先对非结构化网络威胁信息进行文本预处理,再进行深层次断句,再进行语义依存分析,再进行词汇标记,再进行同义词扩充,以获得非结构化网络威胁信息中攻击者的攻击目的和攻击手段。It should be understood that, in order to improve the processing effect of natural language processing, in this embodiment, text preprocessing may be performed on unstructured network threat information first, then in-depth sentence segmentation, semantic dependency analysis, vocabulary tagging, and synonym expansion may be performed to obtain the attack purpose and attack means of the attacker in the unstructured network threat information.
可以理解的是,对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息可以是获取非结构化网络威胁信息中的随机信息,对所述随机信息进行简化处理,获得简化网络威胁信息。It can be understood that performing text preprocessing on unstructured network threat information to obtain simplified network threat information may be obtaining random information in the unstructured network threat information, performing simplified processing on the random information, and obtaining simplified network threat information.
步骤S102:对所述简化网络威胁信息进行深层次断句,获得威胁语句。Step S102: Perform in-depth sentence segmentation on the simplified network threat information to obtain threat sentences.
应当理解的是,为了简化网络威胁信息,确保每个待分析的句子都独立的表达了一个技战术及实现过程(TTP)。本实施例中,可以对简化网络威胁信息进行深层次断句,获得威胁语句。It should be understood that, in order to simplify the network threat information, it is ensured that each sentence to be analyzed independently expresses a technique, tactics and implementation process (TTP). In this embodiment, the simplified network threat information can be segmented in depth to obtain threat sentences.
可以理解的是,对简化网络威胁信息进行深层次断句,获得威胁语句可以是获取简化网络威胁信息中的语句结束符号、并列关系连词以及递进关系连词,根据语句结束符号、并列关系连词以及递进关系连词对简化网络威胁信息进行深层次断句,获得威胁语句。It can be understood that performing in-depth sentence segmentation on the simplified network threat information to obtain the threat sentence may be to obtain the sentence end symbols, coordinating relative conjunctions, and progressive relative conjunctions in the simplified network threat information, and perform deep sentence segmentation on the simplified network threat information according to the sentence end symbols, parallel relative conjunctions, and progressive relative conjunctions to obtain the threat sentence.
步骤S103:对所述威胁语句进行语义依存分析,获得标准威胁语句。Step S103: Perform semantic dependency analysis on the threat sentence to obtain a standard threat sentence.
应当理解的是,为了将复杂多变的描述方式标准化、统一化,本实施例中,还可以对威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系,并根据依赖关系对威胁语句进行标准化处理,获得标准威胁语句。It should be understood that, in order to standardize and unify complex and changeable description methods, in this embodiment, semantic dependency analysis may also be performed on the threat sentence to obtain the dependency relationship between each vocabulary in the threat sentence, and standardize the threat sentence according to the dependency relationship to obtain a standard threat sentence.
可以理解的是,对所述威胁语句进行语义依存分析,获得标准威胁语句可以是对威胁语句进行语义依存分析,获得威胁语句中各词汇之间的依赖关系,根据依赖关系对威胁语句进行标准化处理,获得标准威胁语句。It can be understood that performing semantic dependency analysis on the threat sentence to obtain a standard threat sentence may be performing semantic dependency analysis on the threat sentence to obtain a dependency relationship between each vocabulary in the threat sentence, and standardizing the threat sentence according to the dependency relationship to obtain a standard threat sentence.
步骤S104:对所述标准威胁语句进行词汇标记,获得威胁语料库。Step S104: Perform vocabulary tagging on the standard threat sentences to obtain a threat corpus.
应当理解的是,为了收敛威胁语料库,本实施例中,可以先获取标准威胁语句中各部分的必要分值,再根据必要分值对标准威胁语句进行简化,获得威胁语料库。It should be understood that, in order to converge the threat corpus, in this embodiment, the necessary scores of each part of the standard threat sentence can be obtained first, and then the standard threat sentence can be simplified according to the necessary score to obtain the threat corpus.
可以理解的是,对标准威胁语句进行词汇标记,获得威胁语料库可以是获取标准威胁语句中各部分的必要分值,根据必要分值对标准威胁语句进行简化,获得威胁语料库。It can be understood that the lexical tagging of the standard threat sentences to obtain the threat corpus may be to obtain the necessary scores of each part of the standard threat sentences, and simplify the standard threat sentences according to the necessary scores to obtain the threat corpus.
步骤S105:对所述威胁语料库进行同义词扩充,获得目标语料库。Step S105: performing synonym expansion on the threat corpus to obtain a target corpus.
应当理解的是,为了提高后续模型预测的召回率,本实施例中,可以对威胁语料库中的高频关键词进行同义词扩展。It should be understood that, in order to improve the recall rate of subsequent model predictions, in this embodiment, synonym expansion may be performed on high-frequency keywords in the threat corpus.
可以理解的是,对威胁语料库进行同义词扩充,获得目标语料库可以是获取威胁语料库中各关键词的出现频率,并根据出现频率确定威胁关键词,基于预设词典对威胁关键词进行同义词扩充,获得目标语料库。It can be understood that the synonym expansion of the threat corpus and the acquisition of the target corpus may be obtained by obtaining the occurrence frequency of each keyword in the threat corpus, determining the threat keywords according to the frequency of occurrence, and performing synonym expansion on the threat keywords based on a preset dictionary to obtain the target corpus.
步骤S106:根据所述目标语料库确定攻击目的和攻击手段。Step S106: Determine the attack purpose and attack means according to the target corpus.
需要说明的是,攻击目的可以是非结构化网络威胁信息中攻击者采用的 技战术,例如,技战术可以是病毒持续在电脑上运行。It should be noted that the attack purpose can be the techniques and tactics adopted by the attacker in the unstructured network threat information, for example, the technique and tactics can be that the virus continues to run on the computer.
攻击手段可以是非结构化网络威胁信息中攻击者的攻击实现过程,例如,攻击实现过程可以是修改注册表或开机自启动等。The attack means can be the attack implementation process of the attacker in the unstructured network threat information. For example, the attack implementation process can be modifying the registry or booting automatically.
在第二实施例中,公开了对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息,对简化网络威胁信息进行深层次断句,获得威胁语句,对威胁语句进行语义依存分析,获得标准威胁语句,对标准威胁语句进行词汇标记,获得威胁语料库,对威胁语料库进行同义词扩充,获得目标语料库,根据目标语料库确定攻击目的和攻击手段;由于本实施例先对非结构化网络威胁信息进行文本预处理,再进行深层次断句,再进行语义依存分析,再进行词汇标记,再进行同义词扩充,以获得非结构化网络威胁信息中攻击者的攻击目的和攻击手段,从而能够提高自然语言处理的处理效果。In the second embodiment, text preprocessing of unstructured network threat information is disclosed, simplified network threat information is obtained, simplified network threat information is subjected to in-depth sentence segmentation, threat sentences are obtained, threat sentences are subjected to semantic dependency analysis, standard threat sentences are obtained, standard threat sentences are lexically marked, threat corpus is obtained, threat corpus is synonymously expanded, target corpus is obtained, and attack purpose and attack means are determined according to the target corpus; since this embodiment first performs text preprocessing on unstructured network threat information, and then performs deep sentence segmentation , then semantic dependency analysis, vocabulary tagging, and synonym expansion to obtain the attack purpose and attack means of the attacker in the unstructured network threat information, so as to improve the processing effect of natural language processing.
在第二实施例中,所述步骤S20,包括:In the second embodiment, the step S20 includes:
步骤S201:获取多平台网络威胁信息。Step S201: Obtain multi-platform network threat information.
应当理解的是,为了获得多维度的未知攻击手段,本实施例中,可以先获取多平台网络威胁信息,再基于多平台网络威胁信息通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段。It should be understood that, in order to obtain multi-dimensional unknown attack methods, in this embodiment, multi-platform network threat information may be obtained first, and then based on the multi-platform network threat information, the attack method is predicted for the attack purpose through a preset machine learning model to obtain unknown attack methods.
需要说明的是,多平台网络威胁信息可以是多个安全平台检测获得的网络威胁信息。It should be noted that the multi-platform network threat information may be network threat information detected and obtained by multiple security platforms.
步骤S202:基于所述多平台网络威胁信息通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段。Step S202: Based on the multi-platform network threat information, predict the attack means of the attack target through a preset machine learning model, and obtain unknown attack means.
需要说明的是,预设机器学习模型可以预先设置,在本实施例和其他实施例中,以词袋(Bag of Words,BOW)模型为例进行说明。It should be noted that the preset machine learning model can be preset. In this embodiment and other embodiments, a Bag of Words (BOW) model is used as an example for illustration.
在第二实施例中,公开了获取多平台网络威胁信息,基于所述多平台网络威胁信息通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段;由于本实施例基于多平台网络威胁信息进行攻击手段预测,从而能够获得多维度的未知攻击手段。In the second embodiment, it is disclosed that multi-platform network threat information is obtained, and based on the multi-platform network threat information, attack means are predicted for the attack purpose through a preset machine learning model to obtain unknown attack means; since this embodiment predicts attack means based on multi-platform network threat information, multi-dimensional unknown attack means can be obtained.
参照图4,图4为本发明网络威胁信息提取方法第三实施例的流程示意图,基于上述图3所示的第二实施例,提出本发明网络威胁信息提取方法的第三实施例。Referring to FIG. 4 , FIG. 4 is a schematic flowchart of a third embodiment of the network threat information extraction method of the present invention. Based on the second embodiment shown in FIG. 3 above, a third embodiment of the network threat information extraction method of the present invention is proposed.
在第三实施例中,所述步骤S101,包括:In the third embodiment, the step S101 includes:
步骤S1011:获取非结构化网络威胁信息中的随机信息。Step S1011: Obtain random information in unstructured network threat information.
应当理解的是,为了降低输入随机性,提高信息处理速度,本实施例中,可以先获取非结构化网络威胁信息中的随机信息,再对随机信息进行简化处理,以获得简化网络威胁信息。It should be understood that, in order to reduce input randomness and improve information processing speed, in this embodiment, random information in unstructured network threat information may be obtained first, and then the random information may be simplified to obtain simplified network threat information.
步骤S1012:对所述随机信息进行简化处理,获得简化网络威胁信息。Step S1012: Simplify the random information to obtain simplified network threat information.
在具体实现中,例如,非结构化网络威胁信息为“一种木马程序,它运行后会在%TEMP%目录下释放正常的TP程序TPHelper.exe和恶意的TPHelperBase.dll以构成dll劫持。”,对非结构化网络威胁信息中的随机信息进行简化处理后,获得简化网络威胁信息“一种木马程序,它运行后会在特定目录下释放正常的TP程序EXE文件和恶意的DLL文件以构成dll劫持”。In a specific implementation, for example, the unstructured network threat information is "a Trojan horse program that releases a normal TP program TPHelper.exe and a malicious TPHelperBase.dll in the %TEMP% directory after it runs to constitute a dll hijacking." After simplifying the random information in the unstructured network threat information, the simplified network threat information "a Trojan horse program that releases a normal TP program EXE file and a malicious DLL file in a specific directory to constitute a dll hijacking" is obtained after it runs.
在第三实施例中,公开了获取非结构化网络威胁信息中的随机信息,对所述随机信息进行简化处理,简化网络威胁信息;由于本实施例先获取非结构化网络威胁信息中的随机信息,再对随机信息进行简化处理,以获得简化网络威胁信息,从而能够降低输入随机性,提高信息处理速度。In the third embodiment, it is disclosed that the random information in the unstructured network threat information is obtained, and the random information is simplified to simplify the network threat information; because this embodiment first obtains the random information in the unstructured network threat information, and then performs simplified processing on the random information to obtain the simplified network threat information, thereby reducing the input randomness and improving the information processing speed.
在第三实施例中,所述步骤S102,包括:In the third embodiment, the step S102 includes:
步骤S1021:获取所述简化网络威胁信息中的语句结束符号、并列关系连词以及递进关系连词。Step S1021: Obtain the sentence-end symbols, coordinating relative conjunctions and progressive relative conjunctions in the simplified network threat information.
应当理解的是,为了简化网络威胁信息,确保每个待分析的句子都独立的表达了一个技战术及实现过程(TTP)。本实施例中,可以对简化网络威胁信息进行深层次断句,获得威胁语句。It should be understood that, in order to simplify the network threat information, it is ensured that each sentence to be analyzed independently expresses a technique, tactics and implementation process (TTP). In this embodiment, the simplified network threat information can be segmented in depth to obtain threat sentences.
需要说明的是,语句结束符号可以包括?!………“‘“’等,并列关系连词可以包括并、和、跟、与等,递进关系连词可以包括不仅、不但、而且、何况等。It should be noted that the statement end symbol can include ? ! ………"' "' etc. Coordinating relative conjunctions can include and, and, with, and, etc. Progressive relative conjunctions can include not only, not only, but also, not to mention, etc.
步骤S1022:根据所述语句结束符号、所述并列关系连词以及所述递进关系连词对所述简化网络威胁信息进行深层次断句,获得威胁语句。Step S1022: Perform in-depth sentence segmentation on the simplified network threat information according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction to obtain a threat sentence.
可以理解的是,根据语句结束符号、并列关系连词以及递进关系连词对简化网络威胁信息进行深层次断句,获得威胁语句可以是获取语句结束符号、并列关系连词以及递进关系连词在简化网络威胁信息中的位置,并根据位置对简化网络威胁信息进行深层次断句,获得威胁语句。It can be understood that the simplified network threat information is segmented in depth according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction, and the threat sentence is obtained by obtaining the position of the sentence end symbol, the coordinate relative conjunction and the progressive relative conjunction in the simplified network threat information, and performing deep sentence segmentation on the simplified network threat information according to the position to obtain the threat sentence.
在第三实施例中,公开了获取简化网络威胁信息中的语句结束符号、并列关系连词以及递进关系连词,根据语句结束符号、并列关系连词以及递进关系连词对简化网络威胁信息进行深层次断句,获得威胁语句;由于本实施例对简化网络威胁信息进行深层次断句,获得威胁语句,从而能够简化网络威胁信息,确保每个待分析的句子都独立的表达了一个技战术及实现过程。In the third embodiment, it is disclosed to obtain the sentence end symbols, coordinating relative conjunctions and progressive relative conjunctions in the simplified network threat information, and perform in-depth sentence segmentation on the simplified network threat information according to the sentence end symbols, coordinating relative conjunctions and progressive relative conjunctions to obtain threat sentences; because this embodiment performs deep sentence segmentation on the simplified network threat information to obtain threat sentences, the network threat information can be simplified, ensuring that each sentence to be analyzed independently expresses a technical strategy and an implementation process.
在第三实施例中,所述步骤S103,包括:In the third embodiment, the step S103 includes:
步骤S1031:对所述威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系。Step S1031: Perform semantic dependency analysis on the threat sentence to obtain the dependency relationship between the words in the threat sentence.
应当理解的是,为了将复杂多变的描述方式标准化、统一化,本实施例中,还可以对威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系,并根据依赖关系对威胁语句进行标准化处理,获得标准威胁语句。It should be understood that, in order to standardize and unify complex and changeable description methods, in this embodiment, semantic dependency analysis may also be performed on the threat sentence to obtain the dependency relationship between each vocabulary in the threat sentence, and standardize the threat sentence according to the dependency relationship to obtain a standard threat sentence.
需要说明的是,依赖关系可以是父子词之间的依赖关系。It should be noted that the dependency relationship may be a dependency relationship between parent and child words.
步骤S1032:根据所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。Step S1032: Perform standardization processing on the threat sentence according to the dependency relationship to obtain a standard threat sentence.
可以理解的是,根据依赖关系对所述威胁语句进行标准化处理后,可以将语句中攻击者所使用的工具、途径方法、空间位置、实施范围、达成效果等进行标准化归位输出。It can be understood that after the threat statement is standardized according to the dependency relationship, the tools, approaches, spatial locations, implementation scope, and achieved effects, etc. used by the attacker in the statement can be standardized and output.
在具体实现中,例如,可以将被字句和把字句标准化、统一化。In a specific implementation, for example, the subject words and the words and sentences can be standardized and unified.
进一步地,为了提高标准化处理的效果,所述步骤S1032,包括:Further, in order to improve the effect of standardization processing, the step S1032 includes:
获取所述威胁语句中各词汇的词性信息;Obtain the part-of-speech information of each vocabulary in the threat sentence;
根据所述词性信息和所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。Standardize the threat sentence according to the part-of-speech information and the dependency relationship to obtain a standard threat sentence.
为了便于理解,参考图5进行说明,但并不对本方案进行限定。图5为语义依存分析示意图,图中,威胁语句为“木马将得到的键盘日志发送到可配置的电子邮箱地址”,ROOT表示根节点,是全句核心节点,mDEPD表示助词,FEAT表示修饰语,PAT表示主语操作的客体(客体发生变化),rPAT表示主语操作的客体(客体发生变化,被动),CONT表示主语操作的客体(客体未发生明显变化),rCONT表示主语操作的客体(客体未发生明显变化,被动语句),mRELA表示连词、介词,如但是,而且等,AGT表示主 语,LOC表示空间,mPUNC表示标点符号。For ease of understanding, description is made with reference to FIG. 5 , but this solution is not limited. Figure 5 is a schematic diagram of semantic dependency analysis. In the figure, the threat sentence is "the Trojan will send the obtained keyboard log to a configurable email address", ROOT represents the root node, which is the core node of the whole sentence, mDEPD represents the auxiliary word, FEAT represents the modifier, PAT represents the object of the subject operation (the object changes), rPAT represents the object of the subject operation (the object changes, passive), CONT represents the object of the subject operation (the object does not change significantly), rCONT represents the object of the subject operation (the object changes). No obvious change in aspect, passive sentence), mRELA represents conjunctions and prepositions, such as but, and etc., AGT represents the subject, LOC represents space, and mPUNC represents punctuation marks.
在第三实施例中,公开了对威胁语句进行语义依存分析,获得威胁语句中各词汇之间的依赖关系,根据依赖关系对威胁语句进行标准化处理,获得标准威胁语句;由于本实施例对威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系,并根据依赖关系对威胁语句进行标准化处理,获得标准威胁语句,从而能够将复杂多变的描述方式标准化、统一化。In the third embodiment, it is disclosed that the semantic dependency analysis is performed on the threat sentence to obtain the dependency relationship between the vocabulary in the threat sentence, and the threat sentence is standardized according to the dependency relationship to obtain the standard threat sentence; because the semantic dependency analysis is performed on the threat sentence in this embodiment, the dependency relationship between each vocabulary in the threat sentence is obtained, and the threat sentence is standardized according to the dependency relationship to obtain the standard threat sentence, so that complex and changeable description methods can be standardized and unified.
在第三实施例中,所述步骤S104,包括:In the third embodiment, the step S104 includes:
步骤S1041:获取所述标准威胁语句中各部分的必要分值。Step S1041: Obtain the necessary scores of each part in the standard threat sentence.
应当理解的是,为了收敛威胁语料库,本实施例中,可以先获取标准威胁语句中各部分的必要分值,再根据必要分值对标准威胁语句进行简化,获得威胁语料库。It should be understood that, in order to converge the threat corpus, in this embodiment, the necessary scores of each part of the standard threat sentence can be obtained first, and then the standard threat sentence can be simplified according to the necessary score to obtain the threat corpus.
需要说明的是,必要分值用于衡量威胁语句中各词汇在语句中的必要程度。It should be noted that the necessary score is used to measure the degree of necessity of each vocabulary in the threat sentence in the sentence.
步骤S1042:根据所述必要分值对所述标准威胁语句进行简化,获得威胁语料库。Step S1042: Simplify the standard threat sentence according to the necessary score to obtain a threat corpus.
在具体实现中,例如,标准威胁语句为“木马将得到的键盘日志发送到可配置的电子邮件地址”,根据标准威胁语句中各部分的必要分值对标准威胁语句进行简化后,获得威胁语料库中的威胁语料“发送键盘日志电子邮件地址”。In a specific implementation, for example, the standard threat sentence is "the Trojan horse will send the obtained keyboard logs to a configurable email address". After the standard threat sentence is simplified according to the necessary scores of each part in the standard threat sentence, the threat corpus "send keyboard logs to an email address" in the threat corpus is obtained.
在第三实施例中,公开了获取标准威胁语句中各部分的必要分值,根据必要分值对标准威胁语句进行简化,获得威胁语料库;由于本实施例先获取标准威胁语句中各部分的必要分值,再根据必要分值对标准威胁语句进行简化,获得威胁语料库,从而能够使威胁语料库收敛。In the third embodiment, it is disclosed that the necessary scores of each part of the standard threat sentence are obtained, and the standard threat sentence is simplified according to the necessary score to obtain the threat corpus; since this embodiment first obtains the necessary score of each part of the standard threat sentence, and then according to the necessary score, the standard threat sentence is simplified to obtain the threat corpus, so that the threat corpus can be converged.
在第三实施例中,所述步骤S105,包括:In the third embodiment, the step S105 includes:
步骤S1051:获取所述威胁语料库中各关键词的出现频率,并根据所述出现频率确定威胁关键词。Step S1051: Obtain the occurrence frequency of each keyword in the threat corpus, and determine the threat keyword according to the occurrence frequency.
应当理解的是,为了提高后续模型预测的召回率,本实施例中,可以对威胁语料库中的高频关键词进行同义词扩展。It should be understood that, in order to improve the recall rate of subsequent model predictions, in this embodiment, synonym expansion may be performed on high-frequency keywords in the threat corpus.
可以理解的是,根据出现频率确定威胁关键词可以是根据出现频率对各关键词进行排序,并根据排序结果确定威胁关键词。It can be understood that determining the threat keyword according to the frequency of occurrence may be sorting the keywords according to the frequency of occurrence, and determining the threat keyword according to the ranking result.
步骤S1052:基于预设词典对所述威胁关键词进行同义词扩充,获得目标语料库。Step S1052: Perform synonym expansion on the threat keywords based on a preset dictionary to obtain a target corpus.
需要说明的是,预设词典可以预先设置,预设词典中可以存储各关键词对应的同义词。It should be noted that the preset dictionary can be preset, and synonyms corresponding to each keyword can be stored in the preset dictionary.
在具体实现中,例如,威胁语料库中的“木马搜集域内账户名”可以扩展为“木马收集域内用户名”、“木马采集域内用户账户”以及“木马收割域内用户登录名”等。In a specific implementation, for example, the "account name in the Trojan horse collection domain" in the threat corpus can be expanded to "user name in the Trojan horse collection domain", "user account in the Trojan horse collection domain", and "user login name in the Trojan horse harvesting domain".
在第三实施例中,公开了获取威胁语料库中各关键词的出现频率,并根据出现频率确定威胁关键词,基于预设词典对威胁关键词进行同义词扩充,获得目标语料库;由于本实施例对威胁语料库中的高频关键词进行同义词扩展,从而能够提高后续模型预测的召回率。In the third embodiment, it is disclosed that the frequency of occurrence of each keyword in the threat corpus is obtained, and the threat keyword is determined according to the frequency of occurrence, and the threat keyword is synonymously expanded based on the preset dictionary to obtain the target corpus; since this embodiment performs synonym expansion on the high-frequency keywords in the threat corpus, the recall rate of subsequent model predictions can be improved.
参照图6,图6为本发明网络威胁信息提取方法第四实施例的流程示意图,基于上述图3所示的第二实施例,提出本发明网络威胁信息提取方法的第四实施例。Referring to FIG. 6 , FIG. 6 is a schematic flowchart of a fourth embodiment of the network threat information extraction method of the present invention. Based on the second embodiment shown in FIG. 3 above, the fourth embodiment of the network threat information extraction method of the present invention is proposed.
在第四实施例中,步骤S201之前,还包括:In the fourth embodiment, before step S201, it also includes:
步骤S110:根据所述目标语料库构建训练集语料库。Step S110: Construct a training set corpus according to the target corpus.
应当理解的是,为了提高预设机器学习模型的精度,本实施例中,可以先对初始机器学习模型进行训练,以获得预设机器学习模型。It should be understood that, in order to improve the accuracy of the preset machine learning model, in this embodiment, the initial machine learning model may be trained first to obtain the preset machine learning model.
可以理解的是,根据目标语料库构建训练集语料库可以是从目标语料库中随机选取训练样本构建训练集语料库。It can be understood that constructing the training set corpus according to the target corpus may be to randomly select training samples from the target corpus to construct the training set corpus.
进一步地,为了使训练集语料库聚类,所述步骤S110,包括:Further, in order to cluster the training set corpus, the step S110 includes:
获取所述目标语料库中同义威胁语句的语句数量;Obtain the number of sentences of synonymous threat sentences in the target corpus;
根据所述语句数量从所述目标语料库中选取威胁语句样本;Selecting threat sentence samples from the target corpus according to the number of sentences;
基于所述威胁语句样本构建训练集语料库。A training set corpus is constructed based on the threat sentence samples.
应当理解的是,同义威胁语句可以是具有相同语义的语句。It should be understood that a synonymous threat statement may be a statement with the same semantics.
可以理解的是,根据语句数量从目标语料库中选取威胁语句样本可以是根据语句数量从大到小对同义威胁语句进行排序,并将排序靠前的预设数量的同义威胁语句作为威胁语句样本。It can be understood that selecting the threat sentence samples from the target corpus according to the number of sentences may be sorting the synonymous threat sentences according to the number of sentences in descending order, and using the top-ranked preset number of synonymous threat sentences as the threat sentence samples.
进一步地,为了提高威胁语句样本的可靠性,所述根据所述语句数量从 所述目标语料库中选取威胁语句样本,包括:Further, in order to improve the reliability of the threat sentence sample, the said threat sentence sample is selected from the target corpus according to the number of sentences, including:
根据所述语句数量对所述目标语料库中的威胁语句进行排序;sorting the threat sentences in the target corpus according to the number of sentences;
接收用户基于所述目标语料库输入语义标签;Receiving user input semantic tags based on the target corpus;
根据排序结果和所述语义标签从所述目标语料库中选取威胁语句样本。Select a threat statement sample from the target corpus according to the sorting result and the semantic label.
应当理解的是,为了提高威胁语句样本的可靠性,还可以由用户输入语义标签来对目标语料库中的各威胁语句进行标记。It should be understood that, in order to improve the reliability of the threat sentence samples, the user may also input semantic tags to mark each threat sentence in the target corpus.
可以理解的是,根据排序结果和语义标签从所述目标语料库中选取威胁语句样本可以是将排序靠前,且语义标签为预设标签的预设数量的同义威胁语句作为威胁语句样本。其中,预设标签可以预先设置,本实施例对此不加以限制。It can be understood that the selection of threat sentence samples from the target corpus according to the sorting results and semantic labels may be a preset number of synonymous threat sentences that are ranked first and have preset semantic labels as threat sentence samples. Wherein, the preset label may be preset, which is not limited in this embodiment.
步骤S120:根据所述训练集语料库对初始机器学习模型进行训练,获得预设机器学习模型。Step S120: Train the initial machine learning model according to the training set corpus to obtain a preset machine learning model.
应当理解的是,根据训练集语料库对初始机器学习模型进行训练,获得预设机器学习模型可以是将训练集语料库中的各威胁语句样本输入初始机器学习模型,并根据输出结果对初始机器学习模型进行调整,以对初始机器学习模型进行训练,获得预设机器学习模型。It should be understood that training the initial machine learning model according to the training set corpus to obtain the preset machine learning model may be inputting each threat sentence sample in the training set corpus into the initial machine learning model, and adjusting the initial machine learning model according to the output results, so as to train the initial machine learning model and obtain the preset machine learning model.
在第四实施例中,公开了根据目标语料库构建训练集语料库,根据训练集语料库对初始机器学习模型进行训练,获得预设机器学习模型;由于本实例预先对初始机器学习模型进行训练,以获得预设机器学习模型,从而提高了预设机器学习模型的精度。In the fourth embodiment, it is disclosed that the training set corpus is constructed according to the target corpus, and the initial machine learning model is trained according to the training set corpus to obtain the preset machine learning model; since this example pre-trains the initial machine learning model to obtain the preset machine learning model, thereby improving the accuracy of the preset machine learning model.
此外,本发明实施例还提出一种存储介质,所述存储介质上存储有网络威胁信息提取程序,所述网络威胁信息提取程序被处理器执行时实现如上文所述的网络威胁信息提取方法。In addition, an embodiment of the present invention also proposes a storage medium, on which a network threat information extraction program is stored, and when the network threat information extraction program is executed by a processor, the network threat information extraction method as described above is implemented.
此外,参照图7,本发明实施例还提出一种网络威胁信息提取装置,所述网络威胁信息提取装置包括:语言处理模块10、手段预测模块20以及信息生成模块30;In addition, referring to FIG. 7 , an embodiment of the present invention also proposes a network threat information extraction device, the network threat information extraction device includes: a language processing module 10, a method prediction module 20, and an information generation module 30;
所述语言处理模块10,用于对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段。The language processing module 10 is configured to perform natural language processing on unstructured network threat information to obtain attack purpose and attack means.
可以理解的是,随着网络威胁攻击的爆炸式增长,威胁分析报告中攻击者所使用的技战术及攻击实现过程(TTP)的相关情报的提取和共享对网络安全建设显得至关重要。然而,由于缺乏标准结构化语言描述和网络威胁报告技战术情报的自动化提取及分析技术,分析复杂和非结构化的威胁分析报告非常耗时费力。It is understandable that with the explosive growth of network threat attacks, the extraction and sharing of relevant information on the techniques and tactics used by the attackers and the attack implementation process (TTP) in the threat analysis report is crucial to the construction of network security. However, due to the lack of standard structured language description and automatic extraction and analysis technology of network threat reporting technical and tactical intelligence, analyzing complex and unstructured threat analysis reports is very time-consuming and laborious.
现有美国MITRE公司基于机器学习(ML)技术实现的TRAM项目及美国北卡罗来纳大学基于信息检索(IR)技术实现的TTPDrill项目能够较为简单的处理威胁分析报告,但由于处理方式都是基于英文报告,无法适用于描述方式复杂多变的中文威胁分析报告,并且这两个项目输出的精准度及误报率都不理想,皆为研究性质,基本无实用性可言。The existing TRAM project implemented by MITER based on machine learning (ML) technology and the TTPDrill project implemented by the University of North Carolina based on information retrieval (IR) technology can process threat analysis reports relatively easily, but because the processing methods are based on English reports, they cannot be applied to Chinese threat analysis reports with complex and changeable descriptions. Moreover, the output accuracy and false alarm rate of these two projects are not ideal.
因此,为了克服上述缺陷,本实施例中基于自然语言处理和预设机器学习模型对非结构化网络威胁信息中攻击者的攻击目的和攻击手段进行自动化识别和提取,从而能够简化网络威胁信息的分析过程,进而能够提高安全防御能力。Therefore, in order to overcome the above-mentioned defects, in this embodiment, based on natural language processing and preset machine learning models, the attacker's attack purpose and attack means in unstructured network threat information are automatically identified and extracted, thereby simplifying the analysis process of network threat information and improving security defense capabilities.
需要说明的是,自然语言处理可以是文本预处理、文本深层次断句、语句语义依存分析、词汇标记化以及词汇同义词扩充中的至少一种,本实施例对此不加以限制。It should be noted that the natural language processing may be at least one of text preprocessing, text deep sentence segmentation, sentence semantic dependency analysis, vocabulary tokenization, and vocabulary synonym expansion, which is not limited in this embodiment.
需要说明的是,攻击目的可以是非结构化网络威胁信息中攻击者采用的技战术,例如,技战术可以是病毒持续在电脑上运行。It should be noted that the purpose of the attack may be the techniques and tactics adopted by the attacker in the unstructured network threat information, for example, the techniques and tactics may be that the virus continues to run on the computer.
攻击手段可以是非结构化网络威胁信息中攻击者的攻击实现过程,例如,攻击实现过程可以是修改注册表或开机自启动等。The attack means can be the attack implementation process of the attacker in the unstructured network threat information. For example, the attack implementation process can be modifying the registry or booting automatically.
所述手段预测模块20,用于通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段。The method prediction module 20 is configured to predict the attack method for the attack purpose through a preset machine learning model, and obtain an unknown attack method.
需要说明的是,预设机器学习模型可以预先设置,在本实施例和其他实施例中,以词袋(Bag of Words,BOW)模型为例进行说明。It should be noted that the preset machine learning model can be preset. In this embodiment and other embodiments, a Bag of Words (BOW) model is used as an example for illustration.
词袋模型将所有词语装进一个袋子里,不考虑其词法和语序的问题,即每个词语都是独立的。The bag-of-words model puts all words into a bag, regardless of their grammar and word order, that is, each word is independent.
所述信息生成模块30,用于根据所述攻击目的、所述攻击手段以及所述未知攻击手段生成结构化网络威胁信息。The information generating module 30 is configured to generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
应当理解的是,根据攻击目的、攻击手段以及未知攻击手段生成结构化 网络威胁信息可以是将攻击目的、攻击手段以及未知攻击手段进行聚合,获得结构化网络威胁信息。It should be understood that generating structured network threat information based on attack purpose, attack means and unknown attack means can be to aggregate attack purpose, attack means and unknown attack means to obtain structured network threat information.
在本实施例中,公开了对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段,通过预设机器学习模型对攻击目的进行攻击手段预测,获得未知攻击手段,根据攻击目的、攻击手段以及未知攻击手段生成结构化网络威胁信息;由于本实施例基于自然语言处理和预设机器学习模型对非结构化网络威胁信息中攻击者的攻击目的和攻击手段进行自动化识别和提取,从而能够简化网络威胁信息的分析过程,进而能够提高安全防御能力。In this embodiment, it is disclosed that natural language processing is performed on unstructured network threat information to obtain the attack purpose and attack means, and the attack purpose is predicted by a preset machine learning model to obtain unknown attack means, and structured network threat information is generated according to the attack purpose, attack means, and unknown attack means; since this embodiment automatically identifies and extracts the attack purpose and attack means of the attacker in the unstructured network threat information based on natural language processing and preset machine learning models, the analysis process of network threat information can be simplified, and security defense capabilities can be improved.
本发明所述网络威胁信息提取装置的其他实施例或具体实现方式可参照上述各方法实施例,此处不再赘述。For other embodiments or specific implementations of the device for extracting network threat information in the present invention, reference may be made to the above-mentioned method embodiments, which will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, but also includes other elements not explicitly listed, or also includes elements inherent to such a process, method, article or system. Without further limitations, an element defined by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system comprising that element.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present invention are for description only, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如只读存储器镜像(Read Only Memory image,ROM)/随机存取存储器(Random Access Memory,RAM)、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is a better implementation. Based on this understanding, the technical solution of the present invention is essentially or contributed to the existing technology. The computer software products are stored in a storage medium (such as Read only Memory Image (ROM)/random access memory (RAM), magnetic magnetic (RAM), and magnetic. In the disc, discs), there are several instructions to enable a terminal device (can be a mobile phone, computer, server, air conditioner, or network device, etc.) to perform the methods described in each embodiment of the present invention.
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and do not limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technical fields, are all included in the scope of patent protection of the present invention.
本发明公开了A1、一种网络威胁信息提取方法,所述网络威胁信息提取方法包括以下步骤:The present invention discloses A1. A method for extracting network threat information. The method for extracting network threat information includes the following steps:
对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段;Perform natural language processing on unstructured network threat information to obtain attack purpose and attack means;
通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段;Predict the attack method of the attack purpose through the preset machine learning model, and obtain the unknown attack method;
根据所述攻击目的、所述攻击手段以及所述未知攻击手段生成结构化网络威胁信息。Generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
A2、如A1所述的网络威胁信息提取方法,所述对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段的步骤,包括:A2. The method for extracting network threat information as described in A1, wherein the steps of performing natural language processing on unstructured network threat information to obtain attack purpose and attack means include:
对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息;Perform text preprocessing on unstructured network threat information to obtain simplified network threat information;
对所述简化网络威胁信息进行深层次断句,获得威胁语句;performing in-depth sentence segmentation on the simplified network threat information to obtain threat sentences;
对所述威胁语句进行语义依存分析,获得标准威胁语句;Performing semantic dependency analysis on the threat sentence to obtain a standard threat sentence;
对所述标准威胁语句进行词汇标记,获得威胁语料库;performing lexical tagging on the standard threat sentence to obtain a threat corpus;
对所述威胁语料库进行同义词扩充,获得目标语料库;performing synonym expansion on the threat corpus to obtain a target corpus;
根据所述目标语料库确定攻击目的和攻击手段。The attack purpose and attack means are determined according to the target corpus.
A3、如A2所述的网络威胁信息提取方法,所述对所述威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系的步骤,包括:A3. The method for extracting network threat information as described in A2, the step of performing semantic dependency analysis on the threat sentence and obtaining the dependency relationship between the words in the threat sentence includes:
对所述威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系;Performing semantic dependency analysis on the threat sentence to obtain the dependency relationship between the words in the threat sentence;
根据所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。The threat sentence is standardized according to the dependency relationship to obtain a standard threat sentence.
A4、如A3所述的网络威胁信息提取方法,所述根据所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句的步骤,包括:A4. The method for extracting network threat information as described in A3, wherein the step of standardizing the threat statement according to the dependency relationship to obtain a standard threat statement includes:
获取所述威胁语句中各词汇的词性信息;Obtain the part-of-speech information of each vocabulary in the threat sentence;
根据所述词性信息和所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。Standardize the threat sentence according to the part-of-speech information and the dependency relationship to obtain a standard threat sentence.
A5、如A2所述的网络威胁信息提取方法,所述对所述威胁语料库进行同义词扩充,获得目标语料库的步骤,包括:A5. The method for extracting network threat information as described in A2, the step of performing synonym expansion on the threat corpus to obtain the target corpus includes:
获取所述威胁语料库中各关键词的出现频率,并根据所述出现频率确定威胁关键词;Obtain the frequency of occurrence of each keyword in the threat corpus, and determine the threat keyword according to the frequency of occurrence;
基于预设词典对所述威胁关键词进行同义词扩充,获得目标语料库。The threat keywords are synonymously expanded based on a preset dictionary to obtain a target corpus.
A6、如A2所述的网络威胁信息提取方法,所述对所述简化网络威胁信息进行深层次断句,获得威胁语句的步骤,包括:A6. The method for extracting network threat information as described in A2, the step of performing in-depth sentence segmentation on the simplified network threat information to obtain a threat sentence includes:
获取所述简化网络威胁信息中的语句结束符号、并列关系连词以及递进关系连词;Obtaining the sentence end symbols, coordinating relative conjunctions and progressive relative conjunctions in the simplified network threat information;
根据所述语句结束符号、所述并列关系连词以及所述递进关系连词对所述简化网络威胁信息进行深层次断句,获得威胁语句。The simplified network threat information is segmented in depth according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction to obtain a threat sentence.
A7、如A2所述的网络威胁信息提取方法,所述对所述标准威胁语句进行词汇标记,获得威胁语料库的步骤,包括:A7. The method for extracting network threat information as described in A2, wherein the step of performing lexical tagging on the standard threat sentence and obtaining a threat corpus includes:
获取所述标准威胁语句中各部分的必要分值;obtaining the necessary scores for each part of said standard threat statement;
根据所述必要分值对所述标准威胁语句进行简化,获得威胁语料库。The standard threat sentence is simplified according to the necessary score to obtain a threat corpus.
A8、如A2所述的网络威胁信息提取方法,所述对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息的步骤,还包括:A8. The method for extracting network threat information as described in A2, wherein the step of performing text preprocessing on unstructured network threat information to obtain simplified network threat information further includes:
获取非结构化网络威胁信息中的随机信息;Obtain random information in unstructured cyber threat information;
对所述随机信息进行简化处理,获得简化网络威胁信息。The random information is simplified to obtain simplified network threat information.
A9、如A2所述的网络威胁信息提取方法,所述通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段的步骤之前,还包括:A9. The method for extracting network threat information as described in A2, before the step of predicting the attack means of the attack purpose through the preset machine learning model, and obtaining the unknown attack means, it also includes:
根据所述目标语料库构建训练集语料库;Constructing a training set corpus according to the target corpus;
根据所述训练集语料库对初始机器学习模型进行训练,获得预设机器学习模型。The initial machine learning model is trained according to the training set corpus to obtain a preset machine learning model.
A10、如A9所述的网络威胁信息提取方法,所述根据所述目标语料库构建训练集语料库的步骤,包括:A10, the method for extracting network threat information as described in A9, the step of constructing a training set corpus according to the target corpus includes:
获取所述目标语料库中同义威胁语句的语句数量;Obtain the number of sentences of synonymous threat sentences in the target corpus;
根据所述语句数量从所述目标语料库中选取威胁语句样本;Selecting threat sentence samples from the target corpus according to the number of sentences;
基于所述威胁语句样本构建训练集语料库。A training set corpus is constructed based on the threat sentence samples.
A11、如A10所述的网络威胁信息提取方法,所述根据所述语句数量从所述目标语料库中选取威胁语句样本的步骤,包括:A11. The method for extracting network threat information as described in A10, the step of selecting a threat sentence sample from the target corpus according to the number of sentences includes:
根据所述语句数量对所述目标语料库中的威胁语句进行排序;sorting the threat sentences in the target corpus according to the number of sentences;
接收用户基于所述目标语料库输入语义标签;Receiving user input semantic tags based on the target corpus;
根据排序结果和所述语义标签从所述目标语料库中选取威胁语句样本。Select a threat statement sample from the target corpus according to the sorting result and the semantic label.
A12、如A1至A11中任一项所述的网络威胁信息提取方法,所述通过预 设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段的步骤,包括:A12. The network threat information extraction method described in any one of A1 to A11, the step of predicting the attack means for the attack purpose through a preset machine learning model, and obtaining unknown attack means, including:
获取多平台网络威胁信息;Obtain multi-platform cyber threat information;
基于所述多平台网络威胁信息通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段。Based on the multi-platform network threat information, the attack method is predicted for the attack purpose through a preset machine learning model, and an unknown attack method is obtained.
本发明还公开了B13、一种网络威胁信息提取设备,所述网络威胁信息提取设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的网络威胁信息提取程序,所述网络威胁信息提取程序被所述处理器执行时实现如上文所述的网络威胁信息提取方法。The present invention also discloses B13, a network threat information extraction device. The network threat information extraction device includes: a memory, a processor, and a network threat information extraction program stored in the memory and operable on the processor. When the network threat information extraction program is executed by the processor, the network threat information extraction method as described above is realized.
本发明还公开了C14、一种存储介质,所述存储介质上存储有网络威胁信息提取程序,所述网络威胁信息提取程序被处理器执行时实现如上文所述的网络威胁信息提取方法。The present invention also discloses C14, a storage medium, on which a network threat information extraction program is stored, and when the network threat information extraction program is executed by a processor, the above-mentioned network threat information extraction method is realized.
本发明还公开了D15、一种网络威胁信息提取装置,所述网络威胁信息提取装置包括:语言处理模块、手段预测模块以及信息生成模块;The present invention also discloses D15, a network threat information extraction device, the network threat information extraction device includes: a language processing module, a means prediction module and an information generation module;
所述语言处理模块,用于对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段;The language processing module is used to perform natural language processing on unstructured network threat information to obtain attack purposes and attack means;
所述手段预测模块,用于通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段;The method prediction module is used to predict the attack method for the attack purpose through a preset machine learning model, and obtain an unknown attack method;
所述信息生成模块,用于根据所述攻击目的、所述攻击手段以及所述未知攻击手段生成结构化网络威胁信息。The information generating module is configured to generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
D16、如D15所述的网络威胁信息提取装置,所述语言处理模块,还用于对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息;D16. The device for extracting network threat information as described in D15, wherein the language processing module is further configured to perform text preprocessing on unstructured network threat information to obtain simplified network threat information;
所述语言处理模块,还用于对所述简化网络威胁信息进行深层次断句,获得威胁语句;The language processing module is further configured to perform in-depth sentence segmentation on the simplified network threat information to obtain threat sentences;
所述语言处理模块,还用于对所述威胁语句进行语义依存分析,获得标准威胁语句;The language processing module is further configured to perform semantic dependency analysis on the threat sentence to obtain a standard threat sentence;
所述语言处理模块,还用于对所述标准威胁语句进行词汇标记,获得威胁语料库;The language processing module is further configured to perform lexical tagging on the standard threat sentence to obtain a threat corpus;
所述语言处理模块,还用于对所述威胁语料库进行同义词扩充,获得目标语料库;The language processing module is further configured to expand synonyms to the threat corpus to obtain a target corpus;
所述语言处理模块,还用于根据所述目标语料库确定攻击目的和攻击手段。The language processing module is further configured to determine an attack purpose and an attack method according to the target corpus.
D17、如D16所述的网络威胁信息提取装置,所述语言处理模块,还用于对所述威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系;D17. The device for extracting network threat information as described in D16, wherein the language processing module is further configured to perform semantic dependency analysis on the threat sentence, and obtain a dependency relationship between each vocabulary in the threat sentence;
所述语言处理模块,还用于根据所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。The language processing module is further configured to standardize the threat sentence according to the dependency relationship to obtain a standard threat sentence.
D18、如D17所述的网络威胁信息提取装置,所述语言处理模块,还用于获取所述威胁语句中各词汇的词性信息;D18. The device for extracting network threat information as described in D17, wherein the language processing module is further configured to obtain part-of-speech information of each vocabulary in the threat sentence;
所述语言处理模块,还用于根据所述词性信息和所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。The language processing module is further configured to standardize the threat sentence according to the part-of-speech information and the dependency relationship to obtain a standard threat sentence.
D19、如D16所述的网络威胁信息提取装置,所述语言处理模块,还用于获取所述威胁语料库中各关键词的出现频率,并根据所述出现频率确定威胁关键词;D19. The device for extracting network threat information as described in D16, wherein the language processing module is further configured to obtain the frequency of occurrence of each keyword in the threat corpus, and determine the threat keyword according to the frequency of occurrence;
所述语言处理模块,还用于基于预设词典对所述威胁关键词进行同义词扩充,获得目标语料库。The language processing module is further configured to perform synonym expansion on the threat keywords based on a preset dictionary to obtain a target corpus.
D20、如D16所述的网络威胁信息提取装置,所述语言处理模块,还用于获取所述简化网络威胁信息中的语句结束符号、并列关系连词以及递进关系连词;D20. The device for extracting network threat information as described in D16, wherein the language processing module is further configured to acquire the sentence-end symbols, coordinating relative conjunctions, and progressive relative conjunctions in the simplified network threat information;
所述语言处理模块,还用于根据所述语句结束符号、所述并列关系连词以及所述递进关系连词对所述简化网络威胁信息进行深层次断句,获得威胁语句。The language processing module is further configured to perform in-depth sentence segmentation on the simplified network threat information according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction, to obtain a threat sentence.

Claims (10)

  1. 一种网络威胁信息提取方法,其特征在于,所述网络威胁信息提取方法包括以下步骤:A method for extracting network threat information, characterized in that the method for extracting network threat information comprises the following steps:
    对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段;Perform natural language processing on unstructured network threat information to obtain attack purpose and attack means;
    通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段;Predict the attack method of the attack purpose through the preset machine learning model, and obtain the unknown attack method;
    根据所述攻击目的、所述攻击手段以及所述未知攻击手段生成结构化网络威胁信息。Generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
  2. 如权利要求1所述的网络威胁信息提取方法,其特征在于,所述对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段的步骤,包括:The network threat information extraction method according to claim 1, wherein the step of performing natural language processing on the unstructured network threat information to obtain the attack purpose and attack means includes:
    对非结构化网络威胁信息进行文本预处理,获得简化网络威胁信息;Perform text preprocessing on unstructured network threat information to obtain simplified network threat information;
    对所述简化网络威胁信息进行深层次断句,获得威胁语句;performing in-depth sentence segmentation on the simplified network threat information to obtain threat sentences;
    对所述威胁语句进行语义依存分析,获得标准威胁语句;Performing semantic dependency analysis on the threat sentence to obtain a standard threat sentence;
    对所述标准威胁语句进行词汇标记,获得威胁语料库;performing lexical tagging on the standard threat sentence to obtain a threat corpus;
    对所述威胁语料库进行同义词扩充,获得目标语料库;performing synonym expansion on the threat corpus to obtain a target corpus;
    根据所述目标语料库确定攻击目的和攻击手段。The attack purpose and attack means are determined according to the target corpus.
  3. 如权利要求2所述的网络威胁信息提取方法,其特征在于,所述对所述威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系的步骤,包括:The method for extracting network threat information according to claim 2, wherein the step of performing semantic dependency analysis on the threat sentence to obtain the dependency relationship between the words in the threat sentence includes:
    对所述威胁语句进行语义依存分析,获得所述威胁语句中各词汇之间的依赖关系;Performing semantic dependency analysis on the threat sentence to obtain the dependency relationship between the words in the threat sentence;
    根据所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。The threat sentence is standardized according to the dependency relationship to obtain a standard threat sentence.
  4. 如权利要求3所述的网络威胁信息提取方法,其特征在于,所述根据所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句的步骤,包括:The method for extracting network threat information according to claim 3, wherein the step of standardizing the threat sentence according to the dependency relationship to obtain a standard threat sentence includes:
    获取所述威胁语句中各词汇的词性信息;Obtain the part-of-speech information of each vocabulary in the threat sentence;
    根据所述词性信息和所述依赖关系对所述威胁语句进行标准化处理,获得标准威胁语句。Standardize the threat sentence according to the part-of-speech information and the dependency relationship to obtain a standard threat sentence.
  5. 如权利要求2所述的网络威胁信息提取方法,其特征在于,所述对所述威胁语料库进行同义词扩充,获得目标语料库的步骤,包括:The method for extracting network threat information according to claim 2, wherein the step of expanding the threat corpus with synonyms to obtain the target corpus includes:
    获取所述威胁语料库中各关键词的出现频率,并根据所述出现频率确定威胁关键词;Obtain the frequency of occurrence of each keyword in the threat corpus, and determine the threat keyword according to the frequency of occurrence;
    基于预设词典对所述威胁关键词进行同义词扩充,获得目标语料库。The threat keywords are synonymously expanded based on a preset dictionary to obtain a target corpus.
  6. 如权利要求2所述的网络威胁信息提取方法,其特征在于,所述对所述简化网络威胁信息进行深层次断句,获得威胁语句的步骤,包括:The method for extracting network threat information according to claim 2, wherein the step of performing in-depth sentence segmentation on the simplified network threat information to obtain threat sentences includes:
    获取所述简化网络威胁信息中的语句结束符号、并列关系连词以及递进关系连词;Obtaining the sentence end symbols, coordinating relative conjunctions and progressive relative conjunctions in the simplified network threat information;
    根据所述语句结束符号、所述并列关系连词以及所述递进关系连词对所述简化网络威胁信息进行深层次断句,获得威胁语句。The simplified network threat information is segmented in depth according to the sentence end symbol, the coordinating relative conjunction and the progressive relative conjunction to obtain a threat sentence.
  7. 如权利要求2所述的网络威胁信息提取方法,其特征在于,所述对所述标准威胁语句进行词汇标记,获得威胁语料库的步骤,包括:The method for extracting network threat information according to claim 2, wherein the step of performing lexical tagging on the standard threat sentence to obtain a threat corpus includes:
    获取所述标准威胁语句中各部分的必要分值;obtaining the necessary scores for each part of said standard threat statement;
    根据所述必要分值对所述标准威胁语句进行简化,获得威胁语料库。The standard threat sentence is simplified according to the necessary score to obtain a threat corpus.
  8. 一种网络威胁信息提取设备,其特征在于,所述网络威胁信息提取设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的网络威胁信息提取程序,所述网络威胁信息提取程序被所述处理器执行时实现如权利要求1至7中任一项所述的网络威胁信息提取方法。A network threat information extraction device, characterized in that the network threat information extraction device comprises: a memory, a processor, and a network threat information extraction program stored on the memory and operable on the processor, and the network threat information extraction program is executed by the processor to implement the network threat information extraction method according to any one of claims 1 to 7.
  9. 一种存储介质,其特征在于,所述存储介质上存储有网络威胁信息提取程序,所述网络威胁信息提取程序被处理器执行时实现如权利要求1至7中任一项所述的网络威胁信息提取方法。A storage medium, wherein a network threat information extraction program is stored on the storage medium, and when the network threat information extraction program is executed by a processor, the network threat information extraction method according to any one of claims 1 to 7 is realized.
  10. 一种网络威胁信息提取装置,其特征在于,所述网络威胁信息提取装置包括:语言处理模块、手段预测模块以及信息生成模块;A network threat information extraction device, characterized in that the network threat information extraction device includes: a language processing module, a means prediction module, and an information generation module;
    所述语言处理模块,用于对非结构化网络威胁信息进行自然语言处理,获得攻击目的和攻击手段;The language processing module is used to perform natural language processing on unstructured network threat information to obtain attack purposes and attack means;
    所述手段预测模块,用于通过预设机器学习模型对所述攻击目的进行攻击手段预测,获得未知攻击手段;The method prediction module is used to predict the attack method for the attack purpose through a preset machine learning model, and obtain an unknown attack method;
    所述信息生成模块,用于根据所述攻击目的、所述攻击手段以及所述未知攻击手段生成结构化网络威胁信息。The information generating module is configured to generate structured network threat information according to the attack purpose, the attack means and the unknown attack means.
PCT/CN2022/113831 2022-01-20 2022-08-22 Cyber threat information extraction method, device, storage medium, and apparatus WO2023138047A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210069037.4A CN116522331A (en) 2022-01-20 2022-01-20 Network threat information extraction method, device, storage medium and apparatus
CN202210069037.4 2022-01-20

Publications (1)

Publication Number Publication Date
WO2023138047A1 true WO2023138047A1 (en) 2023-07-27

Family

ID=87347723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/113831 WO2023138047A1 (en) 2022-01-20 2022-08-22 Cyber threat information extraction method, device, storage medium, and apparatus

Country Status (2)

Country Link
CN (1) CN116522331A (en)
WO (1) WO2023138047A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118611983A (en) * 2024-07-31 2024-09-06 国网江西省电力有限公司信息通信分公司 Behavior gene identification method for network attack organization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858018A (en) * 2018-12-25 2019-06-07 中国科学院信息工程研究所 A kind of entity recognition method and system towards threat information
KR20200118712A (en) * 2019-04-08 2020-10-16 한전케이디엔주식회사 Detecting Method for Cyber Threats using Machine Learning and Natural Language Processing
US20210051162A1 (en) * 2019-08-12 2021-02-18 Bank Of America Corporation Network threat detection and information security using machine learning
CN112769821A (en) * 2021-01-07 2021-05-07 中国电子科技集团公司第十五研究所 Threat response method and device based on threat intelligence and ATT & CK

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858018A (en) * 2018-12-25 2019-06-07 中国科学院信息工程研究所 A kind of entity recognition method and system towards threat information
KR20200118712A (en) * 2019-04-08 2020-10-16 한전케이디엔주식회사 Detecting Method for Cyber Threats using Machine Learning and Natural Language Processing
US20210051162A1 (en) * 2019-08-12 2021-02-18 Bank Of America Corporation Network threat detection and information security using machine learning
CN112769821A (en) * 2021-01-07 2021-05-07 中国电子科技集团公司第十五研究所 Threat response method and device based on threat intelligence and ATT & CK

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XUN, SHUANG: "Research and Implementation of Threat Intelligence Automatic Extraction Model Based on Natural Language Processing", INFORMATION & TECHNOLOGY, CHINA MASTER'S THESES FULL-TEXT DATABASE, no. 4, 5 June 2020 (2020-06-05), CN, pages 1 - 64, XP009547770, DOI: 10.26969/d.cnki.gbydu.2020.000408 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118611983A (en) * 2024-07-31 2024-09-06 国网江西省电力有限公司信息通信分公司 Behavior gene identification method for network attack organization

Also Published As

Publication number Publication date
CN116522331A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN107992585B (en) Universal label mining method, device, server and medium
JP6860070B2 (en) Analytical equipment, log analysis method and analysis program
CN113590739B (en) Model-based semantic text search
KR101893090B1 (en) Vulnerability information management method and apparastus thereof
Long et al. Collecting indicators of compromise from unstructured text of cybersecurity articles using neural-based sequence labelling
US20090248595A1 (en) Name verification using machine learning
WO2019169858A1 (en) Searching engine technology based data analysis method and system
WO2014000576A1 (en) Network searching method and network searching system
US8386238B2 (en) Systems and methods for evaluating a sequence of characters
NL2026782B1 (en) Method and system for determining affiliation of software to software families
US10649970B1 (en) Methods and apparatus for detection of functionality
US10740570B2 (en) Contextual analogy representation
Sworna et al. Apiro: A framework for automated security tools api recommendation
Saxe et al. CrowdSource: Automated inference of high level malware functionality from low-level symbols using a crowd trained machine learning model
CN110928871A (en) Table header detection using global machine learning features from orthogonal rows and columns
KR102516454B1 (en) Method and apparatus for generating summary of url for url clustering
WO2023138047A1 (en) Cyber threat information extraction method, device, storage medium, and apparatus
KR100691400B1 (en) Method for analyzing morpheme using additional information and morpheme analyzer for executing the method
CN115470489A (en) Detection model training method, detection method, device and computer readable medium
US8065283B2 (en) Term synonym generation
Shang et al. A framework to construct knowledge base for cyber security
CN116830099A (en) Inferring information about a web page based on a uniform resource locator of the web page
WO2010132062A1 (en) System and methods for sentiment analysis
JP2006178599A (en) Document retrieval device and method
JP2012141681A (en) Query segment position determining device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22921476

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE