CN110177114B

CN110177114B - Network security threat indicator identification method, equipment, device and computer readable storage medium

Info

Publication number: CN110177114B
Application number: CN201910493265.2A
Authority: CN
Inventors: 郭豪; 洪春华; 梁玉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-06-06
Filing date: 2019-06-06
Publication date: 2021-07-13
Anticipated expiration: 2039-06-06
Also published as: CN110177114A

Abstract

Disclosed is a network security threat index identification method, comprising: acquiring network information; and identifying the network intelligence aiming at least two network security threat indicators to obtain the identification results of the at least two network security threat indicators, wherein the at least two network security threat indicators are divided into at least two groups in advance, different identification modes are adapted to the at least two groups in advance, and the different identification modes comprise identification modes based on a machine learning model. A network security threat indicator identification apparatus, device and computer-readable storage medium are also disclosed.

Description

Network security threat indicator identification method, equipment, device and computer readable storage medium

Technical Field

The present application relates to network security, and more particularly, to a network security threat indicator identification method, apparatus, device, and computer-readable storage medium.

Background

Threat intelligence, by definition Gartner, is some evidence-based knowledge, including context, mechanisms, notations, meanings, and actionable suggestions, that is related to a threat or hazard that an asset is exposed to, or is in contemplation of, and can be used to provide information support for asset-related subjects' response to or processing decisions about the threat or hazard. Most of the threat intelligence referred to in the industry can be regarded as narrow-sense threat intelligence, whose main content is used to identify and detect cyber-security threat Indicators (IOCs), such as file hashes, IP addresses, domain names, etc., and such threat intelligence is referred to herein as cyber-security threat intelligence. Network intelligence, broadly network security threat intelligence and non-network security threat intelligence, may have only network security threat intelligence, or may have only non-network security threat intelligence, or may both. Extracting cyber-security threat intelligence from cyber-intelligence that may have both cyber-security threat intelligence and non-cyber-security threat intelligence is a time consuming and labor intensive task. In addition, cyber-security threat intelligence includes threat information that can be analyzed to identify cyber-security threat Indicators (IOCs), for example, to form a threat intelligence repository or the like for subsequent use. Network security threat intelligence is mainly classified into two categories according to source: internal network security threat intelligence and external network security threat intelligence. Most of the internal network security threat intelligence is collected and processed by analyzing the internal data of the system, and the external network security threat intelligence mainly originates from the shared or paid network security threat intelligence provided by enterprises and/or communities. In view of the closeness and specificity of the internal cyber-security threat intelligence, the internal cyber-security threat intelligence is not generally used when verifying cyber-security threat indicators. In the field of network security, external network security threat information plays an important role in whole network security perception, but the data volume of the external network security threat information is huge, the external network security threat information is difficult to identify one by one in a manual mode, time and labor are wasted, and missing report and false report can exist.

Disclosure of Invention

Embodiments of the present invention provide a network security threat indicator identification method, apparatus, device and computer-readable storage medium, which at least partially solve the above-mentioned problems.

According to a first aspect of the present invention, there is provided a network security threat indicator identification method, including: acquiring network information; and identifying the network intelligence aiming at least two network security threat indicators to obtain the identification results of the at least two network security threat indicators, wherein the at least two network security threat indicators are divided into at least two groups in advance, different identification modes are adapted to the at least two groups in advance, and the different identification modes comprise identification modes based on a machine learning model.

According to one embodiment, prior to said identifying, the method further comprises: classifying the network information into network security threat information or non-network security threat information by utilizing a pre-configured machine learning classification model; and filtering out non-network security threat intelligence in the network intelligence.

According to one embodiment, wherein the preconfigured machine-learned classification model comprises an embedded layer, a convolutional layer, a max-pooling layer, and a fully-connected layer, and wherein the classifying further comprises: acquiring a text of the network intelligence and inputting the text into the embedding layer so as to encode the text into a distributed representation; inputting the distributed representation into a convolutional layer to extract features of text of the network intelligence; inputting the features into the maximum pooling layer to extract a maximum value corresponding to each feature, and splicing the extracted maximum values corresponding to each feature to serve as the output of the maximum pooling layer; and inputting the output of the maximum pooling layer into the full-connection layer, and obtaining the classification result based on the output of the full-connection layer.

According to one embodiment, the method further comprises, after the classifying and the filtering: judging whether the network information classified as the network security threat information is effective network security threat information or not by utilizing a pre-configured machine learning judgment model; and filtering out invalid cyber-security threat intelligence in the cyber intelligence.

According to one embodiment, wherein the machine learning decision model comprises an embedding layer and a random forest layer, and wherein the deciding comprises: inputting text of network intelligence classified as the cyber-security threat intelligence into the embedding layer to encode it into a distributed representation; and inputting the distributed representation to a random forest layer to judge whether the network intelligence classified as the network security threat intelligence is effective network security threat intelligence according to the output of the random forest layer.

According to one embodiment, the different identification manners further include: a word bank-based recognition mode, wherein words in the network information are matched with words in a pre-established word bank, and words capable of being matched are used as recognition results; and a rule-based identification mode, wherein the preset rule is utilized to analyze the text of the network intelligence, and the content conforming to the rule is taken as an identification result.

According to one embodiment, the method further comprises: displaying the recognition result; and correcting the recognition result when receiving a correction instruction for the recognition result.

According to one embodiment, wherein said displaying said recognition result comprises: and displaying the identification result through a web page.

According to one embodiment, wherein a first group of the at least two groups comprises the following kinds of cyber-security threat indicators: affecting a region and a platform, and wherein said identifying comprises: and aiming at any kind of network security threat indexes in the first group, identifying the network information by utilizing a word bank-based identification mode, wherein the word bank-based identification mode is to match words in the network information with words in a pre-established word bank and take the words capable of being matched as identification results.

According to one embodiment, wherein a second group of the at least two groups comprises the following kinds of cyber-security threat indicators: basic data files, registries, services and startup items of a program, and wherein said identifying comprises: and aiming at any kind of network security threat indexes in the second group, identifying the network information by using a rule-based identification mode, wherein the rule-based identification mode is to analyze the network information by using a preset rule and take the content meeting the rule as an identification result.

According to one embodiment, wherein a third group of the at least two groups comprises the following kinds of cyber-security threat indicators: trojan family, threat organization, threat object, threat approach, vulnerability usage, file hash, IP address, domain name, file information, global resource locator, mutex lock, and mailbox, and wherein said identifying comprises: and aiming at any kind of network security threat indexes in the third group, identifying the network intelligence by utilizing an identification mode based on a machine learning model.

According to one embodiment, the method further comprises: statistically analyzing the identification result to obtain the relevance between the types of the network security threat indicators; and/or outputting a network security threat alert to a user based on the identification result.

According to one embodiment, wherein said obtaining network intelligence comprises: external intelligence sources are crawled through a crawler technology to obtain network intelligence.

According to one embodiment, the method further comprises: if the recognition result is recognized by using a recognition mode based on a machine learning model, further training the machine learning model by using the corrected recognition result; and/or updating the word stock by using the corrected recognition result under the condition that the recognition result is recognized by using a word stock-based recognition mode, wherein the word stock-based recognition mode is to match words in the network information with words in a pre-established word stock and take matched words as the recognition result.

According to one embodiment, the machine learning model comprises a first embedding layer, a second embedding layer, a first layer of bidirectional long-term memory layer, a second layer of bidirectional long-term memory layer, a feedforward neural network layer and an optimization layer; and wherein identifying the network intelligence using the machine learning model comprises: inputting a next level element of a word of the network intelligence into the first embedding layer to be encoded into a distributed representation of the next level element; inputting the distributed representation of the next-level element into the first layer of bidirectional long-short time memory layer to obtain the output of the first layer of bidirectional long-short time memory layer; inputting a word of the network intelligence into the second embedding layer to be encoded into a distributed representation of the word; splicing the output of the first layer of bidirectional long-term and short-term memory layer with the distributed representation of the word and then inputting the spliced output into the second layer of bidirectional long-term and short-term memory layer to obtain the output of the second layer of bidirectional long-term and short-term memory layer; inputting the output of the second layer bidirectional long-time and short-time memory layer into a feedforward neural network layer with a hidden layer to obtain the probability of each network security threat index in the word; and inputting the probability into the optimization layer, and obtaining output which is a network security threat index in the network information.

According to a second aspect of the present invention, there is provided a network security threat indicator identification apparatus, comprising: an acquirer configured to acquire network intelligence; and an identifier configured to identify the network intelligence for at least two network security threat indicators to obtain an identification result of the at least two network security threat indicators, wherein the at least two network security threat indicators are pre-divided into at least two groups, different identification modes are pre-adapted for the at least two groups, and the different identification modes include an identification mode based on a machine learning model.

According to a third aspect of the present invention, there is provided a network security threat indicator identifying apparatus, comprising: a processor; and a memory configured to have computer-executable instructions stored thereon, which when executed in the processor, cause the processor to implement the method of the first aspect and any embodiment thereof described above.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium, wherein instructions are stored therein, which when run on a computer, cause the computer to implement the method of the first aspect and any embodiment thereof.

According to the embodiment, an automatic identification mode of the network security threat sign is adopted, so that time and labor consumption of manual identification are avoided. Because at least two kinds of network security threat signs needing to be identified are grouped according to the identification mode, and are identified by the machine learning model-based identification mode, the word bank-based identification mode and the rule-based identification mode corresponding to different groups, the network security threat signs can be advantageously identified by utilizing the characteristics of different kinds of network security threat indexes, so that the limitation of adopting a single identification mode is avoided, for example, the rule-based identification mode is invalid to some kinds of network security threat indexes (such as attack organizations) and cannot be effectively identified, but the machine learning model-based identification mode can be effectively identified instead, and the problems of missing report and false report are solved to a certain extent. Through interaction with a WEB page, for example, the machine learning model can continuously receive a result fed back by a front end, so that the machine learning model is continuously trained and optimized, the identification accuracy of the machine learning model is continuously improved, a word bank can be continuously updated, and the problems of missing report and false report are solved to a certain extent. In addition, the adopted machine learning model can identify the characteristics of the context, so that the non-malicious network security threat indexes appearing in the report can be distinguished, and the problems of report missing and report false are solved to a certain extent. In the embodiment, the pre-configured machine learning classification model is utilized to classify the network information to be divided into the network security threat information and the non-network security threat information and remove the non-network security threat information, so that the manpower can be further liberated, manual screening is not needed, and the method can be flexibly applied to various information sources. In a further embodiment, a pre-configured machine learning judgment model is used to judge whether the network information classified as the network security threat information is effective network security threat information, and further remove the ineffective network security threat information in the network information, thereby further improving the identification efficiency.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates a flow diagram of a network security threat indicator identification method, according to an embodiment of the invention.

FIG. 2 illustrates an example of a structure of a machine learning model according to an embodiment of the present invention.

FIG. 3 illustrates one example of the structure and processing of a machine learning classification model according to an embodiment of the present invention.

FIG. 4 illustrates one example of the structure and processing of a machine learning judgment model according to an embodiment of the present invention.

FIG. 5 illustrates one output example of a machine learning model according to an embodiment of the present invention.

Fig. 6a illustrates a display interface of recognition results according to an embodiment of the present invention.

Fig. 6b illustrates another display interface of the recognition result according to an embodiment of the present invention.

FIG. 7 illustrates a block diagram of an apparatus for network security threat indicator identification, in accordance with an embodiment of the invention.

FIG. 8 illustrates a hardware implementation environment diagram according to an embodiment of the invention.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Cyber security threat intelligence, as referred to herein, refers to information that includes threat information that may be identified to identify a cyber security threat Indicator (IOC). Network intelligence, as referred to herein, generally refers to both cyber-security threat intelligence and non-cyber-security threat intelligence, possibly with only cyber-security threat intelligence, or possibly with only non-cyber-security threat intelligence, or possibly both. Network security threat indicators, as referred to herein, refer to evidence data that identifies potentially malicious activity in a system or network.

FIG. 1 illustrates a flow diagram of a network security threat indicator identification method, according to an embodiment of the invention. It should be noted that the order of execution of the steps described below does not represent the order of execution of the steps themselves, and that the steps may be executed in any reasonable order, either sequentially or simultaneously, unless the execution of a step is preceded by a step. A network security threat indicator identification method according to an embodiment of the invention starts in step 101. In which network intelligence is obtained, in one example, external network security threat intelligence sources may be crawled through crawler technology to obtain network intelligence. The external cyber-security threat intelligence sources are typically selected from cyber-threat intelligence sharing platforms, such as the shared cyber intelligence on website www.freebuf.com. Of course, in another example, the network intelligence may also be doped with non-network security threat intelligence.

Then, in step 106, for at least two network security threat indicators, the network intelligence is identified by using a pre-adapted identification mode to obtain identification results of the at least two network security threat indicators. The at least two types of cyber-security threat indicators are pre-divided into at least two groups, and different identification modes are pre-adapted to the at least two groups, and in one example, 18 types of cyber-security threat indicators are selected, which are: trojan family, threat organization, threat object, affected area, threat approach, vulnerability, platform, file hash, IP address, domain name, file information, global resource locator, basic data file of program, mutual exclusion lock, registry, service, startup item and mailbox. Trojan families such as Trickbot, jasperloader, artardownloader, bulehero, etc. Examples of threat organizations are organizations that initiate threats such as APT10, flower of tendrilled vine, muddywater, and the like. The threat object is a target of a threat such as a financial department, a government agency, an educational institution, or the like. The area of influence refers to the geographic extent of the threat effect. The threat approach is, as the name implies, a means adopted by the threat, such as Distributed Denial of Service (DDoS), and an attacker combines a plurality of computers as an attack platform by means of a client/server technology to launch DDoS attack on one or more targets, thereby exponentially improving the power of Denial of Service attack. Threat maneuvers are also exploits, spoofed files, malicious mail, Windows PowerShell (a command line shell and scripting environment), Phishing (Phishing), etc. Phishing refers to a fraudster usually masquerading himself as a credible brand such as an internet bank, an online retailer and a credit card company, and carrying out phishing activities by using deceptive emails and forged Web sites, and fraudsters often reveal own private data such as credit card numbers, bank card accounts, identity card numbers and the like. The vulnerability refers to the utilized vulnerability, such as the vulnerability with CVE (Common Vulnerabilities & expositions) numbers CVE-2017-. Platform refers to the platform that the threat is directed against, such as windows, linux, Mac OS, etc. The file Hash (i.e. Hash) is also called a file signature, and even if one bit in a file is changed, the file Hash is different, so that the file Hash can be used for distinguishing different files, the more common file Hash algorithms include MD5 and SHA-1, and 12 file hashes are listed below the right side of fig. 6 b. IP addresses such as 65.182.100.42, 81.88.24.211, 103.219.22.63, etc. Domain names such as:

breed.wanttobea.com、

zzi.aircargox.com、

nono.littlebodiesbigsouls.com、

tribunaledinapoli.recsinc.com、

tribunaledinapoli.prepperpillbox.com、

tribunaledinapoli.lowellunderwood.com、

tributedinapoli, rntman, com, and the like.

File information such as kernel.dll, winerv.exe, rudll32.exe, rtegre.exe, wprgxyyeqd 79.exe, and the like. Global resource locator (URL) such as:

http://planasolutions.com/wordpress/wp-content/nq3sqe-x875-tt/、

http://mattheweidem.com/ikn0owm-g991-syvw/、

http:// irose. com/lpo7qje-wg 556-pnv/etc.

Basic Data files (PDB, Program Data Base) of the Program are, for example:

C:\Users\CN_ide\Desktop\TSSL_v3.2.7_BypassSymantec_20180528\TClient\Release\FakeRun.pdb、

D:\Soft\DevelopedCode_Last\yty2.0\Release\C++\Setup.pdb、

c \ users \803\ documents \ visual studio2010\ Projects \ hellpdll \ Release \ hellpdll. The mutually exclusive locks are such as {531511FA-190D-5D85-8A4A-279F2F592CC7}, etc. The registry is for example:

Software\Microsoft\Office\12.0\Word\Resiliency\DisabledItems、

Software\Microsoft\Office\12.0\Word\Resiliency\StartupItems、

Software\Microsoft\Office\11.0\Word\Resiliency\DocumentRecovery、

software, Microsoft, Office, 11.0, Word, Resilience, disabledItems, and the like. Examples of the start items include memory optimizer. lnk, SLVjiAEwaK. url, SMTPLoader. lnk, and the like. Such as ndisproxy-mn, Wmmvsvc, SCardPrv, etc. The mailbox is for example: ijuqodisounvib 98@ o2.pl, sayanwalsworth96@ protomail. com, abbschevis @ protomail. com, cotteakela @ protomail. com, aperywsqaroci @ o2.pl, asuxidoruraep @ o2.pl, couweizotofo @ o2.pl, dharmaparrack @ protomail. co, and the like.

In one example, the 18 kinds of network security threat indicators are divided into three groups, each group is pre-adapted with an identification mode, and the pre-adapted identification modes are different for the three groups. The first group comprises an influence region and a platform, the second group comprises basic data files, a registry, services and starting items, and the third group comprises the rest of Trojan horse families, threat organizations, threat objects, threat methods, vulnerabilities, file hashes, IP addresses, domain names, file information, global resource locators, mutual exclusion locks and mailboxes. The basis for grouping is an identification method, and for the first group, the network information is identified by a thesaurus-based identification method in step 1061, for the second group, the network information is identified by a rule-based identification method in step 1062, and for the third group, the network information is identified by a machine learning model in step 1063.

All words in the network information whole text are directly matched with words in a pre-established word bank based on the recognition mode of the word bank, and the word bank including the word bank of the platform and the word bank of the affected area can be obtained through a public data source of the network security threat index and can be established or modified manually. For the two types, the platform and the affected area are relatively stable and can be enumerated, and the method is suitable for recognition by adopting a word stock mode. The word stock of the platform includes, for example, "Linux", "Windows", etc., and the word stock of the affected area includes, for example, "China", "US", "Japan", etc., although the word stock may also include corresponding chinese or other national languages). Can be matched, namely identified as a corresponding network security intelligence index, such as a platform or an affected area.

The rule-based identification method analyzes the whole network information by using a predetermined rule (for example, a rule for identifying a basic data file, a rule for identifying a registry, a rule for identifying a service, and a rule for identifying a startup item), and uses the content conforming to the rule as an identification result, for example, the basic data file, the registry, the service, or the startup item, and the rules of these types are relatively fixed and do not need to be maintained and changed frequently. For example, the rules identifying the base data files may be expressed, for example, in regular expressions as:

r'\b([A-Za-z0-9-_\.]+\.(pdb))\b'

where r "leads" a native string that ends with. pdb, which can be preceded by upper and lower case letters and any one or more than one of the enumerated symbols, \ b represents a boundary. The regular expression is generic to a variety of programming environments or may require minor modifications for certain specific environments.

The machine learning model in the machine learning model-based recognition scheme may take a variety of different configurations. FIG. 2 illustrates an example of a structure of a machine learning model according to an embodiment of the present invention. The machine learning model comprises a first embedding layer, a second embedding layer, a first two-way long-short-time memory layer, a second two-way long-short-time memory layer, a feedforward neural network layer and an optimization layer. Each two-way long-short time memory layer is composed of a Recurrent Neural Network (RNN) element with the type of long-short time memory (LSTM). Identifying the network intelligence using the machine learning model includes the following operations. Inputting a next level element of a word of the network intelligence into the first embedding layer to be encoded into a distributed representation of the next level element; inputting the distributed representation of the next-level element into the first layer of bidirectional long-short time memory layer to obtain the output of the first layer of bidirectional long-short time memory layer; inputting the words of the network intelligence into the second embeddingA layer to encode as a distributed representation of the word; splicing the output of the first layer of bidirectional long-term and short-term memory layer with the distributed representation of the word and then inputting the spliced output into the second layer of bidirectional long-term and short-term memory layer to obtain the output of the second layer of bidirectional long-term and short-term memory layer; inputting the output of the second layer bidirectional long-time and short-time memory layer into a feedforward neural network layer with a hidden layer to obtain the probability of each network security threat index in the word; and inputting the probability into the optimization layer, and obtaining output which is a network security threat index in the network information. Referring to FIG. 2, input X_ijIs the word X_i(where i =1, … …, n, j =1, … …, symbol X_iNumber of characters in (c) such as morpheme (prefix or suffix), root word, word X_iFrom the network intelligence to be recognized, Vc is the mapping of the next level element of a word to its distributed representation (word vector), here as the first embedding layer, X_ijAnd inputting the data into a first bidirectional long-time and short-time memory layer after being mapped by Vc. V_TIs the word X_i(where i =1, … …, n, j =1, … …, symbol X_iNumber of characters) to its distributed representation (i.e., word vector), referred to herein as the second embedding layer. Output and word X of the first two-way long-and-short-term memory layer_iPassing through V_TE is obtained by output splicing after mapping_i(where i =1, … …, n) as the input of the second layer of bidirectional short-time memory layer, and then obtaining the output d of the second layer of bidirectional short-time memory layer_i(where i =1, … …, n), the probability vector a is obtained through a feedforward neural network with a hidden layer_i(where i =1, … …, n), a_nIs the probability that the nth word has the tth IOC. With a_iIs an input, and then an output y is obtained_i(where i =1, … …, n), i.e. the network security threat indicator in the identified word, for example at a_iThe IOC with the highest probability. In one example, the training data set is derived from the text of 200 APT (Advanced Persistent threads) reports that are manually labeled. The training data set is pre-processed (e.g., special character replacement, segmentation, etc.) and then input into the machine learning model shown in FIG. 2 for trainingAnd (5) after training is finished, the method can be used for identifying network security threat indexes. The F1 score (F1 score is an index used in statistics to measure the accuracy of the two classification models, and gives consideration to the accuracy and the recall rate of the classification models) of the network security threat index identified by the machine learning model through tests, the F1 score can be regarded as a weighted average of the accuracy and the recall rate of the models, the maximum value of the weighted average is 1, and the minimum value of the weighted average is 0) and is about 0.9.

It should be noted that the various ways of identification may involve matching or entering text, and does not mean that the network intelligence must be in the form of text, but may be in any other form, such as pictures, audio, etc., that may be converted to text for matching or entering, for example.

The recognition mode based on the machine learning model is more flexible and suitable for recognizing various targets, and is more suitable for network security threat index types which cannot be well recognized by the recognition mode based on the rules and the word banks, or the recognition mode based on the machine learning model which needs a great deal of effort to maintain the word banks or the rules.

The inventor of the invention realizes different characteristics of different types of network security threat indexes and adaptability to recognition modes based on word banks, rules or machine learning models, so that by adopting the grouping adaptation mode, compared with a single recognition mode or blind and diverse recognition modes which do not have the different characteristics of the different types of network security threat indexes, the network security threat index recognition can be more efficiently and accurately carried out.

Optionally, after step 101 and before

step

1061 and 1063, in step 102, considering that there is non-cyber security threat intelligence in the obtained cyber intelligence, the cyber intelligence obtained in step 101 is classified into cyber security threat intelligence or non-cyber security threat intelligence by using a pre-configured machine learning classification model, and in step 103, the non-cyber security threat intelligence in the cyber intelligence is filtered out. Therefore, the manpower can be further liberated, manual screening is not needed, and the method can be flexibly applied to various information sources. A preconfigured machine learning classification model and processing example is shown in fig. 3, for example. In fig. 3, a preconfigured machine learning classification model 300 includes an embedding layer 301, a convolutional layer 302, a max pooling layer 303, and a fully connected layer 304. The classification includes: firstly, obtaining the text of the network intelligence and inputting the text into the embedded layer 301 to encode the text into a distributed representation, then inputting the distributed representation into the convolutional layer 302 to extract the characteristics of the text of the network intelligence, then inputting the characteristics into the maximum pooling layer 303 to extract the maximum value corresponding to each characteristic, and splicing the extracted maximum values corresponding to each characteristic to be used as the output of the maximum pooling layer. Finally, the output of the max pooling layer is input to the fully-connected layer 304, and the result of the classification can be obtained based on the output of the fully-connected layer. The machine learning classification model may be trained using, for example, titles and keywords for 1 million cyber-security threat intelligence and ten thousand non-cyber-security threat intelligence.

Optionally, after step 103 and before

step

1061 and 1063, in step 104, a pre-configured machine learning judgment model is used to judge whether the cyber intelligence classified as the cyber security threat intelligence is valid cyber security threat intelligence, for example, there is a case that the same word has different meanings in different contexts, so that sometimes it is a cyber security threat index and sometimes not, i.e., it is invalid cyber security threat intelligence. By such a determination, the invalid cyber-security threat intelligence in the cyber intelligence can be filtered out in step 105. A preconfigured machine learning judgment model and processing example is shown in fig. 4, for example. In fig. 4, the machine learning judgment model 400 includes an embedding layer 401 and a random forest layer 402. The judgment comprises the following steps: first, a text of the network intelligence classified as the cyber-security threat intelligence is input to the embedding layer 401 to be encoded into a distributed representation, and then the distributed representation is input to the random forest layer 402 to judge whether the network intelligence classified as the cyber-security threat intelligence is effective cyber-security threat intelligence according to an output of the random forest layer. Through these steps, it is possible to further help improve the recognition efficiency. The machine learning judgment model may be trained using 2000 artificially labeled cyber-security threat intelligence messages, of which 800 are valid cyber-security threat intelligence messages and 1200 are invalid cyber-security threat intelligence messages.

It should be noted that the text of the network intelligence obtained herein can be in various languages, and in one example, they can be distinguished by language, and processed with a machine learning classification model, a machine learning judgment model, and a machine learning model for identifying the IOC trained with corresponding different languages.

According to the embodiment, the identification mode based on the word bank, the rule or the machine learning model is an automatic identification mode of the network security threat sign, and time and labor consumption of manual identification is avoided. Because at least two kinds of network security threat signs needing to be identified are grouped according to the identification mode, and are identified by the machine learning model-based identification mode, the word bank-based identification mode and the rule-based identification mode which correspond to different groups, the characteristics of different kinds of network security threat signs can be utilized to be advantageously identified, the limitation of adopting a single identification mode is avoided, for example, the rule-based identification mode is ineffective to some kinds of network security threat signs (such as attack organizations) and cannot be effectively identified, but the machine learning model-based identification mode can be effectively identified instead, and the problems of missing report and false report are solved to a certain extent. The adopted machine learning model can identify the characteristics of the context, so that the non-malicious network security threat indexes appearing in the report can be distinguished, and the problems of report missing and false report are solved to a certain extent. In the embodiment, the pre-configured machine learning classification model is utilized to classify the network information to be divided into the network security threat information and the non-network security threat information and remove the non-network security threat information, so that the manpower can be further liberated, manual screening is not needed, and the method can be flexibly applied to various information sources. In a further embodiment, a pre-configured machine learning judgment model is used to judge whether the network information classified as the network security threat information is effective network security threat information, and further remove the ineffective network security threat information in the network information, thereby further improving the identification efficiency.

FIG. 5 illustrates one output example of a machine learning model according to an embodiment of the present invention. The trained model shown in FIG. 2 is used to identify network intelligence, such as that obtained from www.freebuf.com, to obtain the output shown in FIG. 5, where the first column is a word in the network intelligence, such as 194.70.136, and the last column is the identified network security threat indicator, such as B-IP, which refers to IP address, B-DOMAIN, which refers to DOMAIN name, and B-FILEASH, which refers to file hash.

Optionally, in step 107, the identification results of the at least two kinds of network security threat indicators are displayed. The display may be displayed through a web page. Fig. 6a illustrates a display interface of the recognition result showing a plurality of matters related to the recognized network intelligence according to an embodiment of the present invention. Wherein, the first column GUID is the unique identification of the acquired network information, the second column is the title thereof, the third column is the labeling state of the network information, the fourth column is the operator, and the fifth column is the crawling time of the network information; the sixth column is the manual check time.

Fig. 6b illustrates another display interface of the recognition result according to an embodiment of the present invention. The operation interface capable of manual checking and modification is shown, wherein the network security threat indicators to be labeled are on the right side, before manual checking, the recognition result of step 106 is loaded into the corresponding network security threat indicators on the right side, and by clicking on the corresponding network security threat indicators, the loaded recognition result can be displayed or hidden, and the recognition result can be manually added, deleted, searched and modified. On the left side there is a large text box in which the text of the network intelligence is presented, in which selectable text can be displayed.

Optional step 108 is discussed below, in which the computer corrects the recognition result in the event of a received indication of a correction to the recognition result, for example, by receiving an indication from a user via a user interface. For example, when a word needs to be manually labeled as a cyber-security threat indicator, the word needs to be manually selected, then the "label" button above is clicked, and then the kind of cyber-security threat indicator to be labeled is selected from a menu displayed later. Of course, modification or deletion can also be made manually in a similar manner by means of other menus or buttons, such marking, modification or deletion being embodied in the right-hand corresponding fields. After the manual marking is finished, a 'save modify' button is clicked. In one example, the save is not only saved locally, but also submitted to a server for saving. This completes the correction of the recognition result of step 106. Also in the interface shown in fig. 6b are the following buttons: a "reset" button, i.e., abandoning all manual labeling and modification, and resetting to the automatic recognition result in step 106; the "delete all labels" button deletes all labels in the current network intelligence, including automatic labels and manual labels. There may of course be other buttons to assist in manually adding, deleting, looking up and modifying the computer's recognition results.

The results of the corrections obtained in step 108 can be fed back to the computer to optimize its recognition, in particular machine learning model-based recognition and lexicon-based recognition. Therefore, optionally, in step 109, in the case that the recognition result is recognized by using a recognition method based on a machine learning model, the machine learning model is further trained by using the result corrected in step 108; and/or updating the lexicon with the result corrected in step 108 in case the recognition result is recognized by a lexicon-based recognition approach.

Through interaction with, for example, a WEB page, the machine learning model can continuously receive a result fed back by the front end, so that the machine learning model is continuously trained and optimized, the identification accuracy of the machine learning model is continuously improved, the word bank can be continuously updated, and the problems of missing report and false report are solved to a certain extent.

Whether the recognition result obtained from step 106 by the computer or the corrected result obtained from step 108 can be used as a basis for outputting the network security threat warning to the user, or can be used as a further analysis, for example, a statistical analysis of the association between the categories of the network security threat indicators, such as a certain attack organization, what attack method is commonly used by the attack organization, what attack object is attacked, what malicious IP address is used, a domain name, what md5 is used as a file hash, what mailbox is used by a domain name registrar, and the like, and these pieces of information can be associated, which is helpful for other applications, for example, to realize the identification of the associated IOC more quickly based on these pieces of associated information. Optionally, therefore, in step 110, the recognition results are analyzed to obtain the correlation between the categories of the cyber-security threat indicators. And optionally, at step 111, outputting a network security threat alert to a user based on the identification result.

The process may be performed periodically for continued optimization. For example, new network intelligence is crawled every day (step 101), and optionally after the processing of classification, judgment and filtering (step 102 and 105), automatic identification is performed (step 106), manual checking is accepted through

steps

107 and 108, and the corrected result is used for optimization of the identification mode (step 109), and the process is circulated, so that the reliability of automatic identification is improved continuously and gradually.

FIG. 7 illustrates a block diagram of a network security threat indicator identification apparatus, in accordance with an embodiment of the present invention. The apparatus comprises an obtainer 701 and an identifier 702. Where obtainer 701 is configured to obtain network intelligence, in one example obtainer 701 crawls external network security threat intelligence sources through crawler technology to obtain network intelligence. The external cyber-security threat intelligence sources are typically selected from cyber-threat intelligence sharing platforms, such as the shared cyber intelligence on website www.freebuf.com. Of course, in another example, the network intelligence may also be doped with non-network security threat intelligence. The identifier 702 is configured to identify the network intelligence for at least two network security threat indicators to obtain an identification result of the at least two network security threat indicators. The at least two network security threat indicators are divided into at least two groups in advance, and different identification modes are adapted to the at least two groups in advance. For the above examples of at least two kinds of network security threat indicators and their grouping, and examples of corresponding adapted identification manners, that is, the identification manner based on the word stock, the identification manner based on the rule, and the identification manner based on the machine learning algorithm, reference may be made to the corresponding description in step 106, which is not described herein again. In fig. 7, a word bank recognizer 7021 implements a recognition method based on a word bank, a rule recognizer 7022 implements a recognition method based on a rule, and a machine learning model recognizer 7023 implements a recognition method based on a machine learning model.

Optionally, the network security threat indicator recognition apparatus may further include a human-machine interface 703, further including an input unit 7031 and an output unit 7032, where the output unit 7032 shows recognition results of at least two kinds of network security threat indicators; the input unit 7032 corrects the recognition result in response to a received correction instruction for the recognition result. The results of the corrections may be fed back to the recognizer 702 to optimize its recognition, particularly the machine learning model recognizer 7023 and the lexicon recognizer 7021. The optimization method refers to the above description of step 109, and is not repeated here.

Optionally, the cyber security threat indicator identification apparatus may further comprise a classifier 704 and a first filter 705, the classifier 704 configured to, prior to identifying the cyber intelligence: the network intelligence is classified into cyber-security threat intelligence or non-cyber-security threat intelligence by using a pre-configured machine learning classification model, and then the non-cyber-security threat intelligence in the cyber intelligence is filtered by the first filter 705. For further explanation of the classifier, reference may be made to the description of step 102 above.

Optionally, the cyber-security threat indicator identifying apparatus may further comprise a decider 706 and a second filter 707, the decider 706 being configured for, after the classifying and the filtering: judging whether the network information classified as the network security threat information is effective network security threat information or not by utilizing a pre-configured machine learning judgment model; and filtering out invalid cyber-security threat intelligence in the cyber intelligence. Further explanation of the decider 706 may be found in the description of step 104 above.

FIG. 8 illustrates a hardware implementation environment diagram according to an embodiment of the invention. Referring to fig. 8, in an embodiment of the invention, a cyber security threat indicator identifying apparatus 800 includes a processor 804 including a hardware element 810. Processor 804 includes, for example, one or more processors such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. The term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for network security threat indicator identification, or incorporated in combined hardware and/or software modules. Also, the techniques may be fully implemented in one or more circuits or logic elements. The methods in this disclosure may be implemented in various components, modules, or units, but need not be implemented by different hardware units. Rather, as noted above, the various components, modules or units may be combined or provided by a collection of interoperative hardware units (including one or more processors as noted above) in combination with appropriate software and/or firmware.

In one or more examples, the aspects described above in connection with fig. 1-7 may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium 806 and executed by a hardware-based processor. Computer-readable media 806 may include computer-readable storage media corresponding to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, the computer-readable medium 806 may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. The data storage medium can be any available medium that can be read by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium 806.

By way of example, and not limitation, such computer-readable storage media can comprise memory such as RAM, ROM, EEPROM, CD _ ROM or other optical disk, magnetic disk memory or other magnetic storage, flash memory or any other memory 812 which can be used to store desired program code in the form of instructions or data structures and which can be read by a computer. Also, any connection is properly termed a computer-readable medium 806. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media 806.

The cyber security threat indicator identification apparatus 800 may also include an I/O interface for transmitting data, and other functionality 814. The cyber security threat indicator identification apparatus 800 may be included in various apparatuses such as a mobile phone, smart phone, tablet, laptop, desktop, game console, car mounted device, home appliance such as a television, player, or any apparatus capable of networking or otherwise receiving information, here illustrated as a computer 816, a mobile apparatus 818, and other apparatuses 820. Each of these configurations includes devices that may have generally different configurations and capabilities, and thus the network security threat indicator identification apparatus 800 may be configured according to one or more of the different device classes. The techniques of this disclosure may also be implemented, in whole or in part, on the "cloud" 822 through the use of a distributed system, such as through the platform 824 described below.

Cloud 822 includes and/or is representative of platform 824 for resources 826. The platform 824 abstracts underlying functionality of hardware (e.g., servers) and software resources of the cloud 822. Resources 826 may include applications and/or data that may be used when executing computer processes on servers remote from computing device 802. Resources 826 may also include services provided over the internet and/or over a subscriber network such as a cellular or Wi-Fi network.

The platform 824 may abstract resources and functionality to connect the computing device 802 with other computing devices. The platform 824 may also be used to abstract hierarchies of resources to provide a hierarchy of respective levels of encountered demand for resources 826 that are implemented via the platform 824. Thus, in interconnected device embodiments, implementation of functions described herein may be distributed throughout the system. For example, the functionality may be implemented in part on the computing device 802 and through the platform 824 that abstracts the functionality of the cloud 822.

According to the embodiment, an automatic identification mode of the network security threat sign is adopted, so that time and labor consumption of manual identification are avoided. Because a plurality of types of network security threat signs needing to be identified are grouped according to identification modes, and are identified by machine learning model-based identification modes, word bank-based identification modes and rule-based identification modes corresponding to different groups, the network security threat signs can be advantageously identified by utilizing the characteristics of different types of network security threat indexes, so that the limitation of adopting a single identification mode is avoided, for example, the rule-based identification mode is invalid to some types of network security threat indexes (such as attack organizations) and cannot be effectively identified, but the identification mode based on the machine learning model can be effectively identified instead, and the problems of missing report and false report are solved to a certain extent. Through interaction with a WEB page, for example, the machine learning model can continuously receive a result fed back by a front end, so that the machine learning model is continuously trained and optimized, the identification accuracy of the machine learning model is continuously improved, a word bank can be continuously updated, and the problems of missing report and false report are solved to a certain extent. In addition, the adopted machine learning model can identify the characteristics of the context, so that the non-malicious network security threat indexes appearing in the report can be distinguished, and the problems of report missing and report false are solved to a certain extent.

It should be noted that the appearances of the phrases "first," "second," and the like in this disclosure are not intended to indicate any importance or order to the steps, but are merely used for distinguishing. Method steps are not described in a sequence which does not represent their execution sequence without specific description or prerequisite constraints (i.e., the execution of one step is premised on the execution result of another step), and the described method steps can be executed in a possible and reasonable order.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A network security threat index identification method comprises the following steps:

acquiring network information; and

aiming at least two network security threat indexes, utilizing a pre-adaptive identification mode to identify the network intelligence so as to obtain the identification results of the at least two network security threat indexes,

wherein the at least two network security threat indicators are pre-divided into at least two groups, different identification modes are pre-adapted to the at least two groups, and

wherein the different recognition modes comprise recognition modes based on machine learning models.

2. The method of claim 1, further comprising, prior to identifying the network intelligence:

classifying the network information into network security threat information or non-network security threat information by utilizing a pre-configured machine learning classification model; and is

And filtering non-network security threat intelligence in the network intelligence.

3. The method of claim 2, wherein the first and second light sources are selected from the group consisting of a red light source, a green light source, and a blue light source,

wherein the preconfigured machine learning classification model comprises an embedding layer, a convolutional layer, a max-pooling layer, and a full-connectivity layer, and

wherein the classifying further comprises:

acquiring a text of the network intelligence and inputting the text into the embedding layer so as to encode the text into a distributed representation;

inputting the distributed representation into a convolutional layer to extract features of text of the network intelligence;

inputting the features into the maximum pooling layer to extract a maximum value corresponding to each feature, and splicing the extracted maximum values corresponding to each feature to serve as the output of the maximum pooling layer;

and inputting the output of the maximum pooling layer into the full-connection layer, and obtaining the classification result based on the output of the full-connection layer.

4. The method of claim 2, further comprising, after said classifying and said filtering:

judging whether the network information classified as the network security threat information is effective network security threat information or not by utilizing a pre-configured machine learning judgment model; and is

Filtering ineffective network security threat information in the network information; wherein the machine learning judgment model comprises an embedding layer and a random forest layer, and

wherein the determining comprises:

inputting text of network intelligence classified as the cyber-security threat intelligence into the embedding layer to encode it into a distributed representation; and is

And inputting the distributed representation into a random forest layer so as to judge whether the network information classified as the network security threat information is effective network security threat information or not according to the output of the random forest layer.

5. The method of any of claims 1-4, wherein the different identification manner further comprises:

a word bank-based recognition mode, wherein words in the network information are matched with words in a pre-established word bank, and words capable of being matched are used as recognition results; and

and the identification mode is based on a rule, wherein the preset rule is utilized to analyze the text of the network information, and the content conforming to the rule is used as an identification result.

6. The method of any of claims 1-4, further comprising:

displaying the recognition result through a web page; and

and in the case of receiving a correction instruction for the recognition result, correcting the recognition result.

7. The method according to any one of claims 1 to 4,

wherein a first group of the at least two groups comprises network security threat indicators of the following kind: affect areas and platforms, and

wherein said identifying comprises: and aiming at any kind of network security threat indexes in the first group, identifying the network information by utilizing a word bank-based identification mode, wherein the word bank-based identification mode is to match words in the network information with words in a pre-established word bank and take the words capable of being matched as identification results.

8. The method according to any one of claims 1 to 4,

wherein a second group of the at least two groups comprises network security threat indicators of the following kind: basic data files, registries, services and startup items of the program, and

wherein said identifying comprises: and aiming at any kind of network security threat indexes in the second group, identifying the network information by using a rule-based identification mode, wherein the rule-based identification mode is to analyze the network information by using a preset rule and take the content meeting the rule as an identification result.

9. The method according to any one of claims 1 to 4,

wherein a third group of the at least two groups comprises the following categories of cyber-security threat indicators: trojan family, threat organization, threat object, threat approach, vulnerability usage, file hash, IP address, domain name, file information, global resource locator, mutex lock, and mailbox, and

wherein said identifying comprises: and aiming at any kind of network security threat indexes in the third group, identifying the network intelligence by utilizing an identification mode based on a machine learning model.

10. The method of any of claims 1-4, further comprising:

statistically analyzing the identification result to obtain the relevance between the types of the network security threat indicators; and/or

And outputting a network security threat warning based on the identification result.

11. The method of claim 6, further comprising:

if the recognition result is recognized by using a recognition mode based on a machine learning model, further training the machine learning model by using the corrected recognition result; and/or

And updating the word stock by using the corrected recognition result under the condition that the recognition result is recognized by using a word stock-based recognition mode, wherein the word stock-based recognition mode is to match words in the network information with words in a pre-established word stock and take the words capable of being matched as the recognition result.

12. The method of any of claims 1-4, wherein the machine learning model comprises a first embedding layer, a second embedding layer, a first layer of two-way long-term memory layer, a second layer of two-way long-term memory layer, a feed-forward neural network layer, and an optimization layer; and is

Wherein identifying the network intelligence using the machine learning model comprises:

inputting a next level element of a word of the network intelligence into the first embedding layer to be encoded into a distributed representation of the next level element;

inputting the distributed representation of the next-level element into the first layer of bidirectional long-short time memory layer to obtain the output of the first layer of bidirectional long-short time memory layer;

inputting a word of the network intelligence into the second embedding layer to be encoded into a distributed representation of the word;

splicing the output of the first layer of bidirectional long-term and short-term memory layer with the distributed representation of the word and then inputting the spliced output into the second layer of bidirectional long-term and short-term memory layer to obtain the output of the second layer of bidirectional long-term and short-term memory layer;

inputting the output of the second layer bidirectional long-time and short-time memory layer into a feedforward neural network layer with a hidden layer to obtain the probability of each network security threat index in the word; and

and inputting the probability into the optimization layer, and obtaining output which is a network security threat index in the network information.

13. A network security threat indicator identification apparatus, comprising:

an acquirer configured to acquire network intelligence; and

an identifier configured to identify the network intelligence for at least two cyber-security threat indicators to obtain an identification result of the at least two cyber-security threat indicators,

14. A network security threat indicator identification apparatus, comprising:

a processor; and

a memory configured to have computer-executable instructions stored thereon that, when executed in the processor, cause the processor to implement the method of any of claims 1-12.

15. A computer-readable storage medium having stored therein instructions which, when executed on a computer, cause the computer to implement the method of any one of claims 1-12.