CN114697066A

CN114697066A - Network threat detection method and device

Info

Publication number: CN114697066A
Application number: CN202011643016.6A
Authority: CN
Inventors: 黄朝文; 陈劲; 路文超; 白敏�; 李佳馨; 齐向东; 吴云坤
Original assignee: Qax Technology Group Inc; Secworld Information Technology Beijing Co Ltd
Current assignee: Qax Technology Group Inc; Secworld Information Technology Beijing Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2022-07-01

Abstract

The present disclosure provides a network threat detection method, including: obtaining threat indicator IOC data; performing feature extraction processing on the IOC data to obtain at least one threat indicator feature associated with the IOC data; performing malicious rule matching operation based on the at least one threat index characteristic by using a preset threat detection model to obtain a malicious matching score, wherein the malicious matching score is determined according to the number of matched malicious rules and the type of the malicious rules; and determining the IOC data as malicious IOC data under the condition that the malicious matching score is higher than a preset threshold value. The present disclosure also provides a cyber-threat detection apparatus, an electronic device, and a computer-readable storage medium.

Description

Network threat detection method and device

Technical Field

The present disclosure relates to the field of network security, and in particular, to a network threat detection method, a network threat detection apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

With the diversified development of network attack means and channels, the network threat presents rapid and malignant evolution, which puts higher requirements on the network threat detection capability. IOC (Indicator of threat intelligence) data is threat intelligence data capable of describing threat event information, and network threat detection and external attack prevention can be realized based on the IOC data.

In the process of realizing the technical concept of the present disclosure, the inventor finds that the network threat detection based on the IOC data needs to be performed by strongly relying on time investment and experience accumulation of a security analyst in the related art, which has the problems of high professional requirements, low detection efficiency and unstable detection result.

Disclosure of Invention

One aspect of the present disclosure provides a cyber-threat detection method, including: obtaining threat indicator IOC data; performing feature extraction processing on the IOC data to obtain at least one threat indicator feature associated with the IOC data; and performing malicious rule matching operation based on the at least one threat index characteristic by using a preset threat detection model to obtain a malicious matching score, wherein the malicious matching score is determined according to the number of matched malicious rules and the types of the malicious rules, and the IOC data is determined to be malicious IOC data under the condition that the malicious matching score is higher than a preset threshold value.

Optionally, the performing, by using a preset threat detection model, a malicious rule matching operation based on the at least one threat indicator feature, and the performing includes: vectorizing each threat index feature by using the threat detection model to obtain at least one threat index vector associated with the IOC data; and performing matching operation based on a preset malicious rule according to the at least one threat index vector.

Optionally, the threat indicator feature comprises any one of: malicious sample association characteristics, Uniform Resource Locator (URL) association characteristics, Internet Protocol (IP) characteristics, domain name character characteristics, domain name resolution characteristics and domain name registrant characteristics.

Optionally, after determining the malicious IOC data, the method further includes: and updating a preset malicious rule by using the malicious IOC data, and/or generating threat early warning information based on the malicious IOC data, and sending the threat early warning information to corresponding network element equipment.

Optionally, when the threat detection model is utilized to perform threat detection on a plurality of IOC data, the method further includes: calculating the similarity of different IOC data according to at least one threat index characteristic associated with each IOC data by using the threat detection model to obtain the similarity score of every two IOC data and output the similarity score; and dividing the IOC data with the similarity score higher than a first preset threshold into the same cluster classification so as to realize the division of the IOC data into at least one cluster classification.

Optionally, the method further comprises: for any target IOC data to be analyzed, determining other IOC data with similarity higher than a second preset threshold in the at least one cluster classification according to at least one threat index characteristic associated with the target IOC data, wherein the second preset threshold is not less than the first preset threshold.

Optionally, the method further comprises: classifying the IOC data based on a preset classification rule according to at least one threat indicator characteristic associated with the IOC data, wherein the classification result comprises one of the following: trojan, virus, and backdoor program.

Optionally, the source of the IOC data includes at least one of: enterprise self-production information, third party aggregated information and cloud sharing information.

Optionally, the training method of the threat detection model includes: obtaining a plurality of sample IOC data with security identifiers, wherein the security identifier of any sample IOC data is a malicious identifier or a benign identifier; performing feature extraction processing on each sample IOC data to obtain at least one threat index feature associated with each sample IOC data; performing model training based on the security identification and the at least one threat indicator feature associated with each of the sample IOC data to obtain the threat detection model.

Another aspect of the present disclosure provides a cyber-threat detection apparatus, including: the obtaining module is used for obtaining threat indicator IOC data; the first processing module is used for carrying out feature extraction processing on the IOC data to obtain at least one threat index feature related to the IOC data; and the second processing module is used for performing malicious rule matching operation based on the at least one threat index characteristic by using a preset threat detection model to obtain a malicious matching score, wherein the malicious matching score is determined according to the number of matched malicious rules and the type of the malicious rules, and the IOC data is determined to be malicious IOC data when the malicious matching score is higher than a preset threshold value.

Optionally, the second processing module includes: the first processing submodule is used for carrying out vectorization processing on each threat index characteristic by using the threat detection model to obtain at least one threat index vector associated with the IOC data; and the second processing submodule is used for performing matching operation based on a preset malicious rule according to the at least one threat index vector.

Optionally, the apparatus further includes a third processing module, configured to update a preset malicious rule by using the malicious IOC data after the malicious IOC data is determined by the second processing module, and/or generate threat early warning information based on the malicious IOC data, and send the threat early warning information to a corresponding network element device.

Optionally, the apparatus further includes a fourth processing module, configured to, when the second processing module performs threat detection on multiple pieces of IOC data by using the threat detection model, perform similarity calculation on different pieces of IOC data by using the threat detection model according to at least one threat indicator feature associated with each piece of IOC data, obtain a similarity score of every two pieces of IOC data, and output the similarity score; and dividing the IOC data with the similarity score higher than a first preset threshold into the same cluster classification so as to realize the division of the IOC data into at least one cluster classification.

Optionally, the apparatus further includes a fifth processing module, configured to, for any target IOC data to be analyzed, determine, according to at least one threat indicator feature associated with the target IOC data, other IOC data whose similarity to the target IOC data is higher than a second preset threshold in the at least one cluster classification, where the second preset threshold is not smaller than the first preset threshold.

Optionally, the apparatus further includes a sixth processing module, configured to classify the IOC data based on a preset classification rule according to at least one threat indicator feature associated with the IOC data, where the classification result includes one of: trojan, virus, and backdoor program.

Optionally, the source of IOC data comprises at least one of: enterprise self-production information, third party aggregated information and cloud sharing information.

Another aspect of the present disclosure provides an electronic device including: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods of embodiments of the present disclosure.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the methods of the embodiments of the present disclosure when executed.

Another aspect of the present disclosure provides a computer program product comprising computer readable instructions, wherein the computer readable instructions are configured to perform the method of the embodiments of the present disclosure when executed.

Drawings

For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

FIG. 1 schematically illustrates a system architecture of a cyber-threat detection method and apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a cyber-threat detection method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a network threat detection flow, in accordance with an embodiment of the present disclosure;

FIG. 4A schematically illustrates a schematic diagram of a process of training and applying a threat detection model, according to an embodiment of the disclosure;

FIG. 4B schematically illustrates a schematic diagram of a threat detection model training and optimization process, according to an embodiment of the disclosure;

FIG. 5 schematically illustrates a flow diagram of a method for web threat detection using a threat detection model, in accordance with an embodiment of the disclosure;

FIG. 6 schematically illustrates a schematic diagram of vectorizing threat indicator features, in accordance with an embodiment of the disclosure;

FIG. 7 schematically illustrates a schematic diagram of cluster classification of IOC data according to an embodiment of the disclosure;

FIG. 8 schematically illustrates a schematic diagram of type partitioning of IOC data, according to an embodiment of the disclosure;

FIG. 9 schematically illustrates a threat detection result diagram according to an embodiment of the disclosure;

FIG. 10 schematically illustrates a block diagram of a cyber-threat detection apparatus according to an embodiment of the present disclosure;

fig. 11 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.

Embodiments of the present disclosure provide a cyber-threat detection method and a detection apparatus to which the method can be applied, which may include, for example, the following operations. Firstly, threat index IOC data is obtained, feature extraction processing is carried out on the IOC data, at least one threat index feature related to the IOC data is obtained, then malicious rule matching operation based on the at least one threat index feature is carried out by utilizing a preset threat detection model, and a malicious matching score is obtained, wherein the malicious matching score is determined according to the number of matched malicious rules and the type of the malicious rules; and determining the IOC data as malicious IOC data under the condition that the malicious matching score is higher than a preset threshold value. Fig. 1 schematically illustrates a system architecture of a cyber-threat detection method and apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 includes: at least one terminal (a plurality of which are shown in the figure as

terminals

101, 102, 103), a network 104 and a server 105 for cyber-threat detection. The network 104 is used to provide communication links between terminals (e.g.,

terminals

101, 102, 103) and the server 105, and the network 104 may include various connection types, including wireless communication links, wired communication links, fiber optic cables, and so forth, for example. The terminal may include a desktop computer, a portable computer, a smart phone, a tablet computer, a personal digital assistant, a network-side device, and other electronic devices installed with an operating system.

In the system architecture 100, a server 105 acquires threat indicator IOC data from a terminal (such as

terminals

101, 102, 103) or from a cloud server through a network 104, performs feature extraction processing on the IOC data after the IOC data to be analyzed is acquired, obtains at least one threat indicator feature associated with the IOC data, and then performs malicious rule matching operation based on the at least one threat indicator feature by using a preset threat detection model to obtain a malicious matching score, wherein the malicious matching score is determined according to the number of matched malicious rules and the type of the malicious rules, and the IOC data is determined to be malicious IOC data when the malicious matching score is higher than a preset threshold value.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure can be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure cannot be applied to other system architectures.

The present disclosure will be described in detail below with reference to the drawings and specific embodiments.

Fig. 2 schematically illustrates a flow chart of a cyber-threat detection method according to an embodiment of the present disclosure, and as shown in fig. 2, the method may include operations S210 to S230.

In operation S210, threat indicator IOC data is obtained.

In the embodiment of the present disclosure, specifically, IOC data of multi-source information is obtained, the multi-source information may include, for example, enterprise self-production information, third-party aggregated information, and cloud-shared information, the enterprise self-production information is data that is available to an enterprise, and may include, for example, honeypot data, enterprise device log data, network security data generated by an enterprise security product, and the like, and the enterprise self-production information has higher data quality than the other two kinds of information. The third party aggregated information specifically comprises cooperative information provider shared information, security manufacturer aggregated information, information industry shared organization aggregated information and the like, and the cloud shared information is open source network information and specifically comprises open source network information data acquired from a cloud server.

After the IOC data from different data sources are acquired, preprocessing operation for the IOC data is performed, where the preprocessing operation may include, for example, normalization processing, deduplication processing, denoising processing, staticizing processing, and the like. The preprocessing of the IOC data is beneficial to realizing the specification of the IOC data, and after the preprocessing operation is completed, the preprocessed IOC data can be enriched, wherein the enrichment is to explain the IOC data to be analyzed by using IOC data of other network information so as to preliminarily evaluate whether the IOC data to be analyzed has a malicious processing mode.

Next, in operation S220, a feature extraction process is performed on the IOC data, resulting in at least one threat indicator feature associated with the IOC data.

In this embodiment, specifically, feature extraction processing is performed on the obtained IOC data to obtain at least one threat indicator feature associated with the IOC data, where the threat indicator feature may include, for example, any one of the following: malicious sample association characteristics, Uniform Resource Locator (URL) association characteristics, Internet Protocol (IP) characteristics, domain name character characteristics, domain name resolution characteristics and domain name registrant characteristics.

The malicious sample association feature describes malicious sample information associated with the IOC data, the URL association feature describes URL information associated with the IOC data, the IP feature describes information such as an IP address of the IOC data and a physical region to which the IP address belongs, the domain name character feature describes domain name character composition information of the IOC data, the domain name resolution quantity feature describes IP quantity information (namely PDNS resolution quantity) resolved by a domain name of the IOC data, and the domain name registrant feature describes domain name registrant information (namely WHOIS information of the domain name) of the IOC data.

Optionally, in addition to obtaining the at least one threat indicator feature, other auxiliary information of the IOC data may be obtained, and the other auxiliary information may include, for example, a file indicator feature, a network indicator feature, and the like of the IOC data. The file index features may include, for example, information such as file hash value, file name, file size, file path, file type, signature certificate, etc., and the network index features may include, for example, information such as domain name record, HTML path, SSL certificate, port address, etc.

Next, in operation S230, a preset threat detection model is used to perform a malicious rule matching operation based on at least one threat index feature, so as to obtain a malicious matching score, where the malicious matching score is determined according to the number of matched malicious rules and the type of the malicious rules, and the IOC data is determined to be malicious IOC data when the malicious matching score is higher than a preset threshold.

In this embodiment, specifically, a preset threat detection model is used to perform a malicious rule matching operation based on at least one threat indicator feature. Specifically, at least one threat index characteristic associated with the IOC data is matched with a preset malicious rule by using a threat detection model to obtain a malicious matching score, the malicious matching score is determined by the number of the matched malicious rules and the types of the malicious rules, and the IOC data is determined to be malicious IOC data under the condition that the malicious matching score is higher than a preset threshold value.

Because the hazard levels of the network threats corresponding to different malicious rule types may be different, different malicious rule types may have different preset weights. The malicious match score is related to not only the number of malicious rules that are matched, but also the type of malicious rules that are matched. A corresponding score can be set for each malicious rule according to the hazard degree of the network threat corresponding to the malicious rule, when the hazard degrees of the network threats represented by the malicious rules are the same, the scores obtained by matching the malicious rules are the same, and at the moment, the more the number of the matched malicious rules is, the higher the malicious matching score is; when the degree of harm of the network threat represented by each malicious rule is possibly different, a corresponding score is set for each malicious rule in advance, and at the moment, a malicious matching score needs to be obtained by calculating the sum of the scores of the matched malicious rules. In another embodiment, a weighted sum score of matching malicious rules may optionally be used as the malicious match score. Illustratively, the threat detection result indicates that the number of malicious rules matched by the IOC data to be analyzed is 10, where the preset weight of 7 malicious rules is 5, and the preset weight of the other 3 malicious rules is 8, and then it is determined that the malicious matching score is 5 × 7+8 × 3 — 59.

Because the final malicious matching score is also related to the type of the matched malicious rule, different weights can be set for different types of malicious rules in advance, so that the higher the preset weight of the type of the matched malicious rule is, the higher the malicious matching score is. That is, the malicious match score depends at least on the number of malicious rules matched and the type of malicious rules.

After determining that the IOC data is malicious IOC data, the IOC data can be used for updating preset malicious rules, specifically, the content of the malicious rules and the threat level of the malicious rules can be updated. The updating of the content of the malicious rules comprises the steps of adding new malicious rules and deleting existing useless malicious rules besides the specific content of the updated malicious rules. And updating the threat level of the malicious rule, namely updating the preset weight for the malicious rule. In addition, threat early warning information may be generated based on the malicious IOC data, and the threat early warning information may be sent to a corresponding network element device, for example, the threat early warning information may be sent to a firewall device, an IPS (Intrusion Prevention System), or the like.

The training method of the threat detection model comprises the following steps: obtaining a plurality of sample IOC data with security identifiers, wherein the security identifier of any sample IOC data is a malicious identifier or a benign identifier; performing feature extraction processing on each sample IOC data to obtain at least one threat index feature associated with each sample IOC data; and performing model training based on the security identification and at least one threat index characteristic associated with each sample IOC data to obtain a threat detection model.

The threat detection model in this embodiment may be trained based on an RNN (Recurrent Neural Network) and a CNN (Convolutional Neural Network) in the AI field, by inputting a large amount of sample IOC data, performing cyclic training operations such as model parameter initialization, calculation of a loss function value, model parameter adjustment, and the like based on training samples, stopping training when the loss function value reaches an optimal solution, and the threat detection model obtained by training has the capability of identifying a Network threat based on the IOC data.

After training of the threat detection model is completed, performing threat detection operation based on IOC data by using the trained threat detection model, inputting at least one threat index characteristic associated with the IOC data into the threat detection model, and obtaining a threat detection result aiming at the IOC data, wherein the threat detection result indicates that the IOC data is malicious IOC data or benign IOC data. And after the threat detection result is obtained, automatically auditing or manually auditing the threat detection result, and judging whether the threat detection result belongs to a pair judgment result or a misjudgment result. IOC data detected in daily work of the threat detection model is used as daily added training data, and iterative training aiming at the threat detection model is carried out by combining historical training data to update and optimize the threat detection model, so that the studying and judging reliability of the threat detection model on unknown IOC data is improved.

According to the embodiment, the IOC data of the threat indicators are obtained; performing feature extraction processing on the IOC data to obtain at least one threat index feature associated with the IOC data; carrying out malicious rule matching operation based on at least one threat index characteristic by using a preset threat detection model to obtain a malicious matching score, wherein the malicious matching score is determined according to the number of matched malicious rules and the type of the malicious rules; and determining the IOC data as malicious IOC data under the condition that the malicious matching score is higher than a preset threshold value. By extracting at least one threat index feature associated with the IOC data and utilizing the threat detection model to carry out malicious detection based on the at least one threat index feature, the method is beneficial to realizing batched and automatic network threat detection based on the IOC data, is beneficial to improving the network threat detection efficiency and improving the stability of a network threat detection result, and is also beneficial to reducing the professional requirement of the network threat detection and reducing the network threat detection cost.

Fig. 3 schematically illustrates a schematic diagram of a cyber-threat detection process according to an embodiment of the present disclosure, and as shown in fig. 3, the cyber-threat detection process includes operations S301 to S306, and the specific content of each operation is as follows.

In operation S301: IOC data for multi-source intelligence is obtained.

In operation S302: the IOC data is preprocessed.

In operation S303: and carrying out data enrichment processing on the preprocessed IOC data so as to realize preliminary evaluation on whether the IOC data is malicious IOC data.

In operation S304: and performing feature extraction processing on the IOC data to obtain at least one threat indicator feature associated with the IOC data.

In operation S305: and carrying out network threat detection based on at least one threat index characteristic to obtain a threat detection result.

In operation S306: and constructing an IOC operation library based on the threat detection result aiming at the IOC data.

Optionally, before performing operation S305, operation S305' may be performed to implement vectorization processing on each threat indicator feature associated with the IOC data to obtain at least one threat indicator vector associated with the IOC data, and then operation S305 may be performed to implement network threat detection based on the at least one threat indicator vector.

Fig. 4A schematically illustrates a training and application process of a threat detection model according to an embodiment of the disclosure, and as shown in fig. 4A, multiple sample IOC data are obtained, and feature extraction processing is performed on each sample IOC data to obtain at least one threat indicator feature associated with each sample IOC data, where the threat indicator feature includes information such as malicious samples and URLs associated with the sample IOC data, PDNS resolution of domain names of the sample IOC data, WHOIS registrants, and character composition.

Illustratively, threat indicator characteristics of certain sample IOC data include the following.

The method is characterized in that: correlate to malignancy 5d3 f.;

and (2) feature: associate to URL xxx/fre.php, xxx/data.bin;

and (3) feature: PDNS resolution 1759;

and (4) feature: WHOIS registrar TOM;

and (5) feature: the domain name characters constitute skgkei.

Each sample IOC data has a preset security identifier, the security identifier is a judgment result of an analyst aiming at the sample IOC data, the judgment result comprises black and non-black, the judgment result is that the black corresponds to a malicious identifier, and the judgment result is that the non-black corresponds to a benign identifier. And performing model training based on the security identification and at least one threat index characteristic associated with each sample IOC data to obtain a threat detection model. When the network threat detection is performed by using the threat detection model, the IOC data to be detected is obtained, feature extraction processing is performed on the IOC data to be detected, and at least one threat index feature associated with the IOC data to be detected is obtained, as shown in fig. 4A, the threat index feature of the IOC data to be detected includes the following contents.

The method is characterized in that: correlates to malignant sample d4f 321;

and (2) feature: php to URL xxx/fre;

and (3) feature: PDNS analytical quantity 984;

and (4) feature: who is not registered;

and (5) feature: the domain name characters make up ksdjle.

And inputting the IOC data to be detected into a threat detection model to obtain a threat detection result aiming at the IOC data to be detected, wherein the threat detection result is black and indicates that the IOC data to be detected is malicious IOC data.

Fig. 4B schematically illustrates a schematic diagram of a training and optimizing process of a threat detection model according to an embodiment of the disclosure, and as shown in fig. 4B, IOC data to be detected is input into the trained threat detection model to obtain a threat detection result for the IOC data to be detected, and whether the threat detection result is a positive judgment result or a false judgment result is determined by an automatic auditing or manual auditing manner. And taking IOC data after misjudgment of the threat detection result as daily added training data, and performing iterative training aiming at the threat detection model by combining historical training data to update and optimize the threat detection model.

Fig. 5 schematically illustrates a flowchart of a method for web threat detection using a threat detection model according to an embodiment of the present disclosure, and as shown in fig. 5, the method may include operations S510 to S520.

In operation S510, a vectorization process is performed on each threat indicator feature by using the threat detection model, so as to obtain at least one threat indicator vector associated with the IOC data.

In this embodiment, specifically, the threat index feature associated with the IOC data has a preset feature structure, the feature structure includes content, content length, and content value, the content includes information such as a malicious sample associated with the IOC data, a URL, an IP address, a physical region to which the IP address belongs, a domain name character composition, a PDNS resolution, and WHOIS information of a domain name, the content length of different contents is usually variable, the content value of different contents may be a numerical value, a word, a symbol mark, and the like, and the content values of different forms may be processed as a string object. When the threat index features of the IOC data are extracted, the defined contents are extracted from the IOC data according to the preset feature structure.

And performing vectorization processing aiming at the threat index characteristics by using a threat detection model to convert the threat index characteristics in the form of data structure objects into threat index vectors in the form of vector objects. Optionally, the threat detection model may be used to encode the threat indicator features, so as to convert the text format threat indicator features into a uniform-dimension threat indicator vector. Or, threat index features in the form of data structure objects can be converted into threat index vectors with the same dimension by using model tools such as Word2vec and FastText. The vectorization processing is carried out on the threat index features, so that the extraction efficiency of effective feature information of the IOC data is improved, and the network threat detection efficiency based on the IOC data is improved.

Fig. 6 schematically illustrates a schematic diagram of vectorizing threat indicator features according to an embodiment of the present disclosure, and as shown in fig. 6, vectorizing each threat indicator feature of the IOC data is performed to convert each threat indicator feature into a threat indicator vector. Illustratively, the threat indicator characteristics of the IOC data include the following.

The method is characterized in that: correlate to malignancy 5d3 f.;

and (2) feature: associate to URL xxx/fre.php, xxx/data.bin;

and (3) characteristic: PDNS resolution 1759;

and (4) feature: WHOIS registrar TOM;

and (5) feature: the domain name characters constitute skgkei.

After vectorization processing is performed on each threat index feature, a plurality of threat index vectors of the IOC data are obtained, as shown in fig. 6, the threat index vectors of the IOC data include the following contents.

Vector 1: 1.04;

vector 2: 0.93;

vector 3: 0.23;

vector 4: 0.001;

vector 5: 5.21.

next, in operation S520, a matching operation based on a preset malicious rule is performed according to at least one threat indicator vector.

In this embodiment, specifically, a threat detection model is used to perform a threat detection operation based on at least one threat index vector, specifically, the threat detection model is used to perform operations such as malicious sample analysis, URL analysis, IP address analysis, and domain name analysis, so as to determine a malicious matching score associated with the IOC data based on a preset malicious rule, where the malicious matching score is determined by the number of matched malicious rules and the type of the malicious rule.

When the threat detection model is used for threat detection aiming at a plurality of IOC data, the method further comprises the following steps: calculating the similarity of different IOC data according to at least one threat index characteristic associated with each IOC data by using a threat detection model to obtain and output the similarity score of every two IOC data; and dividing the IOC data with the similarity score higher than a first preset threshold value into the same cluster classification so as to divide the IOC data into at least one cluster classification. And inputting a plurality of IOC data into a threat detection model, outputting a black and white judgment result aiming at each IOC data by the threat detection model, and outputting a similarity score of every two IOC data.

Fig. 7 schematically illustrates a schematic diagram of cluster classification of IOC data according to an embodiment of the present disclosure, and as shown in fig. 7, multiple pieces of IOC data are divided into 4 cluster classifications (different dotted rectangle boxes indicate one cluster classification), where different cluster classifications include different IOC data, and similarity scores of IOC data in the same cluster classification are higher than a preset threshold.

After at least one cluster classification is obtained based on different IOC data partitions, the association operation of similar IOC data can be performed using the at least one cluster classification. Specifically, for any target IOC data to be analyzed, according to at least one threat indicator characteristic associated with the target IOC data, other IOC data with the similarity higher than a second preset threshold value with the target IOC data are determined in at least one cluster classification, wherein the second preset threshold value is not smaller than the first preset threshold value. And constructing a similarity search system aiming at the IOC data by utilizing the similarity scores among different IOC data, wherein the similarity search system comprises data information of the different IOC data divided into at least one cluster classification. When similarity association operation aiming at the target IOC data is carried out, at least one threat index feature associated with the target IOC data, more specifically at least one threat index vector associated with the target IOC data and output by a threat detection model, is input into a similarity search system to associate other similar IOC data.

It should be noted that the second preset threshold is set to be not less than the first preset threshold, so that all or part of the IOC data can be selected from the cluster classification to which the target IOC data belongs to perform the association operation.

When the correlation operation of the similar IOC data is performed, a similarity distance threshold may be preset, and when the similarity distance between any two IOC data is higher than a preset threshold, it is determined that the two IOC data are similar IOC data. The similarity distance between different IOC data can be determined by the similarity between threat indicator vectors of different IOC data, and also can be determined by the distance between threat indicator vectors of different IOC data. The similarity between the threat index vectors of different IOC data can be determined by parameters such as the cosine of an included angle, correlation coefficients and the like, and the distance between the threat index vectors of different IOC data can be realized by calculating the plain distance, the Euclidean distance and the like.

By providing the similarity correlation method for the IOC data, other IOC data with similarity higher than a preset threshold value with the similarity of the IOC data to be detected are correlated by using a vectorization result of threat index characteristics of the IOC data to be detected, and an intuition similarity judgment result similar to a human is obtained, so that the correlation analysis capability for the IOC data is favorably improved, and the network threat detection capability based on the IOC data is favorably improved.

In addition, the IOC data may be classified based on a preset classification rule according to at least one threat indicator feature associated with the IOC data, and the classification result includes one of: trojan, virus and back door program, the back door program is set in the computer system for special user to control the computer system in special mode, and the back door program may include back door of web page, back door of thread insertion, back door of C/S, etc. And according to the vectorization result of the threat index characteristics of the IOC data, dividing the IOC data into sub-classifications under different types of large classes, and performing detailed division on the sub-classifications to realize continuous updating of threat sample data. Illustratively, the malicious IOC data is divided into different threat subcategories of phishing addresses, trojan addresses, article citations, malicious hosts, scanning hosts, spam, exploits, malware, and so on.

Fig. 8 schematically illustrates a schematic diagram of type division of IOC data according to an embodiment of the present disclosure, and as shown in fig. 8, IOC data is divided into 3 major classes: ml _ cls: 0. ml _ cls: 1. ml _ cls: 2, each major class includes 3 sub-classes, e.g., major class ml _ cls: included in 0 is the subclass ml _ cls: 00. ml _ cls: 01. ml _ cls: 02, each subclass includes different IOC data. And accumulating the IOC data in each sub-class, and taking the corresponding sub-class as an independent class when the accumulated IOC data volume is higher than a preset threshold value. The IOC data are divided into multiple levels of types, so that IOC data analysis capability is improved, and IOC data analysis efficiency is improved.

Fig. 9 schematically illustrates a threat detection result diagram according to an embodiment of the disclosure, and as shown in fig. 9, a threat detection result of the threat detection model for IOC data includes a basic type, a standard brand, a malicious score, a related platform, a decision result, a threat type, a malicious score, a confidence level, a malicious analysis, and the like.

Fig. 10 schematically illustrates a block diagram of a cyber-threat detection apparatus according to an embodiment of the present disclosure, and as shown in fig. 10, the apparatus 1000 may include an obtaining module 1001, a first processing module 1002, and a second processing module 1003.

Specifically, the obtaining module 1001 is configured to obtain threat indicator IOC data; a first processing module 1002, configured to perform feature extraction processing on IOC data to obtain at least one threat indicator feature associated with the IOC data; the second processing module 1003 is configured to perform, by using a preset threat detection model, a malicious rule matching operation based on at least one threat indicator feature to obtain a malicious matching score, where the malicious matching score is determined according to the number of matched malicious rules and the type of the malicious rules, and the IOC data is determined to be malicious IOC data when the malicious matching score is higher than a preset threshold.

As an alternative embodiment, the second processing module comprises: the first processing submodule is used for carrying out vectorization processing on each threat index characteristic by using a threat detection model to obtain at least one threat index vector associated with IOC data; and the second processing submodule is used for performing matching operation based on a preset malicious rule according to at least one threat index vector.

As an alternative embodiment, the threat indicator feature comprises any one of: malicious sample association characteristics, Uniform Resource Locator (URL) association characteristics, Internet Protocol (IP) characteristics, domain name character characteristics, domain name resolution characteristics and domain name registrant characteristics.

As an optional embodiment, the apparatus further includes a third processing module, configured to update the preset malicious rule by using the malicious IOC data after the second processing module determines the malicious IOC data, and/or generate threat early warning information based on the malicious IOC data, and send the threat early warning information to the corresponding network element device.

As an optional embodiment, the apparatus further includes a fourth processing module, configured to, when the second processing module performs threat detection on the plurality of IOC data by using the threat detection model, perform similarity calculation on different IOC data according to at least one threat indicator feature associated with each IOC data by using the threat detection model, obtain a similarity score of every two IOC data, and output the similarity score; and dividing the IOC data with the similarity score higher than a first preset threshold value into the same cluster classification so as to divide the IOC data into at least one cluster classification.

As an optional embodiment, the apparatus further includes a fifth processing module, configured to, for any target IOC data to be analyzed, determine, in at least one cluster classification, other IOC data whose similarity to the target IOC data is higher than a second preset threshold according to at least one threat indicator feature associated with the target IOC data, where the second preset threshold is not less than the first preset threshold.

As an optional embodiment, the apparatus further includes a sixth processing module, configured to classify, according to at least one threat indicator feature associated with the IOC data, the IOC data based on a preset classification rule, where a classification result includes one of: trojan, virus, and back door program.

As an alternative embodiment, the source of IOC data includes at least one of: enterprise self-production information, third party aggregated information and cloud sharing information.

As an alternative embodiment, the apparatus further includes a seventh processing module for training the threat detection model, including: the acquisition submodule is used for acquiring a plurality of sample IOC data with security identifications, wherein the security identification of any sample IOC data is a malicious identification or a benign identification; the third processing submodule is used for performing feature extraction processing on each sample IOC data to obtain at least one threat index feature associated with each sample IOC data; and the fourth processing submodule is used for carrying out model training based on the security identification and at least one threat index characteristic associated with each sample IOC data to obtain a threat detection model.

It should be noted that, in the embodiment of the present disclosure, the embodiment of the apparatus portion is similar to the embodiment of the method portion, and the achieved technical effects are also similar, which are not described herein again.

Any of the modules according to embodiments of the present disclosure, or at least part of the functionality of any of them, may be implemented in one module. Any one or more of the modules according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in any other reasonable manner of hardware or firmware by which a circuit is integrated or packaged, or in any one of three implementations, or in any suitable combination of any of the several. Or one or more of the modules according to embodiments of the disclosure, may be implemented at least partly as computer program modules which, when executed, may perform corresponding functions.

For example, any plurality of the obtaining module 1001, the first processing module 1002 and the second processing module 1003 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 1001, the first processing module 1002, and the second processing module 1003 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or in a suitable combination of any several of them. Alternatively, at least one of the obtaining module 1001, the first processing module 1002 and the second processing module 1003 may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

Fig. 11 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 11, electronic device 1100 includes a processor 1110, a computer-readable storage medium 1120. The electronic device 1100 may perform a method according to an embodiment of the disclosure.

In particular, processor 1110 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 1110 may also include onboard memory for caching purposes. Processor 1110 may be a single processing module or a plurality of processing modules for performing the different actions of the method flows according to embodiments of the present disclosure.

Computer-readable storage medium 1120, for example, may be a non-volatile computer-readable storage medium, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and so on.

The computer-readable storage medium 1120 may include a computer program 1121, and the computer program 1121 may include code/computer-executable instructions that, when executed by the processor 1110, cause the processor 1110 to perform a method according to an embodiment of the present disclosure, or any variation thereof.

The computer programs 1121 can be configured to have, for example, computer program code including computer program modules. For example, in an example embodiment, code in computer program 1121 may include one or more program modules, including, for example, 1121A, module 1121B. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, when the program modules are executed by the processor 1110, the processor 1110 may execute the method according to the embodiment of the present disclosure or any variation thereof.

According to an embodiment of the present disclosure, at least one of the obtaining module 1001, the first processing module 1002 and the second processing module 1003 may be implemented as a computer program module described with reference to fig. 11, which, when executed by the processor 1110, may implement the respective operations described above.

The present disclosure also provides a computer-readable storage medium, which may be embodied in the device/apparatus/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

The present disclosure also provides a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, the program code is configured to cause the electronic device to implement the cyber-threat detection method provided by the embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It will be understood by those skilled in the art that while the present disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims

1. A cyber-threat detection method, comprising:

obtaining threat indicator IOC data;

performing feature extraction processing on the IOC data to obtain at least one threat indicator feature associated with the IOC data;

performing malicious rule matching operation based on the at least one threat index characteristic by using a preset threat detection model to obtain a malicious matching score, wherein the malicious matching score is determined according to the number of matched malicious rules and the type of the malicious rules; and (c) a second step of,

and under the condition that the malicious matching score is higher than a preset threshold value, determining that the IOC data are malicious IOC data.

2. The method according to claim 1, wherein the performing malicious rule matching operation based on the at least one threat indicator feature by using a preset threat detection model comprises:

vectorizing each threat index feature by using the threat detection model to obtain at least one threat index vector associated with the IOC data;

and performing matching operation based on a preset malicious rule according to the at least one threat index vector.

3. A method according to claim 1 or 2, wherein the threat indicator signature comprises any at least one of:

malicious sample association characteristics, Uniform Resource Locator (URL) association characteristics, Internet Protocol (IP) characteristics, domain name character characteristics, domain name resolution characteristics and domain name registrant characteristics.

4. The method of claim 1 or 2, wherein after determining the malicious IOC data, further comprising:

updating preset malicious rules by using the malicious IOC data, and/or

And generating threat early warning information based on the malicious IOC data, and sending the threat early warning information to corresponding network element equipment.

5. The method of claim 1, wherein when utilizing the threat detection model for threat detection for a plurality of IOC data, further comprising:

calculating the similarity of different IOC data according to at least one threat index characteristic associated with each IOC data by using the threat detection model to obtain the similarity score of every two IOC data and output the similarity score;

and dividing the IOC data with the similarity score higher than a first preset threshold into the same cluster classification so as to realize the division of the IOC data into at least one cluster classification.

6. The method of claim 5, further comprising:

for any target IOC data to be analyzed, determining other IOC data with similarity higher than a second preset threshold in the at least one cluster classification according to at least one threat index characteristic associated with the target IOC data, wherein the second preset threshold is not less than the first preset threshold.

7. The method of claim 1, further comprising:

classifying the IOC data based on a preset classification rule according to at least one threat indicator characteristic associated with the IOC data, wherein the classification result comprises one of the following: trojan, virus, and backdoor program.

8. The method of claim 1, wherein the source of the IOC data comprises at least one of:

the system comprises white information of enterprises, third party aggregated information and cloud shared information.

9. The method of claim 1, wherein the training of the threat detection model comprises:

obtaining a plurality of sample IOC data with security identifiers, wherein the security identifier of any sample IOC data is a malicious identifier or a benign identifier;

performing feature extraction processing on each sample IOC data to obtain at least one threat index feature associated with each sample IOC data;

performing model training based on the security identification and the at least one threat indicator feature associated with each of the sample IOC data to obtain the threat detection model.

10. A cyber-threat detection apparatus comprising:

the obtaining module is used for obtaining threat indicator IOC data;

the first processing module is used for carrying out feature extraction processing on the IOC data to obtain at least one threat index feature related to the IOC data;

the second processing module is used for performing malicious rule matching operation based on the at least one threat index characteristic by using a preset threat detection model to obtain a malicious matching score, wherein the malicious matching score is determined according to the number of matched malicious rules and the type of the malicious rules; and the number of the first and second groups,

11. An electronic device, comprising:

one or more processors, and memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-9.

12. A computer-readable storage medium storing computer-executable instructions for implementing the method of any one of claims 1 to 9 when executed.

13. A computer program product comprising computer readable instructions, wherein the computer readable instructions, when executed, are for performing the method of any of claims 1 to 9.