CN115001791A - Attack resource marking method and device - Google Patents

Attack resource marking method and device Download PDF

Info

Publication number
CN115001791A
CN115001791A CN202210592354.4A CN202210592354A CN115001791A CN 115001791 A CN115001791 A CN 115001791A CN 202210592354 A CN202210592354 A CN 202210592354A CN 115001791 A CN115001791 A CN 115001791A
Authority
CN
China
Prior art keywords
attack
entity set
resource
entities
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210592354.4A
Other languages
Chinese (zh)
Other versions
CN115001791B (en
Inventor
鲍青波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202210592354.4A priority Critical patent/CN115001791B/en
Publication of CN115001791A publication Critical patent/CN115001791A/en
Application granted granted Critical
Publication of CN115001791B publication Critical patent/CN115001791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides an attack resource marking method, and relates to the field of big data and artificial intelligence. The method comprises the following steps: acquiring a manual mark attack entity set; performing semi-supervised learning training on the artificially marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set; acquiring a current marked attack entity set according to the artificial marked attack entity set and the recommended attack entity set; performing semi-supervised training on the attack resource marking prediction model according to the current marked attack entity set and the current unmarked attack entity set until a preset condition is met, and taking the attack resource marking prediction model meeting the preset condition as a target attack resource marking prediction model; and outputting the labeling results of all attack entities according to the target attack resource labeling prediction model. By adopting the method, the efficiency of automatically marking the attack resources can be improved.

Description

Attack resource marking method and device
Technical Field
The present disclosure relates to the field of big data and artificial intelligence technologies, and in particular, to an attack resource tagging method and apparatus, an electronic device, and a readable storage medium.
Background
Novel attack means of attack organizations such as APT (Advanced Persistent thread) are endless, the attacks are concealed and disguised through technical means to escape from conventional detection means, and especially the attacks aiming at key infrastructure or the stealing of important industry data are very harmful.
The attack aiming at some large or key targets often has the characteristic of the group attack, and no matter the group attack mode such as APT attack or Lesoh attack and the like, when the group behavior is analyzed, the characteristics of fully-correlated analysis of an attacker infrastructure, an attack sample, IP, a domain name, malicious URL and the like, the similarity or the co-occurrence of the attack mode and the like are discovered. And analyzing the correlation characteristics so as to perform aggregate analysis and annotation on the multiple alarm events to determine the group attack behavior.
In the prior art, at least one characteristic of a target IP is firstly obtained, then a characteristic set of the target IP is generated, and then the characteristic set of the target IP is input into a pre-trained target identification model so as to identify whether the target IP is an attack IP or not. However, when the attack organization is analyzed against unknown attack organizations, because the characteristics are unknown, the attack organization cannot be analyzed by using common automatic analysis means such as similarity characteristics and family characteristics, and thus the attack IP cannot be identified by means of the prior art.
In summary, in the human-computer interaction mode, how to improve the efficiency of automatically labeling attack resources is a problem that needs to be solved at present.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, the present disclosure provides an attack resource tagging method, which solves the problem of low efficiency of tagging attack resources.
In order to achieve the above object, the embodiments of the present disclosure provide the following technical solutions:
in a first aspect, an embodiment of the present disclosure provides an attack resource labeling method, where the method includes:
acquiring an artificial mark attack entity set;
performing semi-supervised learning training on the artificially marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set;
acquiring a current marked attack entity set according to the artificial marked attack entity set and the recommended attack entity set;
performing semi-supervised training on the attack resource marking prediction model according to the current marked attack entity set and the current unmarked attack entity set until a preset condition is met, and taking the attack resource marking prediction model meeting the preset condition as a target attack resource marking prediction model;
and outputting the labeling results of all attack entities according to the target attack resource labeling prediction model.
As an optional implementation manner of the embodiment of the present disclosure, before acquiring the artificially marked attack entity set, the method further includes:
constructing an attack resource association map; the attack entity of the attack resource association map comprises: at least one entity of an internet protocol, IP, a domain name, a unique identification of a malicious sample file, a uniform resource locator, URL.
As an optional implementation manner of the embodiment of the present disclosure, the acquiring a set of artificially marked attack entities includes:
marking two label types for at least two attack entities in the attack resource association map to obtain a manual marking attack entity set;
wherein the tag type of the attack entity comprises: malicious attacking entities and non-malicious attacking entities.
As an optional implementation manner of the embodiment of the present disclosure, the performing semi-supervised learning training on the artificially labeled attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set includes:
performing semi-supervised learning training on the artificially marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to generate an attack resource marking prediction model;
acquiring the label type of the current unmarked entity and the probability corresponding to the label type of the current unmarked entity according to the attack resource marking prediction model;
and sequencing according to the probability of the label type of the current unmarked entity set to obtain a recommended attack entity set.
As an optional implementation manner of the embodiment of the present disclosure, the performing ranking according to the probability of the tag type of the currently unlabeled entity to obtain a recommended attack entity set includes:
and sequentially sequencing from large to small according to the probability corresponding to the malicious label type of the current unmarked entity to obtain a recommended attack entity set.
As an optional implementation manner of the embodiment of the present disclosure, the method further includes:
and training the attack resource labeling prediction model according to the incidence relation among various attack entities and the attribute information of the various attack entities.
As an optional implementation manner of the embodiment of the present disclosure, the attribute information of the attack entity includes:
geographic information of the IP, C-segment information of the IP, sub-domain names and domain name information.
In a second aspect, an embodiment of the present disclosure provides an attack resource labeling apparatus, including:
the acquisition module is used for acquiring an artificial mark attack entity set;
the recommending module is used for performing semi-supervised learning training on the artificial marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set;
the determining module is used for acquiring the current marked attack entity set according to the artificial marked attack entity set and the recommended attack entity set;
the analysis module is used for carrying out semi-supervised training on the attack resource marking prediction model according to the current marked attack entity set and the current unmarked attack entity set until a preset condition is met, and taking the attack resource marking prediction model meeting the preset condition as a target attack resource marking prediction model;
and the output module is used for outputting the labeling results of all attack entities according to the target attack resource labeling prediction model.
As an optional implementation manner of the embodiment of the present disclosure, the apparatus further includes: a building module, the building module specifically configured to:
constructing an attack resource association map; the attack entity of the attack resource association map comprises: at least one entity of an internet protocol, IP, a domain name, a unique identification of a malicious sample file, a uniform resource locator, URL.
As an optional implementation manner of the embodiment of the present disclosure, the obtaining module is specifically configured to:
marking two label types for at least two attack entities in the attack resource association map to obtain a manual marking attack entity set;
wherein the tag type of the attack entity comprises: malicious attacking entities and non-malicious attacking entities.
As an optional implementation manner of the embodiment of the present disclosure, the recommendation module is specifically configured to:
the generating unit is used for carrying out semi-supervised learning training on the artificial marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to generate an attack resource marking prediction model;
the probability unit is used for acquiring the label type of the current unmarked entity and the probability corresponding to the label type of the current unmarked entity according to the attack resource marking prediction model;
and the sequencing unit is used for sequencing according to the probability of the label type of the current unmarked entity set to obtain the recommended attack entity set.
As an optional implementation manner of the embodiment of the present disclosure, the sorting unit is specifically configured to:
and sequentially sequencing from large to small according to the probability corresponding to the malicious label type of the current unmarked entity to obtain a recommended attack entity set.
As an optional implementation manner of the embodiment of the present disclosure, the apparatus further includes a training module, where the training module is configured to:
and training the attack resource labeling prediction model according to the incidence relation among various attack entities and the attribute information of the various attack entities.
As an optional implementation manner of the embodiment of the present disclosure, the attribute information of the attack entity includes: geographic information of the IP, C-segment information of the IP, sub-domain names and domain name information.
In a third aspect, an embodiment of the present disclosure provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the attack resource tagging method according to the first aspect or any implementation manner of the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the attack resource tagging method described in the first aspect or any implementation manner of the first aspect.
The attack resource marking method comprises the steps of firstly obtaining an artificial mark attack entity set, then carrying out semi-supervised learning training on the artificial mark entity set and a current unmarked attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set, then obtaining the current mark attack entity set according to the artificial mark entity set and the recommended attack entity set, then carrying out semi-supervised training on an attack resource marking model according to the current marked attack entity set and the current unmarked attack entity set until a preset condition is met, taking an attack resource marking prediction model meeting the preset condition as a target attack resource marking prediction model, and finally outputting marking results of all attack entities according to the target attack resource marking prediction model. According to the method, only a small amount of manually marked attack entities are utilized, and based on a semi-supervised machine learning method, through repeated iteration, man-machine cooperative analysis and marking are performed on other unmarked attack entities, so that the advantages of manual analysis and automatic analysis can be combined in the iteration process of obtaining the marking result, the efficiency of marking attack resources is improved, and the accuracy of the marking result is also improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the embodiments or technical solutions in the prior art description will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a flowchart illustrating an attack resource tagging method according to an embodiment;
FIG. 2 is a diagram of an attack resource association graph of an attack resource tagging method in an embodiment;
FIG. 3 is a flowchart illustrating an attack resource tagging method in another embodiment;
FIG. 4 is a schematic structural diagram of an attack resource tagging apparatus in an embodiment;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
Relational terms such as "first" and "second," and the like, may be used throughout the description and claims of the present disclosure to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
In the embodiments of the present disclosure, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described as "exemplary" or "e.g.," in an embodiment of the present disclosure is not to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion. Further, in the description of the embodiments of the present disclosure, the meaning of "a plurality" means two or more unless otherwise specified.
The overall concept of the embodiment of the disclosure is as follows: when the man-machine interaction analysis is carried out on the attack resource association graph, a small number of manually labeled entities and labels are combined, automatic learning and labeling are carried out on other unlabeled attack resource entities based on a semi-supervised machine learning method, and the efficiency of carrying out automatic labeling on the attack resources is improved.
In one embodiment, as shown in fig. 1, a method for tagging attack resources is provided, which includes the following steps:
and S11, acquiring the artificial mark attack entity set.
Specifically, before acquiring the artificially marked attack entity set, an attack resource association map is constructed. For the attack resources used by the group attack, the analysis aims to label the attack resource entity as a 'malicious' type label, and the analysis process is completed when the attack resource entities of all 'malicious' labels of the group attack behavior are found. However, when the attack resource association graph is initially formed, each entity on the graph is not labeled maliciously or maliciously, and a manual analysis is needed to judge and apply a label.
In some embodiments, the implementation of step S11 (obtaining the set of artificially tagged attack entities) may include:
and marking two label types for at least two attack entities in the attack resource association map to obtain a manual marking attack entity set.
Wherein, the label type of the attack entity comprises: malicious attacking entities and non-malicious attacking entities. For example, 1 represents "malicious", and 0 represents "non-malicious".
Specifically, assuming that N attacking entities are present on the attack resource association map, 2 or more of the attacking entities are randomly selected and manually analyzed, and the manual labeling process is completed until the corresponding entities are labeled with the labels of "1" and "0".
Illustratively, referring to the association diagram shown in fig. 2 as an example, there are 7 attack resource entities and corresponding association relations, where two entities have been labeled manually, where domain1 is labeled as "1" and "1" represents a "malicious attack entity"; IP1 is labeled as "0" which represents a "non-malicious attacking entity".
And S12, performing semi-supervised learning training on the artificial marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set.
Specifically, a large number of unmarked attack entities used in semi-supervised learning and marked attack entities are simultaneously used to perform marking type division on the currently unmarked attack entities, that is, an attack resource marking prediction model is constructed in the embodiment of the present disclosure, joint training is performed using the marked attack entities and the unmarked attack entities, recommendation is performed according to a division result of the unmarked attack entities, for example, for the attack entities divided into "malicious" categories, the attribute information of the attack entities is converted into corresponding probability values in combination with the attribute information of the attack entities, recommendation is performed according to the size of the probability values, and a recommended attack entity set is obtained.
Since there are usually many attacking entities on an attack resource association graph, there are hundreds or even thousands or more, it is obviously impractical to manually label all the entities, and the efficiency is very low. Therefore, the efficiency of the attack entity labeling can be improved by combining a small number of manually labeled entities in the last step to perform automatic learning labeling.
S13, obtaining the current marked attack entity set according to the artificial marked attack entity set and the recommended attack entity set.
Specifically, for example, the manually marked attack entity set includes 2 attack entities, and the recommended attack entity set includes 5 attack entities, so that the current marked attack entity set includes 7 attack entities.
And S14, performing semi-supervised training on the attack resource labeling prediction model according to the current marked attack entity set and the current unmarked attack entity set until a preset condition is met, and taking the attack resource labeling prediction model meeting the preset condition as a target attack resource labeling prediction model.
In some embodiments, assuming that 2 attack entities are artificially labeled, one of the attack entities is labeled as "malicious" and the other is labeled as "non-malicious", then performing semi-supervised learning training by using the graph neural network, the labeled attack entities and the unlabeled attack entities to obtain the division results of the remaining unlabeled entities, and obtaining 5 recommended attack entities according to the magnitude sequence of the probability values of the attack entities classified as "malicious" types in the unlabeled entities, wherein the 5 attack entities already have respective labels, and an expert analyzes and confirms the 5 attack entities to judge whether the labels are labeled correctly. If the label is correct, the other attack entities are predicted again by using the marked 7 attack entities, theoretically, the input label data is more, the classification labeling result is more accurate, then 5 recommended attack entities are obtained according to probability sequencing, manual confirmation is carried out, then the other attack entities are predicted by using the above 12 marked attack entities, until the expert thinks that the iteration is continued, the effect is meaningless, at the moment, the prediction model can be understood to be trained to achieve the optimal effect, and the iteration is stopped. In addition, it should be noted that, if the prediction result of the attack entity in the recommended attack entity set is not accurate, the expert may manually correct the prediction result.
And S15, outputting the labeling results of all attack entities according to the target attack resource labeling prediction model.
Specifically, through a plurality of rounds of interactive iterative processes, the attack entities determined by manual analysis are gradually clear, and a final analysis result can be formed, namely, a target attack resource labeling prediction model is formed, labeling results of all attack resources are output, the labeling results are fed back or stored, and the analysis stage is ended.
The attack resource labeling method provided by the embodiment of the disclosure includes the steps of firstly obtaining an artificial mark attack entity set, then performing semi-supervised learning training on the artificial mark entity set and a current unmarked attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set, then obtaining the current mark attack entity set according to the artificial mark entity set and the recommended attack entity set, then performing semi-supervised training on an attack resource labeling model according to the current marked attack entity set and the current unmarked attack entity set until preset conditions are met, taking an attack resource labeling prediction model meeting the preset conditions as a target attack resource labeling prediction model, and finally outputting labeling results of all attack entities according to the target attack resource labeling prediction model. According to the method, only a small amount of manually marked attack entities are utilized, and based on a semi-supervised machine learning method, through repeated iteration, man-machine cooperative analysis and marking are performed on other unmarked attack entities, so that the advantages of manual analysis and automatic analysis can be combined in the iteration process of obtaining the marking result, the efficiency of marking attack resources is improved, and the accuracy of the marking result is also improved.
In some embodiments, referring to fig. 3, on the basis of fig. 1, before performing step S11 (acquiring the artificial mark attack entity set), the following steps may also be performed:
and S10, constructing an attack resource association graph.
Wherein, the attack entity of the attack resource association map comprises: at least one entity of an internet protocol, IP, a domain name, a unique identification of a malicious sample file, a uniform resource locator, URL.
Specifically, when carrying out full-correlation analysis of the attack resource entities, the correlation relationship is usually displayed in the form of a map, the attack resource entities with the group attack behavior are displayed on the map, the generation of the map is usually constructed by event alarm data reported by a probe, and the map is combined with external information or knowledge data to correlate other attack resource entities.
In addition, the unique identification of a malicious sample file may be an MD5(Message-Digest Algorithm 5, fifth edition) value, and MD5 is a hash function widely used in the field of computer security to provide integrity protection for messages. The MD5 value can be understood as the ID of a file, its value being unique. If the file has been modified, for example, by an embedded virus, trojan horse, etc., its MD5 value will change. For example, the file may be a video file, an audio file, an image sequence frame file, or any other type of file.
In some embodiments, the implementation manner of step S12 (performing semi-supervised learning training on the artificial labeled attack entity set and the current unlabeled attack entity set according to the graph neural network algorithm to obtain the recommended attack entity set) may include:
a. and performing semi-supervised learning training on the artificially marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to generate an attack resource marking prediction model.
b. And acquiring the label type of the current unmarked entity and the probability corresponding to the label type of the current unmarked entity according to the attack resource marking prediction model.
c. And sequencing according to the probability of the label type of the current unmarked entity set to obtain a recommended attack entity set.
Wherein recommending the set of attacking entities comprises: and presetting a number of recommended attack resource entities.
In some embodiments, the implementation manner of step b (obtaining the recommended attack entity set by sorting according to the probability of the tag type of the current unlabeled entity set) may include:
and sequentially sequencing from large to small according to the probability corresponding to the malicious label type of the current unmarked entity to obtain a recommended attack entity set.
Illustratively, the fully-connected layer outputs predicted probability values of a plurality of attacking entities through functional transformation, for example, IP1, 90% of which are malicious attacking entities; domain1, 95% being a malicious attacking entity; domain2, 40% being a malicious attacking entity; IP2, 88% are malicious attacking entities; IP3, 80% are malicious attacking entities; URL1, 10% being a malicious attacking entity; URL2, 85% is a malicious attacking entity. And domain1 & gtIP 1 & gtIP 2 & gtURL 2 & gtIP 3 & gtdomain 2 & gtURL 1 which are ordered from large to small according to probability values. Therefore, assuming that the preset number is 5, the set of recommended attack entities includes 5 recommended attack entities, which are: domain1, IP1, IP2, URL2, IP 3.
In some embodiments, the following steps may also be performed:
and training the attack resource labeling prediction model according to the incidence relation among various attack entities and the attribute information of the various attack entities.
Specifically, in the embodiment of the present disclosure, a Graph neural Network algorithm (GCN) is used for learning and labeling. The GCN is a first-order local approximation of spectrogram convolution and is a multilayer graph convolution neural network, each convolution layer only processes first-order neighborhood information, and multi-order neighborhood information transmission can be achieved by superposing a plurality of convolution layers. The attack resource association map comprises a plurality of attack resource entities and corresponding association relations. Such as DNS resolution relationship between IP and domain, inclusion relationship between domain and URL, co-C-segment relationship between IP and IP, etc. By operating the GCN model and setting the corresponding embedded layer, the attack entity can be divided, and the utilized attribute information comprises the node attack resource type attribute and the topological characteristic on the associated map.
In some embodiments, the attribute information of the attacking entity may include:
geographic information of the IP, C-segment information of the IP, sub-domain names and domain name information.
Wherein, the geographic information of IP includes: country, region, city, zip code, latitude, longitude, etc. C section information of IP: for example, if there is an IP of 199.87.232.11, then 199 is the number above paragraph A, 87 is the number above paragraph B, 232 is the number above paragraph C, and 11 is the number above paragraph D. And thirdly, when the sub domain name is used for computer data transmission, the electronic direction of the computer is identified. Com, and www.baidu.com are the sub-domain names to which the domain name corresponds, and www is the corresponding host header, for example. The domain name information is the transmission protocol of the IP used for inquiring the domain name, the owner and other information. The domain name information of different domain name suffixes needs to be inquired in different domain name databases, and the domain name information of each domain name or IP is stored by a corresponding management organization. For example, the domain name information is a database for inquiring whether a domain name has been registered, and details of the registered domain name.
Specifically, the geographic information of the IP, the C section information of the IP, the sub domain name and the domain name information participate in the deep learning process through a specific embedded layer.
The attack resource marking method provided by the embodiment of the disclosure is used for firstly obtaining the artificial marked attack entity set, then carrying out semi-supervised learning training on the artificial marked entity set and the current unmarked attack entity set according to the graph neural network algorithm to obtain the recommended attack entity set, then obtaining the current marked attack entity set according to the artificial marked entity set and the recommended attack entity set, then carrying out semi-supervised training on the attack resource marking model according to the current marked attack entity set and the current unmarked attack entity set until a preset condition is met, taking the attack resource marking prediction model meeting the preset condition as a target attack resource marking prediction model, and finally outputting marking results of all attack entities according to the target attack resource marking prediction model. According to the method, only a small amount of manually marked attack entities are utilized, and based on a semi-supervised machine learning method, through repeated iteration, man-machine cooperative analysis and marking are performed on other unmarked attack entities, so that the advantages of manual analysis and automatic analysis can be combined in the iteration process of obtaining the marking result, the efficiency of marking attack resources is improved, and the accuracy of the marking result is also improved.
In one embodiment, as shown in fig. 4, there is provided an attack resource labeling apparatus 400, including:
an obtaining module 410, configured to obtain an entity set of artificial mark attack;
the recommending module 420 is configured to perform semi-supervised learning training on the artificially labeled attack entity set and the current unlabeled attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set;
a determining module 430, configured to obtain a current tagged attack entity set according to the artificial tagged attack entity set and the recommended attack entity set;
the analysis module 440 is configured to perform semi-supervised training on the attack resource tagging prediction model according to the current tagged attack entity set and the current untagged attack entity set until a preset condition is met, and use the attack resource tagging prediction model meeting the preset condition as a target attack resource tagging prediction model;
and the output module 450 is configured to output the labeling results of all attack entities according to the target attack resource labeling prediction model.
As an optional implementation manner of the embodiment of the present disclosure, the apparatus further includes: a building module, the building module specifically configured to:
constructing an attack resource association map; the attack entity of the attack resource association map comprises: at least one entity of an internet protocol, IP, a domain name, a unique identification of a malicious sample file, a uniform resource locator, URL.
As an optional implementation manner of the embodiment of the present disclosure, the obtaining module 410 is specifically configured to:
marking two label types for at least two attack entities in the attack resource association map to obtain a manual marking attack entity set;
wherein the tag type of the attack entity comprises: malicious attacking entities and non-malicious attacking entities.
As an optional implementation manner of the embodiment of the present disclosure, the recommending module 420 is specifically configured to:
the generating unit is used for carrying out semi-supervised learning training on the artificial marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to generate an attack resource marking prediction model;
the probability unit is used for acquiring the label type of the current unmarked entity and the probability corresponding to the label type of the current unmarked entity according to the attack resource marking prediction model;
and the sequencing unit is used for sequencing according to the probability of the label type of the current unmarked entity set to obtain the recommended attack entity set.
As an optional implementation manner of the embodiment of the present disclosure, the sorting unit is specifically configured to:
and sequentially sequencing from large to small according to the probability corresponding to the malicious label type of the current unmarked entity to obtain a recommended attack entity set.
As an optional implementation manner of the embodiment of the present disclosure, the apparatus further includes a training module, where the training module is configured to:
and training the attack resource labeling prediction model according to the incidence relation among various attack entities and the attribute information of the various attack entities.
As an optional implementation manner of the embodiment of the present disclosure, the attribute information of the attack entity includes: geographic information of the IP, C-segment information of the IP, sub-domain names and domain name information.
The attack resource marking device provided by the embodiment of the disclosure is applied, firstly, the artificial marked attack entity set is obtained, then, the artificial marked entity set and the current unmarked attack entity set are subjected to semi-supervised learning training according to the graph neural network algorithm, the recommended attack entity set is obtained, then, the current marked attack entity set is obtained according to the artificial marked entity set and the recommended attack entity set, then, the attack resource marking model is subjected to semi-supervised training according to the current marked attack entity set and the current unmarked attack entity set until the preset condition is met, the attack resource marking prediction model meeting the preset condition is taken as a target attack resource marking prediction model, and finally, the marking results of all attack entities are output according to the target attack resource marking prediction model. According to the method, only a small amount of manually marked attack entities are utilized, based on a semi-supervised machine learning method, and through multiple iterations, man-machine collaborative analysis and marking are carried out on other unmarked attack entities, so that the advantages of manual analysis and automatic analysis can be combined in the iteration process of obtaining the marking result, the efficiency of marking attack resources is improved, and the accuracy of the marking result is improved.
For the specific limitations of the attack resource tagging device, reference may be made to the above limitations on the attack resource tagging method, which is not described herein again. All or part of the modules in the attack resource marking device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor of the electronic device, and can also be stored in the processor of the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.
The embodiment of the disclosure also provides an electronic device, and fig. 5 is a schematic structural diagram of the electronic device provided by the embodiment of the disclosure. As shown in fig. 5, the electronic device provided in this embodiment includes: a memory 51 and a processor 52, the memory 51 being for storing computer programs; the processor 52 is configured to execute the steps executed by any embodiment of the attack resource tagging method provided by the above method embodiments when the computer program is called. The electronic equipment comprises a processor, a memory, a communication interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The computer program is executed by a processor to implement an attack resource tagging method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular electronic devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the attack resource tagging apparatus provided by the present disclosure may be implemented in the form of a computer, and a computer program may be run on an electronic device as shown in fig. 5. The memory of the electronic device may store various program modules constituting the attack resource tagging apparatus of the electronic device, such as the obtaining module 410, the recommending module 420, the determining module 430, the analyzing module 440, and the outputting module 450 shown in fig. 4. The computer program constituted by the program modules causes the processor to execute the steps of the attack resource tagging method for the electronic device of the embodiments of the present disclosure described in the present specification.
The embodiment of the disclosure also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for tagging attack resources provided by the above method embodiment is implemented.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium.
The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer readable media include both permanent and non-permanent, removable and non-removable storage media. Storage media may implement information storage by any method or technology, and the information may be computer-readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transmyedia) such as modulated data signals and carrier waves.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description is only for the purpose of describing particular embodiments of the present disclosure, so as to enable those skilled in the art to understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An attack resource labeling method is characterized by comprising the following steps:
acquiring a manual mark attack entity set;
performing semi-supervised learning training on the artificially marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set;
acquiring a current marked attack entity set according to the artificial marked attack entity set and the recommended attack entity set;
performing semi-supervised training on the attack resource marking prediction model according to the current marked attack entity set and the current unmarked attack entity set until a preset condition is met, and taking the attack resource marking prediction model meeting the preset condition as a target attack resource marking prediction model;
and outputting the labeling results of all attack entities according to the target attack resource labeling prediction model.
2. The method of claim 1, wherein prior to obtaining the set of artificially marked attack entities, the method further comprises:
constructing an attack resource association map; the attack entity of the attack resource association map comprises: at least one entity of an internet protocol, IP, a domain name, a unique identification of a malicious sample file, a uniform resource locator, URL.
3. The method of claim 2, wherein obtaining the set of artificially marked attack entities comprises:
marking two label types for at least two attack entities in the attack resource association map to obtain a manual marking attack entity set;
wherein the tag type of the attack entity comprises: malicious attacking entities and non-malicious attacking entities.
4. The method of claim 1, wherein the performing semi-supervised learning training on the set of artificially labeled attack entities and the set of currently unlabeled attack entities according to a graph neural network algorithm to obtain a set of recommended attack entities comprises:
performing semi-supervised learning training on the artificially marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to generate an attack resource marking prediction model;
acquiring the label type of the current unmarked entity and the probability corresponding to the label type of the current unmarked entity according to the attack resource marking prediction model;
and sequencing according to the probability of the label type of the current unmarked entity set to obtain a recommended attack entity set.
5. The method of claim 4, wherein the obtaining a set of recommended attack entities by ranking according to the probability of the tag type of the currently unlabeled entity comprises:
and sequentially sequencing from large to small according to the probability corresponding to the malicious label type of the current unmarked entity to obtain a recommended attack entity set.
6. The method of claim 1, further comprising:
and training the attack resource labeling prediction model according to the incidence relation among various attack entities and the attribute information of the various attack entities.
7. The method of claim 6, wherein the attribute information of the attacking entity comprises:
geographic information of the IP, C-segment information of the IP, sub-domain names and domain name information.
8. An attack resource labeling apparatus, comprising:
the acquisition module is used for acquiring the artificial mark attack entity set;
the recommending module is used for performing semi-supervised learning training on the artificial marked attack entity set and the current unmarked attack entity set according to a graph neural network algorithm to obtain a recommended attack entity set;
the determining module is used for acquiring the current marked attack entity set according to the artificial marked attack entity set and the recommended attack entity set;
the analysis module is used for carrying out semi-supervised training on the attack resource marking prediction model according to the current marked attack entity set and the current unmarked attack entity set until a preset condition is met, and taking the attack resource marking prediction model meeting the preset condition as a target attack resource marking prediction model;
and the output module is used for outputting the labeling results of all attack entities according to the target attack resource labeling prediction model.
9. An electronic device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the attack resource tagging method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the attack resource tagging method of any one of claims 1 to 7.
CN202210592354.4A 2022-05-27 2022-05-27 Attack resource labeling method and device Active CN115001791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210592354.4A CN115001791B (en) 2022-05-27 2022-05-27 Attack resource labeling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210592354.4A CN115001791B (en) 2022-05-27 2022-05-27 Attack resource labeling method and device

Publications (2)

Publication Number Publication Date
CN115001791A true CN115001791A (en) 2022-09-02
CN115001791B CN115001791B (en) 2024-02-06

Family

ID=83028951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210592354.4A Active CN115001791B (en) 2022-05-27 2022-05-27 Attack resource labeling method and device

Country Status (1)

Country Link
CN (1) CN115001791B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598813A (en) * 2014-12-09 2015-05-06 西安电子科技大学 Computer intrusion detection method based on integrated study and semi-supervised SVM
CN108881294A (en) * 2018-07-23 2018-11-23 杭州安恒信息技术股份有限公司 Attack source IP portrait generation method and device based on attack
CN109818929A (en) * 2018-12-26 2019-05-28 天翼电子商务有限公司 Based on the unknown threat cognitive method actively from step study, system, storage medium, terminal
CN110933102A (en) * 2019-12-11 2020-03-27 支付宝(杭州)信息技术有限公司 Abnormal flow detection model training method and device based on semi-supervised learning
CN112543168A (en) * 2019-09-20 2021-03-23 中移(苏州)软件技术有限公司 Network attack detection method, device, server and storage medium
CN112990295A (en) * 2021-03-10 2021-06-18 中国互联网络信息中心 Semi-supervised graph representation learning method and device based on migration learning and deep learning fusion
CN113206824A (en) * 2021-03-23 2021-08-03 中国科学院信息工程研究所 Dynamic network abnormal attack detection method and device, electronic equipment and storage medium
CN113554094A (en) * 2021-07-23 2021-10-26 清华大学 Network anomaly detection method and device, electronic equipment and storage medium
US20210400059A1 (en) * 2020-06-22 2021-12-23 Wangsu Science & Technology Co., Ltd. Network attack detection method, system and device based on graph neural network
CN114139604A (en) * 2021-11-04 2022-03-04 杭州涿溪脑与智能研究所 Online learning-based electric power industrial control attack monitoring method and device
CN114168938A (en) * 2021-10-29 2022-03-11 四川大学 Semi-supervised SQL injection attack detection method based on few abnormal labels
WO2022063274A1 (en) * 2020-09-27 2022-03-31 中兴通讯股份有限公司 Data annotation method and system, and electronic device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598813A (en) * 2014-12-09 2015-05-06 西安电子科技大学 Computer intrusion detection method based on integrated study and semi-supervised SVM
CN108881294A (en) * 2018-07-23 2018-11-23 杭州安恒信息技术股份有限公司 Attack source IP portrait generation method and device based on attack
CN109818929A (en) * 2018-12-26 2019-05-28 天翼电子商务有限公司 Based on the unknown threat cognitive method actively from step study, system, storage medium, terminal
CN112543168A (en) * 2019-09-20 2021-03-23 中移(苏州)软件技术有限公司 Network attack detection method, device, server and storage medium
CN114039794A (en) * 2019-12-11 2022-02-11 支付宝(杭州)信息技术有限公司 Abnormal flow detection model training method and device based on semi-supervised learning
CN110933102A (en) * 2019-12-11 2020-03-27 支付宝(杭州)信息技术有限公司 Abnormal flow detection model training method and device based on semi-supervised learning
US20210400059A1 (en) * 2020-06-22 2021-12-23 Wangsu Science & Technology Co., Ltd. Network attack detection method, system and device based on graph neural network
WO2022063274A1 (en) * 2020-09-27 2022-03-31 中兴通讯股份有限公司 Data annotation method and system, and electronic device
CN112990295A (en) * 2021-03-10 2021-06-18 中国互联网络信息中心 Semi-supervised graph representation learning method and device based on migration learning and deep learning fusion
CN113206824A (en) * 2021-03-23 2021-08-03 中国科学院信息工程研究所 Dynamic network abnormal attack detection method and device, electronic equipment and storage medium
CN113554094A (en) * 2021-07-23 2021-10-26 清华大学 Network anomaly detection method and device, electronic equipment and storage medium
CN114168938A (en) * 2021-10-29 2022-03-11 四川大学 Semi-supervised SQL injection attack detection method based on few abnormal labels
CN114139604A (en) * 2021-11-04 2022-03-04 杭州涿溪脑与智能研究所 Online learning-based electric power industrial control attack monitoring method and device

Also Published As

Publication number Publication date
CN115001791B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
CN112104677B (en) Controlled host detection method and device based on knowledge graph
Gupta et al. Layered approach using conditional random fields for intrusion detection
US20200349430A1 (en) System and method for predicting domain reputation
CN112131882A (en) Multi-source heterogeneous network security knowledge graph construction method and device
CN108446559B (en) APT organization identification method and device
US20120143844A1 (en) Multi-level coverage for crawling selection
US20180131708A1 (en) Identifying Fraudulent and Malicious Websites, Domain and Sub-domain Names
CN111104579A (en) Identification method and device for public network assets and storage medium
CN113242236A (en) Method for constructing network entity threat map
CN114884703A (en) Advanced persistent threat detection method based on threat intelligence and message delivery model
CN110929185A (en) Website directory detection method and device, computer equipment and computer storage medium
CN111368163A (en) Crawler data identification method, system and equipment
CN115001791B (en) Attack resource labeling method and device
CN117614644A (en) Malicious website identification method, electronic equipment and storage medium
CN117614931A (en) Method and device for quickly finding and analyzing black ash produced domain name based on domain name pool
CN115694994A (en) Threat analysis method and device based on multi-level information fusion
US11693960B2 (en) System and method for detecting leaked documents on a computer network
CN115391568A (en) Entity classification method, system, terminal and storage medium based on knowledge graph
Ridzuan et al. A Thematic Review on Data Quality Challenges and Dimension in the Era of Big Data
CN112104656B (en) Network threat data acquisition method, device, equipment and medium
CN114528552A (en) Security event correlation method based on vulnerability and related equipment
WO2020170911A1 (en) Estimation device, estimation method, and program
Onyekwelu et al. Pre-processing of university webserver log files for intrusion detection
Uzlov et al. Web-based protected geoinformation system of criminal analysis (RICAS) for analytical support for crimes investigation
CN116527548B (en) IPv 6-based dynamic test method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant