CN110166289B - Method and device for identifying target information assets - Google Patents
Method and device for identifying target information assets Download PDFInfo
- Publication number
- CN110166289B CN110166289B CN201910404921.7A CN201910404921A CN110166289B CN 110166289 B CN110166289 B CN 110166289B CN 201910404921 A CN201910404921 A CN 201910404921A CN 110166289 B CN110166289 B CN 110166289B
- Authority
- CN
- China
- Prior art keywords
- node
- target
- nodes
- degree
- weighted graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 239000011159 matrix material Substances 0.000 claims description 44
- 238000004590 computer program Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000013106 supervised machine learning method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a method and a device for identifying target information assets, wherein the method comprises the following steps: acquiring a flow log of an information asset to be identified; extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log, and generating a directed weighted graph; optimizing a preset recognition model according to the directed weighted graph, and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes; and acquiring target nodes corresponding to the scores larger than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets. The device performs the above method. The method and the device provided by the embodiment of the invention can accurately and efficiently identify the target information assets.
Description
Technical Field
The invention relates to the technical field of information asset processing, in particular to a method and a device for identifying target information assets.
Background
In recent years, with the rapid development of computer technology, various information assets in an enterprise, such as network devices and other devices that frequently interact with each other through a network, have become important assets in the enterprise. With the continuous growth of enterprises and organization businesses, difficulties are brought to information asset management work, a large amount of non-master information assets and zombie information assets are easily generated, and great hidden dangers are brought to enterprise and organization safety. In this context, it is important to identify the target information assets (i.e., information assets accessed via the network and information assets accessed by other information assets more frequently) in the enterprise in time.
The method in the prior art mainly utilizes a manual entry, statistics or supervised machine learning method to identify the target information assets. The manual recording mode is as follows: inputting the information of the target information assets into the system by means of manpower and equipment; the statistical method mainly comprises the following steps: counting some indexes of the target information assets, and then carrying out comprehensive judgment to identify the target information assets; the supervised machine learning method mainly comprises the following steps: and training the model by extracting the characteristics, and further identifying the target information assets. However, each of the above methods has certain drawbacks, such as: 1) although the manual input mode is accurate and comprehensive, time and labor are consumed for enterprises; 2) the error introduced by the method based on statistics is large, and the accuracy is low; 3) supervised machine learning approaches require a large amount of labeled data to drive, often at excessive cost.
Therefore, how to avoid the above-mentioned defects and accurately and efficiently identify the target information assets becomes a problem to be solved.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a method and a device for identifying target information assets.
The embodiment of the invention provides a method for identifying target information assets, which comprises the following steps:
acquiring a flow log of an information asset to be identified;
extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log, and generating a directed weighted graph;
optimizing a preset recognition model according to the directed weighted graph, and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes;
and acquiring target nodes corresponding to the scores larger than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
The embodiment of the invention provides a device for identifying target information assets, which comprises:
the acquisition unit is used for acquiring a flow log of the information asset to be identified;
the generating unit is used for extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log and generating a directed weighted graph;
the optimization unit is used for optimizing a preset recognition model according to the directed weighted graph and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes;
and the identification unit is used for acquiring target nodes corresponding to the scores greater than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
An embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein,
the processor, when executing the program, implements the method steps of:
acquiring a flow log of information assets to be identified;
extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log, and generating a directed weighted graph;
optimizing a preset recognition model according to the directed weighted graph, and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes;
and acquiring target nodes corresponding to the scores larger than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
An embodiment of the invention provides a non-transitory computer readable storage medium having a computer program stored thereon, which when executed by a processor implements the following method steps:
acquiring a flow log of an information asset to be identified;
extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log, and generating a directed weighted graph;
optimizing a preset recognition model according to the directed weighted graph, and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes;
and acquiring target nodes corresponding to the scores larger than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
According to the method and the device for identifying the target information assets, provided by the embodiment of the invention, the data are extracted from the flow logs, the directed weighted graph capable of reflecting the relation among the data is generated, the optimized preset identification model is obtained according to the directed weighted graph, the output result of the model is used as the score of each node, the target node corresponding to the score larger than the preset threshold value is obtained, the associated target nodes associated with all the target nodes are obtained according to the directed weighted graph, and all the target nodes and the information assets to be identified corresponding to the associated target nodes are determined as the target information assets, so that the target information assets can be identified accurately and efficiently.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of a method of identifying a target information asset of the present invention;
FIG. 2 is a schematic diagram of a directed weighted graph according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of an apparatus for identifying a target information asset according to the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of an embodiment of a method for identifying a target information asset, and as shown in fig. 1, the embodiment of the present invention provides a method for identifying a target information asset, including the following steps:
s101: and acquiring a flow log of the information asset to be identified.
Specifically, the device obtains a flow log of the information asset to be identified. To facilitate the implementation of the embodiments of the present invention, a modular implementation may be employed. Four functional modules may be included: the system comprises a flow data processor, an asset identifier, an important asset discriminator and an association analyzer.
The flow data processor is described in detail as follows:
1. flow data processor
The flow data processor mainly comprises two parts: a flow data collector and a flow data preprocessor.
1.1. Flow data acquisition device
The method has the main function of collecting the flow logs on an HDFS (Hadoop distributed file system). Taking the TCP traffic log as an example, the TCP traffic log format usually includes at least the following 14 fields, as shown in table 1:
TABLE 1 TCP traffic log data field
1.2. Flow data preprocessor
The method has the main function of preprocessing the original flow log on the HDFS, and if an original flow log data sample xi exists, the main preprocessing steps comprise:
a) and decomposing fields of the original flow log xi, extracting key fields sip and dip, and forming a new data sample xj ═ sip, dip.
b) And (5) screening xj, and selecting sip and dip as data samples xz of the intranet IP.
c) Counting the data samples xz by using a programming model mapredactor, counting the number of times that sip visits dip, and forming a new data sample xk ═ sip, dip, count, such as: xk is [192.169.12.10,172.18.30.15,20 ].
S102: and extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log, and generating a directed weighted graph.
Specifically, the device extracts a source IP, a destination IP, and the statistical number of times the source IP accesses the destination IP from the traffic log, and generates a directed weighted graph. The nodes of the directed weighted graph represent each source IP or each destination IP, and the directed edges of the directed weighted graph represent the statistical times of the source IP accessing the destination IP; the directed weighted graph contains background nodes gd.
The asset identifier is described in detail as follows:
2. asset identifier
The asset identifier essentially comprises two parts: and the Graph generator and the asset evaluator finish the identification of the information assets of the enterprise intranet.
Graph generator
The Graph generator is mainly used for generating a directed weighted Graph, namely an intranet asset Graph, according to the preprocessed flow log data (specifically, a source IP, a destination IP and the statistical times of the source IP accessing the destination IP, wherein the source IP corresponds to the sip, the destination IP corresponds to the dip, and the statistical times correspond to the count). Assume that the pre-processed traffic log data x ═ x1, x2, x3, x4], where x 1-x 4 are generated by the traffic data handler module, respectively, as shown in table 2 (dip, sip fields relate to enterprise privacy and are therefore denoted by characters):
TABLE 2 post-preprocessing traffic log data fields
Traffic log data samples | sip | dip | count |
X1 | a | b | 30 |
X2 | a | c | 10 |
X3 | b | c | 5 |
X4 | c | a | 15 |
The method mainly comprises the following steps:
and taking the count fields of the flow log data samples x 1-x 4 as edge weights weight, and sip and dip as nodes of the directed weighted graph. The following LeaderRank or PageRank algorithm may be adopted to calculate the node and edge weight values, and it should be noted that: the directed edge of the directed weighted graph represents the numerical value of the statistic times through the edge weight, and the direction of the directed edge represents the source IP accessing the destination IP.
In order to avoid the appearance of a non-strong connected graph and thus cause the directed weighted graph to be trapped in an isolated node, a background node gd (ground node) may be introduced to construct a matrix representing the adjacency relation of each node, which may be represented as follows with reference to table 2:
fig. 2 is a schematic diagram of a directed weighted graph according to an embodiment of the present invention, and the directed weighted graph is generated according to the matrix, as shown in fig. 2. Nodes a, b, c, and gd in fig. 2 are nodes of the matrix, and directed edges indicate that nonzero values, i.e., edge weights, exist between the nodes in the matrix.
S103: optimizing a preset recognition model according to the directed weighted graph, and taking an output result of the optimized preset recognition model as the score of each node; the score represents how often the node visits and is visited by other nodes.
Specifically, the device optimizes a preset recognition model according to the directed weighted graph, and takes an output result of the optimized preset recognition model as a score of each node; the score represents how often the node visits and is visited by other nodes. With reference to the above description, the following is presented for an asset assessor in an asset identifier:
2.2. asset evaluator
And (4) an asset evaluator (corresponding to the optimized preset identification model), wherein the model needs to meet the preset termination condition of the iterative update. The step of optimizing the preset recognition model may be as follows:
(a) and the initialization part is mainly used for initializing all nodes in the directed weighted graph so as to obtain the initialization scores of all the nodes and constructing an initialization matrix containing all the initialization scores. The initialization score of the gd node is 0, and the initialization scores of other nodes are 1, such as:
(b) and the parameter updating part is used for updating the initialization matrix according to the edge weight of the target node of the directional target as the directional edge, the out-degree of the source node of the starting point target as the directional edge and the initialization score of each source node connected with each target node. Specifically, the updating can be performed according to the following formula:
wherein, scoreiScore is the score value (score) of node ijIs the score value (score) of node j, weightj,iEdge weight pointing to node i for node j, outdegreejFor the out degree of the node j, N is the number of all nodes in the directed weighted graph (including the background node gd) pointing to the node i, that is, the node i is the destination node of the directed edge as the directed target, and the node j is the source node of the start point target of the directed edge. Referring to fig. 2, a description will be given by taking node i as a b node (that is, a destination node as a directing target of a directed edge is a b node) as an example: as shown in fig. 2, the source nodes of the starting point target of the directed edge pointing to the b node are node a and background node gd, the corresponding edge weights are 30 and 1, respectively, the degree of departure for node a is 41 (i.e., 30+1+10), and the degree of departure for background node gd is 3 (i.e., 1+1+ 1). The initialization score of the node a during the initial calculation is 1, and the initialization score of the background node gd during the initial calculation is 0, referring to the above formula:
the score of node b calculated in the first iteration is 1 × (30/41) +0 × (1/3) ═ 30/41
The score of the background node gd calculated in the first iteration is 1 × (1/41) +1 × (1/41) +1 × (1/41) ═ 3/41
The score of the node a calculated in the first iteration is 1 × (15/16) +0 × (1/3) ═ 15/16
And updating the scores of all nodes in the matrix once after each iteration calculation is completed.
The score for node b calculated for the second iteration 15/16 × (30/41) +3/41 × (1/3)
The scores of other nodes calculated in the second iteration and the scores of other nodes calculated in subsequent iterations are not repeated.
(c) And if the updated matrix is judged to meet the preset termination condition, normalizing the final scores of all the nodes except the background node gd. The preset termination condition may include: the sum of the differences between the scores of all the nodes in the matrix updated at the u-th time and the scores of all the nodes in the matrix updated at the u-1 th time is smaller than a preset error threshold; or the updating times reach a preset time threshold, and the preset error threshold and the preset time threshold can be set independently according to actual conditions. The description of the model error portion in the preset termination condition may be as follows:
and calculating a model error part, wherein the sum of the differences between the scores of all nodes in the matrix updated for the u-th time and the scores of all nodes in the matrix updated for the u-1 th time is less than a preset error threshold, and the specific formula is as follows:
wherein, score'i,scoreiScore values before and after updating are respectively set for the node i, and N is the number of all nodes (including the background node gd) in the directed weighted graph. I.e. if the updated matrix for the u-th time corresponds to Diff _ u<=errormin(preset error threshold), the update calculation is terminated.
Or, setting the preset threshold of times as 100 times, and when the Diff corresponding to the iteratively updated matrix is Diff _100, terminating the updating calculation.
The score normalization part mainly comprises two steps, specifically as follows:
(a) after the model stops updating the parameters, calculating the final scores of all nodes (excluding gd nodes) in the directed weighted graph, wherein the specific formula is as follows:
wherein, scoreopt(i)、scorefinal(i)、scoregdThe scores of the node i, the final score and the background node gd output when the calculation is stopped are respectively obtained, N is the number of all nodes in the directed weighted graph, and the value of i is N-1, namely the background node gd is removed.
(b) For score in step (a)final(i)Normalization is performed so that the score range of each node is [0,100 ]]That is, the output result of the optimized preset recognition model is [0,100 ]]A score in between.
S104: and acquiring target nodes corresponding to the scores larger than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
Specifically, the device acquires a target node corresponding to a score larger than a preset threshold value, acquires associated target nodes associated with all the target nodes according to the directed weighted graph, and determines information assets to be identified corresponding to all the target nodes and the associated target nodes as target information assets.
The important asset discriminators are detailed as follows:
3. important asset discriminator
The important asset discriminator is mainly used for screening and sorting the information assets to be identified output by the asset identifier module according to a threshold value, and identifying top _ K with the front access and access frequency. Suppose that information assets to be identified (each information asset to be identified corresponds to each node in the directed weighted graph except the background node gd) X ═ X1,x2……xi],i=1,2……N-1,Wherein xiIf the IP is an intranet IP, that is, a node in the directed weighted graph, the degree is an output result of the optimized preset recognition model, and N is the number of all nodes in the directed weighted graph, the method mainly includes the following steps:
and screening the information assets X to be identified according to a preset threshold value thread. Comparison xi(i-1, 2 … N-1), if the degree is more than or equal to the thread, retaining xiElse delete xiAnd forming a target node Y, wherein the preset threshold value thread can be set independently according to the actual condition.
For x in YiAnd sorting the degree values in a descending order, and outputting top _ K of the important assets of the enterprise. I.e., K target nodes with top-most access and visited frequencies.
The association analyzer is described in detail as follows:
4. association analyzer
The main function of the association analyzer is to associate the K target nodes output by the important asset discriminator with the result (directed weighted graph) in the asset identifier, and identify the associated target nodes associated with the K target nodes. Suppose top _ K ═ x'1,x′2……x′i]I ═ 1,2 … … K, where x'iIf the target node in the intranet is IP', and the target node is a score, the method includes the following steps:
(a) by x 'in top _ K'iAnd (i-1, 2, … … K) as a source node, traversing the directed weighted graph in the asset identifier, outputting a degree node corresponding to the source node and a weight of a connecting edge to form degree data outk (in this step, corresponding to traversing a target node serving as the source node in the directed weighted graph, obtaining a degree node corresponding to the target node serving as the source node). Illustrated with reference to fig. 2 is as follows: the target node b (i.e., the node in fig. 2 as the starting point of the directed edge) as the source node, and the out-degree nodes corresponding to b are the background node gd and the node c. The out-degree data outk corresponding to b is a directed edge (edge weight 1) pointing to the background node gd, a directed edge (edge weight 5) pointing to the node c (this step corresponds to obtaining the out-degree node pairThe corresponding edge weight).
(b) By x 'in top _ K'iAnd (i ═ 1,2 and … … K) as a destination node, traversing the directed weighted graph in the asset identifier, outputting an entry node corresponding to the destination node and a weight of a connecting edge, and forming out-degree data ink (the step corresponds to traversing the target node serving as the destination node in the directed weighted graph and acquiring the entry node corresponding to the target node serving as the destination node). Illustrated with reference to fig. 2 is as follows: a target node b (i.e., a node pointed by a directed edge in fig. 2) as a destination node, and the entry nodes corresponding to b are a background node gd and a node a. The out-degree data ink corresponding to b is a directional edge (edge weight 1) of the starting point background node gd of the directional edge, and a directional edge (edge weight 30) of the starting point node a of the directional edge (this step corresponds to obtaining the edge weight corresponding to the in-degree node). It should be noted that: the embodiment of the present invention does not specifically limit the sequence of traversing the target node serving as the source node in the directed weighted graph, acquiring the out-degree node corresponding to the target node serving as the source node, traversing the target node serving as the destination node in the directed weighted graph, and acquiring the in-degree node corresponding to the target node serving as the destination node. All out-degree nodes and all in-degree nodes except the background node gd are taken as all associated target nodes.
(c) And sequentially combining the data of all the associated target nodes, outk and ink to form the associated data of all the associated target nodes (the step corresponds to taking all the out-degree nodes except the background node gd, all the in-degree nodes and all the edge weights respectively corresponding to all the out-degree nodes and all the in-degree nodes as the associated data of all the associated target nodes).
According to the method for identifying the target information asset, provided by the embodiment of the invention, the data is extracted from the flow log, the directed weighted graph capable of reflecting the relation among the data is generated, the optimized preset identification model is obtained according to the directed weighted graph, the output result of the model is used as the score of each node, the target node corresponding to the score larger than the preset threshold value is obtained, the associated target nodes associated with all the target nodes are obtained according to the directed weighted graph, and all the target nodes and the information asset to be identified corresponding to the associated target nodes are determined as the target information asset, so that the target information asset can be accurately and efficiently identified.
On the basis of the above embodiment, the optimizing a preset recognition model according to the directed weighted graph includes:
initializing all nodes in the directed weighted graph to obtain initialization scores of all nodes, and constructing an initialization matrix containing all initialization scores.
Specifically, the device initializes all nodes in the directed weighted graph to obtain an initialization score of each node, and constructs an initialization matrix including all initialization scores. Reference may be made to the above embodiments, which are not described in detail.
And updating the initialization matrix according to the edge weight of the destination node of the directional target serving as the directional edge, the out degree of the source node serving as the starting point target of the directional edge and the initialization score of each source node connected with each destination node.
Specifically, the device updates the initialization matrix based on an edge weight of a destination node as a directed edge of a directed object, an out-degree of a source node as a starting point object of the directed edge, and an initialization score of each source node connected to each destination node. Reference may be made to the above embodiments, which are not described in detail.
And if the updated matrix is judged to meet the preset termination condition, normalizing the final scores of all the nodes except the background node gd.
Specifically, if the device judges that the updated matrix meets the preset termination condition, the device normalizes the final scores of all the nodes except the background node gd. Reference may be made to the above embodiments, which are not described in detail.
According to the method for identifying the target information asset, provided by the embodiment of the invention, the optimized preset identification model is obtained through the directed weighted graph, so that the target information asset can be further accurately and efficiently identified.
On the basis of the above embodiment, the preset termination condition includes:
the difference between the total score of each node in the matrix updated at the u-th time and the total score of each node in the matrix updated at the u-1 th time is smaller than a preset error threshold; or the updating times reach a preset time threshold value.
Specifically, the preset termination condition in the device includes:
the difference between the total score of each node in the updated matrix of the u-th time and the total score of each node in the updated matrix of the u-1 st time is smaller than a preset error threshold value; or the updating times reach a preset time threshold value. Reference may be made to the above embodiments, which are not described in detail.
The method for identifying the target information asset provided by the embodiment of the invention can further accurately and efficiently identify the target information asset by reasonably setting the specific preset termination condition.
On the basis of the above embodiment, the obtaining associated target nodes associated with all target nodes according to the directed weighted graph includes:
traversing a target node serving as a source node in the directed weighted graph, and acquiring a degree-out node corresponding to the target node serving as the source node; and traversing the target node serving as the destination node in the directed weighted graph to acquire an entry node corresponding to the target node serving as the destination node.
Specifically, the device traverses a target node serving as a source node in the directed weighted graph, and acquires an out-degree node corresponding to the target node serving as the source node; and traversing the target node serving as the destination node in the directed weighted graph to acquire an entry node corresponding to the target node serving as the destination node. Reference may be made to the above embodiments, which are not described in detail.
All out-degree nodes and all in-degree nodes except the background node gd are taken as all associated target nodes.
Specifically, the device takes all out-degree nodes and all in-degree nodes except the background node gd as all associated target nodes. Reference may be made to the above embodiments, which are not described in detail.
According to the method for identifying the target information asset, provided by the embodiment of the invention, all the associated target nodes are obtained through the out-degree nodes or in-degree nodes corresponding to various target nodes, so that the target information asset can be further accurately and efficiently identified.
On the basis of the above embodiment, the method further comprises:
when the out-degree node and the in-degree node are respectively obtained, the edge weights respectively corresponding to the out-degree node and the in-degree node are also obtained, and all out-degree nodes except the background node gd, all in-degree nodes and all edge weights respectively corresponding to all out-degree nodes and all in-degree nodes are used as the associated data of all associated target nodes.
Specifically, when the device acquires the out-degree node and the in-degree node, the device further acquires edge weights corresponding to the out-degree node and the in-degree node, and takes all out-degree nodes except the background node gd, all in-degree nodes, and all edge weights corresponding to all out-degree nodes and all in-degree nodes as the associated data of all associated target nodes. Reference may be made to the above embodiments, which are not described in detail.
The method for identifying the target information asset provided by the embodiment of the invention can comprehensively acquire the information of the target information asset by acquiring the associated data of all associated target nodes.
Fig. 3 is a schematic structural diagram of an embodiment of an apparatus for identifying a target information asset of the present invention, and as shown in fig. 3, an embodiment of the present invention provides an apparatus for identifying a target information asset, which includes an obtaining unit 301, a generating unit 302, an optimizing unit 303, and an identifying unit 304, where:
the acquiring unit 301 is configured to acquire a traffic log of an information asset to be identified; the generating unit 302 is configured to extract a source IP, a destination IP, and a statistical number of times that the source IP accesses the destination IP from the traffic log, and generate a directed weighted graph; the optimizing unit 303 is configured to optimize a preset recognition model according to the directed weighted graph, and use an output result of the optimized preset recognition model as a score of each node; the score represents how frequently the node visits and is visited by other nodes; the identifying unit 304 is configured to obtain a target node corresponding to a score greater than a preset threshold, obtain associated target nodes associated with all target nodes according to the directed weighted graph, and determine information assets to be identified corresponding to all target nodes and associated target nodes as target information assets.
Specifically, the obtaining unit 301 is configured to obtain a flow log of an information asset to be identified; the generating unit 302 is configured to extract a source IP, a destination IP, and a statistical number of times that the source IP accesses the destination IP from the traffic log, and generate a directed weighted graph; the optimizing unit 303 is configured to optimize a preset recognition model according to the directed weighted graph, and use an output result of the optimized preset recognition model as a score of each node; the score represents how frequently the node visits and is visited by other nodes; the identifying unit 304 is configured to obtain a target node corresponding to a score greater than a preset threshold, obtain associated target nodes associated with all target nodes according to the directed weighted graph, and determine information assets to be identified corresponding to all target nodes and the associated target nodes as target information assets.
The device for identifying the target information asset provided by the embodiment of the invention can accurately and efficiently identify the target information asset by extracting data from the flow log, generating the directed weighted graph capable of reflecting the relation among the data, acquiring the optimized preset identification model according to the directed weighted graph, taking the output result of the model as the score of each node, acquiring the target node corresponding to the score larger than the preset threshold value, acquiring the associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and the information asset to be identified corresponding to the associated target nodes as the target information asset.
On the basis of the foregoing embodiment, the optimization unit 303 is specifically configured to:
initializing all nodes in the directed weighted graph to obtain initialization scores of all nodes, and constructing an initialization matrix containing all initialization scores; updating the initialization matrix according to the edge weight of a target node of a directional target as a directional edge, the out-degree of a source node of a starting point target as the directional edge and the initialization score of each source node connected with each target node; and if the updated matrix is judged to meet the preset termination condition, normalizing the final scores of all the nodes except the background node gd.
Specifically, the optimizing unit 303 is specifically configured to: initializing all nodes in the directed weighted graph to obtain initialization scores of all nodes, and constructing an initialization matrix containing all initialization scores; updating the initialization matrix according to the edge weight of a target node of a directional target as a directional edge, the out-degree of a source node of a starting point target as the directional edge and the initialization score of each source node connected with each target node; and if the updated matrix is judged to meet the preset termination condition, normalizing the final scores of all the nodes except the background node gd.
The device for identifying the target information asset provided by the embodiment of the invention can further accurately and efficiently identify the target information asset by acquiring the optimized preset identification model through the directed weighted graph.
On the basis of the above embodiment, the preset termination condition includes:
the difference between the total score of each node in the matrix updated at the u-th time and the total score of each node in the matrix updated at the u-1 th time is smaller than a preset error threshold; or the updating times reach a preset time threshold value.
Specifically, the preset termination condition in the apparatus includes: the difference between the total score of each node in the matrix updated at the u-th time and the total score of each node in the matrix updated at the u-1 th time is smaller than a preset error threshold; or the updating times reach a preset time threshold value.
The device for identifying the target information asset provided by the embodiment of the invention can further accurately and efficiently identify the target information asset by reasonably setting the specific preset termination condition.
On the basis of the foregoing embodiment, the identifying unit 304 is specifically configured to:
traversing a target node serving as a source node in the directed weighted graph, and acquiring a degree-out node corresponding to the target node serving as the source node; traversing a target node serving as a target node in the directed weighted graph to obtain an entry node corresponding to the target node serving as the target node; all out-degree nodes and all in-degree nodes except the background node gd are taken as all associated target nodes.
Specifically, the identifying unit 304 is specifically configured to: traversing a target node serving as a source node in the directed weighted graph, and acquiring a degree-out node corresponding to the target node serving as the source node; traversing a target node serving as a target node in the directed weighted graph to obtain an entry node corresponding to the target node serving as the target node; all out-degree nodes and all in-degree nodes except the background node gd are taken as all associated target nodes.
The device for identifying the target information asset provided by the embodiment of the invention can further accurately and efficiently identify the target information asset by acquiring all associated target nodes through the out-degree nodes or in-degree nodes corresponding to various target nodes.
On the basis of the above embodiment, the apparatus is further configured to:
when the out-degree node and the in-degree node are respectively obtained, the edge weights respectively corresponding to the out-degree node and the in-degree node are also obtained, and all out-degree nodes except the background node gd, all in-degree nodes and all edge weights respectively corresponding to all out-degree nodes and all in-degree nodes are used as the associated data of all associated target nodes.
Specifically, the apparatus is further configured to: when the out-degree node and the in-degree node are respectively obtained, the edge weights respectively corresponding to the out-degree node and the in-degree node are also obtained, and all out-degree nodes except the background node gd, all in-degree nodes and all edge weights respectively corresponding to all out-degree nodes and all in-degree nodes are used as the associated data of all associated target nodes.
The device for identifying the target information assets provided by the embodiment of the invention can comprehensively acquire the information of the target information assets by acquiring the associated data of all associated target nodes.
The apparatus for identifying a target information asset provided in the embodiment of the present invention may be specifically configured to execute the processing flows of the above method embodiments, and its functions are not described herein again, and refer to the detailed description of the above method embodiments.
Fig. 4 is a schematic structural diagram of an entity of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes: a processor (processor)401, a memory (memory)402, and a bus 403;
the processor 401 and the memory 402 complete communication with each other through a bus 403;
the processor 401 is configured to call the program instructions in the memory 402 to execute the methods provided by the above-mentioned method embodiments, for example, including: acquiring a flow log of an information asset to be identified; extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log, and generating a directed weighted graph; the directed weighted graph comprises background nodes gd; optimizing a preset recognition model according to the directed weighted graph, and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes; and acquiring target nodes corresponding to the scores larger than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-mentioned method embodiments, for example, comprising: acquiring a flow log of an information asset to be identified; extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log, and generating a directed weighted graph; the directed weighted graph comprises background nodes gd; optimizing a preset recognition model according to the directed weighted graph, and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes; and acquiring target nodes corresponding to the scores larger than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the methods provided by the above method embodiments, for example, including: acquiring a flow log of an information asset to be identified; extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log, and generating a directed weighted graph; the directed weighted graph comprises background nodes gd; optimizing a preset recognition model according to the directed weighted graph, and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes; and acquiring target nodes corresponding to the scores larger than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
Those of ordinary skill in the art will understand that: all or part of the steps of implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer-readable storage medium, and when executed, executes the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer information asset (which may be a personal computer, a server, or a network information asset, etc.) to perform the methods described in the various embodiments or some portions of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method of identifying a target information asset, comprising:
acquiring a flow log of an information asset to be identified;
extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log, and generating a directed weighted graph;
optimizing a preset recognition model according to the directed weighted graph, and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes; wherein, optimizing a preset recognition model according to the directed weighted graph comprises: initializing all nodes in the directed weighted graph to obtain initialization scores of all nodes, and constructing an initialization matrix containing all initialization scores; updating the initialization matrix according to the edge weight of a target node of a directional target as a directional edge, the out-degree of a source node of a starting point target as the directional edge and the initialization score of each source node connected with each target node; if the updated matrix is judged to meet the preset termination condition, the final scores of all the nodes except the background node gd are processed in a normalization mode;
and acquiring target nodes corresponding to the scores larger than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
2. The method of identifying a target information asset as claimed in claim 1, wherein said preset termination condition comprises:
the difference between the total score of each node in the matrix updated at the u-th time and the total score of each node in the matrix updated at the u-1 th time is smaller than a preset error threshold;
or the updating times reach a preset time threshold value.
3. The method for identifying a target information asset according to any one of claims 1 to 2, wherein said obtaining associated target nodes associated with all target nodes according to the directed weighted graph comprises:
traversing a target node serving as a source node in the directed weighted graph, and acquiring a degree-out node corresponding to the target node serving as the source node; traversing a target node serving as a target node in the directed weighted graph to obtain an entry node corresponding to the target node serving as the target node;
all out-degree nodes and all in-degree nodes except the background node gd are taken as all associated target nodes.
4. The method of identifying a target information asset as recited in claim 3, further comprising:
when the out-degree node and the in-degree node are respectively obtained, the edge weights respectively corresponding to the out-degree node and the in-degree node are also obtained, and all out-degree nodes except the background node gd, all in-degree nodes and all edge weights respectively corresponding to all out-degree nodes and all in-degree nodes are used as the associated data of all associated target nodes.
5. An apparatus for identifying a target information asset, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a flow log of the information asset to be identified;
the generating unit is used for extracting a source IP, a destination IP and the statistical times of the source IP accessing the destination IP from the flow log and generating a directed weighted graph;
the optimization unit is used for optimizing a preset recognition model according to the directed weighted graph and taking an output result of the optimized preset recognition model as the score of each node; the score represents how frequently the node visits and is visited by other nodes; wherein the optimization unit is specifically configured to: initializing all nodes in the directed weighted graph to obtain initialization scores of all nodes, and constructing an initialization matrix containing all initialization scores; updating the initialization matrix according to the edge weight of a target node of a directional target as a directional edge, the out-degree of a source node of a starting point target as the directional edge and the initialization score of each source node connected with each target node; if the updated matrix is judged to meet the preset termination condition, the final scores of all the nodes except the background node gd are processed in a normalization mode;
and the identification unit is used for acquiring target nodes corresponding to the scores greater than a preset threshold value, acquiring associated target nodes associated with all the target nodes according to the directed weighted graph, and determining all the target nodes and information assets to be identified corresponding to the associated target nodes as target information assets.
6. The apparatus for identifying a target information asset as claimed in claim 5, wherein said preset termination condition comprises:
the difference between the total score of each node in the matrix updated at the u-th time and the total score of each node in the matrix updated at the u-1 th time is smaller than a preset error threshold;
or the updating times reach a preset time threshold value.
7. The apparatus for identifying a target information asset as claimed in any one of claims 5 to 6, wherein said identifying unit is specifically configured to:
traversing a target node serving as a source node in the directed weighted graph, and acquiring a degree-out node corresponding to the target node serving as the source node; traversing a target node serving as a target node in the directed weighted graph to obtain an entry node corresponding to the target node serving as the target node;
all out-degree nodes and all in-degree nodes except the background node gd are taken as all associated target nodes.
8. The apparatus for identifying a target information asset as recited in claim 7, wherein said apparatus is further configured to:
when the out-degree node and the in-degree node are respectively obtained, the edge weights respectively corresponding to the out-degree node and the in-degree node are also obtained, and all out-degree nodes except the background node gd, all in-degree nodes and all edge weights respectively corresponding to all out-degree nodes and all in-degree nodes are used as the associated data of all associated target nodes.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 4 are implemented when the processor executes the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910404921.7A CN110166289B (en) | 2019-05-15 | 2019-05-15 | Method and device for identifying target information assets |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910404921.7A CN110166289B (en) | 2019-05-15 | 2019-05-15 | Method and device for identifying target information assets |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110166289A CN110166289A (en) | 2019-08-23 |
CN110166289B true CN110166289B (en) | 2022-07-05 |
Family
ID=67634801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910404921.7A Active CN110166289B (en) | 2019-05-15 | 2019-05-15 | Method and device for identifying target information assets |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110166289B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112583610B (en) * | 2019-09-27 | 2023-04-11 | 顺丰科技有限公司 | System state prediction method, system state prediction device, server and storage medium |
CN113032607B (en) * | 2019-12-09 | 2024-07-02 | 深圳云天励飞技术有限公司 | Critical personnel analysis method, device, electronic equipment and storage medium |
CN112929216A (en) * | 2021-02-05 | 2021-06-08 | 深信服科技股份有限公司 | Asset management method, device, equipment and readable storage medium |
CN113158001B (en) * | 2021-03-25 | 2024-05-14 | 深圳市联软科技股份有限公司 | Network space IP asset attribution and correlation discrimination method and system |
CN113904921B (en) * | 2021-10-21 | 2024-04-30 | 上海观安信息技术股份有限公司 | Dynamic network topology graph generation method, system, processing equipment and storage medium based on log and graph |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8595262B1 (en) * | 2012-03-29 | 2013-11-26 | Amazon Technologies, Inc. | Resource resolution in computing environments using directed graphs |
EP2757505A1 (en) * | 2013-01-22 | 2014-07-23 | Skymedia Srl | Method and corresponding computer implemented system for the retrieval of documents from a management and classification system with relative and absolute importance weighting of the document sources |
CN107977340A (en) * | 2017-12-27 | 2018-05-01 | 邵美 | A kind of importance ranking method of block chain trade network node |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10833954B2 (en) * | 2014-11-19 | 2020-11-10 | Battelle Memorial Institute | Extracting dependencies between network assets using deep learning |
CA2933669A1 (en) * | 2015-06-23 | 2016-12-23 | Above Security Inc. | Method and system for detecting and identifying assets on a computer network |
-
2019
- 2019-05-15 CN CN201910404921.7A patent/CN110166289B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8595262B1 (en) * | 2012-03-29 | 2013-11-26 | Amazon Technologies, Inc. | Resource resolution in computing environments using directed graphs |
EP2757505A1 (en) * | 2013-01-22 | 2014-07-23 | Skymedia Srl | Method and corresponding computer implemented system for the retrieval of documents from a management and classification system with relative and absolute importance weighting of the document sources |
CN107977340A (en) * | 2017-12-27 | 2018-05-01 | 邵美 | A kind of importance ranking method of block chain trade network node |
Also Published As
Publication number | Publication date |
---|---|
CN110166289A (en) | 2019-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110166289B (en) | Method and device for identifying target information assets | |
EP3588279B1 (en) | Automated extraction of rules embedded in software application code using machine learning | |
WO2019144066A1 (en) | Systems and methods for preparing data for use by machine learning algorithms | |
CN109818961B (en) | Network intrusion detection method, device and equipment | |
CN111726248A (en) | Alarm root cause positioning method and device | |
CN111199474A (en) | Risk prediction method and device based on network diagram data of two parties and electronic equipment | |
CN113505936A (en) | Project approval result prediction method, device, equipment and storage medium | |
CN110798467A (en) | Target object identification method and device, computer equipment and storage medium | |
CN110209929B (en) | Resume recommendation method and device, computer equipment and storage medium | |
CN107368526A (en) | A kind of data processing method and device | |
CN111368096A (en) | Knowledge graph-based information analysis method, device, equipment and storage medium | |
CN111695979A (en) | Method, device and equipment for analyzing relation between raw material and finished product | |
CN113962712A (en) | Method for predicting fraud gangs and related equipment | |
CN114329455B (en) | User abnormal behavior detection method and device based on heterogeneous graph embedding | |
CN111950237B (en) | Sentence rewriting method, sentence rewriting device and electronic equipment | |
CN114139636B (en) | Abnormal operation processing method and device | |
CN111353860A (en) | Product information pushing method and system | |
CN115422000A (en) | Abnormal log processing method and device | |
CN114092057A (en) | Project model construction method and device, terminal equipment and storage medium | |
CN114781517A (en) | Risk identification method and device and terminal equipment | |
CN116432633A (en) | Address error correction method, device, computer equipment and readable medium | |
CN109919811B (en) | Insurance agent culture scheme generation method based on big data and related equipment | |
CN109189833B (en) | Knowledge base mining method and device | |
CN115080732A (en) | Complaint work order processing method and device, electronic equipment and storage medium | |
CN111353871A (en) | Risk prediction method and device based on network diagram data of two parties and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088 Applicant after: QAX Technology Group Inc. Address before: 100015 15, 17 floor 1701-26, 3 building, 10 Jiuxianqiao Road, Chaoyang District, Beijing. Applicant before: BEIJING QIANXIN TECHNOLOGY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |