CN116488847B

CN116488847B - Network information tracing method, device, equipment and medium

Info

Publication number: CN116488847B
Application number: CN202310172941.2A
Authority: CN
Inventors: 王震; 侯东鹏; 高超; 李向华; 李学龙
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2023-02-27
Filing date: 2023-02-27
Publication date: 2024-01-30
Anticipated expiration: 2043-02-27
Also published as: CN116488847A

Abstract

The invention discloses a network information tracing method, device, equipment and medium, belonging to the field of network space safety information. The method solves the problems that the existing propagation source positioning algorithm is large in algorithm consumption and low in accuracy rate of positioning information obtained based on limited observation points. Comprising the following steps: selecting a special node from a graph network based on the influence of user nodes as a first observation point of the graph network, selecting one neighbor node from each N-level neighbor and adding the first observation point into an observation point set, determining the shortest distance sum of each receiving observation point and each user node in the graph network, deleting one or more user nodes with the shortest distance sum being greater than a distance threshold, determining the rest user nodes in the graph network as high-probability propagation source candidate points, evaluating the high-probability propagation source candidate points based on a penalty-driven optimal source estimator, and determining the user node with the minimum penalty score as a prediction propagation source.

Description

Network information tracing method, device, equipment and medium

Technical Field

The invention belongs to the field of network space security information, and particularly relates to a network information tracing method, device, equipment and medium.

Background

The information tracing, namely the transmission source positioning, refers to tracing the sources of transmissible information such as rumors, biological diseases, computer viruses and the like of outbreaks in reality, so as to control transmission. Today, where there is an increasing contact, the risky transmission process jeopardizes the stability of society. Especially today, the population reaches 80 billion, and network users have more than 50 billion people. In such a large-scale network environment, the hazard of risk transmission is self-evident, and it can lead to a group becoming scattered, causing social panic and the like, thereby causing serious interference to the production and life of the people, the stability of the society and the governance of the country.

At present, propagation source positioning algorithms based on network science are specifically classified into three major categories: global information based methods, local information based methods, and sensor observation (i.e., point of view, monitor, sentinel) based methods. The method based on global information needs to acquire infection information of all points in a network, and then deduces a propagation source by using the global information; the local information-based method is to deduce the propagation source by using the infection information of local points in the network; the observation method based on the observation points is to arrange a certain number of observation points in the network in advance, the observation points achieve the purpose of monitoring the network by capturing the infection information, and finally deduce the propagation sources in the network through the information of the observation points.

Paluch et al have demonstrated in 2020 that the efficiency of location based observation of the observation point method is highest by comparing three types of observation methods (global information based method, local information based method and observation point based method). Pinto et al, in 2012, originally proposed a gaussian estimator based on a BFS (english Breadth First Search, breadth first search) spanning tree to estimate the propagation source, but this hypothetical information was propagated along the shortest path, and when the infection rate of the propagation was low, the propagation was relatively random, rather than along the shortest path, so this positioning method was inapplicable in some scenarios. Thus, poplar et al enhance the predictive power of the gaussian estimator by additionally considering the propagation direction information recorded by the observation point; paluch et al consider the effects of multipath diffusion based on a gaussian estimator. In addition, the impact of community mining on propagation has recently received attention. The king et al overcome the influence of community coupling by detecting the community structure of the network, and further solve the problem of multi-propagation source positioning. However, these methods have high complexity based on multivariate gaussian distribution, and thus cannot be flexibly and efficiently applied to large-scale networks.

Through the analysis, the existing propagation source positioning algorithm has the problems that the algorithm consumption is high and the accuracy is low because the positioning information obtained based on the limited observation points.

Disclosure of Invention

The embodiment of the invention provides a network information tracing method, device, equipment and medium, which can solve the problems that the algorithm consumption is high and the accuracy is lower in the existing propagation source positioning algorithm based on the positioning information obtained by limited observation points.

The embodiment of the invention provides a network information tracing method, which comprises the following steps:

selecting a special node from a graph network based on the influence of a user node as a first observation point of the graph network, determining N-order neighbors corresponding to the first observation point from the graph network, and optionally selecting one neighbor node from each N-order neighbor in turn and adding the neighbor node into an observation point set from the first observation point, wherein N is a positive integer greater than 1;

when information is transmitted, determining a first observation point in the observation point set, which receives the transmitted information, as a receiving observation point;

determining the shortest distance sum of each receiving observation point and each user node in the graph network, deleting one or more user nodes with the shortest distance sum being larger than a distance threshold, determining the rest user nodes in the graph network as high-probability propagation source candidate points, evaluating the high-probability propagation source candidate points based on a penalty-driven optimal source estimator, and determining the user node with the minimum penalty score as a prediction propagation source.

Preferably, the determining an N-order neighbor corresponding to the first observation point, and selecting one neighbor node from each N-order neighbor in turn and adding the first observation point into the observation point set specifically includes:

sequentially selecting a first-order neighbor, a second-order neighbor and an N-order neighbor of a first observation point in the graph network, arbitrarily selecting a first-order neighbor node from the first-order neighbors, arbitrarily selecting a second-order neighbor node from the second-order neighbors, arbitrarily selecting an N-order neighbor node from the N-order neighbors, and adding the first-order neighbor node, the second-order neighbor node and the N-order neighbor node into the observation point set as the first observation point;

the set of viewpoints is determined by the following formula:

and the set of viewpoints meets the specification RS (G, O):

where e (O) is the eccentricity of the first observation point O included in the observation point set O, V is the set of user nodes in the graph network G, |v| is the number of user nodes, d (V) _i O) represents a user node v _i Shortest distance to first viewpoint in viewpoint set O, Ω (|v| ² ) The complexity of the representation increases as the number of user node sets V increases by a square, Φ (O) represents a constraint that puts time and space constraints on the acquisition process of the observation point set O, and s.t. represents a constraint condition.

Preferably, after determining the remaining user nodes in the graph network as high probability propagation source candidate points, the method further includes:

when the first-order neighbor node of the receiving observation point comprises any one first observation point in the observation point set, dividing the first-order neighbor node of the receiving observation point into a transmission node, a non-transmission node and a first observation point, and deleting edges between the receiving observation point and the non-transmission node; or alternatively

When the first-order neighbor nodes of the receiving observation point only comprise user nodes, dividing the first-order neighbor nodes of the receiving observation point into transmission nodes and non-transmission nodes, and deleting edges between the receiving observation point and the non-transmission nodes; the transfer node is used for transferring virus information to the receiving observation point.

Preferably, the high probability propagation source candidate point is determined by the following formula:

wherein V is _θ Representing a set of candidate points with high probability of being a propagation source, d (v _i O) represents a user node v _i Shortest distance to first observation point in observation point set O, O _Y Representing the collection of addressee observation points.

Preferably, the penalty driven optimal source estimator is as follows:

wherein,

d _o,v ＝d(v _ψ(o) ,v)+1

representing the predicted propagation source, O _Y Represents the collection of receiving observation points, O _N Representing a collection of non-receiving observations,representing a reward factor->Is a penalty coefficient, Γ (v) represents the neighbor set of the user node v, t _o A standardized observation time, v, representing the reception observation point o _ψ(o) The method comprises the steps of representing a user node corresponding to a propagation source of a receiving observation point o, wherein do, v represents a deduced potential true propagation path distance from the user node v to the receiving observation point o, and rp (v) represents a reward and punishment mechanism of the user node v.

Preferably, the normalized observation time of the reception observation point o is expressed by the following formula:

wherein,represents the time, ω, at which the propagation was recorded by the observation point earliest ⁺ Omega and omega ^- Penalty terms respectively representing observation time normalization, and the order of penalty forces is ω ⁺ >ω>ω ^- 。

Preferably, the special node comprises any one of a central node and a leaf node, and the special node is determined by the following formula:

wherein,representing the number of neighbors of the user node vi, argmax represents the central node in the selection graph network G, argmin represents the leaf nodes in the selection graph network G, O represents a special node, and O represents a set of observation points.

The embodiment of the invention provides a network information tracing device, which comprises:

an adding unit, configured to select a special node from a graph network based on an influence of a user node as a first observation point of the graph network, determine an N-order neighbor corresponding to the first observation point from the graph network, and sequentially select one neighbor node and the first observation point from each N-order neighbor in any way, where N is a positive integer greater than 1, and add the neighbor node to an observation point set;

The first determining unit is used for determining a first observation point of the observation point set, which receives the propagation information, as a receiving observation point when the information is propagated;

and the second determining unit is used for determining the shortest distance sum of each receiving observation point and each user node in the graph network, deleting one or more user nodes with the shortest distance sum being larger than a distance threshold value, determining the rest user nodes in the graph network as high-probability propagation source candidate points, evaluating the high-probability propagation source candidate points based on a penalty-driven optimal source estimator, and determining the user node with the minimum penalty score as a prediction propagation source.

The embodiment of the invention provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the network information tracing method according to any one of the above.

An embodiment of the present invention provides a computer readable storage medium storing a computer program, where the computer program when executed by a processor causes the processor to execute any one of the network information tracing methods described above.

The embodiment of the invention provides a network information tracing method, a device, equipment and a medium, wherein the method comprises the following steps: selecting a special node from a graph network based on the influence of a user node as a first observation point of the graph network, determining N-order neighbors corresponding to the first observation point from the graph network, and optionally selecting one neighbor node from each N-order neighbor in turn and adding the neighbor node into an observation point set from the first observation point, wherein N is a positive integer greater than 1; when the virus propagates, determining a first observation point in the observation point set, which receives propagation information, as a receiving observation point; determining the shortest distance sum of each receiving observation point and each user node in the graph network, deleting one or more user nodes with the shortest distance sum being larger than a distance threshold, determining the rest user nodes in the graph network as high-probability propagation source candidate points, evaluating the high-probability propagation source candidate points based on a penalty-driven optimal source estimator, and determining the user node with the minimum penalty score as a prediction propagation source. According to the method, deployment of the observation points in the graph network is realized through a random full-order neighbor observation point selection strategy based on a greedy strategy, and the observation point set of global information can be captured through practice low-cost deployment. Further, the early-stage infection area is locked by the strategy based on the small-range propagation source inference of the early-stage response, so that the small area is rapidly positioned, and unnecessary expenditure is reduced. The method can solve the problem of low positioning accuracy rate caused by lack of early information, and can realize flexible positioning in a network scene of millions of scale.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a network information tracing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a network information tracing method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a complete technical framework of a network information tracing method according to an embodiment of the present invention;

fig. 4 is a schematic diagram showing the difference between the network information tracing method and other latest comparison methods according to the embodiment of the present invention;

fig. 5 is an accuracy evaluation schematic diagram of a network information tracing method and other comparison methods on six real networks according to an embodiment of the present invention;

fig. 6 is a schematic diagram of start-up opportunity evaluation of a network information tracing method and other comparison methods on six real networks according to an embodiment of the present invention;

Fig. 7 is a schematic diagram of CPU time consumption evaluation of a network information tracing method and other comparison methods on six real networks according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a network information tracing device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a schematic flow chart of a network information tracing method according to an embodiment of the present invention; fig. 2 is a schematic diagram of a network information tracing method according to an embodiment of the present invention; the following describes in detail the network information tracing method provided by the embodiment of the present invention with reference to fig. 1 and fig. 2, as shown in fig. 1, the method includes the following steps:

step 101, selecting a special node from a graph network based on the influence of a user node as a first observation point of the graph network, determining N-order neighbors corresponding to the first observation point from the graph network, and optionally selecting one neighbor node from each N-order neighbor in turn and adding the neighbor node into an observation point set from the first observation point, wherein N is a positive integer greater than 1;

102, when the virus propagates, determining a first observation point in the observation point set, which receives propagation information, as a reception observation point;

step 103, determining the shortest distance sum of each receiving observation point and each user node in the graph network, deleting one or more user nodes with the shortest distance sum being larger than a distance threshold, determining the rest user nodes in the graph network as high probability propagation source candidate points, evaluating the high probability propagation source candidate points based on a penalty-driven optimal source estimator, and determining the user node with the minimum penalty score as a prediction propagation source.

The network information tracing method provided by the embodiment of the invention can be at least applied to various propagation source tracing methods, such as rumor propagation source tracing, disease propagation source tracing and the like.

It should be noted that, as shown in fig. 2, the network tracing method provided in the embodiment of the present invention may be divided into a viewpoint deployment process and a fast preprocessing process, specifically, step 101 mainly describes the viewpoint deployment process, and steps 102 and 103 mainly describe the fast preprocessing process.

Before the introduction step 101, the construction process of the graph network is introduced, specifically, a user relationship of the target area needing to be monitored and located for harmful information is established, where the user relationship is shown in formula (1):

γ＝{<v _i ,v _j >|i≠j,|γ|＝z} (1)

Wherein,<v _i ,v _j >representing two different user nodes v _i And user node v _j Is a friendship, z represents a set of tuples. Further, after the user relationship is input, the user relationship is mapped into the graph network G, specifically as shown in the formula (2):

G＝(V＝{v ₁ ,v ₂ ,v ₃ ,...},E＝{<v _i ,v _j >|i≠j,|γ|＝z) (2)

wherein V represents a user node set for user nodes V of the corresponding target area ₁ ,v ₂ ,v ₃ .. E represents a set of z edges, connected edges<v _i ,v _j >Representing two user nodes v _i And user node v _j Are recognized in social networks.

Further, initializing all user nodes included in the graph network to a State in which no harmful information is received, and recording as State (v) =0; in the embodiment of the present invention, state (·) =0 indicates that the State of legal input·is that no harmful information is received, and State (·) =1 indicates that the State of legal input·is that harmful information is received.

In step 101, after a graph network including a user relationship is constructed, deployment of observation points in the graph network is required, and it should be noted that in the embodiment of the present invention, the observation points play roles in the graph network in that when there is propagation information (virus, rumor) propagated, it is necessary to record the time when the propagation information is received through the observation points, so that the observation points can be directly sourced according to the time when the propagation information is received, where the observation point set is denoted by O.

Specifically, according to the user nodes included in the graph network, the edges included in each user node are directly counted, the edges included in each user node are determined as the node number of each user node, and in practical application, the node number included in the user node may also be referred to as the degree of the user node. For example, as shown in fig. 2 (a) I, user node V ₅ 5 neighbors of (a), then user node V ₅ Degree of 5, section V ₃ The neighbors of a point are only 1, so the user node V ₃ The degree of (2) is 1.

Further, according to the degree of the user nodes, the influence of the user nodes can be determined, and the user nodes are ordered according to the influence of the user nodes, so that the user node with the most influence and the user node with the least influence in the graph network can be obtained. For example, as shown in fig. 2 (a) I, user node V ₅ Is 5, his influence is also 5, user node V ₃ So his influence is 1, after ordering the user nodes according to the influence, user node V ₅ Is the user node with the greatest degree in the network, and the user node V ₃ I.e., the node with the least influence.

Further, a particular node in the graph network is determined by the following equation (3):

Wherein,representing the number of neighbors of the user node vi, argmax representing the central node in the selection graph network G, argmin representing the selection graph network G for the leaf node, and o representing the special node, wherein the special node may be the central node or the leaf node, the central node is the user node with the greatest degree in the graph network, and correspondingly, the leaf node is the user node with the least degree in the graph network, and in practical application, the special node may be the central node or the leaf nodeIs a leaf node, and the selection of a special node is not limited in the embodiment of the invention.

After the special node is determined, the full-order neighbor observation point set O of the special node O needs to be further deployed based on the idea of random full-order neighbor selection, specifically, the special node determined in the graph network is determined as a first observation point, and then a first-order neighbor, a second-order neighbor, an equal-N-order neighbor of the first observation point are sequentially selected in the graph network.

After determining the first-order neighbor, the second-order neighbor, the N-order neighbor, and the like of the first observation point, one neighbor node can be arbitrarily selected from each-order neighbor, and the selected neighbor node is added into the observation point set, so that the observation point set can be obtained, and the observation point set further comprises the first observation point. Specifically, one neighbor node is arbitrarily selected from the first-order neighbors as a first-order neighbor node, one neighbor node is arbitrarily selected from the second-order neighbors as a second-order neighbor node, one neighbor node is arbitrarily selected from the N-order neighbors as an N-order neighbor node, and then the selected first-order neighbor node, second-order neighbor node and N-order neighbor node are added to the observation point set as first observation points, so that the observation point set is obtained. It should be noted that, the above-mentioned viewpoint set includes not only selecting a plurality of neighboring nodes, but also includes a special node, i.e., the first viewpoint, initially selected from the graph network.

In the embodiment of the invention, the set of the observation points is selected based on the random full-order neighbor and determined by a formula (4):

further, in order to ensure that the first observation point in the observation point set can capture the propagation source in early stage, in the embodiment of the invention, the observation point set which is deployed is selected by adopting a greedy algorithm, and the observation point set meets the specification RS (G, O), further, in order to overcome the problem that the observation point is difficult to be greedy deployed in a large-scale network due to the huge overhead of a global greedy strategy, the observation point can be formed by establishing a structure with the shape of phi (O) less than or equal to O (I)V| ² ) Constraint on the deployed viewpoint set O, GS (G, O) is shown in equation (5):

in the formulas (4) and (5), e (O) is the eccentricity of the first observation point O included in the observation point set O, i.e., the maximum value of the shortest distance from the first observation point O to other user nodes, which is the furthest reachable distance. Thus, from the central node or the leaf node, at least one complete observation chain reaches the furthest end of the network from the user node; v is the set of user nodes in graph network G, V is the number of user nodes, and in order to overcome the problem of difficulty in greedy deployment of observation points in a large-scale network caused by the huge overhead of a global greedy strategy, the observation points can be formed by building a graph with the shape of phi (O) less than or equal to omega (|V|) ² ) Screening the deployed viewpoint set O by constraint of Ω (|V|) ² ) The complexity of the representation increases as the number of user node sets V increases by a square, and phi (O) represents a constraint on the acquisition process of the observation point set O in time and space, so that phi (O) is less than or equal to omega (|V|) ² ) Representing that the time-space complexity is not higher than the square multiple increase along with the increase of the number of user nodes |V| of the observation point set O in the screening process in the graph network G, and s.t. represents a constraint condition, d (V) _i O) represents node v _i Shortest distance to the first viewpoint in set O.

In practical application, the first-order neighbor is the one that can be reached in the fastest step, the second-order neighbor is the one that can be reached in the fastest step, and the N-order neighbor is the one that can be reached in the fastest step, where N is a positive integer greater than or equal to 1. The full order described in the above embodiments is from 1 up to the maximum N order, i.e. the e (o) order. For the random full-order neighbor selection strategy, as shown in fig. 2 (a) I, if a particular node selects a user node V ₅ Then the user node V ₅ The first-order neighbors of (a) include user nodes with V ₃ ，V ₆ ，V ₉ ，V ₄ ，V ₁₀ Similarly, the second-order neighbor includes user nodes includingV ₂ ，V ₈ ，V ₁₁ ，V ₁₂ Etc.; the third-order neighbors include user nodes: v (V) ₁ ，V ₇ ，V ₁₄ Etc. Further, the first order neighbor node, the second order neighbor node and the third order neighbor node are randomly selected, and then { V ₃ ，V ₆ ，V ₉ ，V ₄ ，V ₁₀ Any one of the two is selected, in this embodiment, a first order neighbor node V ₆ The method comprises the steps of carrying out a first treatment on the surface of the Second order neighbor { V ₂ ，V ₈ ，V ₁₁ ，V ₁₂ Any one of the second-order neighbor nodes and the third-order neighbor { V } is selected ₁ ，V ₇ ，V ₁₄ Any one of the third-order neighbor nodes is selected. The selected neighbor node is then added to the set of observation points as the first observation point.

It should be noted that, in the above embodiment, the first-order neighbor { V ₃ ，V ₆ ，V ₉ ，V ₄ ，V ₁₀ A plurality of neighbor nodes included in the set of observation points, each of which may be added as a first-order neighbor node, and accordingly, a second-order neighbor { V }, respectively ₂ ，V ₈ ，V ₁₁ ，V ₁₂ Each of the plurality of neighbor nodes included in the method can also be added to the observation point set as a second-order neighbor node, so that a random selection strategy is added on the basis of greedy thought, and the method becomes a key for reducing the complexity of greedy algorithm.

In the embodiment of the invention, if the number of the observation points obtained by performing one-time observation point deployment on the first observation point cannot meet the deployment requirement, that is, the obtained observation points cannot achieve the overall area coverage and the wide information capture, the selection of the special node and the mode of selecting the observation point set based on the full-order neighbor of the special node (the first node) need to be re-performed until the number of the observation points included in the observation point set can meet the deployment number, that is, the overall area coverage and the wide information capture capability are achieved through the deployed observation point set, and the complete propagation chain can be captured as much as possible. Because sufficiently dispersed sensors provide good prerequisites for the received early information and adding random selection strategies provides guarantees for fast sensor selection in large networks.

It should be noted that, the selection ratio of the deployment viewpoint set in the graph network may be 5%, 10%, 15%, 20%, 25% and 30%, that is, the number of the first viewpoints included in the viewpoint set may be 5%, 10%, 15%, 20%, 25% and 30% of the number of user nodes in the graph network, and the specific occupied ratio is not limited herein.

In step 102, when there is a propagation message propagating, a first observation point in the observation point set, where the propagation message is received, may be determined as a receiving observation point, and accordingly, an observation point where the propagation message is not received is determined as an not receiving observation point, further, the receiving observation point is added to the receiving observation point set, and the not receiving observation point is added to the not receiving observation point set.

Specifically, the collection of destination viewpoints is denoted as O _Y The collection of non-reception observation points represents O _N Wherein for O ε O _Y State (O) =1, for O ε O _N State (o) =0; in the embodiment of the invention, when 4 receiving observation points exist, the tracing process can be executed.

Further, in the embodiment of the present invention, pruning based on a distance threshold may be performed on a user node that is a propagation source with a low probability based on a destination viewpoint, specifically:

In step 103, determining the sum of the shortest distances between each receiving observation point included in the receiving observation point set and each user node in the graph network, deleting one or more user nodes with the sum of the shortest distances being greater than a distance threshold, and determining the undeleted user nodes as high probability propagation source candidate points, wherein in the embodiment of the invention, the determination method of the high probability propagation source candidate points can be determined by the following formula (6):

wherein V is _θ Representing a set of candidate points with high probability of being a propagation source, d (v _i O) represents a user node v _i Shortest distance to first observation point in observation point set O, O _Y Representing the collection of addressee observation points. The distance threshold here may be 8.

In the embodiment of the present invention, after determining the high probability propagation source candidate point in the graph network, the edges of the receiving observation point and the corresponding first-order neighbors may be partially deleted according to the angle at which the receiving observation point receives the message.

Specifically, because the information received by the receiving observation point must be propagated through the user node included in the first-order neighbor, the first-order neighbor of the receiving observation point may be further divided by deleting the edge between the receiving observation point and the first-order neighbor without information transfer, but in practical application, the first-order neighbor of the receiving observation point may further include the first observation point already added to the observation point set, so when dividing the first-order neighbor of the receiving observation point, the dividing needs to be performed according to practical situations:

And if the first-order neighbor of the receiving observation point comprises the first observation point, dividing the first-order neighbor of the receiving observation point into a transmission node, a non-transmission node and the first observation point, and deleting edges between the receiving observation point and user nodes which are divided into the non-transmission nodes in the first-order neighbor. And if the first-order neighbor nodes of the receiving observation point only comprise user nodes, dividing the first-order neighbor nodes of the receiving observation point into transmission nodes and non-transmission nodes, and deleting edges between the receiving observation point and the user nodes divided into the non-transmission nodes in the first-order neighbors.

In the embodiment of the invention, the user node for transmitting the message to the receiving observation point is called a transmitting node, and correspondingly, the user node for not transmitting the message to the receiving observation point is called a non-transmitting node. Both the delivery node and the non-delivery node here belong to user nodes included in the first-order neighborhood of the terminating viewpoint.

Further, after determining the high probability propagation source candidate points, the high probability propagation source candidate points may be evaluated using a penalty driven optimal source estimator, where the penalty driven optimal source estimator is shown in equation (7):

in practical application, d in formula (7) _o,v The potential true propagation path distance from the user node v to the trust viewpoint o is represented by the expression:

d _o,v ＝d(v _ψ(o) ,v)+1 (8)

in formula (7)Representing a time-distance penalty term which considers that the actual propagation distance and the normalized observation time have strong positive correlation, wherein the weaker the positive correlation is, the stronger the penalty is, and the expression is shown in a formula (9):

in equation (7), rp (v) represents a reward and punishment mechanism of the user node v for rewarding the user node v if O _Y The neighbor of the intermediate receiving observation point has a user node v; or for punishing user node v, if O _N The neighbors of the non-addressee observation point in (a) exist in the user node v. The expression is shown in formula (10):

in the formula (7), t _o The standardized observation time of the receiving observation point o is represented as the expression shown in the formula (11):

in the above formulas (7) to (11),representing a predicted propagation source from a high probability propagation source candidate point; o (O) _Y Represents the collection of receiving observation points, O _N Represents the collection of the observation points which are not received, t _o A standardized observation time, v, representing the reception observation point o _ψ(o) User node corresponding to propagation source of receiving observation point o, d _o,v Representing the potential true propagation path distance of reasoning from the user node v to the receiving observation point o, rp (v) represents a reward and punishment mechanism of the user node v, and p and q are weight coefficients, and in the embodiment of the invention, 3/4 and 2/3 are respectively taken; omega ⁺ Omega and omega ^- Penalty terms respectively representing observation time normalization, and the order of penalty forces is ω ⁺ >ω>ω ^- In the embodiment of the invention, omega ⁺ Omega and omega ^- 2,1.5 and 1, respectively. Γ (v) represents the neighbor set of user node v, +.>Representing a reward factor->Is a penalty factor that, in an embodiment of the invention,

in the embodiment of the invention, the penalty-driven optimal source estimator calculates penalty scores of all the high-probability propagation source candidate points, wherein the higher the penalty score is, the smaller the probability that the high-probability propagation source candidate point becomes a propagation source is, and conversely, the lower the penalty score is, the greater the probability that the high-probability propagation source candidate point becomes the propagation source is.

In the embodiment of the invention, if the optimal source estimator driven by punishment is used for the condition of low infection rate, the time observed by the sensor is relatively late, for example, when the infection rate is less than 0.1, at least t is needed _o The susceptible node can be successfully infected with 95% probability only by 30 time stamps, here 30 time stamps, which can be understood as 1 time ticket if a normal node and a receiving node (infected) coexistThe probability that a normal node is infected is 0.1, and then 95% of the probability that the normal node has infection confidence is statistically calculated, and it can be estimated that the probability that the normal node is infected by a receiving node in 1 time unit is 1-0.9 x 0.9 in 0.1,2 time units, and so on, and about 30 time units are needed for the normal node to be infected in 95% of the cases.

Based on the above assumption, the propagation distance and normalized observation time expressed by equation (9)Hardly acts as a constraint because of d _o,v (representing the inferred potential true propagation path distance from the user node v to the receiving viewpoint o) is much smaller than the normalized observation time of the receiving viewpoint o represented by equation (11), i.e., t _o . Further, only a single effective term ++remains after taking max in equation (9)>In practical application, t _o In one experiment, this is an observed value due to d _o,v Representing a true propagation path distance much smaller than t _o Normalized observation time of the reception observation point o is expressed, so that finallyThe smaller the penalty. Based on this, it can be determined that candidates far from the earlier observed sensors have less penalty than candidates close to these sensors. It can thus be determined that when t in equation (7) _o After the expression (11), if the infection rate is high, the positioning accuracy of the expression (7) can be improved by about 3%, but if the infection rate is low, the accuracy can be improved by at least 10%.

In summary, the embodiment of the invention provides a network information tracing method, which maintains the wide coverage capability of a sensor with lower calculation cost, enhances the global capturing capability of the transmission information at the early-stage transmission observation point of information deficiency, and provides guarantee for early-stage positioning. Further, a rapid preprocessing process is provided to screen out invalid candidates with low probability or no probability, so that the time complexity of the inference process is greatly reduced, and the quick positioning of the propagation source is further ensured. In addition, the observation time is standardized in the inference strategy, and the true propagation path is inferred and reproduced, so that the positioning accuracy is increased. Furthermore, the method performs the transmission source positioning task under the network space safety, can play a key role in social media rumor tracing, network virus source tracing and other applications, particularly, the method has the important roles of excavating and tracing the starting warriors such as rumors or viruses, cutting off transmission paths, timely stopping losses, killing malicious information, purifying network environment, maintaining social stability and the like. Furthermore, the method effectively solves the problem of accurate and quick positioning in a million-level network scale environment by restraining the overall complexity for the first time and realizing effective positioning in a large-scale network environment. Further, the method utilizes deduction of the real propagation path to realize accurate tracing under the complex behavior interaction scene for the first time, and overcomes the technical bias of assuming information to propagate along the shortest path in the technical scheme of tracing.

In order to more clearly describe the network information tracing method provided by the embodiment of the present invention, the network information tracing method provided by the embodiment of the present invention is described in detail below with reference to fig. 2 and 3, specifically, as shown in fig. 3, the method includes 7 steps, where S1 to S3 correspond to a random full-order neighbor viewpoint selection policy based on a greedy policy in fig. 2, and S4 to S7 correspond to a small-range propagation source inference policy based on early response in fig. 2.

S1, inputting a user relation library of a target area: the user relationship of the target area required for harmful information monitoring and positioning is input, and is specifically shown in a formula (1).

S2, constructing a graph network g= (V, E) for the target area and initializing: after the user relationship is input, mapping the actual relationship into the graph network G, wherein V is a point set, and is used for users corresponding to the target area, E is an edge set, the connected edge represents that two users recognize in the social network, and all nodes are initialized to a state in which no harmful information is received, and the state is specifically shown in a formula (2).

S3, deploying observation points according to the graph network G: according to the illuminance, the user nodes in the graph network are subjected to influence sorting, and the user node with the largest influence is selected as a special node V ₅ (first viewpoint), wherein the determination of the special node is determined by formula (3); further, the first observation point V is sequentially selected ₅ First-order neighbor node V in first-order, second-order and third-order neighbors ₆ Second-order neighbor node V ₂ And third-order neighbor node V ₇ Added as a first viewpoint to the viewpoint set, the viewpoint set is selected based on a random full-order neighbor and determined by formula (4). Wherein the user node V ₅ As the first viewpoint in the set of viewpoints, the user node V ₆ As the second viewpoint in the set of viewpoints, the user node V ₂ As the third viewpoint in the viewpoint set, the user node V ₇ As the fourth viewpoint in the set of viewpoints.

S4, the observation point receiving the propagation information reaches the scale: when information harmful information (public opinion, rumor, etc.) appears, and the number of observation points of the received information is 4, the tracing process can be executed, and the collection of the received observation points is recorded as O _Y The collection of non-reception observation points represents O _N 。

S5, based on O _Y The intermediate reception observation point performs pruning based on a distance threshold on candidate nodes which have low probability of becoming propagation sources, specifically screens out V which is impossible to become a source as shown in a formula (6) ₃ Here, the distance threshold is taken as 8. The undeleted user nodes are determined to be high probability propagation source candidate points.

S6, using the penalty-driven optimal source estimator to determine the high probability propagation source candidate point (V ₄ ，V ₉ ，V ₁₂ ，V ₁₁ ，V ₁₀ ) Performing penalty score evaluation, wherein a penalty-driven optimal source estimator calculates penalty scores of all nodes, and the higher the penalty score is, the smaller the probability of becoming a propagation source is; wherein,the definition of the penalty driven optimal source estimator is shown in equation (7). With user node V in fig. 2 ₁₀ The following are examples: 2+1 represents V ₁₀ Distance to the second observation point o2 and the first observation point o1, respectively. The first observation o1 and the second observation o2 awards by multiplying the score of their immediate neighbors by 0.95, while the third observation o3 and the fourth observation o4 penalize by multiplying the score of their immediate neighbors by 1.05. User node V with lowest penalty score output ₁₀ Is the predicted propagation source.

S7, outputting a node corresponding to the lowest penalty score as a propagation source;

fig. 4 is a schematic diagram of the difference between the network information tracing method and other latest comparison methods according to the embodiment of the present invention: the embodiment of the invention starts to execute the tracing strategy when the propagation time step T=2, and the comparison algorithm starts to execute the tracing strategy when T=6.

The network information tracing method provided by the embodiment of the invention has a plurality of positive effects, and has great advantages compared with the prior art, and the following is described with reference to data, charts and the like of a test process.

Table 1 shows the scale of the test dataset;

table 1 scale of test dataset

Table 1 shows the network configuration information used for the test. g1-G6 are all social network real data sets, and particularly, the network scale corresponding to G5 and G6 does not appear in the most advanced propagation source positioning algorithm.

Fig. 5 is an accuracy evaluation schematic diagram of a network information tracing method and other comparison methods on six real networks according to an embodiment of the present invention: the individual transmission rates in the (a) group are subject to uniform distribution U.about.0.1, 0.3, (b) the individual transmission rates in the (b) group are subject to uniform distribution U.about.0.3, 0.7, (c) the individual transmission rates in the (c) group are subject to uniform distribution U.about. 0.7,0.99. . Wherein the abscissa represents the deployment ratio of the observation points, 5%, 10%, 15%, 20%, 25% and 30% of the deployment ratios of the observation points are selected in the embodiments of the present invention, respectively. The "F-score" in the ordinate represents the accuracy of the prediction, and we averaged the accuracy by averaging 500 independent experiments. The higher the average accuracy, the more capable the algorithm predicts the real source. It can be seen from fig. 5 that the greedy coverage-based rapid propagation source positioning method (GRSL) proposed by the present invention is superior to other methods on all networks, and the average accuracy of positioning is about 10% higher than the best positioning algorithm among the comparison algorithms.

Fig. 6 is a schematic diagram of start-up opportunity evaluation of a network information tracing method and other comparison methods on six real networks according to an embodiment of the present invention: and positioning the different methods sequentially at different time steps, and then taking out the time step corresponding to the optimal prediction accuracy. Wherein the abscissa represents the deployment ratio of the observation points, 5%, 10%, 15%, 20%, 25% and 30% of the deployment ratios of the observation points are selected in the embodiments of the present invention, respectively. The ordinate represents the start time step, and the earlier the time step is, the earlier the algorithm response capability is, and the spread of malicious information can be better restrained. From fig. 6, it can be seen that the greedy coverage-based rapid propagation source location method (GRSL) proposed by the present invention is as early as the response time of the GFNL algorithm, 50% -60% earlier than other comparative algorithms other than GFNL.

Fig. 7 is a schematic diagram of CPU time consumption evaluation of a network information tracing method and other comparison methods on six real networks according to an embodiment of the present invention: wherein the abscissa represents the deployment ratio of the observation points, 5%, 10%, 15%, 20%, 25% and 30% of the deployment ratios of the observation points are selected in the embodiments of the present invention, respectively. The ordinate is the logarithm of the time required for the different methods to complete the execution of the source localization strategy. The shorter the execution time is, the lower the time complexity of the algorithm is, the positioning and the result prediction can be realized rapidly, and the method has better practical significance. From fig. 7, it can be seen that the greedy coverage-based rapid propagation source positioning method (GRSL) proposed by the present invention is about 100 times faster than the fastest algorithm of the comparison algorithms.

Based on the same inventive concept, the embodiment of the invention provides a network information tracing device, and because the principle of the device for solving the technical problem is similar to that of a network information tracing method, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

As shown in fig. 8, the apparatus includes an adding unit 201, a first determining unit 202, and a second determining unit 203.

An adding unit 201, configured to select a special node from a graph network based on an influence of a user node as a first observation point of the graph network, determine an N-order neighbor corresponding to the first observation point from the graph network, and sequentially select one neighbor node and the first observation point from each N-order neighbor arbitrarily, where N is a positive integer greater than 1, to add the neighbor node to the first observation point set;

a first determining unit 202, configured to determine, when there is information propagation, a first observation point in the observation point set, where the propagation information is received, as a reception observation point;

a second determining unit 203, configured to determine a shortest distance sum of each destination observation point and each user node in the graph network, delete one or more of the user nodes whose shortest distance sum is greater than a distance threshold, determine remaining user nodes in the graph network as high probability propagation source candidate points, evaluate the high probability propagation source candidate points based on a penalty-driven optimal source estimator, and determine a user node with a minimum penalty score as a prediction propagation source.

Preferably, the adding unit 201 is specifically configured to:

the set of viewpoints is determined by the following formula:

and the set of viewpoints meets the specification RS (G, O):

Preferably, the second determining unit 203 is further configured to:

Preferably, the second determining unit 203 is specifically configured to:

wherein V is _θ Representing a set of candidate points with high probability of being a propagation source, d (v _i O) represents a user node v _i To the first viewpoint in viewpoint set OIs the shortest distance of O _Y Representing the collection of addressee observation points.

Preferably, the second determining unit 203 is specifically configured to:

wherein,

d _o,v ＝d(v _ψ(o) ,v)+1

Preferably, the second determining unit 203 is specifically configured to:

Preferably, the adding unit 201 is specifically configured to:

wherein,representing the number of neighbors of the user node vi, argmax represents the central node in the selection graph network G, argmin represents the leaf nodes in the selection graph network G, O represents a special node, and O represents a set of observation points. />

It should be understood that the units included in the network information tracing device are only logic division according to the functions implemented by the equipment device, and in practical application, the units may be overlapped or split. The functions implemented by the network information tracing device provided by this embodiment correspond to the network information tracing method provided by the above embodiment one by one, and the more detailed processing flow implemented by the device is described in detail in the above method embodiment one, which is not described in detail here.

Another embodiment of the present invention also provides a computer apparatus, including: a processor and a memory; the memory is used for storing computer program codes, and the computer program codes comprise computer instructions; when the processor executes the computer instructions, the electronic device executes each step of the network information tracing method in the method flow shown in the method embodiment.

In another embodiment of the present invention, a computer readable storage medium is provided, where computer instructions are stored, and when the computer instructions run on a computer device, the computer device is caused to execute the steps of the network information tracing method in the method flow shown in the method embodiment.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The network information tracing method is characterized by comprising the following steps of:

determining the shortest distance sum of each receiving observation point and each user node in the graph network, deleting one or more user nodes with the shortest distance sum being larger than a distance threshold, determining the rest user nodes in the graph network as high-probability propagation source candidate points, evaluating the high-probability propagation source candidate points based on a penalty-driven optimal source estimator, and determining the user node with the minimum penalty score as a prediction propagation source;

the determining, from the graph network, an N-order neighbor corresponding to the first observation point, and selecting, in sequence, one neighbor node from each N-order neighbor and adding the first observation point to an observation point set, specifically includes:

The set of viewpoints is determined by the following formula:

and the set of viewpoints meets the specification RS (G, O):

wherein O is ^F (O) =o represents the set of observation points, O or O' represents O ^F Two different points of view in (o),symbol representing the meaning "for arbitrary",>the symbol representing the meaning of "present", dis is an abbreviation for distance, representing a certain distance value, d (O, O ') is the distance used to evaluate the observation point O and the observation point O', e (O) is the eccentricity of the first observation point O comprised in the observation point set O, V is the set of user nodes in the graph network G, |v| is the number of user nodes, d (V) _i O) represents a user node v _i Shortest to first viewpoint in viewpoint set ODistance Ω (|v|) ² ) The complexity of the representation increases as the number of the user node sets V increases in square, phi (O) represents constraint of time and space on the acquisition process of the observation point set O, and s.t. represents constraint conditions;

the penalty driven optimal source estimator is determined by the following formula:

wherein,

representing the predicted propagation source, O _Y Represents the collection of receiving observation points, O _N Representing the collection of non-receiving observations->Representing a reward factor->Is a penalty coefficient, Γ (v) represents the neighbor set of the user node v, t _o A standardized observation time, v, representing the reception observation point o _ψ(o) The method comprises the steps that user nodes corresponding to propagation sources of a receiving observation point o are represented, do, v represent inferred potential real propagation path distances from the user nodes v to the receiving observation point o, rp (v) represent a reward and punishment mechanism of the user nodes v, p and q represent weight coefficients, and argmin represents parameters enabling a function to obtain a minimum value;

the special node comprises any one of a central node and a leaf node, and is determined by the following formula:

wherein,representing the number of neighbors of the user node vi, argmax representing the parameter that maximizes the function, argmax as the maximum selection function, representing the selection of the following content condition +.>V corresponding to the maximum value of (2) _i Argmax represents the central node in the selection graph network G, argmin represents the parameter that minimizes the function, argmin is the minimum selection function, representing the selection of the following content condition +.>V corresponding to the minimum value of (2) _i Argmin represents a leaf node in the selection graph network G, O represents a special node, and O represents a set of observation points.

2. The method of claim 1, wherein after determining remaining user nodes in the graph network as high probability propagation source candidate points, further comprising:

3. The method of claim 1, wherein the high probability propagation source candidate point is determined by the following formula:

4. The method of claim 1, wherein the normalized observation time of the destination observation point o is represented by the following formula:

5. The utility model provides a network information traceability device which characterized in that includes:

a second determining unit, configured to determine a shortest distance sum of each destination observation point and each user node in the graph network, delete one or more of the user nodes whose shortest distance sum is greater than a distance threshold, determine remaining user nodes in the graph network as high probability propagation source candidate points, evaluate the high probability propagation source candidate points based on a penalty-driven optimal source estimator, and determine a user node with a minimum penalty score as a prediction propagation source;

the set of viewpoints is determined by the following formula:

and the set of viewpoints meets the specification RS (G, O):

wherein O is ^F (O) =o represents the set of observation points, O or O' represents O ^F Different from (o)The two points of view are arranged in the same plane,symbol representing the meaning "for arbitrary",>the symbol representing the meaning of "present", dis is an abbreviation for distance, representing a certain distance value, d (O, O ') is the distance used to evaluate the observation point O and the observation point O', e (O) is the eccentricity of the first observation point O comprised in the observation point set O, V is the set of user nodes in the graph network G, |v| is the number of user nodes, d (V) _i O) represents a user node v _i Shortest distance to first viewpoint in viewpoint set O, Ω (|v| ² ) The complexity of the representation increases as the number of the user node sets V increases in square, phi (O) represents constraint of time and space on the acquisition process of the observation point set O, and s.t. represents constraint conditions;

wherein,

wherein,representing the number of neighbors of the user node vi, argmax representing the parameter that maximizes the function, argmax as the maximum selection function, representing the selection of the following content condition +.>V corresponding to the maximum value of (2) _i Argmax represents the central node in the selection graph network G, argmin represents the parameter that minimizes the function, argmin is the minimum selection function, representing the selection of the following content condition +. >V corresponding to the minimum value of (2) _i Argmin represents a leaf node in the selection graph network G, O represents a special node, and O represents a set of observation points.

6. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the network information tracing method of any one of claims 1-4.

7. A computer readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the network information tracing method of any one of claims 1-4.