CN112669980A - Epidemic propagation network reconstruction method and system based on node similarity - Google Patents

Epidemic propagation network reconstruction method and system based on node similarity Download PDF

Info

Publication number
CN112669980A
CN112669980A CN202011579617.5A CN202011579617A CN112669980A CN 112669980 A CN112669980 A CN 112669980A CN 202011579617 A CN202011579617 A CN 202011579617A CN 112669980 A CN112669980 A CN 112669980A
Authority
CN
China
Prior art keywords
node
similarity
network
propagation network
epidemic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011579617.5A
Other languages
Chinese (zh)
Other versions
CN112669980B (en
Inventor
王晖
李学庆
刘诗炎
李大庆
李建欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Beihang University
Original Assignee
Shandong University
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University, Beihang University filed Critical Shandong University
Priority to CN202011579617.5A priority Critical patent/CN112669980B/en
Publication of CN112669980A publication Critical patent/CN112669980A/en
Application granted granted Critical
Publication of CN112669980B publication Critical patent/CN112669980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an epidemic propagation network reconstruction method and system based on node similarity, belonging to the field of epidemic data reconstruction analysis, firstly integrating a flow regulation report to initially establish a local epidemic propagation network, then defining node similarity indexes of three types of epidemic propagation networks based on network structures and node attributes, then reconstructing the epidemic propagation network by adding possible missing edges into the local epidemic propagation network through calculating the similarity indexes of node pairs, and finally comparing network reconstruction accuracy under different indexes. And analyzing the virus propagation speed to predict the space-time propagation range of the virus.

Description

Epidemic propagation network reconstruction method and system based on node similarity
Technical Field
The invention relates to a method and a system for reconstructing an epidemic propagation network based on node similarity, and belongs to the technical field of epidemic data reconstruction analysis.
Background
Since ancient times, human society has suffered from various epidemics, some of which have profound effects on the human development history. In the face of epidemic spread, how to grasp the virus spread window period really controls the infection source, and the key to stopping the spread of epidemic situation is to cut off the spread path. However, with the development of modern transportation, frequent close contact between passengers cannot be avoided as more and more urban residents choose public transportation as a daily travel mode due to the low travel cost of public transportation and encouragement of government policies. In this context, once an epidemic outbreak occurs, it is often characterized by a wide spatial propagation range and difficult excavation of a propagation path.
Epidemiological investigation, called 'flow regulation' for short, is a basic work of epidemic situation response, and can provide basis for judging close contacts, defining disinfection range and the like by investigating epidemiological related information such as morbidity and treatment conditions, clinical characteristics, risk factors, exposure history and the like of epidemic cases. The individual case flow regulation report or the aggregated flow regulation report can only reflect the propagation relationship among one or a plurality of cases, and can not dig out the propagation characteristics and the propagation chain of epidemic diseases, so a large number of flow regulation reports need to be integrated, and the propagation conditions of different cases are linked by a network construction method. However, current research on integrating large numbers of reports on circulation only performs basic statistical analysis on the macro level of epidemics (such as sex, age distribution, etc. of cases), and does not analyze on the micro level how epidemics propagate among individuals. Further, due to restrictions on the survey time and scope, concealment or forgetfulness of the examinee, and the like, the person contact information in the flow chart report is incomplete, and there is a problem that the contact situation is lost.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an epidemic propagation network reconstruction method and system based on node similarity, which comprises the steps of firstly integrating a large number of flow regulation reports to initially establish a local epidemic propagation network, then defining node similarity indexes of three types of epidemic propagation networks based on network structures and node attributes, then reconstructing the epidemic propagation network by adding possible missing edges into the local epidemic propagation network through calculating the similarity indexes of node pairs, and finally comparing the network reconstruction accuracy under different indexes. The virus propagation speed is analyzed, the space-time propagation range of the virus is predicted, the prevention and control strategy can be made, and the method has good practical application value.
The invention aims to provide a method and a system for reconstructing an epidemic propagation network based on node similarity, which can effectively utilize information of a flow regulation report and combine with a multi-source data channel to construct an epidemic propagation network capable of faithfully reflecting a virus propagation path in consideration of the problems of difficulty in mining a propagation chain, difficulty in accurately analyzing propagation characteristics, low utilization rate of a large number of flow regulation reports and the like in the existing epidemic analysis, and provide a new idea for the epidemic analysis, namely, the epidemic propagation chain can be effectively mined based on the constructed epidemic propagation network, the virus propagation speed is analyzed, and the space-time propagation range of viruses is predicted.
The invention adopts the following technical scheme:
a epidemic propagation network reconstruction method based on node similarity comprises the following steps:
step 1: establishing a local epidemic propagation network based on the flow regulation report;
step 2: defining a node similarity index;
and step 3: and reconstructing the epidemic propagation network based on the node similarity index.
Preferably, the step 1 specifically comprises: the method comprises the following steps of drawing a flow regulation report summary table by integrating flow regulation reports, extracting a node set and a connection edge set to construct a local epidemic propagation network, and defining a missing edge in the network, wherein the method comprises the following steps:
1.1, integrating flow regulation reports;
1.2, extracting a node set and a continuous edge set;
1.3, establishing a local epidemic propagation network;
1.4, defining the missing edge of the epidemic propagation network.
Preferably, the step 1.1 specifically comprises:
summarizing all obtained flow regulation reports (assuming that the flow regulation reports which can be obtained only contain confirmed cases and tight connection information thereof but not tight connection information), extracting case information from the flow regulation reports, wherein the case information comprises basic personnel conditions, morbidity and clinic conditions, clinical characteristics, risk factors, exposure history, tight connection personnel and other characteristics, drawing a flow regulation report summary table, wherein indexes of the flow regulation report summary table are all confirmed cases, and the head of the table is all characteristics of the cases, so that a node set and a continuous edge set of an epidemic propagation network are extracted in the step 1.2; secondly, analyzing epidemiological characteristics of the infectious disease based on the integrated case information, wherein the epidemiological characteristics comprise time-varying conditions of confirmed cases, suspected cases and the like, regional distribution conditions of confirmed cases, population characteristics of confirmed cases, death number, gross mortality, mortality density and the like;
preferably, the step 1.2 specifically comprises:
firstly, all the confirmed cases in the flow regulation report summary table are extracted and used as a confirmed case node set V in the epidemic disease propagation network1(ii) a Then, extracting a 'close contact person' characteristic column in the flow regulation report summary table, and taking all close contact persons in the column as a close contact person node set V in the epidemic disease propagation network2(ii) a Then, removing node set V of confirmed cases1And joint sealing personnel node set V2If a node is simultaneously present at V1And V2In (1), directly delete V2The node in (2), on the basis of which, a set V of diagnosed case nodes is summarized1And joint sealing personnel node set V2Obtaining a personnel node set V, namely V ═ V1∪V2(ii) a Finally, two nodes V are arbitrarily selected from the personnel node set Vx、VyIf the flow regulation report records that the two persons represented by the two nodes have close connection relation, the node pair V is considered to bex、VyThere is a connecting edge e betweenxyAnd obtaining a continuous edge set E by pairwise combination of all nodes in the ergodic personnel node set V0
Preferably, the step 1.3 specifically comprises:
integrating the personnel node set V and the connecting edge set E obtained in the step 1.20The local epidemic propagation network G established based on the flow regulation report can be obtained0The local epidemic propagation network comprises two types of personnel nodes (confirmed case nodes and joint sealing personnel nodes); since the flow regulation report in step 1.1 is assumed to only contain information of confirmed cases and their close connections, but not close connections, the local epidemic propagation network constructed according to step 1.2 only contains "the confirmed cases and close connections" and "the confirmed cases and the confirmed cases are connected" and two types of connections;
preferably, the step 1.4 specifically comprises:
the local epidemic propagation network G established in step 1.3 is due to the limitation of investigation time and scope, the hiding or forgetting of the examinee, and the like0The situation that two nodes have close connection relation but are not recorded during flow regulation may exist, and at this time, the connection edge between the two nodes is defined as a missing edge of the popularity propagation network; according to different types of nodes, missing edges can be divided into three types: connecting the confirmed cases with confirmed cases, connecting the confirmed cases with joint sealing personnel of other confirmed cases, and connecting the joint sealing personnel with joint sealing personnel; based on the method proposed in step 2, step 3 excavates the missing edge.
Preferably, the step 2 specifically comprises: in order to mine the missing edges proposed in step 1.4, considering the network structure and node attributes, three node similarity indexes in the epidemic propagation network are respectively proposed, so that method support is provided for the missing edge mining and network reconstruction in step 3, and the method comprises the following steps:
2.1, defining a similarity index based on a network structure;
2.2, defining a similarity index based on the node attribute;
and 2.3, defining a similarity index based on the network structure and the node attribute.
Preferably, the step 2.1 specifically comprises:
consider if the network G is spread in a local epidemic0In the method, two unconnected personnel nodes (including confirmed case nodes and close contact personnel nodes) have a plurality of common neighbor nodes (namely, common close contact persons), so that missing edges can exist between the two nodes; defining a network structure based similarity indicator S according to the number of common neighbors between nodesStructure of the productNamely:
Figure RE-GDA0002946933220000041
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0002946933220000042
representing node pair Vx、VyThe larger the value of the similarity index based on the network structure is, the node pair V is indicatedx、VyThe higher the probability of a connecting edge between them; Γ (x) and Γ (y) represent nodes V, respectivelyxAnd node VyOf the network.
Preferably, the step 2.2 specifically comprises:
the node attributes in the epidemic propagation network comprise occupation, disease history, track and the like of personnel, and the track is taken as an example to provide a similarity index; acquiring track information of each node in the epidemic propagation network, wherein the track information comprises longitude and latitude of the node at each moment, and n moments are assumed to be counted according to actual needs, and the moment set is {0,1,2, …, t, …, n }; for a certain node pair Vx、VyIn other words, the latitude and longitude information of the node pair at each moment is extracted, and the great circle distance between the node pair at each moment t is calculated based on the latitude and longitude
Figure RE-GDA0002946933220000043
(calculating the great circle distance according to the longitude and latitude belongs to the known technology range, and is not described herein); thereby obtaining a great circle distance set k of the node pair at each momentxyIs composed of
Figure RE-GDA0002946933220000044
Figure RE-GDA0002946933220000045
Defining a contact space threshold kpThe contact space threshold kpCan be adjusted according to the infection probability of diseases, and the larger the infection probability of diseases is, the contact space threshold k ispThe larger; by the great circle distance between a pair of nodes at a certain time t
Figure RE-GDA0002946933220000046
And a contact space threshold kpThe space availability index of the node pair at the time can be defined
Figure RE-GDA0002946933220000047
Namely:
Figure RE-GDA0002946933220000048
therefore, the space effectiveness index set K of the node pair at each moment can be obtainedxyIs composed of
Figure RE-GDA0002946933220000049
Figure RE-GDA00029469332200000410
Statistics KxyThe number of the median value continuing to be 1, wherein the maximum number continuing to be 1 is the number
Figure RE-GDA00029469332200000411
Less than kpIs defined as the maximum duration of
Figure RE-GDA00029469332200000412
From this, a similarity index S based on the node attributes can be definedPropertiesNamely:
Figure RE-GDA00029469332200000413
in the formula, TpRepresenting a contact time threshold value, TpCan be adjusted according to the infection probability of diseases, and the larger the infection probability of diseases is, the higher the threshold value T ispThe smaller;
Figure RE-GDA00029469332200000414
representing node pair Vx、VyBased on the similarity index of the node attributes, if the maximum duration of close contact between the pair of nodes exceeds the contact time threshold, then its value is 1, otherwise it is 0.
Preferably, the step 2.3 specifically comprises:
setting network structure weights w respectivelyStructure of the productAnd node attribute weight wProperties,wStructure of the productAnd wPropertiesThe sum is 1; defining similarity index S based on network structure and node attributesStructure and PropertiesNamely:
Figure RE-GDA0002946933220000051
preferably, the step 3 specifically comprises: based on the three similarity indexes provided in the step 2, calculating the similarity of the node pairs to mine missing edges in the local epidemic propagation network, so as to reconstruct the epidemic propagation network; and defining reconstruction accuracy, and selecting a similarity index with high reconstruction accuracy and a reconstructed network thereof by comparing network reconstruction accuracy under different similarities, wherein the method comprises the following steps:
3.1, calculating the similarity of the node pairs;
3.2, reconstructing an epidemic propagation network;
3.3, defining the reconstruction accuracy;
and 3.4, comparing the network reconstruction accuracy under different similarities.
Preferably, the step 3.1 specifically comprises:
traversing the local epidemic propagation network G constructed in the step 1 according to the three similarity indexes provided in the step 20Calculating three similarity values of each node pair in all node pairs in the node set V;
preferably, the step 3.2 specifically comprises:
aiming at a certain similarity index, sequencing all the node pairs calculated in the step 3.1 from high to low according to the similarity value; determining a missing edge similarity threshold f under the similarity index, wherein the threshold f can be adjusted according to the infection probability of diseases, the larger the infection probability of the diseases is, the smaller the threshold f is, and if the similarity value of a certain node pair exceeds f and is not in the local epidemic propagation network G0If the continuous edge is established, the missing edge exists between the node pair; traversing all node pairs, adding missing edges, and reconstructing an epidemic propagation network G under the similarity index;
preferably, the step 3.3 specifically comprises:
aiming at the calculation result of certain similarity, the local epidemic propagation network G constructed in the step 10Connected edge set E0If the similarity values of m node pairs are higher than the missing edge similarity threshold f defined in step 3.2, the network reconstruction accuracy P under the similarity is defined as P ═ m/l, and the larger the value of P is, the higher the network reconstruction accuracy is;
preferably, the step 3.4 specifically comprises:
for the calculation results of the similarity of the three nodes in the step 3.1, respectively calculating the network reconstruction accuracy under the similarity of the three nodes based on the step 3.3; and selecting the node similarity index with the highest network reconstruction accuracy as a final node similarity index, and taking the reconstructed network as a final epidemic propagation network.
A epidemic propagation network reconstruction system based on node similarity comprises a local epidemic propagation network establishing module, a similarity index defining module and an epidemic propagation network reconstruction module;
the local epidemic propagation network establishing module is used for establishing a local epidemic propagation network based on a flow regulation report, the similarity index defining module is used for defining a node similarity index, and the epidemic propagation network reconstructing module is used for reconstructing an epidemic propagation network according to the node similarity index.
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the node similarity-based epidemic propagation network reconfiguration method when executing the program.
A computer-readable storage medium storing computer instructions for causing the computer to execute the node similarity-based epidemic propagation network reconstructing method.
It is worth noting that the method does not belong to a disease census method, but provides a method for analyzing a disease census result based on a complex network theory, and through reconstructing an epidemic propagation network, a propagation chain can be effectively excavated, and disease propagation characteristics can be analyzed, so that the utilization rate of the disease census result is improved.
The invention is not described in detail, and can be carried out by adopting the prior art.
The invention has the beneficial effects that:
the epidemic propagation network reconstruction method based on the node similarity solves the problems that in the existing epidemic analysis, the propagation chain is difficult to mine, the propagation characteristics are difficult to analyze accurately, the utilization rate of a large number of flow regulation reports is low and the like, can effectively mine the epidemic propagation chain based on the constructed epidemic propagation network, analyzes the virus propagation speed, predicts the space-time propagation range of the virus, is beneficial to making a prevention and control strategy, and has good practical application value.
Has the following characteristics:
1. systematicness: the individual case flow regulation report or the aggregated flow regulation report can only reflect the propagation relation among one or a plurality of cases, and the epidemic propagation network constructed by the invention is based on the thought of system science, establishes the connection between the propagation conditions of different cases by integrating a large number of flow regulation reports and multi-channel information, and is beneficial to the mining of the epidemic propagation characteristics and the propagation chain.
2. The flexibility is high: the method for reconstructing the epidemic propagation network based on the node similarity indexes uses the common neighbor numbers as the network structure similarity indexes and uses the track information as the node attribute similarity indexes, and can reconstruct the epidemic propagation networks with different propagation modes and different disease types by adopting the node similarity indexes of different types according to research requirements.
3. The application is wide: the established epidemic propagation network provides a new idea for epidemic analysis, can effectively mine an epidemic propagation chain based on the established epidemic propagation network, analyzes the virus propagation speed, predicts the space-time propagation range of the virus, and has good practical application value.
In conclusion, the epidemic propagation network reconstruction method based on the node similarity provides a good solution for propagation chain mining and propagation characteristic analysis in epidemic analysis.
Drawings
FIG. 1 is a flowchart of a method for reconstructing a epidemic propagation network based on node similarity according to the present invention.
The specific implementation mode is as follows:
in order to make the technical problems and technical solutions to be solved by the present invention clearer, the following detailed description is made with reference to the accompanying drawings and specific embodiments. It is to be understood that the embodiments described herein are for purposes of illustration and explanation only and are not intended to limit the invention.
The invention aims to solve the problems that in the existing epidemic analysis, a propagation chain is difficult to mine, propagation characteristics are difficult to accurately analyze, the utilization rate of a large number of flow regulation reports is low, and the like.
The invention provides a method and a system for reconstructing an epidemic propagation network based on a node similarity index. The method has the advantages of systematicness, high flexibility, wide application and the like, and provides a good solution for propagation chain mining and propagation characteristic analysis in epidemic disease analysis.
The invention is further described with reference to the following description and embodiments in conjunction with the accompanying drawings.
Example 1:
the embodiment of the invention explains the method by taking reconstruction of a certain epidemic propagation network as an example, and the method needs to reconstruct the certain epidemic propagation network at present, namely, a local certain epidemic propagation network is constructed on the basis of integrating national flow regulation reports, and the certain epidemic propagation network is reconstructed by combining multi-channel data according to a node similarity method.
In order to achieve the purpose, the method adopts the technical scheme that: a epidemic propagation network construction method based on node similarity. The process is shown in figure 1 and comprises:
step 1: establishing a local epidemic propagation network based on the flow regulation report;
step 2: defining a node similarity index;
and step 3: and reconstructing a certain epidemic propagation network based on the node similarity indexes.
Wherein, the step 1 specifically comprises the following steps: the method comprises the following steps of integrating a certain epidemic flow regulation report to draw a flow regulation report summary table, extracting a node set and a continuous edge set to construct a local certain epidemic propagation network, and defining a missing edge in the network, wherein the method comprises the following steps:
1.1, integrating flow regulation reports;
1.2, extracting a node set and a continuous edge set;
1.3, establishing a local epidemic propagation network;
1.4, defining a missing edge of a certain epidemic propagation network.
Wherein, the step 1.1 specifically comprises the following steps:
summarizing all acquired flow regulation reports of a certain epidemic disease (the flow regulation reports which can be acquired are assumed to only contain confirmed cases and tight connection information thereof, but not tight connection information), extracting case information from the flow regulation reports, wherein the case information comprises basic personnel conditions, cases of attack and diagnosis, clinical characteristics, risk factors, exposure history, tight connection personnel and other characteristics, drawing a flow regulation report summary table, indexes of the flow regulation report summary table are all confirmed cases, and the head of the table is all characteristics of the cases, so that a node set and a continuous edge set of a certain epidemic disease propagation network are extracted in step 1.2; secondly, analyzing epidemiological characteristics of the infectious disease based on the integrated case information, wherein the epidemiological characteristics comprise time-varying conditions of confirmed cases, suspected cases and the like, regional distribution conditions of confirmed cases, population characteristics of confirmed cases, death number, gross mortality, mortality density and the like;
wherein, the step 1.2 specifically comprises the following steps:
firstly, all the confirmed cases in the flow regulation report summary table are extracted and used as a definite case node set V in a certain epidemic disease propagation network1(ii) a Then, extracting a 'close contact person' characteristic column in a flow regulation report summary table, and taking all close contact persons in the column as a close contact person node set V in a certain epidemic disease propagation network2(ii) a Then, removing node set V of confirmed cases1And joint sealing personnel node set V2If a node is simultaneously present at V1And V2In (1), directly delete V2The node in (2), on the basis of which, a set V of diagnosed case nodes is summarized1And joint sealing personnel node set V2Obtaining a personnel node set V, namely V ═ V1∪V2(ii) a Finally, two nodes V are arbitrarily selected from the personnel node set Vx、VyIf the flow regulation report records that the two persons represented by the two nodes have close connection relation, the node pair V is considered to bex、VyThere is a connecting edge e betweenxyAnd obtaining a continuous edge set E by pairwise combination of all nodes in the ergodic personnel node set V0
Wherein, the step 1.3 is specifically as follows:
integrating the personnel node set V and the connecting edge set E obtained in the step 1.20Then a local epidemic propagation network G established based on the flow regulation report can be obtained0The local epidemic propagation network comprises two types of personnel nodes (confirmed case nodes and joint sealing personnel nodes); since the flow regulation report in step 1.1 is assumed to only contain information of confirmed cases and their close connections, but not close connections, the local epidemic propagation network constructed according to step 1.2 only contains "the confirmed cases and close connections" and "the confirmed cases and the confirmed cases are connected" and two types of connections;
wherein, the step 1.4 is specifically as follows:
the local epidemic propagation network G established in step 1.3 is due to the limitation of investigation time and scope, the concealed or forgotten by the investigator, etc0The condition that two nodes have close connection relation but are not recorded during flow regulation may exist, and at the moment, the connection edge between the two nodes is defined as a missing edge of a certain epidemic propagation network; according to different types of nodes, missing edges can be divided into three types: connecting the confirmed cases with confirmed cases, connecting the confirmed cases with joint sealing personnel of other confirmed cases, and connecting the joint sealing personnel with joint sealing personnel; based on the method proposed in step 2, step 3 excavates the missing edge.
Wherein, the step 2 specifically comprises the following steps: in order to mine the missing edge proposed in step 1.4, considering the network structure and the node attribute, three node similarity indexes in a certain epidemic propagation network are respectively proposed, so that a method support is provided for the missing edge mining and the network reconstruction in step 3, and the method comprises the following steps:
2.1, defining a similarity index based on a network structure;
2.2, defining a similarity index based on the node attribute;
2.3, defining a similarity index based on a network structure and node attributes;
wherein, the step 2.1 specifically comprises the following steps:
consider if a local epidemic propagation network G0In the method, two unconnected personnel nodes (including confirmed case nodes and close contact personnel nodes) have a plurality of common neighbor nodes (namely, common close contact persons), so that missing edges can exist between the two nodes; defining a network structure based similarity indicator S according to the number of common neighbors between nodesStructure of the productNamely:
Figure RE-GDA0002946933220000091
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0002946933220000092
representing node pair Vx、VyThe larger the value of the similarity index based on the network structure is, the node pair V is indicatedx、VyThe higher the probability of a connecting edge between them; Γ (x) and Γ (y) represent nodes V, respectivelyxAnd node VyOf the network.
Wherein, the step 2.2 specifically comprises the following steps:
the node attributes in the epidemic propagation network comprise occupation, disease history, track and the like of personnel, and the track is taken as an example to provide a similarity index; acquiring track information of each node in a certain epidemic propagation network, wherein the track information comprises longitude and latitude of the node at each moment, and n moments are assumed to be counted according to actual needs, and the moment set is {0,1,2, …, t, …, n }; for a certain node pair Vx、VyIn other words, the latitude and longitude information of the node pair at each moment is extracted, and the great circle distance between the node pair at each moment t is calculated based on the latitude and longitude
Figure RE-GDA0002946933220000093
(calculating the great circle distance according to the longitude and latitude belongs to the known technology range, and is not described herein); thereby obtaining a great circle distance set k of the node pair at each momentxyIs composed of
Figure RE-GDA0002946933220000094
Figure RE-GDA0002946933220000095
Defining a contact space threshold kpThe contact space threshold kpCan be adjusted according to the infection probability of diseases, and the larger the infection probability of diseases is, the contact space threshold k ispThe larger; by the great circle distance between a pair of nodes at a certain time t
Figure RE-GDA0002946933220000096
And a contact space threshold kpThe space availability index of the node pair at the time can be defined
Figure RE-GDA0002946933220000097
Namely:
Figure RE-GDA0002946933220000098
therefore, the space effectiveness index set K of the node pair at each moment can be obtainedxyIs composed of
Figure RE-GDA0002946933220000101
Figure RE-GDA0002946933220000102
Statistics KxyThe number of the median value continuing to be 1, wherein the maximum number continuing to be 1 is the number
Figure RE-GDA0002946933220000103
Less than kpIs defined as the maximum duration of
Figure RE-GDA0002946933220000104
From this, a similarity index S based on the node attributes can be definedPropertiesNamely:
Figure RE-GDA0002946933220000105
in the formula, TpRepresenting a contact time threshold value of 0.5h, and the contact time threshold value TpCan be adjusted according to the infection probability of diseases, and the larger the infection probability of diseases is, the higher the threshold value T ispThe smaller;
Figure RE-GDA0002946933220000106
representing node pair Vx、VyBased on the similarity index of the node attributes, if the maximum duration of close contact between the pair of nodes exceeds the contact time threshold, then its value is 1, otherwise it is 0.
The step 2.3 is specifically as follows:
setting network structure weights w respectivelyStructure of the product0.5 and node attribute weight wProperties=0.5,wStructure of the productAnd wPropertiesThe sum is 1; defining similarity index S based on network structure and node attributesStructure and PropertiesNamely:
Figure RE-GDA0002946933220000107
namely, it is
Figure RE-GDA0002946933220000108
Wherein, the step 3 is specifically as follows: based on the three similarity indexes provided in the step 2, calculating the similarity of the node pairs to mine missing edges in a local epidemic propagation network, so as to reconstruct the epidemic propagation network; and defining reconstruction accuracy, and selecting a similarity index with high reconstruction accuracy and a reconstructed network thereof by comparing network reconstruction accuracy under different similarities, wherein the method comprises the following steps:
3.1, calculating the similarity of the node pairs;
3.2, reconstructing a certain epidemic propagation network;
3.3, defining the reconstruction accuracy;
and 3.4, comparing the network reconstruction accuracy under different similarities.
Wherein, the step 3.1 specifically comprises the following steps:
traversing the local epidemic propagation network G constructed in the step 1 according to the three similarity indexes provided in the step 20Calculating three similarity values of each node pair in all node pairs in the node set V;
wherein, the step 3.2 specifically comprises the following steps:
aiming at a certain similarity index, sequencing all the node pairs calculated in the step 3.1 from high to low according to the similarity value; determining a missing edge similarity threshold f under the similarity index, where in this embodiment, the threshold f is 0.5, and if the similarity value of a node pair exceeds 0.5 and is not in the local epidemic propagation network G0If the continuous edge is established, the missing edge exists between the node pair; traversing all node pairs, adding missing edges, and reconstructing a certain epidemic propagation network G under the similarity index;
wherein, the step 3.3 is specifically as follows:
aiming at the calculation result of a certain similarity, the local certain epidemic propagation network G constructed in the step 10Connected edge set E0If the similarity values of m node pairs are higher than the missing edge similarity threshold f defined in step 3.2, the network reconstruction accuracy P under the similarity is defined as P ═ m/l, and the larger the value of P is, the higher the network reconstruction accuracy is;
wherein, the step 3.4 is specifically as follows:
for the calculation results of the similarity of the three nodes in the step 3.1, respectively calculating the network reconstruction accuracy under the similarity of the three nodes based on the step 3.3; and selecting the node similarity index with the highest network reconstruction accuracy as a final node similarity index, and taking the reconstructed network as a final epidemic propagation network.
The present embodiment is not described in detail and is well known in the art.
Example 2:
a epidemic propagation network reconstruction system based on node similarity comprises a local epidemic propagation network establishing module, a similarity index defining module and an epidemic propagation network reconstruction module;
the local epidemic propagation network establishing module is used for establishing a local epidemic propagation network based on a flow regulation report, the similarity index defining module is used for defining a node similarity index, and the epidemic propagation network reconstructing module is used for reconstructing an epidemic propagation network according to the node similarity index.
Example 3:
an electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the node similarity-based epidemic propagation network reconfiguration method of embodiment 1 when executing the program.
Example 4:
a computer-readable storage medium storing computer instructions for causing a computer to execute the node similarity-based epidemic propagation network restructuring method of embodiment 1.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A epidemic propagation network reconstruction method based on node similarity is characterized by comprising the following steps:
step 1: establishing a local epidemic propagation network based on the flow regulation report;
step 2: defining a node similarity index;
and step 3: and reconstructing the epidemic propagation network based on the node similarity index.
2. The method for reconstructing the epidemic propagation network based on the node similarity according to claim 1, wherein the step 1 comprises the following steps:
1.1, integrating flow regulation reports;
1.2, extracting a node set and a continuous edge set;
1.3, establishing a local epidemic propagation network;
1.4, defining the missing edge of the epidemic propagation network.
3. The method for reconstructing the epidemic propagation network based on the node similarity according to claim 2, wherein the step 1.1 specifically comprises:
summarizing all the obtained flow regulation reports, wherein the flow regulation reports which can be obtained are supposed to only contain confirmed cases and tight connection information thereof but not tight connection information, extracting case information from the flow regulation reports, wherein the case information comprises basic personnel conditions, morbidity and clinic conditions, clinical characteristics, risk factors, exposure history and tight connection personnel characteristics, and drawing a flow regulation report summary table, wherein the index of the flow regulation report summary table is all the confirmed cases, and the head of the table is all the characteristics of the cases; secondly, analyzing epidemiological characteristics of the infectious diseases based on the integrated case information, wherein the epidemiological characteristics comprise time-varying conditions of confirmed cases, suspected cases and the like, regional distribution conditions of confirmed cases, population characteristics of confirmed cases, death number, gross mortality and mortality density;
the step 1.2 is specifically as follows:
firstly, all the confirmed cases in the flow regulation report summary table are extracted and used as a confirmed case node set V in the epidemic disease propagation network1(ii) a Then, extracting a 'close contact person' characteristic column in the flow regulation report summary table, and taking all close contact persons in the column as a close contact person node set V in the epidemic disease propagation network2(ii) a Then, removing node set V of confirmed cases1And joint sealing personnel node set V2If a node is simultaneously present at V1And V2In (1), directly delete V2The node in (2), on the basis of which, a set V of diagnosed case nodes is summarized1And joint sealing personnel node set V2Obtaining a personnel node set V, namely V ═ V1∪V2(ii) a Finally, two nodes V are arbitrarily selected from the personnel node set Vx、VyIf the flow regulation report records that the two persons represented by the two nodes have a secret between the two personsIf the node pair is in the connection relation, the node pair V is consideredx、VyThere is a connecting edge e betweenxyAnd obtaining a continuous edge set E by pairwise combination of all nodes in the ergodic personnel node set V0
The step 1.3 is specifically as follows:
integrating the personnel node set V and the connecting edge set E obtained in the step 1.20The local epidemic propagation network G established based on the flow regulation report can be obtained0The local epidemic propagation network comprises two types of personnel nodes, namely a confirmed case node and a joint sealing personnel node; since the flow regulation report in step 1.1 is assumed to only contain information of confirmed cases and their close connections, but not close connections, the local epidemic propagation network constructed according to step 1.2 only contains "the confirmed cases and close connections" and "the confirmed cases and the confirmed cases are connected" and two types of connections;
the step 1.4 is specifically as follows:
step 1.3 local epidemic propagation network G0The condition that two nodes have close connection relation but are not recorded during flow regulation exists, and at the moment, the connection edge between the two nodes is defined as a missing edge of the popularity propagation network; according to different types of nodes, missing edges are divided into three types: the confirmed cases are connected with the confirmed cases, the confirmed cases are connected with the close contact persons of other confirmed cases, and the close contact persons are connected with the close contact persons.
4. The method for reconstructing the epidemic propagation network based on node similarity according to claim 1, wherein the step 2 comprises the following steps:
2.1, defining a similarity index based on a network structure;
2.2, defining a similarity index based on the node attribute;
and 2.3, defining a similarity index based on the network structure and the node attribute.
5. The method for reconstructing the epidemic propagation network based on the node similarity according to claim 4, wherein the step 2.1 specifically comprises:
if the local epidemic spreads the network G0In the method, two unconnected personnel nodes have a plurality of common neighbor nodes, namely common joint contact persons, the personnel nodes comprise confirmed case nodes and joint contact personnel nodes, and missing edges can exist between the two nodes; defining a network structure based similarity indicator S according to the number of common neighbors between nodesStructure of the productNamely:
Figure FDA0002864058070000021
in the formula (I), the compound is shown in the specification,
Figure FDA0002864058070000022
representing node pair Vx、VyThe larger the value of the similarity index based on the network structure is, the node pair V is indicatedx、VyThe higher the probability of a connecting edge between them; Γ (x) and Γ (y) represent nodes V, respectivelyxAnd node VyThe neighbor of (2);
the step 2.2 specifically comprises the following steps:
taking the track as an example to provide a similarity index; acquiring track information of each node in the epidemic propagation network, wherein the track information comprises longitude and latitude of the node at each moment, and n moments are assumed to be counted according to actual needs, and the moment set is {0,1,2, …, t, …, n }; for a certain node pair Vx、VyIn other words, the latitude and longitude information of the node pair at each moment is extracted, and the great circle distance between the node pair at each moment t is calculated based on the latitude and longitude
Figure FDA0002864058070000031
Thereby obtaining a great circle distance set k of the node pair at each momentxyIs composed of
Figure FDA0002864058070000032
Defining a contact space threshold kp(ii) a By the great circle distance between a pair of nodes at a certain time t
Figure FDA0002864058070000033
And a contact space threshold kpDefining the space validity index of the node pair at the moment
Figure FDA0002864058070000034
Namely:
Figure FDA0002864058070000035
therefore, the space effectiveness index set K of the node pair at each moment can be obtainedxyIs composed of
Figure FDA0002864058070000036
Figure FDA0002864058070000037
Statistics KxyThe number of the median value continuing to be 1, wherein the maximum number continuing to be 1 is the number
Figure FDA0002864058070000038
Less than kpIs defined as the maximum duration of
Figure FDA0002864058070000039
Thereby defining a similarity index S based on the node attributesPropertiesNamely:
Figure FDA00028640580700000310
in the formula, TpRepresents a contact time threshold;
Figure FDA00028640580700000311
representing node pair Vx、VySimilarity index based on node attributes, if the node pair is closeIf the maximum duration of the distance contact exceeds the contact time threshold, the value is 1, otherwise the value is 0;
the step 2.3 is specifically as follows:
setting network structure weights w respectivelyStructure of the productAnd node attribute weight wProperties,wStructure of the productAnd wPropertiesThe sum is 1; defining similarity index S based on network structure and node attributesStructure and PropertiesNamely:
Figure FDA00028640580700000312
6. the method for reconstructing epidemic propagation network based on node similarity according to claim 1, wherein the step 3 comprises the following steps:
3.1, calculating the similarity of the node pairs;
3.2, reconstructing an epidemic propagation network;
3.3, defining the reconstruction accuracy;
and 3.4, comparing the network reconstruction accuracy under different similarities.
7. The method for reconstructing the epidemic propagation network based on the node similarity according to claim 6, wherein the step 3.1 specifically comprises:
traversing the local epidemic propagation network G constructed in the step 1 according to the three similarity indexes provided in the step 20Calculating three similarity values of each node pair in all node pairs in the node set V;
the step 3.2 is specifically as follows:
aiming at a certain similarity index, sequencing all the node pairs calculated in the step 3.1 from high to low according to the similarity value; determining a similarity threshold f of the missing edge under the similarity index, if the similarity value of a certain node pair exceeds f and is not in the local epidemic propagation network G0If the continuous edge is established, the missing edge exists between the node pair; all the node pairs are traversed and,adding missing edges, and reconstructing an epidemic propagation network G under the similarity index;
the step 3.3 is specifically as follows:
aiming at the calculation result of certain similarity, the local epidemic propagation network G constructed in the step 10Connected edge set E0If the similarity values of m node pairs are higher than the missing edge similarity threshold f defined in step 3.2, the network reconstruction accuracy P under the similarity is defined as P ═ m/l, and the larger the value of P is, the higher the network reconstruction accuracy is;
the step 3.4 is specifically as follows:
for the calculation results of the similarity of the three nodes in the step 3.1, respectively calculating the network reconstruction accuracy under the similarity of the three nodes based on the step 3.3; and selecting the node similarity index with the highest network reconstruction accuracy as a final node similarity index, and taking the reconstructed network as a final epidemic propagation network.
8. A epidemic propagation network reconstruction system based on node similarity is characterized by comprising a local epidemic propagation network establishing module, a similarity index defining module and an epidemic propagation network reconstruction module;
the local epidemic propagation network establishing module is used for establishing a local epidemic propagation network based on a flow regulation report, the similarity index defining module is used for defining a node similarity index, and the epidemic propagation network reconstructing module is used for reconstructing an epidemic propagation network according to the node similarity index.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the node similarity-based epidemic propagation network restructuring method of claim 1.
10. A computer-readable storage medium storing computer instructions for causing a computer to execute the node similarity-based epidemic propagation network restructuring method of claim 1.
CN202011579617.5A 2020-12-28 2020-12-28 Epidemic propagation network reconstruction method and system based on node similarity Active CN112669980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011579617.5A CN112669980B (en) 2020-12-28 2020-12-28 Epidemic propagation network reconstruction method and system based on node similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011579617.5A CN112669980B (en) 2020-12-28 2020-12-28 Epidemic propagation network reconstruction method and system based on node similarity

Publications (2)

Publication Number Publication Date
CN112669980A true CN112669980A (en) 2021-04-16
CN112669980B CN112669980B (en) 2022-03-11

Family

ID=75410835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011579617.5A Active CN112669980B (en) 2020-12-28 2020-12-28 Epidemic propagation network reconstruction method and system based on node similarity

Country Status (1)

Country Link
CN (1) CN112669980B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417239A (en) * 2022-03-29 2022-04-29 北京科技大学 Strategy migration method and device for epidemic situation prevention and control under experience shortage
CN114494643A (en) * 2022-01-11 2022-05-13 西北工业大学 Disease propagation control method based on network division
CN114496299A (en) * 2022-04-14 2022-05-13 八爪鱼人工智能科技(常熟)有限公司 Epidemic prevention information processing method based on deep learning and epidemic prevention service system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945310A (en) * 2012-09-27 2013-02-27 吉林大学 Epidemic propagation network modeling and inference of based on autonomic computing
CN103279512A (en) * 2013-05-17 2013-09-04 湖州师范学院 Method for using most influential node in social network to achieve efficient viral marketing
CN110223785A (en) * 2019-05-28 2019-09-10 北京师范大学 A kind of infectious disease transmission network reconstruction method based on deep learning
CN111261301A (en) * 2020-02-13 2020-06-09 姜通渊 Big data infectious disease prevention and control method and system
CN111354472A (en) * 2020-02-20 2020-06-30 戴建荣 Infectious disease transmission monitoring and early warning system and method
CN111354471A (en) * 2020-02-19 2020-06-30 自然资源部第一海洋研究所 Infectious disease transmission rate and epidemic situation evaluation method based on data
CN111370139A (en) * 2020-05-26 2020-07-03 第四范式(北京)技术有限公司 Infectious disease tracing method and device, electronic equipment and storage medium
CN111430041A (en) * 2020-03-26 2020-07-17 北京懿医云科技有限公司 Infectious disease epidemic situation prediction method and device, storage medium and electronic equipment
CN111446006A (en) * 2020-04-08 2020-07-24 陈恬慧 Method for tracking close contact person in epidemic situation of infectious disease
CN111540477A (en) * 2020-04-20 2020-08-14 中国科学院地理科学与资源研究所 Respiratory infectious disease close contact person identification method based on mobile phone data
CN111739657A (en) * 2020-07-20 2020-10-02 北京梦天门科技股份有限公司 Epidemic infected person prediction method and system based on knowledge graph
CN111743522A (en) * 2020-06-15 2020-10-09 武汉理工大学 Intelligent terminal early warning system of epidemic situation prevention and control
CN111863271A (en) * 2020-06-08 2020-10-30 浙江大学 Early warning, prevention and control analysis system for serious infectious disease transmission risk of new coronary pneumonia

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945310A (en) * 2012-09-27 2013-02-27 吉林大学 Epidemic propagation network modeling and inference of based on autonomic computing
CN103279512A (en) * 2013-05-17 2013-09-04 湖州师范学院 Method for using most influential node in social network to achieve efficient viral marketing
CN110223785A (en) * 2019-05-28 2019-09-10 北京师范大学 A kind of infectious disease transmission network reconstruction method based on deep learning
CN111261301A (en) * 2020-02-13 2020-06-09 姜通渊 Big data infectious disease prevention and control method and system
CN111354471A (en) * 2020-02-19 2020-06-30 自然资源部第一海洋研究所 Infectious disease transmission rate and epidemic situation evaluation method based on data
CN111354472A (en) * 2020-02-20 2020-06-30 戴建荣 Infectious disease transmission monitoring and early warning system and method
CN111430041A (en) * 2020-03-26 2020-07-17 北京懿医云科技有限公司 Infectious disease epidemic situation prediction method and device, storage medium and electronic equipment
CN111446006A (en) * 2020-04-08 2020-07-24 陈恬慧 Method for tracking close contact person in epidemic situation of infectious disease
CN111540477A (en) * 2020-04-20 2020-08-14 中国科学院地理科学与资源研究所 Respiratory infectious disease close contact person identification method based on mobile phone data
CN111370139A (en) * 2020-05-26 2020-07-03 第四范式(北京)技术有限公司 Infectious disease tracing method and device, electronic equipment and storage medium
CN111863271A (en) * 2020-06-08 2020-10-30 浙江大学 Early warning, prevention and control analysis system for serious infectious disease transmission risk of new coronary pneumonia
CN111743522A (en) * 2020-06-15 2020-10-09 武汉理工大学 Intelligent terminal early warning system of epidemic situation prevention and control
CN111739657A (en) * 2020-07-20 2020-10-02 北京梦天门科技股份有限公司 Epidemic infected person prediction method and system based on knowledge graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李卫疆等: ""一种基于节点相似度的标签传播算法"", 《软件导刊》 *
王照永等: ""一种基于结构及节点特征相似度的社交网络图数据去匿名方法"", 《研究与开发》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114494643A (en) * 2022-01-11 2022-05-13 西北工业大学 Disease propagation control method based on network division
CN114494643B (en) * 2022-01-11 2024-02-23 西北工业大学 Disease transmission control method based on network division
CN114417239A (en) * 2022-03-29 2022-04-29 北京科技大学 Strategy migration method and device for epidemic situation prevention and control under experience shortage
CN114496299A (en) * 2022-04-14 2022-05-13 八爪鱼人工智能科技(常熟)有限公司 Epidemic prevention information processing method based on deep learning and epidemic prevention service system
CN114496299B (en) * 2022-04-14 2022-06-21 八爪鱼人工智能科技(常熟)有限公司 Epidemic prevention information processing method based on deep learning and epidemic prevention service system

Also Published As

Publication number Publication date
CN112669980B (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN112669980B (en) Epidemic propagation network reconstruction method and system based on node similarity
CN112786210B (en) Epidemic propagation tracking method and system
CN110957015B (en) Missing value filling method for electronic medical record data
Chamberlain et al. Analysis in HUGIN of data conflict
CN112786205B (en) Data model-based syndrome early warning method, device, medium and equipment
CN111768873A (en) COVID-19 real-time risk prediction method
CN112599253A (en) Method, device, equipment and medium for determining epidemic situation propagation path according to close contact map
CN110322356A (en) The medical insurance method for detecting abnormality and system of dynamic multi-mode are excavated based on HIN
CN112366000A (en) Method for predicting number of specific population in region during infectious disease transmission
Darrab et al. Modern applications and challenges for rare itemset mining
Zhang et al. SEIR-FMi: A coronavirus disease epidemiological model based on intra-city movement, inter-city movement and medical resource investment
CN113380420B (en) Epidemic situation prediction method and device, electronic equipment and storage medium
Tang et al. Research on COVID-19 Prevention and Control Model Based on Evolutionary Games
Zhang et al. Network modeling and analysis of COVID-19 testing strategies
Liang et al. Resilience analysis for confronting the spreading risk of contagious diseases
Tatara et al. Application of distributed agent-based modeling to investigate opioid use outcomes in justice involved populations
CN111986034B (en) Medical insurance group fraud monitoring method, system and storage medium
Borsboom et al. The lighting of the BECONs: A behavioral data science approach to tracking interventions in COVID-19 research
Silva et al. Literature review on epidemiological modelling, spatial modelling and artificial intelligence for COVID-19
CN114664457A (en) Clinical path establishing and optimizing method meeting differential privacy constraints
CN116487062A (en) Pathological data analysis method and device
CN108733683A (en) A kind of method and device for exploration event clue of being sounded out the people in a given scope one by one in order to break a criminal case based on data
Kabiri et al. How different age groups responded to the COVID-19 pandemic in terms of mobility behaviors: a case study of the United States
López-Blanco et al. Trustworthy Artificial Intelligence-based federated architecture for symptomatic disease detection
Kaur et al. Comparative Analysis of Machine Learning Classifiers on Forecasting Dengue Fever Infection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant