CN112967818A - Sudden infectious disease early warning method based on abnormal subgraph detection - Google Patents

Sudden infectious disease early warning method based on abnormal subgraph detection Download PDF

Info

Publication number
CN112967818A
CN112967818A CN202110236120.1A CN202110236120A CN112967818A CN 112967818 A CN112967818 A CN 112967818A CN 202110236120 A CN202110236120 A CN 202110236120A CN 112967818 A CN112967818 A CN 112967818A
Authority
CN
China
Prior art keywords
node
abnormal
nodes
network
subgraph detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110236120.1A
Other languages
Chinese (zh)
Inventor
袁丽娜
生龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Engineering
Original Assignee
Hebei University of Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Engineering filed Critical Hebei University of Engineering
Priority to CN202110236120.1A priority Critical patent/CN112967818A/en
Publication of CN112967818A publication Critical patent/CN112967818A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Remote Sensing (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Alarm Systems (AREA)

Abstract

The invention belongs to the field of space epidemiology and public health emergency decision, and particularly relates to an emergency infectious disease early warning method based on abnormal subgraph detection. The method mainly comprises the following steps: data processing, initial weighted directed network construction, conversion into weighted directed network and abnormal subgraph detection. The invention uses the mobile phone signaling data to construct a complex network, calculates the relative distance between two areas, predicts the infectious diseases by using an abnormal subgraph detection algorithm, combines the field of propagation dynamics with the field of complex science to predict the infectious diseases, and can be repeatedly used and applied to other infectious diseases.

Description

Sudden infectious disease early warning method based on abnormal subgraph detection
Technical Field
The invention belongs to the field of space epidemiology and public health emergency decision, and particularly relates to an abnormal subgraph detection algorithm in the field of propagation dynamics and the field of complex science, so as to realize early warning of sudden infectious diseases.
Background
The traditional disease prediction models such as SIR, SEIR and the like are all based on known parameters such as infection rate, latency, mortality and the like, and model parameters are continuously modified in an iterative mode according to the data of the current infectious diseases to improve the prediction accuracy. However, in the face of outbreak of a novel infectious disease, parameters related to the disease are unknown, how to send early warning information in time under the condition of a small amount of early data and take related measures is of great significance to control of spread of the infectious disease.
Early transmission of infectious diseases followed a simple model of spread, i.e., more cases in regions that are geographically close to the source of infection. With the rapid development of the transportation industry, the travel mode and the population mobility mode of people are changed, and meanwhile, the spreading mode of infectious diseases is also changed. That is, the number of infectious diseases in a certain area is positively correlated with the ratio of population flow in the area of the source of infection. Abnormality detection has been widely used in a large number of fields such as disease detection. The data is constructed into a network, abnormal subgraph detection is carried out, abnormal nodes are found out, abnormal modes are mined, and the spreading rule of infectious diseases can be found.
The invention uses the mobile phone signaling data and the aviation data, combines the disease prediction in the field of propagation dynamics with the abnormal subgraph detection algorithm in the field of complex science, predicts the emergent infectious diseases and carries out early warning on related risk areas.
Disclosure of Invention
In order to overcome the defects of the prior art, the method provides a sudden infectious disease early warning method based on abnormal subgraph detection.
The invention discloses a sudden infectious disease early warning method based on abnormal subgraph detection, which comprises the following steps:
1) processing data;
2) constructing an initial weighted directed network;
3) converting into a weighted directed network, and calculating the p value of the node;
4) and (5) abnormal subgraph detection.
Further, the method comprises the following specific steps:
1) a weighted directed network G ═ (V, E) is constructed, where V is the set of all nodes (e.g., cities, countries) and E is the set of all edges. If a personnel flow record exists between two nodes, a directed weighted edge exists between the two nodes, the weight of the edge is the relative distance between the two nodes, and a flow matrix P (P is more than or equal to 0) is givenmn1) or less), the relative distance d between the node n and the node mmnComprises the following steps:
dmn=1-logPmn (1)
wherein Pmn represents the ratio of the number of the flowing people from the node n to the node m to the total number of the flowing people from the node n, and if a plurality of paths exist between the node n and the node m, the relative distance D between any node n and the node mmnThe updating is as follows:
Figure BDA0002960224690000021
Figure BDA0002960224690000022
wherein Γ ═ v1,K,vnDenotes an ordered path from node n to node m, λ (Γ) denotes the sum of the lengths of ordered paths Γ, DmnRepresenting the shortest distance between the node n and the node m;
2) converting the network in the step 1) into a weighted directed network GA(V, E, p), wherein V ═ V1,...,vNIs GAThe set of nodes of (a) is,
Figure BDA0002960224690000023
is GASet of middle edges, mapping function p: V → [0,1 → [0]For an empirical p-value for each vertex v, the p-value p (v) for node v can be calculated based on empirical correction by comparing the current features of v with the features of v in the historical data
Figure BDA0002960224690000024
Wherein w (v) represents the weight of node v, { v1,...,vTDenotes a node adjacent to the node v;
3) network G is scanned using a Non-parametric map scan (NPGS) function methodAAnd (5) carrying out abnormal subgraph detection and excavating an abnormal connected subgraph.
The general form of NPGS is defined as:
Figure BDA0002960224690000025
wherein the content of the first and second substances,
Figure BDA0002960224690000026
is the vertex of the connected subgraph, Na(S)=∑v∈Sδ (p (v) ≦ a) represents the number of vertices of the subgraph at the a-significant level, n(s) ∑ ev∈S1 represents the number of nodes of the p value in S. If the input is true, δ (), 1, otherwise, δ (), 0. Suppose Na(S)=∑v∈Sδ (p (v) > a), then
Figure BDA0002960224690000027
Significant level a at 0 and constant amax(in general, take amax0.15). In the case of abnormal subgraph detection, phi (a, N)a(S), N (S)) the following two attributes need to be satisfied:
(1) phi is monotonically increasing with respect to N (S)
(2) Phi is about
Figure BDA0002960224690000028
Monotonically decreasing
These two properties are clearly true.
The nonparametric scan statistic BJ statistic and the HC statistic satisfy the above two properties, respectively:
the BJ statistic is defined as:
Figure BDA0002960224690000031
wherein KL is the Kullback-Liebler divergence between observed and expected values for p < a;
Figure BDA0002960224690000032
wherein, a, b ∈ [0,1], when b ═ 1 or b ═ 0, then KL (a, b) ═ 0;
the definition of the HC statistic is:
Figure BDA0002960224690000033
given a non-parametric statistical function
Figure BDA0002960224690000034
The anomaly subgraph detection problem can be optimized as:
Figure BDA0002960224690000035
is equivalent to
Figure BDA0002960224690000036
Compared with the prior art, the invention has the technical characteristics and effects that:
the invention uses the mobile phone signaling data and the aviation data to construct a complex network, calculates the relative distance between two areas, predicts the infectious diseases by using an abnormal subgraph detection algorithm, and has the following characteristics:
1. the procedure is simple and easy to realize;
2. the infectious disease prediction is carried out by combining the field of transmission dynamics with the field of complex science;
3. the constructed complex network can be reused, and is still effective when being applied to other infectious diseases.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention.
FIG. 2 shows the statistics of the number of abnormal regions per day at home and abroad 2020.01.06-2020.03.10.
FIG. 3 is 2020.02.06 showing the first 15 ranked cities of abnormal Chinese cities.
Detailed Description
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings.
The invention relates to a risk early warning method based on sudden infectious diseases, which comprises the following steps:
1) and (3) preprocessing data, namely preprocessing the data by collecting (acquiring) mobile phone signaling data and aviation data.
2) A weighted directed network G ═ (V, E) is constructed, where V is the set of all nodes (e.g., cities, countries) and E is the set of all edges. If a personnel flow record exists between the two nodes, a directed edge with the weight exists between the two nodes, and the weight of the edge is the relative distance between the two nodes. Then a traffic matrix P (0 ≦ P) is givenmn1) or less), the relative distance d between the node n and the node mmnComprises the following steps:
dmn=1-logPmn (1)
wherein Pmn represents the ratio of the number of the flowing people from the node n to the node m to the total number of the flowing people from the node n, and if a plurality of paths exist between the node n and the node m, the relative distance D between any node n and the node mmnThe updating is as follows:
Figure BDA0002960224690000041
Figure BDA0002960224690000042
wherein Γ ═ v1,K,vnDenotes an ordered path from node n to node m, λ (Γ) denotes the sum of the lengths of ordered paths Γ, DmnRepresenting the shortest distance between the node n and the node m;
3) converting the network in the step 2) into a weighted directed network GA(V, E, p), wherein V ═ V1,...,vNIs GAThe set of nodes of (a) is,
Figure BDA0002960224690000043
is GASet of middle edges, mapping function p: V → [0,1 → [0]For each vertexAn empirical p-value for v, the p-value p (v) for node v may be calculated based on empirical corrections by comparing the current characteristics of v with the characteristics of v in historical data.
Figure BDA0002960224690000044
Wherein w (v) represents the weight of node v, { v1,...,vTDenotes a node adjacent to the node v;
4) network G is scanned using a Non-parametric map scan (NPGS) function methodAAnd (5) carrying out abnormal subgraph detection and excavating abnormal subgraphs.
The general form of NPGS is defined as:
Figure BDA0002960224690000045
wherein the content of the first and second substances,
Figure BDA0002960224690000046
is the vertex of the connected subgraph, Na(S)=∑v∈Sδ (p (v) ≦ a) represents the number of vertices of the subgraph at the a-significant level, n(s) ∑ ev∈S1 represents the number of nodes of the p value in S. If the input is true, δ (), 1, otherwise, δ (), 0. Suppose Na(S)=∑v∈Sδ (p (v) > a), then
Figure BDA0002960224690000047
Significant level a at 0 and constant amax(in general, take amax0.15). In the case of abnormal subgraph detection, phi (a, N)a(S), N (S)) the following two attributes need to be satisfied:
(1) phi is monotonically increasing with respect to N (S)
(2) Phi is about
Figure BDA0002960224690000051
Monotonically decreasing
These two properties are clearly true. The nonparametric scan statistic BJ statistic and the HC statistic satisfy the above two properties, respectively.
The BJ statistic is defined as:
Figure BDA0002960224690000052
wherein KL is the Kullback-Liebler divergence between observed and expected values for p < a;
Figure BDA0002960224690000053
wherein, a, b ∈ [0,1], when b ═ 1 or b ═ 0, then KL (a, b) ═ 0;
the definition of the HC statistic is:
Figure BDA0002960224690000054
given a non-parametric statistical function
Figure BDA0002960224690000055
The anomaly subgraph detection problem can be optimized as:
Figure BDA0002960224690000056
is equivalent to
Figure BDA0002960224690000057
4) And 3) obtaining the abnormal connected subgraph as the final result. Taking the data of the aviation network of 2 months and 6 days as an example, the final abnormal connected subgraph comprises 15 nodes (regions), namely the regions need to be observed in a key way in 2 months and 7 days, and the data are respectively as follows:
China japanese France India Thailand
Korea Vietnam Philippines USA Germany
Malaysia Couplet cacique Singapore Australia Canada
Meanwhile, the complex network constructed by taking Wuhan, Shijiazhuang and China as central sources is analyzed, the regions ranked to top10 are respectively as follows, and when a case appears in the central source region, dmnCities that are ranked more highly may be more dangerous.
The following table is the area with the central source, dmnThe region ranked top 10.
Figure BDA0002960224690000061
As shown in figures 1 to 3, the invention uses mobile phone signaling data and aviation data to construct a complex network, calculates the relative distance between two areas, predicts the infectious diseases by using an abnormal subgraph detection algorithm, combines the field of propagation dynamics with the field of complex science to predict the infectious diseases, and can be repeatedly used and still be effectively applied to other infectious diseases.

Claims (2)

1. An early warning method for sudden infectious diseases based on abnormal subgraph detection is characterized by comprising the following steps:
1) processing data;
2) constructing an initial weighted directed network;
3) converting into a weighted directed network, and calculating the p value of the node;
4) and (5) abnormal subgraph detection.
2. An early warning method for sudden infectious diseases based on abnormal subgraph detection is characterized by comprising the following steps:
1) constructing a weighted directed network G ═ (V, E), where V is the set of all nodes and E is the set of all edges;
if a personnel flow record exists between the two nodes, a directed edge with the weight exists between the two nodes, and the weight of the edge is the relative distance between the two nodes;
then a traffic matrix P (0 ≦ P) is givenmn1) or less), the relative distance d between the node n and the node mmnComprises the following steps:
dmn=1-logPmn (1)
wherein, PmnThe ratio of the number of the flowing people from the node n to the node m to the total number of the flowing people from the node n is shown, and if a plurality of paths exist between the node n and the node m, the relative distance D between any node n and the node mmnThe updating is as follows:
Figure FDA0002960224680000011
Figure FDA0002960224680000012
wherein Γ ═ v1,K,vnDenotes an ordered path from node n to node m, λ (Γ) denotes the sum of the lengths of ordered paths Γ, DmnRepresenting the shortest distance between the node n and the node m;
2) converting the network in the step 1) into a weighted directed network GA(V, E, p), wherein V ═ V1,...,vNIs GAThe set of nodes of (a) is,
Figure FDA0002960224680000013
is GASet of middle edges, mapping function p: V → [0,1 → [0]For an empirical p-value for each vertex v, the p-value p (v) for node v can be calculated based on empirical correction by comparing the current features of v with the features of v in the historical data
Figure FDA0002960224680000014
Wherein w (v) represents the weight of node v, { v1,...,vTDenotes a node adjacent to the node v;
3) network G using non-parametric map scan function method NPGSAAnd (3) carrying out abnormal subgraph detection, wherein the general form of mining an abnormal connected subgraph NPGS is defined as:
Figure FDA0002960224680000015
wherein the content of the first and second substances,
Figure FDA0002960224680000021
is the vertex of the connected subgraph, Na(S)=∑v∈Sδ (p (v) ≦ a) represents the number of vertices of the subgraph at the a-significant level, n(s) ∑ ev∈S1 represents the total number of p values in S;
if the input is true, δ (), 1, otherwise, δ (), 0;
suppose Na(S)=∑v∈Sδ (p (v) > a), then
Figure FDA0002960224680000022
Significant level a at 0 and constant amaxGenerally take amaxOptimization is 0.15;
in the case of abnormal subgraph detection, phi (a, N)a(S), N (S)) the following two attributes need to be satisfied:
(1) phi is monotonically increasing with respect to N (S)
(2) Phi is about
Figure FDA0002960224680000023
Monotonically decreasing
These two properties are clearly true, and the nonparametric scan statistic BJ statistic and HC statistic satisfy the above two properties, respectively:
the BJ statistic is defined as:
Figure FDA0002960224680000024
wherein KL is the Kullback-Liebler divergence between observed and expected values for p < a;
Figure FDA0002960224680000025
wherein, a, b ∈ [0,1], when b ═ 1 or b ═ 0, then KL (a, b) ═ 0;
the definition of the HC statistic is:
Figure FDA0002960224680000026
given a non-parametric statistical function
Figure FDA0002960224680000027
The anomaly subgraph detection problem can be optimized as:
Figure FDA0002960224680000028
is equivalent to
Figure FDA0002960224680000029
CN202110236120.1A 2021-03-03 2021-03-03 Sudden infectious disease early warning method based on abnormal subgraph detection Withdrawn CN112967818A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110236120.1A CN112967818A (en) 2021-03-03 2021-03-03 Sudden infectious disease early warning method based on abnormal subgraph detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110236120.1A CN112967818A (en) 2021-03-03 2021-03-03 Sudden infectious disease early warning method based on abnormal subgraph detection

Publications (1)

Publication Number Publication Date
CN112967818A true CN112967818A (en) 2021-06-15

Family

ID=76276864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110236120.1A Withdrawn CN112967818A (en) 2021-03-03 2021-03-03 Sudden infectious disease early warning method based on abnormal subgraph detection

Country Status (1)

Country Link
CN (1) CN112967818A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090080768A1 (en) * 2007-09-20 2009-03-26 Chung Shan Institute Of Science And Technology, Armaments Bureau, M.N.D. Recognition method for images by probing alimentary canals
US20160265070A1 (en) * 2015-01-08 2016-09-15 The Board Of Trustees Of The University Of Illinois Copy number detection and methods
CN112417303A (en) * 2020-12-09 2021-02-26 天津大学 Evolution algorithm for detecting multiple abnormal subgraphs from dynamic attribute graph
CN112422571A (en) * 2020-11-19 2021-02-26 天津大学 Method for carrying out exception alignment across multiple attribute networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090080768A1 (en) * 2007-09-20 2009-03-26 Chung Shan Institute Of Science And Technology, Armaments Bureau, M.N.D. Recognition method for images by probing alimentary canals
US20160265070A1 (en) * 2015-01-08 2016-09-15 The Board Of Trustees Of The University Of Illinois Copy number detection and methods
CN112422571A (en) * 2020-11-19 2021-02-26 天津大学 Method for carrying out exception alignment across multiple attribute networks
CN112417303A (en) * 2020-12-09 2021-02-26 天津大学 Evolution algorithm for detecting multiple abnormal subgraphs from dynamic attribute graph

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
F. CHEN AND D. B. NEILL: ""Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs"", 《PROC. 20TH ACM SIGKDD INT. CONF. KNOWL. DISCOVERY DATA MINING》 *
NANNANWU: ""A_Nonparametric_Approach_to_Uncovering_Connected_Anomalies_by_Tree_Shaped_Priors"", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 *
PATIL, G P, TAILLIE: ""Geographic and network surveillance via scan statistics for critical area detection"", 《STATISTICAL SCIENCE》 *

Similar Documents

Publication Publication Date Title
Eubank et al. Structural and algorithmic aspects of massive social networks
CN111787000B (en) Network security evaluation method and electronic equipment
Church et al. Mapping evacuation risk on transportation networks using a spatial optimization model
Toğa et al. COVID-19 prevalence forecasting using autoregressive integrated moving average (ARIMA) and artificial neural networks (ANN): case of Turkey
CN102801568B (en) Method and device for dynamically evaluating reliability of network
Ding et al. TLQP: Early-stage transportation lock-down and quarantine problem
CN113537788B (en) Urban traffic jam recognition method based on virus propagation theory
Cadena et al. Graph anomaly detection based on Steiner connectivity and density
Hou et al. The Prediction of Multistep Traffic Flow Based on AST‐GCN‐LSTM
JP6310345B2 (en) Privacy protection device, privacy protection method, and database creation method
Yu et al. A dynamic evacuation simulation framework based on geometric algebra
CN112417303A (en) Evolution algorithm for detecting multiple abnormal subgraphs from dynamic attribute graph
de Souza et al. MOCHA: A tool for mobility characterization
CN113162787B (en) Method for fault location in a telecommunication network, node classification method and related devices
Zhou et al. Graph-structured sparse optimization for connected subgraph detection
Cadena et al. Near-optimal and practical algorithms for graph scan statistics with connectivity constraints
CN116504050A (en) Special event city partitioning method based on multi-mode public transport trip data
CN108965287B (en) Virus propagation control method based on limited temporary edge deletion
CN112967818A (en) Sudden infectious disease early warning method based on abnormal subgraph detection
Jiang et al. On spectral graph embedding: A non-backtracking perspective and graph approximation
Yu et al. GSTC-Unet: A U-shaped multi-scaled spatiotemporal graph convolutional network with channel self-attention mechanism for traffic flow forecasting
CN116070385B (en) Automatic risk identification method and system for overseas mineral resource supply chain
CN113609126B (en) Integrated storage management method and system for multi-source space-time data
Giménez-Mujica et al. Epidemic local final size in a metapopulation network as indicator of geographical priority for control strategies in SIR type diseases
Wang et al. Communication network time series prediction algorithm based on big data method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210615

WW01 Invention patent application withdrawn after publication