CN112967818A - Sudden infectious disease early warning method based on abnormal subgraph detection - Google Patents
Sudden infectious disease early warning method based on abnormal subgraph detection Download PDFInfo
- Publication number
- CN112967818A CN112967818A CN202110236120.1A CN202110236120A CN112967818A CN 112967818 A CN112967818 A CN 112967818A CN 202110236120 A CN202110236120 A CN 202110236120A CN 112967818 A CN112967818 A CN 112967818A
- Authority
- CN
- China
- Prior art keywords
- node
- abnormal
- nodes
- network
- subgraph detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/80—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Remote Sensing (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Alarm Systems (AREA)
Abstract
The invention belongs to the field of space epidemiology and public health emergency decision, and particularly relates to an emergency infectious disease early warning method based on abnormal subgraph detection. The method mainly comprises the following steps: data processing, initial weighted directed network construction, conversion into weighted directed network and abnormal subgraph detection. The invention uses the mobile phone signaling data to construct a complex network, calculates the relative distance between two areas, predicts the infectious diseases by using an abnormal subgraph detection algorithm, combines the field of propagation dynamics with the field of complex science to predict the infectious diseases, and can be repeatedly used and applied to other infectious diseases.
Description
Technical Field
The invention belongs to the field of space epidemiology and public health emergency decision, and particularly relates to an abnormal subgraph detection algorithm in the field of propagation dynamics and the field of complex science, so as to realize early warning of sudden infectious diseases.
Background
The traditional disease prediction models such as SIR, SEIR and the like are all based on known parameters such as infection rate, latency, mortality and the like, and model parameters are continuously modified in an iterative mode according to the data of the current infectious diseases to improve the prediction accuracy. However, in the face of outbreak of a novel infectious disease, parameters related to the disease are unknown, how to send early warning information in time under the condition of a small amount of early data and take related measures is of great significance to control of spread of the infectious disease.
Early transmission of infectious diseases followed a simple model of spread, i.e., more cases in regions that are geographically close to the source of infection. With the rapid development of the transportation industry, the travel mode and the population mobility mode of people are changed, and meanwhile, the spreading mode of infectious diseases is also changed. That is, the number of infectious diseases in a certain area is positively correlated with the ratio of population flow in the area of the source of infection. Abnormality detection has been widely used in a large number of fields such as disease detection. The data is constructed into a network, abnormal subgraph detection is carried out, abnormal nodes are found out, abnormal modes are mined, and the spreading rule of infectious diseases can be found.
The invention uses the mobile phone signaling data and the aviation data, combines the disease prediction in the field of propagation dynamics with the abnormal subgraph detection algorithm in the field of complex science, predicts the emergent infectious diseases and carries out early warning on related risk areas.
Disclosure of Invention
In order to overcome the defects of the prior art, the method provides a sudden infectious disease early warning method based on abnormal subgraph detection.
The invention discloses a sudden infectious disease early warning method based on abnormal subgraph detection, which comprises the following steps:
1) processing data;
2) constructing an initial weighted directed network;
3) converting into a weighted directed network, and calculating the p value of the node;
4) and (5) abnormal subgraph detection.
Further, the method comprises the following specific steps:
1) a weighted directed network G ═ (V, E) is constructed, where V is the set of all nodes (e.g., cities, countries) and E is the set of all edges. If a personnel flow record exists between two nodes, a directed weighted edge exists between the two nodes, the weight of the edge is the relative distance between the two nodes, and a flow matrix P (P is more than or equal to 0) is givenmn1) or less), the relative distance d between the node n and the node mmnComprises the following steps:
dmn=1-logPmn (1)
wherein Pmn represents the ratio of the number of the flowing people from the node n to the node m to the total number of the flowing people from the node n, and if a plurality of paths exist between the node n and the node m, the relative distance D between any node n and the node mmnThe updating is as follows:
wherein Γ ═ v1,K,vnDenotes an ordered path from node n to node m, λ (Γ) denotes the sum of the lengths of ordered paths Γ, DmnRepresenting the shortest distance between the node n and the node m;
2) converting the network in the step 1) into a weighted directed network GA(V, E, p), wherein V ═ V1,...,vNIs GAThe set of nodes of (a) is,is GASet of middle edges, mapping function p: V → [0,1 → [0]For an empirical p-value for each vertex v, the p-value p (v) for node v can be calculated based on empirical correction by comparing the current features of v with the features of v in the historical data
Wherein w (v) represents the weight of node v, { v1,...,vTDenotes a node adjacent to the node v;
3) network G is scanned using a Non-parametric map scan (NPGS) function methodAAnd (5) carrying out abnormal subgraph detection and excavating an abnormal connected subgraph.
The general form of NPGS is defined as:
wherein the content of the first and second substances,is the vertex of the connected subgraph, Na(S)=∑v∈Sδ (p (v) ≦ a) represents the number of vertices of the subgraph at the a-significant level, n(s) ∑ ev∈S1 represents the number of nodes of the p value in S. If the input is true, δ (), 1, otherwise, δ (), 0. Suppose Na(S)=∑v∈Sδ (p (v) > a), thenSignificant level a at 0 and constant amax(in general, take amax0.15). In the case of abnormal subgraph detection, phi (a, N)a(S), N (S)) the following two attributes need to be satisfied:
(1) phi is monotonically increasing with respect to N (S)
These two properties are clearly true.
The nonparametric scan statistic BJ statistic and the HC statistic satisfy the above two properties, respectively:
the BJ statistic is defined as:
wherein KL is the Kullback-Liebler divergence between observed and expected values for p < a;
wherein, a, b ∈ [0,1], when b ═ 1 or b ═ 0, then KL (a, b) ═ 0;
the definition of the HC statistic is:
given a non-parametric statistical functionThe anomaly subgraph detection problem can be optimized as:
is equivalent to
Compared with the prior art, the invention has the technical characteristics and effects that:
the invention uses the mobile phone signaling data and the aviation data to construct a complex network, calculates the relative distance between two areas, predicts the infectious diseases by using an abnormal subgraph detection algorithm, and has the following characteristics:
1. the procedure is simple and easy to realize;
2. the infectious disease prediction is carried out by combining the field of transmission dynamics with the field of complex science;
3. the constructed complex network can be reused, and is still effective when being applied to other infectious diseases.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention.
FIG. 2 shows the statistics of the number of abnormal regions per day at home and abroad 2020.01.06-2020.03.10.
FIG. 3 is 2020.02.06 showing the first 15 ranked cities of abnormal Chinese cities.
Detailed Description
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings.
The invention relates to a risk early warning method based on sudden infectious diseases, which comprises the following steps:
1) and (3) preprocessing data, namely preprocessing the data by collecting (acquiring) mobile phone signaling data and aviation data.
2) A weighted directed network G ═ (V, E) is constructed, where V is the set of all nodes (e.g., cities, countries) and E is the set of all edges. If a personnel flow record exists between the two nodes, a directed edge with the weight exists between the two nodes, and the weight of the edge is the relative distance between the two nodes. Then a traffic matrix P (0 ≦ P) is givenmn1) or less), the relative distance d between the node n and the node mmnComprises the following steps:
dmn=1-logPmn (1)
wherein Pmn represents the ratio of the number of the flowing people from the node n to the node m to the total number of the flowing people from the node n, and if a plurality of paths exist between the node n and the node m, the relative distance D between any node n and the node mmnThe updating is as follows:
wherein Γ ═ v1,K,vnDenotes an ordered path from node n to node m, λ (Γ) denotes the sum of the lengths of ordered paths Γ, DmnRepresenting the shortest distance between the node n and the node m;
3) converting the network in the step 2) into a weighted directed network GA(V, E, p), wherein V ═ V1,...,vNIs GAThe set of nodes of (a) is,is GASet of middle edges, mapping function p: V → [0,1 → [0]For each vertexAn empirical p-value for v, the p-value p (v) for node v may be calculated based on empirical corrections by comparing the current characteristics of v with the characteristics of v in historical data.
Wherein w (v) represents the weight of node v, { v1,...,vTDenotes a node adjacent to the node v;
4) network G is scanned using a Non-parametric map scan (NPGS) function methodAAnd (5) carrying out abnormal subgraph detection and excavating abnormal subgraphs.
The general form of NPGS is defined as:
wherein the content of the first and second substances,is the vertex of the connected subgraph, Na(S)=∑v∈Sδ (p (v) ≦ a) represents the number of vertices of the subgraph at the a-significant level, n(s) ∑ ev∈S1 represents the number of nodes of the p value in S. If the input is true, δ (), 1, otherwise, δ (), 0. Suppose Na(S)=∑v∈Sδ (p (v) > a), thenSignificant level a at 0 and constant amax(in general, take amax0.15). In the case of abnormal subgraph detection, phi (a, N)a(S), N (S)) the following two attributes need to be satisfied:
(1) phi is monotonically increasing with respect to N (S)
These two properties are clearly true. The nonparametric scan statistic BJ statistic and the HC statistic satisfy the above two properties, respectively.
The BJ statistic is defined as:
wherein KL is the Kullback-Liebler divergence between observed and expected values for p < a;
wherein, a, b ∈ [0,1], when b ═ 1 or b ═ 0, then KL (a, b) ═ 0;
the definition of the HC statistic is:
given a non-parametric statistical functionThe anomaly subgraph detection problem can be optimized as:
is equivalent to
4) And 3) obtaining the abnormal connected subgraph as the final result. Taking the data of the aviation network of 2 months and 6 days as an example, the final abnormal connected subgraph comprises 15 nodes (regions), namely the regions need to be observed in a key way in 2 months and 7 days, and the data are respectively as follows:
China | japanese | France | India | Thailand |
Korea | Vietnam | Philippines | USA | Germany |
Malaysia | Couplet cacique | Singapore | Australia | Canada |
Meanwhile, the complex network constructed by taking Wuhan, Shijiazhuang and China as central sources is analyzed, the regions ranked to top10 are respectively as follows, and when a case appears in the central source region, dmnCities that are ranked more highly may be more dangerous.
The following table is the area with the central source, dmnThe region ranked top 10.
As shown in figures 1 to 3, the invention uses mobile phone signaling data and aviation data to construct a complex network, calculates the relative distance between two areas, predicts the infectious diseases by using an abnormal subgraph detection algorithm, combines the field of propagation dynamics with the field of complex science to predict the infectious diseases, and can be repeatedly used and still be effectively applied to other infectious diseases.
Claims (2)
1. An early warning method for sudden infectious diseases based on abnormal subgraph detection is characterized by comprising the following steps:
1) processing data;
2) constructing an initial weighted directed network;
3) converting into a weighted directed network, and calculating the p value of the node;
4) and (5) abnormal subgraph detection.
2. An early warning method for sudden infectious diseases based on abnormal subgraph detection is characterized by comprising the following steps:
1) constructing a weighted directed network G ═ (V, E), where V is the set of all nodes and E is the set of all edges;
if a personnel flow record exists between the two nodes, a directed edge with the weight exists between the two nodes, and the weight of the edge is the relative distance between the two nodes;
then a traffic matrix P (0 ≦ P) is givenmn1) or less), the relative distance d between the node n and the node mmnComprises the following steps:
dmn=1-logPmn (1)
wherein, PmnThe ratio of the number of the flowing people from the node n to the node m to the total number of the flowing people from the node n is shown, and if a plurality of paths exist between the node n and the node m, the relative distance D between any node n and the node mmnThe updating is as follows:
wherein Γ ═ v1,K,vnDenotes an ordered path from node n to node m, λ (Γ) denotes the sum of the lengths of ordered paths Γ, DmnRepresenting the shortest distance between the node n and the node m;
2) converting the network in the step 1) into a weighted directed network GA(V, E, p), wherein V ═ V1,...,vNIs GAThe set of nodes of (a) is,is GASet of middle edges, mapping function p: V → [0,1 → [0]For an empirical p-value for each vertex v, the p-value p (v) for node v can be calculated based on empirical correction by comparing the current features of v with the features of v in the historical data
Wherein w (v) represents the weight of node v, { v1,...,vTDenotes a node adjacent to the node v;
3) network G using non-parametric map scan function method NPGSAAnd (3) carrying out abnormal subgraph detection, wherein the general form of mining an abnormal connected subgraph NPGS is defined as:
wherein the content of the first and second substances,is the vertex of the connected subgraph, Na(S)=∑v∈Sδ (p (v) ≦ a) represents the number of vertices of the subgraph at the a-significant level, n(s) ∑ ev∈S1 represents the total number of p values in S;
if the input is true, δ (), 1, otherwise, δ (), 0;
suppose Na(S)=∑v∈Sδ (p (v) > a), thenSignificant level a at 0 and constant amaxGenerally take amaxOptimization is 0.15;
in the case of abnormal subgraph detection, phi (a, N)a(S), N (S)) the following two attributes need to be satisfied:
(1) phi is monotonically increasing with respect to N (S)
These two properties are clearly true, and the nonparametric scan statistic BJ statistic and HC statistic satisfy the above two properties, respectively:
the BJ statistic is defined as:
wherein KL is the Kullback-Liebler divergence between observed and expected values for p < a;
wherein, a, b ∈ [0,1], when b ═ 1 or b ═ 0, then KL (a, b) ═ 0;
the definition of the HC statistic is:
given a non-parametric statistical functionThe anomaly subgraph detection problem can be optimized as:
is equivalent to
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110236120.1A CN112967818A (en) | 2021-03-03 | 2021-03-03 | Sudden infectious disease early warning method based on abnormal subgraph detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110236120.1A CN112967818A (en) | 2021-03-03 | 2021-03-03 | Sudden infectious disease early warning method based on abnormal subgraph detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112967818A true CN112967818A (en) | 2021-06-15 |
Family
ID=76276864
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110236120.1A Withdrawn CN112967818A (en) | 2021-03-03 | 2021-03-03 | Sudden infectious disease early warning method based on abnormal subgraph detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112967818A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090080768A1 (en) * | 2007-09-20 | 2009-03-26 | Chung Shan Institute Of Science And Technology, Armaments Bureau, M.N.D. | Recognition method for images by probing alimentary canals |
US20160265070A1 (en) * | 2015-01-08 | 2016-09-15 | The Board Of Trustees Of The University Of Illinois | Copy number detection and methods |
CN112417303A (en) * | 2020-12-09 | 2021-02-26 | 天津大学 | Evolution algorithm for detecting multiple abnormal subgraphs from dynamic attribute graph |
CN112422571A (en) * | 2020-11-19 | 2021-02-26 | 天津大学 | Method for carrying out exception alignment across multiple attribute networks |
-
2021
- 2021-03-03 CN CN202110236120.1A patent/CN112967818A/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090080768A1 (en) * | 2007-09-20 | 2009-03-26 | Chung Shan Institute Of Science And Technology, Armaments Bureau, M.N.D. | Recognition method for images by probing alimentary canals |
US20160265070A1 (en) * | 2015-01-08 | 2016-09-15 | The Board Of Trustees Of The University Of Illinois | Copy number detection and methods |
CN112422571A (en) * | 2020-11-19 | 2021-02-26 | 天津大学 | Method for carrying out exception alignment across multiple attribute networks |
CN112417303A (en) * | 2020-12-09 | 2021-02-26 | 天津大学 | Evolution algorithm for detecting multiple abnormal subgraphs from dynamic attribute graph |
Non-Patent Citations (3)
Title |
---|
F. CHEN AND D. B. NEILL: ""Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs"", 《PROC. 20TH ACM SIGKDD INT. CONF. KNOWL. DISCOVERY DATA MINING》 * |
NANNANWU: ""A_Nonparametric_Approach_to_Uncovering_Connected_Anomalies_by_Tree_Shaped_Priors"", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 * |
PATIL, G P, TAILLIE: ""Geographic and network surveillance via scan statistics for critical area detection"", 《STATISTICAL SCIENCE》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Eubank et al. | Structural and algorithmic aspects of massive social networks | |
CN111787000B (en) | Network security evaluation method and electronic equipment | |
Church et al. | Mapping evacuation risk on transportation networks using a spatial optimization model | |
Toğa et al. | COVID-19 prevalence forecasting using autoregressive integrated moving average (ARIMA) and artificial neural networks (ANN): case of Turkey | |
CN102801568B (en) | Method and device for dynamically evaluating reliability of network | |
Ding et al. | TLQP: Early-stage transportation lock-down and quarantine problem | |
CN113537788B (en) | Urban traffic jam recognition method based on virus propagation theory | |
Cadena et al. | Graph anomaly detection based on Steiner connectivity and density | |
Hou et al. | The Prediction of Multistep Traffic Flow Based on AST‐GCN‐LSTM | |
JP6310345B2 (en) | Privacy protection device, privacy protection method, and database creation method | |
Yu et al. | A dynamic evacuation simulation framework based on geometric algebra | |
CN112417303A (en) | Evolution algorithm for detecting multiple abnormal subgraphs from dynamic attribute graph | |
de Souza et al. | MOCHA: A tool for mobility characterization | |
CN113162787B (en) | Method for fault location in a telecommunication network, node classification method and related devices | |
Zhou et al. | Graph-structured sparse optimization for connected subgraph detection | |
Cadena et al. | Near-optimal and practical algorithms for graph scan statistics with connectivity constraints | |
CN116504050A (en) | Special event city partitioning method based on multi-mode public transport trip data | |
CN108965287B (en) | Virus propagation control method based on limited temporary edge deletion | |
CN112967818A (en) | Sudden infectious disease early warning method based on abnormal subgraph detection | |
Jiang et al. | On spectral graph embedding: A non-backtracking perspective and graph approximation | |
Yu et al. | GSTC-Unet: A U-shaped multi-scaled spatiotemporal graph convolutional network with channel self-attention mechanism for traffic flow forecasting | |
CN116070385B (en) | Automatic risk identification method and system for overseas mineral resource supply chain | |
CN113609126B (en) | Integrated storage management method and system for multi-source space-time data | |
Giménez-Mujica et al. | Epidemic local final size in a metapopulation network as indicator of geographical priority for control strategies in SIR type diseases | |
Wang et al. | Communication network time series prediction algorithm based on big data method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210615 |
|
WW01 | Invention patent application withdrawn after publication |