CN112951442A - Hysteresis analysis method and device for child viral diarrhea onset risk - Google Patents

Hysteresis analysis method and device for child viral diarrhea onset risk Download PDF

Info

Publication number
CN112951442A
CN112951442A CN202110218435.3A CN202110218435A CN112951442A CN 112951442 A CN112951442 A CN 112951442A CN 202110218435 A CN202110218435 A CN 202110218435A CN 112951442 A CN112951442 A CN 112951442A
Authority
CN
China
Prior art keywords
data
factor
meteorological
viral diarrhea
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110218435.3A
Other languages
Chinese (zh)
Other versions
CN112951442B (en
Inventor
艾丹妮
路文高
杨健
宋红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110218435.3A priority Critical patent/CN112951442B/en
Publication of CN112951442A publication Critical patent/CN112951442A/en
Application granted granted Critical
Publication of CN112951442B publication Critical patent/CN112951442B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The method and the device for the lag analysis of the child viral diarrhea onset risk can improve the accuracy of infectious disease monitoring, make correct prediction and early warning on outbreak of infectious diseases, and assist regional health related departments to better perform prevention and control work on viral diarrhea outbreak. The method comprises the following steps: (1) performing statistical analysis on the mean value, standard deviation and time sequence of each viral diarrhea case and each variable by a descriptive analysis method, and performing Pearson correlation test; (2) performing data dimensionality reduction by adopting a principal component analysis method aiming at all selected meteorological factors, and extracting principal components as elements for constructing a regression model; (3) selecting a plurality of principal components with highest contribution rate to meteorological parameters as new meteorological factor components, and aiming at the selected principal components, obtaining the principal components with highest contribution rate by adopting the same principal component analysis method to obtain new air quality factors; (4) obtaining a composite hundred-degree search keyword; (5) and constructing a distributed hysteresis nonlinear model.

Description

Hysteresis analysis method and device for child viral diarrhea onset risk
Technical Field
The invention relates to the technical field of disease prevention and control, in particular to a hysteresis analysis method for the onset risk of viral diarrhea of children and a hysteresis analysis device for the onset risk of viral diarrhea of children.
Background
Acute gastroenteritis is a common human digestive tract disease and is mainly characterized by symptoms such as vomiting, diarrhea, fever and the like. There are many factors that cause acute gastroenteritis, including bacteria, viruses, parasites, etc. The viral diarrhea is a common digestive tract disease caused by various human enteroviruses, such as rotavirus, norovirus, adenovirus, astrovirus and the like, the susceptible population mainly comprises children under 5 years old, and the virus is considered as a main pathogen of serious acute diarrhea of children all over the world and is also one of the main causes of death of children in developing countries.
Related researches show that norovirus is a typical food-borne virus and is easily transmitted through unclean water sources and unclean foods; rotavirus can form aerosol with pollution particles in the air and spread through factors such as feces-oral cavity, pollution contact and the like. Since the route of transmission of viral diarrhea is directly close to people's daily lives, the disease can be widespread worldwide and occurs in all seasons of the year. Viral diarrhea can be attributed to various environmental factors such as meteorology and hydrology. In the existing evidence, low temperature, drought factors contribute to rotavirus transmission, showing a winter seasonality in temperate climates, while norovirus prevalence patterns are relatively irregular, with high peaks that may shift within weeks or months between seasons, showing a high degree of seasonal variability. It was found that children born in summer in england and wales confirmed a higher risk of rotavirus infection. Researches in various regions of the world such as the United kingdom, the Netherlands, the Turkish, Australia, Germany, India, the Goss Daikea, the Nepal and the like show that the incidence risk of rotavirus causing diarrhea and the climate factor are in a negative correlation relationship, and the rotavirus outbreak risk can be increased at high temperature in partial regions such as the Bangladesh. The river runoff and the rise of the river water level have the promotion effect on the norovirus outbreak. Some studies on marine product culture environments have shown that solar radiation, water temperature, salinity, etc. can affect some marine products that host norovirus, thereby further affecting the outbreak of food-borne norovirus among populations.
In China, association analysis of environmental factors on common acute gastroenteritis and bacterial diarrhea has been widely reported, but reports of environmental factors on viral diarrhea are still few. Wang, p. studied the seasonal variation in the number of hospitalizations for norovirus and rotavirus infection in hong kong, in china, and found that rotavirus was likely to outbreak in winter, while norovirus was more strongly associated with summer. At the same time, extreme precipitation has a higher risk of norovirus infection than trace precipitation, but a lower risk of rotavirus infection. The survey aiming at the environmental temperature and the viral diarrhea infection burden in the stannless region of China shows that the outbreak of low-temperature and viral diarrhea is promoted, and is consistent with the research in other regions of the world. Ye, Q. investigates the relation between rotavirus infection rate of children and air temperature and air pollutants in Hangzhou areas in China, and the research not only further verifies the negative correlation relationship between the temperature and the rotavirus infection rate, but also finds that the change of the temperature has obvious influence on the detection rate of the rotavirus. Of particular note, the authors found that air pollutants such as PM2.5 concentrations, PM10 concentrations, etc. could significantly increase the risk of rotavirus infection, and dose, lag and cumulative effects were observed.
To date, researchers have raised positive implications for tracking and detecting infectious diseases using internet search data. For example, google search data can report flu trends two weeks in advance in the united states. Other researchers also use search query data to detect the onset of infectious diseases such as dengue fever, ebola virus, hand-foot-and-mouth disease, and the like. Liu, K. then, an index curve model is constructed by adopting composite Baidu indexes with different time lags and norovirus incidence data and adopting a spearman correlation method to fit the related data of the norovirus epidemic in Zhejiang province in China in 2014, and researches show that the norovirus infection risk is increased by 2.15 times when the average composite Baidu index of one unit is increased. Because the monitoring system of the internet utilizes information from social media, search engine query data and news reports, searching data by using the internet can improve the sensitivity and timeliness of health detection events. However, since many deviations and problems such as external interference, such as media, internet usage behavior, regional policy, etc., all affect the prediction accuracy of search engine query data, the single use of internet search data to detect the occurrence of infectious diseases has certain limitations.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a method for hysteresis analysis of the onset risk of the viral diarrhea of children, which can improve the accuracy of infectious disease monitoring, make correct prediction and early warning on outbreak of infectious diseases and assist regional health related departments to better perform prevention and control work on the outbreak of viral diarrhea.
The technical scheme of the invention is as follows: the hysteresis analysis method for the onset risk of the viral diarrhea of children comprises the following steps:
(1) carrying out statistical analysis on the mean value, standard deviation and time sequence of each day viral diarrhea case and each type of variable by adopting a descriptive analysis method, then evaluating the correlation between the viral diarrhea infection quantity and each type of factor by using Pearson correlation test to judge the correlation and the correlation significance degree between the case data and each type of factor, and selecting the factor with the absolute value of the correlation coefficient of the case data exceeding 0.1 and the significance level being less than 0.05 for further analysis;
(2) performing data dimensionality reduction by adopting a principal component analysis method aiming at all selected meteorological factors, extracting principal components as elements for constructing a regression model, wherein each principal component is a formula
(1):
Figure BDA0002948441550000031
Wherein z isiFor each main component of the meteorological factors, xjFor each meteorological factor sequence data, αjN is the number of selected principal components for the load of each factor of each principal component after performing principal component analysis;
(3) selecting a plurality of principal components with highest contribution rate to meteorological parameters as new meteorological factor components, wherein the accumulated contribution rate of the selected principal components exceeds 90%, and aiming at all selected air quality factors, obtaining the principal components with highest contribution rate by adopting the same principal component analysis method to obtain new air quality factors;
(4) aiming at all selected hundred-degree search data columns, a formula (2) is adopted to obtain a compound hundred-degree search keyword:
Figure BDA0002948441550000041
wherein BDI is a composite Baidu search index, xiSearching the data column, beta, for each selected hundred degreesiThe Pearson correlation coefficient between each data column and the epidemic situation sequence is defined, and n is the number of the selected hundred-degree search data columns;
(5) incorporating the various processed factors into a model construction process, wherein the distributed hysteresis nonlinear model is a regression model based on a hysteresis effect, and the model is represented by formula (3):
Figure BDA0002948441550000042
wherein E (Y)t) Cb (x) is the number of viral diarrhea episodes per dayi) A cross product matrix, cb (x), for each meteorological factor principal componentj) The cross matrix of each air quality factor main component, cb (BDI) is a cross product matrix of a composite hundred-degree search index; natural cubic spline functions are adopted in various element spaces, spline nodes are selected from quantiles with logarithmic scale of 25%, 50% and 75%, and the initial value of the degree of freedom df is set to be 3; a lag period of 21 days is selected when a cross matrix is established for various factors, the initial value of the lag period degree of freedom df is set to be 3, the mixed elements of the model comprise a date factor, a day of week factor and a season factor, ns is a natural cubic spline function and is used for controlling the long-term trend of a time variable, and the initial value of the degree of freedom df of the time factor is set to be 7 degrees of freedom per year.
According to the invention, by combining Internet query data with traditional monitoring and by means of the hysteresis dependence of part of meteorological factors, air quality factors and Internet search data on the onset risk of the viral diarrhea of children under the age of 5 in the temperate zone of China, various visual angles such as external natural environment, social activities and the like are provided for the onset risk of the viral diarrhea of children, the monitoring accuracy of infectious diseases can be improved by assisting the relevant departments of regional health to better perform the prevention and control work of viral diarrhea outbreak, and the correct prediction and early warning can be made on the outbreak of the infectious diseases.
There is also provided a hysteresis analysis device for risk of onset of viral diarrhea in children, comprising:
the data acquisition and selection module is configured to perform statistical analysis on the mean value, the standard deviation and the time sequence of each day of viral diarrhea cases and each type of variable by adopting a descriptive analysis method, then evaluate the mutual relation between the viral diarrhea infection quantity and each type of factor by using Pearson correlation test to judge the correlation and the correlation significance degree between the case data and each factor, and select the factor with the absolute value of the correlation coefficient exceeding 0.1 and the significance level being less than 0.05 for further analysis;
and the data dimension reduction module is configured to perform data dimension reduction on all selected meteorological factors by adopting a principal component analysis method, extract principal components serving as elements for constructing a regression model, and obtain a formula (1) for each principal component:
Figure BDA0002948441550000051
wherein z isiFor each main component of the meteorological factors, xjFor each meteorological factor sequence data, αjN is the number of selected principal components for the load of each factor of each principal component after performing principal component analysis;
the air quality factor acquisition module is configured to select a plurality of principal components with highest contribution rates to meteorological parameters as new meteorological factor components, the accumulated contribution rates of the selected principal components exceed 90%, and the principal components with the highest contribution rates are obtained by adopting the same principal component analysis method aiming at all the selected air quality factors to obtain new air quality factors;
and the Baidu search data module is configured to obtain a composite Baidu search keyword by adopting a formula (2) according to all selected Baidu search data columns:
Figure BDA0002948441550000061
wherein BDI is a composite Baidu search index, xiSearching the data column, beta, for each selected hundred degreesiThe Pearson correlation coefficient between each data column and the epidemic situation sequence is defined, and n is the number of the selected hundred-degree search data columns;
a model building module configured to incorporate the processed factors into a model building process, wherein the distributed hysteresis nonlinear model is a regression model based on hysteresis effect, and the model is represented by formula (3):
Figure BDA0002948441550000062
wherein E (Y)t) Cb (x) is the number of viral diarrhea episodes per dayi) A cross product matrix, cb (x), for each meteorological factor principal componentj) The cross matrix of each air quality factor main component, cb (BDI) is a cross product matrix of a composite hundred-degree search index; natural cubic spline functions are adopted in various element spaces, spline nodes are selected from quantiles with logarithmic scale of 25%, 50% and 75%, and the initial value of the degree of freedom df is set to be 3; a lag period of 21 days is selected when a cross matrix is established for various factors, the initial value of the lag period degree of freedom df is set to be 3, the mixed elements of the model comprise a date factor, a day of week factor and a season factor, ns is a natural cubic spline function and is used for controlling the long-term trend of a time variable, and the initial value of the degree of freedom df of the time factor is set to be 7 degrees of freedom per year.
Drawings
FIG. 1 is a flow chart of a method for the hysteresis analysis of the risk of developing viral diarrhea in children according to the present invention.
Detailed Description
As shown in fig. 1, the method for hysteresis analysis of the risk of onset of viral diarrhea in children comprises the following steps:
(1) carrying out statistical analysis on the mean value, standard deviation and time sequence of each day viral diarrhea case and each type of variable by adopting a descriptive analysis method, then evaluating the correlation between the viral diarrhea infection quantity and each type of factor by using Pearson correlation test to judge the correlation and the correlation significance degree between the case data and each type of factor, and selecting the factor with the absolute value of the correlation coefficient of the case data exceeding 0.1 and the significance level being less than 0.05 for further analysis;
(2) performing data dimensionality reduction by adopting a principal component analysis method aiming at all selected meteorological factors, extracting principal components as elements for constructing a regression model, wherein each principal component is a formula
(1):
Figure BDA0002948441550000071
Wherein z isiFor each main component of the meteorological factors, xjFor each meteorological factor sequence data, αjN is the number of selected principal components for the load of each factor of each principal component after performing principal component analysis;
(3) selecting a plurality of principal components with highest contribution rate to meteorological parameters as new meteorological factor components, wherein the accumulated contribution rate of the selected principal components exceeds 90%, and aiming at all selected air quality factors, obtaining the principal components with highest contribution rate by adopting the same principal component analysis method to obtain new air quality factors;
(4) aiming at all selected hundred-degree search data columns, a formula (2) is adopted to obtain a compound hundred-degree search keyword:
Figure BDA0002948441550000072
wherein BDI is a composite Baidu search index, xiSearching the data column, beta, for each selected hundred degreesiThe Pearson correlation coefficient between each data column and the epidemic situation sequence is defined, and n is the number of the selected hundred-degree search data columns;
(5) incorporating the various processed factors into a model construction process, wherein the distributed hysteresis nonlinear model is a regression model based on a hysteresis effect, and the model is represented by formula (3):
Figure BDA0002948441550000081
wherein E (Y)t) Cb (x) is the number of viral diarrhea episodes per dayi) A cross product matrix, cb (x), for each meteorological factor principal componentj) The cross matrix of each air quality factor main component, cb (BDI) is a cross product matrix of a composite hundred-degree search index; natural cubic spline functions are adopted in various element spaces, spline nodes are selected from quantiles with logarithmic scale of 25%, 50% and 75%, and the initial value of the degree of freedom df is set to be 3; a lag period of 21 days is selected when a cross matrix is established for various factors, the initial value of the lag period degree of freedom df is set to be 3, the mixed elements of the model comprise a date factor, a day of week factor and a season factor, ns is a natural cubic spline function and is used for controlling the long-term trend of a time variable, and the initial value of the degree of freedom df of the time factor is set to be 7 degrees of freedom per year.
According to the invention, by combining Internet query data with traditional monitoring and by means of the hysteresis dependence of part of meteorological factors, air quality factors and Internet search data on the onset risk of the viral diarrhea of children under the age of 5 in the temperate zone of China, various visual angles such as external natural environment, social activities and the like are provided for the onset risk of the viral diarrhea of children, the monitoring accuracy of infectious diseases can be improved by assisting the relevant departments of regional health to better perform the prevention and control work of viral diarrhea outbreak, and the correct prediction and early warning can be made on the outbreak of the infectious diseases.
Preferably, in the step (5), in order to control the excessive dispersion effect, the connection function in the model adopts a quasi-poisson function.
Preferably, in the step (5), sensitivity analysis is performed on the model by changing the element cross product matrix and the date degree of freedom df value in the model and adding or deleting seasonal factors, and the model is evaluated according to the akage pool information criterion AIC to determine the final respective df value.
Preferably, in the subgroup analysis, children under 5 years old are grouped according to gender and age, and the same model is used for subgroup analysis of different populations.
According to the infectious disease prevention and treatment method of the people's republic of China, viral diarrhea belongs to C-type infectious diseases. After 2003, the national legal infectious disease reporting system was established by the chinese government and required clinicians to report patient personal information online to the chinese disease prevention and control center in standardized forms within 24 hours after patients were diagnosed. Preferably, in the step (1), viral diarrhea case data from the chinese disease prevention and control center for 2014 to 2019 of gilin province is collected, each case containing sex, age, onset date, and pathogenic virus category of the patient; the meteorological data set is provided by a Chinese meteorological data sharing service system, and comprises evaporation capacity (millimeter), precipitation (millimeter), sunshine duration, three groups of surface temperature data (average surface temperature, maximum surface temperature and minimum surface temperature (centigrade)), three groups of air pressure data (average air pressure, maximum air pressure and minimum air pressure (hectopa)), two groups of relative humidity data (average relative humidity and minimum relative humidity (percentage)), three groups of air temperature data (average air temperature, maximum air temperature and minimum air temperature (centigrade)), three groups of wind speed data (average wind speed, maximum wind speed and maximum wind speed (meter per second)), each meteorological factor monitored by 30 monitoring points in Jilin province is arithmetically averaged according to day to obtain 17 meteorological factor time sequence data of Jilin province; the air quality data is obtained from an on-line monitoring and analyzing platform of Chinese air quality, time sequence data of 7 air quality factors in total, namely, AQI indexes, PM2.5 concentration, PM10 concentration, CO concentration, NO2 concentration, SO2 concentration and O3 concentration of 9 city-level administrative units in Jilin province, are obtained, and the air quality factors of 9 monitoring areas are arithmetically averaged daily to obtain the time sequence data of the 7 air quality factors in Jilin province; in China, hundred degrees are the search engines with the highest market ratio, up to 20 keywords provided by the Chinese disease prevention and control center virus disease aiming at virus diarrhea related symptoms, pathogenic factors and prevention and treatment products are selected, and hundred degree search index time sequence data of all the keywords in corresponding time periods of Jilin province are selected.
It will be understood by those skilled in the art that all or part of the steps in the method of the above embodiments may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the above embodiments, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like. Therefore, in accordance with the method of the present invention, the present invention also includes a device for hysteresis analysis of the risk of developing viral diarrhea in children, which is generally expressed in the form of functional blocks corresponding to the steps of the method. The device includes:
the data acquisition and selection module is configured to perform statistical analysis on the mean value, the standard deviation and the time sequence of each day of viral diarrhea cases and each type of variable by adopting a descriptive analysis method, then evaluate the mutual relation between the viral diarrhea infection quantity and each type of factor by using Pearson correlation test to judge the correlation and the correlation significance degree between the case data and each factor, and select the factor with the absolute value of the correlation coefficient exceeding 0.1 and the significance level being less than 0.05 for further analysis;
and the data dimension reduction module is configured to perform data dimension reduction on all selected meteorological factors by adopting a principal component analysis method, extract principal components serving as elements for constructing a regression model, and obtain a formula (1) for each principal component:
Figure BDA0002948441550000101
wherein z isiFor each main component of the meteorological factors, xjFor each meteorological causeSub-sequence data, αjN is the number of selected principal components for the load of each factor of each principal component after performing principal component analysis;
the air quality factor acquisition module is configured to select a plurality of principal components with highest contribution rates to meteorological parameters as new meteorological factor components, the accumulated contribution rates of the selected principal components exceed 90%, and the principal components with the highest contribution rates are obtained by adopting the same principal component analysis method aiming at all the selected air quality factors to obtain new air quality factors;
and the Baidu search data module is configured to obtain a composite Baidu search keyword by adopting a formula (2) according to all selected Baidu search data columns:
Figure BDA0002948441550000111
wherein BDI is a composite Baidu search index, xiSearching the data column, beta, for each selected hundred degreesiThe Pearson correlation coefficient between each data column and the epidemic situation sequence is defined, and n is the number of the selected hundred-degree search data columns;
a model building module configured to incorporate the processed factors into a model building process, wherein the distributed hysteresis nonlinear model is a regression model based on hysteresis effect, and the model is represented by formula (3):
Figure BDA0002948441550000112
wherein E (Y)t) Cb (x) is the number of viral diarrhea episodes per dayi) A cross product matrix, cb (x), for each meteorological factor principal componentj) The cross matrix of each air quality factor main component, cb (BDI) is a cross product matrix of a composite hundred-degree search index; natural cubic spline functions are adopted in various element spaces, spline nodes are selected from quantiles with logarithmic scale of 25%, 50% and 75%, and the initial value of the degree of freedom df is set to be 3; selecting a lag phase of 21 days when establishing a cross matrix for each type of factor, and delayingThe initial value of the period degree of freedom df is set to be 3, the miscellaneous elements of the model comprise a date factor, a day of week factor and a season factor, wherein ns is a natural cubic spline function used for controlling the long-term trend of a time variable, and the initial value of the degree of freedom df of the time factor is set to be 7 degrees of freedom per year.
Preferably, the model building module adopts a quasi-poisson function as a connection function in the model in order to control the excessive dispersion effect.
Preferably, the model construction module performs sensitivity analysis on the model by changing the element cross product matrix and the date degree of freedom df value in the model and adding or deleting seasonal factors, and evaluates the model according to the akachi pool information criterion AIC to determine the final df values.
Preferably, in the subgroup analysis, children under 5 years old are grouped according to gender and age, and the same model is used for subgroup analysis of different populations.
Preferably, the data collection and selection module collects the virus diarrhea case data from 2014 to 2019 of Jilin province from the Chinese disease prevention control center, wherein each case comprises the sex, the age, the disease onset date and the pathogenic virus category of the patient; the meteorological data set is provided by a China meteorological data sharing service system, the meteorological data set comprises evaporation capacity, precipitation quantity, sunshine duration, three groups of earth surface temperature data, three groups of air pressure data, two groups of relative humidity data, three groups of air temperature data and three groups of air speed data, and each meteorological factor monitored by 30 monitoring points in Jilin province is arithmetically averaged according to the day to obtain 17 meteorological factor time sequence data of the Jilin province; the air quality data is obtained from an on-line monitoring and analyzing platform of Chinese air quality, time sequence data of 7 air quality factors in total, namely, AQI indexes, PM2.5 concentration, PM10 concentration, CO concentration, NO2 concentration, SO2 concentration and O3 concentration of 9 city-level administrative units in Jilin province, are obtained, and the air quality factors of 9 monitoring areas are arithmetically averaged daily to obtain the time sequence data of the 7 air quality factors in Jilin province; chinese disease prevention and control center virosis provides up to 20 keywords aiming at viral diarrhea related symptoms, pathogenic factors and prevention and treatment products, and selects Baidu search index time sequence data of all the keywords in corresponding time periods of Jilin province.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (10)

1. The hysteresis analysis method for the onset risk of the viral diarrhea of children is characterized by comprising the following steps: which comprises the following steps:
(1) carrying out statistical analysis on the mean value, standard deviation and time sequence of each day viral diarrhea case and each type of variable by adopting a descriptive analysis method, then evaluating the correlation between the viral diarrhea infection quantity and each type of factor by using Pearson correlation test to judge the correlation and the correlation significance degree between the case data and each type of factor, and selecting the factor with the absolute value of the correlation coefficient of the case data exceeding 0.1 and the significance level being less than 0.05 for further analysis;
(2) and (3) aiming at all selected meteorological factors, performing data dimensionality reduction by adopting a principal component analysis method, extracting principal components as elements for constructing a regression model, wherein each principal component is a formula (1):
Figure FDA0002948441540000012
wherein z isiFor each main component of the meteorological factors, xjFor each meteorological factor sequence data, αjN is the number of selected principal components for the load of each factor of each principal component after performing principal component analysis;
(3) selecting a plurality of principal components with highest contribution rate to meteorological parameters as new meteorological factor components, wherein the accumulated contribution rate of the selected principal components exceeds 90%, and aiming at all selected air quality factors, obtaining the principal components with highest contribution rate by adopting the same principal component analysis method to obtain new air quality factors;
(4) aiming at all selected hundred-degree search data columns, a formula (2) is adopted to obtain a compound hundred-degree search keyword:
Figure FDA0002948441540000011
wherein BDI is a composite Baidu search index, xiSearching the data column, beta, for each selected hundred degreesiThe Pearson correlation coefficient between each data column and the epidemic situation sequence is defined, and n is the number of the selected hundred-degree search data columns;
(5) incorporating the various processed factors into a model construction process, wherein the distributed hysteresis nonlinear model is a regression model based on a hysteresis effect, and the model is represented by formula (3):
Figure FDA0002948441540000021
wherein E (Y)t) Cb (x) is the number of viral diarrhea episodes per dayi) A cross product matrix, cb (x), for each meteorological factor principal componentj) The cross matrix of each air quality factor main component, cb (BDI) is a cross product matrix of a composite hundred-degree search index; natural cubic spline functions are adopted in various element spaces, spline nodes are selected from quantiles with logarithmic scale of 25%, 50% and 75%, and the initial value of the degree of freedom df is set to be 3; a lag period of 21 days is selected when a cross matrix is established for various factors, the initial value of the lag period degree of freedom df is set to be 3, the mixed elements of the model comprise a date factor, a day of week factor and a season factor, ns is a natural cubic spline function and is used for controlling the long-term trend of a time variable, and the initial value of the degree of freedom df of the time factor is set to be 7 degrees of freedom per year.
2. The method of claim 1, wherein the risk of developing viral diarrhea in children is determined by: in the step (5), in order to control the excessive dispersion effect, the connection function in the model adopts a quasi-poisson function.
3. The method of claim 2, wherein the risk of developing viral diarrhea in children is determined by: in the step (5), sensitivity analysis is carried out on the model by changing the element cross product matrix and the date degree of freedom df value in the model and adding or deleting seasonal factors, and the model is evaluated according to the akachi pool information criterion AIC to determine the final df values.
4. The method of claim 3, wherein the risk of developing viral diarrhea in children is determined by: in subgroup analysis, children under 5 years old were grouped according to gender and age, and subgroup analysis was performed for different groups using the same model.
5. The method of claim 4, wherein the risk of developing viral diarrhea in children is determined by: in the step (1), collecting case data of viral diarrhea from 2014 to 2019 of the gilin province from the central disease prevention and control center, wherein each case comprises sex, age, disease onset date and pathogenic virus category of patients; the meteorological data set is provided by a China meteorological data sharing service system, the meteorological data set comprises evaporation capacity, precipitation quantity, sunshine duration, three groups of earth surface temperature data, three groups of air pressure data, two groups of relative humidity data, three groups of air temperature data and three groups of air speed data, and each meteorological factor monitored by 30 monitoring points in Jilin province is arithmetically averaged according to the day to obtain 17 meteorological factor time sequence data of the Jilin province; the air quality data is obtained from an on-line monitoring and analyzing platform of Chinese air quality, time sequence data of 7 air quality factors in total, namely, AQI indexes, PM2.5 concentration, PM10 concentration, CO concentration, NO2 concentration, SO2 concentration and O3 concentration of 9 city-level administrative units in Jilin province, are obtained, and the air quality factors of 9 monitoring areas are arithmetically averaged daily to obtain the time sequence data of the 7 air quality factors in Jilin province; chinese disease prevention and control center virosis provides up to 20 keywords aiming at viral diarrhea related symptoms, pathogenic factors and prevention and treatment products, and selects Baidu search index time sequence data of all the keywords in corresponding time periods of Jilin province.
6. Hysteresis analysis device of children's viral diarrhea onset risk, its characterized in that: it includes: the data acquisition and selection module is configured to perform statistical analysis on the mean value, the standard deviation and the time sequence of each day of viral diarrhea cases and each type of variable by adopting a descriptive analysis method, then evaluate the mutual relation between the viral diarrhea infection quantity and each type of factor by using Pearson correlation test to judge the correlation and the correlation significance degree between the case data and each factor, and select the factor with the absolute value of the correlation coefficient exceeding 0.1 and the significance level being less than 0.05 for further analysis;
and the data dimension reduction module is configured to perform data dimension reduction on all selected meteorological factors by adopting a principal component analysis method, extract principal components serving as elements for constructing a regression model, and obtain a formula (1) for each principal component:
Figure FDA0002948441540000041
wherein z isiFor each main component of the meteorological factors, xjFor each meteorological factor sequence data, αjN is the number of selected principal components for the load of each factor of each principal component after performing principal component analysis;
the air quality factor acquisition module is configured to select a plurality of principal components with highest contribution rates to meteorological parameters as new meteorological factor components, the accumulated contribution rates of the selected principal components exceed 90%, and the principal components with the highest contribution rates are obtained by adopting the same principal component analysis method aiming at all the selected air quality factors to obtain new air quality factors;
and the Baidu search data module is configured to obtain a composite Baidu search keyword by adopting a formula (2) according to all selected Baidu search data columns:
Figure FDA0002948441540000042
wherein BDI is a composite Baidu search index, xiSearching the data column, beta, for each selected hundred degreesiThe Pearson correlation coefficient between each data column and the epidemic situation sequence is defined, and n is the number of the selected hundred-degree search data columns;
a model building module configured to incorporate the processed factors into a model building process, wherein the distributed hysteresis nonlinear model is a regression model based on hysteresis effect, and the model is represented by formula (3):
Figure FDA0002948441540000051
wherein E (Y)t) Cb (x) is the number of viral diarrhea episodes per dayi) A cross product matrix, cb (x), for each meteorological factor principal componentj) The cross matrix of each air quality factor main component, cb (BDI) is a cross product matrix of a composite hundred-degree search index; natural cubic spline functions are adopted in various element spaces, spline nodes are selected from quantiles with logarithmic scale of 25%, 50% and 75%, and the initial value of the degree of freedom df is set to be 3; a lag period of 21 days is selected when a cross matrix is established for various factors, the initial value of the lag period degree of freedom df is set to be 3, the mixed elements of the model comprise a date factor, a day of week factor and a season factor, ns is a natural cubic spline function and is used for controlling the long-term trend of a time variable, and the initial value of the degree of freedom df of the time factor is set to be 7 degrees of freedom per year.
7. The apparatus for hysteresis analysis of risk of onset of viral diarrhea for children according to claim 6, wherein: and in the model construction module, in order to control the excessive dispersion effect, a quasi-Poisson function is adopted as a connection function in the model.
8. The apparatus for hysteresis analysis of risk of onset of viral diarrhea for children according to claim 7, wherein: and the model construction module is used for carrying out sensitivity analysis on the model in a mode of changing a factor cross product matrix and a date freedom degree df value in the model and adding or deleting a seasonal factor, and evaluating the model according to an akachi pool information criterion AIC to determine each final df value.
9. The apparatus for hysteresis analysis of risk of onset of viral diarrhea for children according to claim 8, wherein: in subgroup analysis, children under 5 years old were grouped according to gender and age, and subgroup analysis was performed for different groups using the same model.
10. The apparatus for hysteresis analysis of risk of onset of viral diarrhea for children according to claim 9, wherein: the data acquisition and selection module is used for collecting virus diarrhea case data from 2014 to 2019 of Jilin province from a Chinese disease prevention and control center, wherein each case comprises the sex, the age, the disease occurrence date and the category of pathogenic viruses of a patient; the meteorological data set is provided by a China meteorological data sharing service system, the meteorological data set comprises evaporation capacity, precipitation quantity, sunshine duration, three groups of earth surface temperature data, three groups of air pressure data, two groups of relative humidity data, three groups of air temperature data and three groups of air speed data, and each meteorological factor monitored by 30 monitoring points in Jilin province is arithmetically averaged according to the day to obtain 17 meteorological factor time sequence data of the Jilin province; the air quality data is obtained from an on-line monitoring and analyzing platform of Chinese air quality, time sequence data of 7 air quality factors in total, namely, AQI indexes, PM2.5 concentration, PM10 concentration, CO concentration, NO2 concentration, SO2 concentration and O3 concentration of 9 city-level administrative units in Jilin province, are obtained, and the air quality factors of 9 monitoring areas are arithmetically averaged daily to obtain the time sequence data of the 7 air quality factors in Jilin province; chinese disease prevention and control center virosis provides up to 20 keywords aiming at viral diarrhea related symptoms, pathogenic factors and prevention and treatment products, and selects Baidu search index time sequence data of all the keywords in corresponding time periods of Jilin province.
CN202110218435.3A 2021-02-23 2021-02-23 Hysteresis analysis method and device for child viral diarrhea onset risk Active CN112951442B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110218435.3A CN112951442B (en) 2021-02-23 2021-02-23 Hysteresis analysis method and device for child viral diarrhea onset risk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110218435.3A CN112951442B (en) 2021-02-23 2021-02-23 Hysteresis analysis method and device for child viral diarrhea onset risk

Publications (2)

Publication Number Publication Date
CN112951442A true CN112951442A (en) 2021-06-11
CN112951442B CN112951442B (en) 2022-09-23

Family

ID=76246474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110218435.3A Active CN112951442B (en) 2021-02-23 2021-02-23 Hysteresis analysis method and device for child viral diarrhea onset risk

Country Status (1)

Country Link
CN (1) CN112951442B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114519308A (en) * 2022-02-22 2022-05-20 河南大学 Method for determining river water and underground water interconversion lag response time influenced by river water and sand regulation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793621A (en) * 2014-03-06 2014-05-14 上海市浦东新区疾病预防控制中心 Comprehensive dysentery monitoring platform
CN104008164A (en) * 2014-05-29 2014-08-27 华东师范大学 Generalized regression neural network based short-term diarrhea multi-step prediction method
US20150371006A1 (en) * 2013-02-15 2015-12-24 Battelle Memorial Institute Use of web-based symptom checker data to predict incidence of a disease or disorder
CN111415752A (en) * 2020-03-01 2020-07-14 集美大学 Hand-foot-and-mouth disease prediction method integrating meteorological factors and search indexes
CN111430040A (en) * 2020-03-03 2020-07-17 广东省公共卫生研究院 Hand-foot-and-mouth disease epidemic situation prediction method based on case, weather and pathogen monitoring data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150371006A1 (en) * 2013-02-15 2015-12-24 Battelle Memorial Institute Use of web-based symptom checker data to predict incidence of a disease or disorder
CN103793621A (en) * 2014-03-06 2014-05-14 上海市浦东新区疾病预防控制中心 Comprehensive dysentery monitoring platform
CN104008164A (en) * 2014-05-29 2014-08-27 华东师范大学 Generalized regression neural network based short-term diarrhea multi-step prediction method
CN111415752A (en) * 2020-03-01 2020-07-14 集美大学 Hand-foot-and-mouth disease prediction method integrating meteorological factors and search indexes
CN111430040A (en) * 2020-03-03 2020-07-17 广东省公共卫生研究院 Hand-foot-and-mouth disease epidemic situation prediction method based on case, weather and pathogen monitoring data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
廖洪秀等: "主成分回归分析在细菌性痢疾与气象因素关系中的应用", 《现代预防医学》 *
赵勇等: "2007年-2008年吉林市儿童病毒性腹泻病监测结果分析", 《中国实验诊断学》 *
郭雪鸿: "997例其他感染性腹泻疾病的流行及病原学特征分析", 《中国卫生标准管理》 *
陶燕等: "气象因素对其他感染性腹泻病的影响", 《兰州大学学报(自然科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114519308A (en) * 2022-02-22 2022-05-20 河南大学 Method for determining river water and underground water interconversion lag response time influenced by river water and sand regulation

Also Published As

Publication number Publication date
CN112951442B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
Shang et al. Land use and climate change effects on surface runoff variations in the upper Heihe River basin
Li et al. Estimating relations of vegetation, climate change, and human activity: A case study in the 400 mm annual precipitation fluctuation zone, China
Weilhoefer A review of indicators of estuarine tidal wetland condition
Chen et al. Assessing water resources vulnerability by using a rough set cloud model: A case study of the Huai River Basin, China
Li et al. Spatiotemporal characteristics of dry-wet abrupt transition based on precipitation in Poyang Lake basin, China
CN112951442B (en) Hysteresis analysis method and device for child viral diarrhea onset risk
Wang et al. Quantitative agricultural flood risk assessment using vulnerability surface and copula functions
Lv et al. Impact of ENSO events on droughts in China
Yang et al. Teleconnections of large-scale climate patterns to regional drought in mid-latitudes: A case study in Xinjiang, China
Wang et al. GIS-based random forest weight for rainfall-induced landslide susceptibility assessment at a humid region in Southern China
Fan et al. Water level fluctuation under the impact of lake regulation and ecological implication in Huayang Lakes, China
Wu et al. Anthropogenic influence on compound dry and hot events in China based on Coupled Model Intercomparison Project Phase 6 models
Liao et al. Quantifying the effects of aging bias in Atlantic striped bass stock assessment
Chattopadhyay et al. Effect of a summer flood on benthic macroinvertebrates in a medium-sized, temperate, lowland river
Zhang et al. Investigation of attributes for identifying homogeneous flood regions for regional flood frequency analysis in Canada
Beecraft et al. Temporal variability in water quality and phytoplankton biomass in a low-inflow estuary (Baffin Bay, TX)
Xu et al. Applicability of a CEEMD–ARIMA combined model for drought forecasting: a case study in the Ningxia Hui Autonomous Region
Wang et al. Analysis of Characteristics of Dry–Wet Events Abrupt Alternation in Northern Shaanxi, China
Zhang et al. Historical trends in air temperature, precipitation, and runoff of a plateau inland river watershed in North China
Deng et al. Patterns and driving factors of diversity in the shrub community in Central and Southern China
Qin et al. Bivariate frequency of meteorological drought in the upper Minjiang River based on copula function
Dallison et al. Influence of historical climate patterns on streamflow and water demand in Wales, UK
Reemts et al. Choosing plant diversity metrics: a tallgrass prairie case study
He et al. Study on determination of excessive emissions of heavy diesel trucks based on OBD data repaired
Fan et al. Monte Carlo optimization for sliding window size in Dixon quality control of environmental monitoring time series data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant