CN115481835B - Atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match - Google Patents

Atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match Download PDF

Info

Publication number
CN115481835B
CN115481835B CN202110601307.7A CN202110601307A CN115481835B CN 115481835 B CN115481835 B CN 115481835B CN 202110601307 A CN202110601307 A CN 202110601307A CN 115481835 B CN115481835 B CN 115481835B
Authority
CN
China
Prior art keywords
layer
atmospheric
coarsening
covariates
concentration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110601307.7A
Other languages
Chinese (zh)
Other versions
CN115481835A (en
Inventor
赵星
许欢
郭冰
周峻民
杨淑娟
肖雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202110601307.7A priority Critical patent/CN115481835B/en
Publication of CN115481835A publication Critical patent/CN115481835A/en
Application granted granted Critical
Publication of CN115481835B publication Critical patent/CN115481835B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Primary Health Care (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Epidemiology (AREA)
  • Quality & Reliability (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Game Theory and Decision Science (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match, which comprises the steps of obtaining ending data Y and mixed covariates X from sample data, and obtaining an atmospheric fine particulate matter exposure concentration set T according to individual addresses; coarsening the mixed covariate X to obtain X * ,X * Is combined into a layer S (X) * ) The method comprises the steps of carrying out a first treatment on the surface of the According to covariate X after coarsening of individual i i * Dividing the values of the values into corresponding layers; coarsening T to obtain T * The method comprises the steps of carrying out a first treatment on the surface of the Each layer s contains all categories T at the same time * All the layers S 'left constitute a set S'; restoring the individual i in the remaining layer s' to the atmospheric fine particulate matter exposure concentration T before coarsening i Confounding covariates X i Obtaining a data set A; and estimating the chronic health hazard of the pollutants in each layer A by adopting a linear mixed effect model, and comprehensively obtaining the overall average health hazard effect. The mixed covariate unbalance and the dependence of the evaluation result on the model are reduced, and the relative accuracy of the evaluation result is ensured.

Description

Atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match
Technical Field
The invention relates to the technical field of environmental pollutant hazard assessment, in particular to an atmospheric pollutant hazard assessment method based on continuous exposure generalized exact matching.
Background
Air pollution is a complex mixture of gaseous and particulate components, each of which can have a deleterious effect on human health. Therefore, it is very important to evaluate the risk of atmospheric pollutants in the atmosphere. In most chronic health hazard effect studies of air pollution, a common approach is to estimate the effect of chronic health hazard due to atmospheric pollution by fitting regression models related to exposure and confounding covariates. These traditional regression methods mix the research design and analysis stages, and have high model dependence on the model, resulting in bias of effect estimation, thereby affecting the strength of the research results as the basis for policy formulation. In causal inference of observational studies, matching is a non-parametric method of controlling confounding effects in observed data at the design stage, which mimics a random control experiment, distinguishes between design and analysis stages, is more transparent, and the estimation results are easier to interpret than traditional regression methods. In air pollution chronic health hazard effect studies, atmospheric pollutant exposure is generally continuous. While causal matching for continuity exposure has only been proposed to date as a matching strategy based on generalized tendencies scores (generalized propensity score, GPS). Since the therapy distribution mechanism is unknown or ambiguous, the effect of GPS-based matching on balance confounding tends to be unstable in one experiment, requiring constant checking of balance, re-matching, re-checking until balance is improved over all variables, a process which is cumbersome. In addition, GPS-based matching often has the situation that most covariates are balanced, while some other covariates are more unbalanced, and the process is uncontrollable, so that the balance of some important covariates is difficult to ensure, and bias of effect estimation is increased.
Disclosure of Invention
The invention aims to provide an atmospheric pollutant hazard assessment method based on continuous exposure generalized exact matching, which is used for reducing the problems of estimation result bias and model dependence existing in the atmospheric pollutant hazard assessment method in the prior art.
The invention solves the problems by the following technical proposal:
an atmospheric contaminant hazard assessment method based on continuous exposure generalized exact matching, comprising:
step S1: carrying out layered random sampling on target crowd to obtain sample data, wherein the sample data comprises individuals i, i epsilon {1,2, …, N }, and N is the number of samples;
step S2: each individual is subjected to a personal examination and questionnaire, and outcome data Y and confounding covariates X are obtained, where y= (Y) 1 ,Y 2 ,…,Y i ,…,Y N ),X=(X 1 ,X 2 ,…,X i ,…,X q ),X i For individual i related confounding covariates, X i =(X 1i ,X 2i ,…,X qi ) Wherein q is the number of covariates;
step S3: the address of the individual i is converted into corresponding longitude and latitude coordinates according to the following steps, then the nearest neighboring grid point is matched for each longitude and latitude coordinate according to the position of the longitude and latitude coordinate falling on the high-resolution air pollutant grid data, the pollutant exposure concentration of the nearest neighboring grid point is the atmospheric pollutant concentration of the individual, and the pollutant exposure concentration is marked as T i The method comprises the steps of carrying out a first treatment on the surface of the The same method is adopted to obtain the concentration set T, T= (T) 1 ,T 2 ,…,T i ,…,T N );
Step S4: coarsening the mixed covariate X to obtain coarsened mixed covariate X * ,X * =(X 1 * ,X 2 * ,…,X q * ) The method comprises the steps of carrying out a first treatment on the surface of the All confounding covariates X after coarsening * A group of layers is obtained by combining different values of H (X i * ) Representing the covariate X after coarsening i * All values of S (X) * ) Representing a set of layers consisting of different values of all coarsened covariates, S (X * )=H(X 1 * )*H(X 2 * )*,…,*H(X q * );
Individual i in the sample, according to coarsened covariates X i * Is divided into corresponding layers, individuals in the same layer, X * The values are the same;
step S5: to atmospheric pollutionCoarsening the dye concentration T to obtain coarsened atmospheric pollutant concentration T *
Step S6: determining whether each layer s contains all categories T at the same time * If yes, reserving the layer and the samples in the layer, otherwise, deleting the layer, and finally marking the set formed by the layer S 'as S'; restoring the individual i in the remaining layer s' to the atmospheric contaminant concentration T before coarsening i Confounding covariates X i The matched data set is obtained and is marked as A, wherein A comprises all individuals in S';
step S7: for the matched data set A obtained in the last step, estimating the chronic health hazard of the pollutant in each layer of the A by adopting a linear mixed effect model, and comprehensively obtaining the overall average health hazard effect, wherein the method specifically comprises the following steps of:
the following linear mixed effect model was fitted:
Y s'i =β s' *t s'is'i
ε s'i ~N(0,σ 2 )
β s' ~N(μ,τ 2 )
wherein S 'i represents the individual at the S' layer, S '∈s'; y is Y s'i Outcome data representing individual i at the s' layer; t is t s'i Represents the atmospheric contaminant concentration, ε, of individual i at s' layer s'i For the error term, the obeying mean is 0, and the variance is sigma 2 Is a normal distribution of (2);
estimating the parameter beta by fitting the linear mixed effect model s' 、μ、σ 2 、τ 2 Wherein beta is s' Represents each 1. Mu.g/m rise in atmospheric contaminant concentration within the s' layer 3 The resulting change in outcome data, i.e., the health hazard effect of long-term exposure of atmospheric contaminants within the s' layer to blood pressure; beta s' Obeying mean value μ, variance τ 2 μ represents the overall health hazard effect of the long-term exposure of atmospheric pollutants on blood pressure, which is obtained by integrating the health hazard effects of all layers, and is the final required calculation result.
The outcome data is blood pressure values, including systolic and diastolic blood pressure, and the corresponding confounding covariates include basic demographic variables, socioeconomic variables, health behavioral variables, and health status variables. The basic demographic variables include gender and age, the socioeconomic variables include educational level and average month income, the health behavior variables include smoking status and drinking status, the health status variables include body mass index and family history, and the atmospheric pollutant concentration is atmospheric fine particulate matter exposure concentration, sulfur dioxide, nitrogen oxide or ozone, etc.
Compared with the prior art, the invention has the following advantages:
(1) According to the invention, the continuous pollutant exposure is matched in an improved generalized accurate mode in a design stage, after a matched data set is obtained, a linear mixed model (linear mixed model) is adopted in an analysis stage to estimate the chronic health hazard degree of long-term exposure of atmospheric pollution to the crowd, so that the mixed covariate unbalance and the dependence on the model are reduced, the matched mixed unbalance is ensured not to be larger than a preset level, and the relative accuracy of an evaluation result is ensured. The method can independently reduce the unbalance of one covariate without affecting the unbalance level of other covariates.
(2) The method is simple to operate, has high operation speed, has high robustness and high interpretability, and improves the strength of the research result as the basis of policy establishment.
(3) The invention provides a generalized accurate matching strategy of continuous exposure, which improves the accuracy of atmospheric pollutant health hazard assessment and the interpretation of the result; the strength of the research result obtained by the method serving as the basis for policy establishment is higher than that of the general traditional regression analysis method based on causal inference of observational research.
(4) The invention does not need repeated balance inspection, and is simple and convenient to operate; the balance of important mixed covariates can be ensured; the pruning of layers avoids extrapolation of the study results; the matching step reduces the dependence of the analysis stage model; the matching operation is simple and easy to implement.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of a matched data structure according to the present invention;
fig. 3 is a schematic diagram of layer deletion.
Detailed Description
The present invention will be described in further detail with reference to examples, but embodiments of the present invention are not limited thereto.
Example 1:
referring to fig. 1, the technology of the atmospheric pollutant hazard assessment method based on continuous exposure generalized exact matching mainly comprises two stages: the first stage is a generalized exact match (CEM) stage of contaminant exposure, including coarsening of confounding covariates related to air contaminants and healthy outcomes, coarsening of contaminants, matching practices, and the like; the second stage is an analysis stage after matching, and the chronic health hazard effect caused by atmospheric pollution is estimated by using the matching data set obtained in the first stage. The basic idea is as follows: during the matching phase, both contaminants and covariates are "coarsened". Then, "exact matches" are applied to the coarsened data, and "layers" that do not match are deleted, resulting in a matched dataset. In the analysis stage after matching, estimating the chronic health hazard of the pollutants in each layer, and comprehensively obtaining the overall average health hazard effect.
The atmospheric pollutants mainly comprise atmospheric fine particulate matter (such as PM 1 、PM 2.5 、PM 10 Etc.), sulfur dioxide, nitrogen oxides, ozone, etc. The chronic health hazard effects of these atmospheric pollutants on the human body are multifaceted and multisystem. Research has reported that the long-term effect of atmospheric pollutants can promote the occurrence and development of respiratory diseases such as respiratory tract inflammation, chronic bronchitis, bronchial asthma, emphysema and the like, and cardiovascular diseases such as coronary heart disease, arteriosclerosis, hypertension and the like. The scheme technology of the invention can be universally applied to evaluate the harm effect of any atmospheric pollutant on different health fates. In order to more clearly describe the steps of the present invention, the following describes specific embodiments of the present invention by taking the evaluation of the chronic health hazard of atmospheric fine particulate matter to blood pressure as an example:
generalized exact match (CEM) phase of contaminant exposure
Step one: carrying out layered random sampling on target crowd to obtain sample data, wherein the sample data comprises individuals i, i epsilon {1,2, …, N }, and N is the number of samples;
determining a research sample and collecting variable data, wherein the method comprises the steps of determining a research target crowd and selecting the research sample; and (3) collection of contaminant data, outcome data and confounding sample data. It is assumed that the target population to be studied is a chronic health hazard to blood pressure from prolonged exposure to atmospheric pollutants such as atmospheric fine particulate matter in adults between 30 and 80 years old. A specific city state can be selected, and a study sample can be obtained by layering random sampling on adults between 30 and 80 years old. Let N be the number of samples of the study sample, note each individual i ε {1,2, …, N }.
Step two: each individual is subjected to a personal examination and questionnaire, and outcome data Y and confounding covariates X are obtained, where y= (Y) 1 ,Y 2 ,…,Y i ,…,Y N ),X=(X 1 ,X 2 ,…,X i ,…,X q ),X i For individual i related confounding covariates, X i =(X 1i ,X 2i ,…,X qi ) Wherein q is the number of covariates;
by performing a questionnaire, physical examination on each individual i in the study sample, the outcome data Y (blood pressure values measured by the individual, including systolic and diastolic blood pressure) and related confounding covariates X required for the study, including basic demographic variables (gender, age), socioeconomic variables (education level, average month income), health behavioural variables (smoking status, drinking status) and health status variables (body mass index, family history) were obtained. Let X i =(X 1i ,X 2i ,…,X qi ) A series of related confounding covariates for individual i, q being the number of covariates.
Step three: the atmospheric fine particulate data is from "high resolution air contaminant grid data". Converting the current address of each individual in the sample into corresponding longitude and latitude coordinates through the Goldmap API, and then falling on the high-resolution air pollutant grid data according to the longitude and latitude coordinatesMatching the nearest neighboring grid point for each longitude and latitude coordinate, wherein the pollutant exposure concentration of the nearest neighboring grid point is the exposure concentration of the individual atmospheric fine particulate matters, and is marked as T i The method comprises the steps of carrying out a first treatment on the surface of the The same method is adopted to obtain an exposure concentration set T, T= (T) of the atmospheric fine particulate matters of each individual 1 ,T 2 ,…,T i ,…,T N ) T is in the range of [ T ] 0 ,t 1 ]Wherein t is 0 Minimum exposure concentration of atmospheric fines, t, for all individuals 1 Is at a maximum value;
step four: coarsening all the mixed covariates X needing to be included into the matching to obtain coarsened mixed covariates X * ,X * =(X 1 * ,X 2 * ,…,X q * ) The method comprises the steps of carrying out a first treatment on the surface of the All confounding covariates X after coarsening * A group of layers is obtained by combining different values of H (X i * ) Representing the covariate X after coarsening i * All values of S (X) * ) Representing a set of layers consisting of different values of all coarsened covariates, S (X * )=H(X 1 * )*H(X 2 * )*,…,*H(X q * );
Individual i in the sample, according to coarsened covariates X i * Is divided into corresponding layers, individuals in the same layer, X * The values are the same;
the coarsening mode of the hybrid covariates can be a data-driven automatic coarsening mode, such as grouping group distances according to an empirical formula proposed by Sturges, but since the automatic coarsening mode determines coarsening groups only according to the range, the quantile, the distribution and other information of sample data, the coarsening mode of the same covariates can be different in different samples. Considering the resolvability of the coarsened variables, the coarsened demarcation points of the covariates are selected as follows in the scheme:
for the continuity variable, the spacing of the coarsened demarcation points may be equidistant or non-equidistant, and may be agreed upon as appropriate according to the actual situation, for example: when coarsening annual income situation, if crowd with different incomes exist in the data set, coarsening demarcation points with unequal intervals are suitable. For unordered classification variables, coarsening is not generally performed, but if the classification is more, some classification can be selected and combined into a wider class, such as professional classification or disease classification, according to professional knowledge and international standards. For ordered multi-class variables, it may be considered as already coarsened continuous variables, so it may be chosen not to coarsen any more, but if there are more classes, it may be chosen to merge some adjacent classes again. For example, most 7-point linkt scales have a neutral classification, so that class 7 can be reasonably coarsened into the following 3 classes: (completely disagreeable, strongly disagreeable, disagreeable), (neutral), (agreeable, strongly agreeable, completely agreeable).
Confounding covariates X in the present case include gender, age, education level, average monthly income, smoking status, drinking status, body mass index, and family history of hypertension. The variable types are shown in the following table 1:
confounding covariates X Variable type
Gender X 1 Disorder two-classification variable (Male and female)
Age X 2 Continuous variable
Education level X 3 Ordered multi-classification variables (illiterate, primary school, middle school, college, university and above)
Average month incomeX 4 Continuous variable
Smoking status X 5 Disorder two-classification variable (yes, no)
Drinking status X 6 Disorder two-classification variable (yes, no)
Body mass index X 7 Continuous variable
Family history of hypertension X 8 Disorder two-classification variable (yes, no)
Table 1 hybrid covariate scale
Coarsening the continuity variable into component variables; for ordered multi-classification variables, merging some adjacent classes; the unordered classification variables remain consistent before and after coarsening. The coarsened mixed covariates were denoted as X * The specific values are shown in the following table 2:
table 2 coarsened hybrid covariates
In the case of the present invention, all confounding covariates X after coarsening * Is 2X 3X 2 = 864, thus S (X * ) Together 864 layers were included. Each individual i in the study sample is subjected to coarsening according to the covariates X * The value of (2) is divided into corresponding layers.
Step five: coarsening the exposure concentration T of the atmospheric fine particles to obtain coarse particlesExposure concentration T of the converted atmospheric fine particulate matter * The method comprises the steps of carrying out a first treatment on the surface of the Exposure concentration T of coarsened atmospheric fine particulate matter * For subsequent matching. The choice of the number of demarcation points exposed to coarsening is a trade-off between overall data balance and the amount of sample left after matching. In general, the roughened demarcation point is selected to be as large as possible on the premise of ensuring the sample size after matching, and in order to reduce the information of the sample size loss and increase the statistical efficiency and the representativeness of the research result, the sample size of more than 50% of the original data is generally reserved as much as possible. The invention can coarsen the exposure concentration T of the continuous atmospheric fine particles into the following steps: the three types of low atmospheric fine particulate matter exposure, medium atmospheric fine particulate matter exposure and high atmospheric fine particulate matter exposure are marked as T * . The coarsened cut-off point is the level of the three-digit number of the exposure concentration of the atmospheric fine particulate matter.
Step six: judging each layer s.epsilon.S (X * ) Whether or not to simultaneously contain all the classes T * If so, the layer and the samples therein are retained, otherwise, the layer is deleted. The set of the last remaining layers S 'is denoted S'. The individual remaining layer s' was restored to the original (pre-coarsening) atmospheric fine particulate exposure concentration T and confounding covariate X, yielding a matched dataset, designated a.
To facilitate an understanding of the generalized exact match (CEM) phase of contaminant exposure, we exemplify a sample of study sample size 10, incorporating only two confounding covariates, simplifying the main match procedure described above as follows:
1) A study sample was obtained as shown in table 3 below:
table 3 study samples table 2) coarsening of contaminants and covariates table 4 below:
table 4 coarsening table of contaminants and covariates
Deletion of layers as shown in fig. 3, the layer is deleted in the set S (X * ) In which it is determined whether each layer s contains all the categories T at the same time * If so, the layer is reserved, otherwise, the layer and the samples in the layer are deleted. The remaining layers form a set S *
The remaining intra-layer individuals recovered the pre-coarsening atmospheric fine particulate exposure concentration T and the confounding covariate X to give a matched dataset a as shown in table 5 below:
table 5 data table after matching
For the matched data set, the layer deletion reduces the imbalance of covariates to a certain extent, the extrapolation of effect estimation values in subsequent analysis is avoided, and meanwhile, the balance of hybrid covariates in each layer is ensured. Therefore, the matched data set can be similar to a random block test, so that unbalance of mixed covariates is reduced, model dependence in a subsequent analysis stage is reduced, and authenticity and robustness of a subsequent research result are improved. Compared with the existing continuous exposure matching strategy based on the generalized tendency score, the method has many advantages, does not need to repeatedly check the balance of covariates, and is simple to operate and easy to understand.
(II) analysis after matching
After the matched dataset a is obtained, a post-matching analysis follows, i.e. an estimation of the health hazard effect of atmospheric pollutants on blood pressure is completed in the matched dataset. In the analysis stage after matching, in order to ensure the balance of covariates, we propose an analysis strategy of 'in-layer estimation' and 'total merging', namely estimating the chronic health hazard of pollutants in each layer, and comprehensively obtaining the overall average health hazard effect. The chronic health hazard of the contaminant is estimated in layer s ' and the imbalance of the data covariates in layer s ' is not greater than the maximum imbalance level previously set by covariate coarsening, in other words, the balance of the covariates is satisfied for the data in layer s '.
The health hazard effect of the atmospheric pollutant hypertension is evaluated, and the specific implementation steps are as follows:
for the matched data set A obtained in the last step, based on the thought of the hierarchical structure, the matched data has the structure shown in fig. 2: the chronic health hazard of the pollutant is estimated in each layer of A by adopting the existing (generalized) linear mixed effect model (linear mixed effect model), and the overall average health hazard effect is obtained comprehensively, and the method specifically comprises the following steps:
the following linear mixed effect model was fitted:
Y s'i =β s' *t s'is'i
ε s'i ~N(0,σ 2 )
β s' ~N(μ,τ 2 )
wherein S 'i represents the individual at the S' layer, S '∈s'; y is Y s'i Outcome data representing individual i at the s' layer; t is t s'i Represents the exposure concentration of the atmospheric fine particulate matter, ε, of the individual i in the s' layer s'i For the error term, the obeying mean is 0, and the variance is sigma 2 Is a normal distribution of (2);
estimating the parameter beta by fitting the linear mixed effect model s' 、μ、σ 2 、τ 2 Wherein beta is s' The exposure concentration of the atmospheric fine particles in the s' layer was expressed as 1. Mu.g/m per rise 3 The resulting change in outcome data, i.e., the health hazard effect of long-term exposure of atmospheric particulates within the s' layer to blood pressure; beta s' Obeying mean value μ, variance τ 2 μ represents the overall health hazard effect of the long-term exposure of the atmospheric fine particulate matter on blood pressure obtained by integrating the health hazard effects of all layers, i.e., the final required calculation result.
In the above description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the above description is merely a preferred embodiment of the invention, which can be practiced in many other ways other than those described herein, and therefore the invention is not limited to the specific implementations disclosed above. It should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the scope and spirit of the principles of this disclosure.

Claims (3)

1. The atmospheric pollutant hazard assessment method based on continuous exposure generalized exact matching is characterized by comprising two stages: the first stage is a generalized exact matching stage of contaminant exposure, including coarsening of confounding covariates related to air contaminants and healthy outcomes, coarsening of contaminants, matching practices; the second stage is an analysis stage after matching, and the chronic health hazard effect caused by the atmospheric pollution is estimated by using the matching data set obtained in the first stage, wherein the first stage specifically comprises:
step S1: carrying out layered random sampling on target crowd to obtain sample data, wherein the sample data comprises individuals i, i epsilon {1,2, …, N }, and N is the number of samples;
step S2: each individual is subjected to a personal examination and questionnaire, and outcome data Y and confounding covariates X are obtained, where y= (Y) 1 ,Y 2 ,…,Y i ,…,Y N ),X=(X 1 ,X 2 ,…,X i ,…,X q ),X i For individual i related confounding covariates, X i =(X 1i ,X 2i ,…,X qi ) Wherein q is the number of covariates;
step S3: the address of the individual i is converted into corresponding longitude and latitude coordinates according to the following steps, then the nearest neighboring grid point is matched for each longitude and latitude coordinate according to the position of the longitude and latitude coordinate falling on the high-resolution air pollutant grid data, the pollutant exposure concentration of the nearest neighboring grid point is the atmospheric pollutant concentration of the individual, and the pollutant exposure concentration is marked as T i The method comprises the steps of carrying out a first treatment on the surface of the The same method is adopted to obtain the concentration set T, T= (T) 1 ,T 2 ,…,T i ,…,T N );
Step S4: coarsening all hybrid covariates X needing to be included in the matching, coarsening the category variable for the continuity variable, merging some adjacent categories for the ordered multi-category variable, and merging some categories into wider categories for the unordered category variable with more categoriesOtherwise, obtain coarsened mixed covariate X * ,X * =(X 1 * ,X 2 * ,…,X q * ) The method comprises the steps of carrying out a first treatment on the surface of the All confounding covariates X after coarsening * A group of layers is obtained by combining different values of H (X i * ) Representing the covariate X after coarsening i * All values of S (X) * ) Representing a set of layers consisting of different values of all coarsened covariates, S (X * )=H(X 1 * )*H(X 2 * )*,…,*H(X q * );
Individual i in the sample is based on the coarsened covariates X i * Is divided into corresponding layers, individuals in the same layer, X * The values are the same;
step S5: coarsening the atmospheric pollutant concentration T, wherein the coarsening demarcation point is the level of the three-quantiles of the exposure concentration of the atmospheric pollutant, so as to obtain the coarsened atmospheric pollutant concentration T *
Step S6: judging each layer s.epsilon.S (X * ) Whether or not to simultaneously contain all the classes T * If yes, reserving the layer and the samples in the layer, otherwise, deleting the layer, and finally marking the set formed by the layer S 'as S'; restoring the individual i in the remaining layer s' to the atmospheric contaminant concentration T before coarsening i Confounding covariates X i The matched data set with mixed covariate balance is obtained and is marked as A, wherein the A comprises all individuals in S';
the second stage specifically comprises:
step S7: for the matched data set A obtained in the last step, estimating the chronic health hazard of the pollutant in each layer of the A by adopting a linear mixed effect model, and comprehensively obtaining the overall average health hazard effect, wherein the method specifically comprises the following steps of:
the following linear mixed effect model was fitted:
Y s′i =β s′ *t s′is′i
ε s′i ~N(0,σ 2 )
β s′ ~N(μ,τ 2 )
wherein S 'i represents the individual at the S' layer, S '∈s'; y is Y s'i Outcome data representing individual i at the s' layer; t is t s'i Represents the atmospheric contaminant concentration, ε, of individual i at s' layer s′i For the error term, the obeying mean is 0, and the variance is sigma 2 Is a normal distribution of (2);
estimating the parameter beta by fitting the linear mixed effect model s′ 、μ、σ 2 、τ 2 Wherein beta is s' Represents each 1. Mu.g/m rise in atmospheric contaminant concentration within the s' layer 3 The resulting change in outcome data, i.e., the health hazard effect of long-term exposure of atmospheric contaminants within the s' layer to blood pressure; beta s' Obeying mean value μ, variance τ 2 μ represents the overall health hazard effect of the long-term exposure of atmospheric pollutants on blood pressure, which is obtained by integrating the health hazard effects of all layers, and is the final required calculation result.
2. The atmospheric contaminant hazard assessment method based on continuous exposure generalized exact match of claim 1, wherein said outcome data is blood pressure values, including systolic and diastolic pressures, and corresponding confounding covariates include basic demographic variables, socioeconomic variables, health behavioural variables, and health status variables; the atmospheric pollutant concentration is the exposure concentration of the atmospheric fine particulate matters, the sulfur dioxide concentration, the nitrogen oxide concentration or the ozone concentration.
3. The atmospheric pollution damage assessment method based on continuous exposure generalized exact match of claim 2, wherein the basic demographic variables include gender and age, the socioeconomic variables include educational level and average month income, the health behavioral variables include smoking status and drinking status, and the health status variables include body mass index and family history.
CN202110601307.7A 2021-05-31 2021-05-31 Atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match Active CN115481835B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110601307.7A CN115481835B (en) 2021-05-31 2021-05-31 Atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110601307.7A CN115481835B (en) 2021-05-31 2021-05-31 Atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match

Publications (2)

Publication Number Publication Date
CN115481835A CN115481835A (en) 2022-12-16
CN115481835B true CN115481835B (en) 2024-02-02

Family

ID=84419116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110601307.7A Active CN115481835B (en) 2021-05-31 2021-05-31 Atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match

Country Status (1)

Country Link
CN (1) CN115481835B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353923A (en) * 2013-06-26 2013-10-16 中山大学 Self-adaption spatial interpolation method and system based on spatial feature analysis
CN104636547A (en) * 2015-01-26 2015-05-20 北京师范大学 Life mode exposing modeling method and application of method to risk assessment
CN105678104A (en) * 2016-04-06 2016-06-15 电子科技大学成都研究院 Method for analyzing health data of old people on basis of Cox regression model
CN106446957A (en) * 2016-10-08 2017-02-22 常熟理工学院 Haze image classification method based on random forest
CN107609337A (en) * 2017-10-19 2018-01-19 中国疾病预防控制中心环境与健康相关产品安全所 A kind of air quality health index issue and personalized method for early warning
CN107798425A (en) * 2017-10-16 2018-03-13 中国科学院地理科学与资源研究所 A kind of space-time based on big data obscures degrees of exposure assessment system and method
CN110766257A (en) * 2018-07-28 2020-02-07 华中科技大学 Method for evaluating short-term exposure concentration of air pollutants of crowd

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353923A (en) * 2013-06-26 2013-10-16 中山大学 Self-adaption spatial interpolation method and system based on spatial feature analysis
CN104636547A (en) * 2015-01-26 2015-05-20 北京师范大学 Life mode exposing modeling method and application of method to risk assessment
CN105678104A (en) * 2016-04-06 2016-06-15 电子科技大学成都研究院 Method for analyzing health data of old people on basis of Cox regression model
CN106446957A (en) * 2016-10-08 2017-02-22 常熟理工学院 Haze image classification method based on random forest
CN107798425A (en) * 2017-10-16 2018-03-13 中国科学院地理科学与资源研究所 A kind of space-time based on big data obscures degrees of exposure assessment system and method
CN107609337A (en) * 2017-10-19 2018-01-19 中国疾病预防控制中心环境与健康相关产品安全所 A kind of air quality health index issue and personalized method for early warning
CN110766257A (en) * 2018-07-28 2020-02-07 华中科技大学 Method for evaluating short-term exposure concentration of air pollutants of crowd

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
" Long-term effects of ambient PM2.5 on hypertension in multi-ethnic population from Sichuan province, China: a study based on 2013 and 2018 health service surveys";Jiayue Xu等;《Environmental Science and Pollution Research》;摘要,数据,空气污染,协变量,统计分析,结果 *

Also Published As

Publication number Publication date
CN115481835A (en) 2022-12-16

Similar Documents

Publication Publication Date Title
Arias et al. A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data
Lima et al. The Berg Balance Scale as a clinical screening tool to predict fall risk in older adults: a systematic review
Wang et al. Using street view data and machine learning to assess how perception of neighborhood safety influences urban residents’ mental health
Zucker et al. Etiology of alcoholism reconsidered: The case for a biopsychosocial process.
Abbey et al. Estimated long-term ambient concentrations of PM10 and development of respiratory symptoms in a nonsmoking population
Leander et al. Determinants for a low health-related quality of life in asthmatics
Pastor et al. Longitudinal Rasch modeling in the context of psychotherapy outcomes assessment
Pastor-Bárcenas et al. Unbiased sensitivity analysis and pruning techniques in neural networks for surface ozone modelling
Jalaludin et al. Acute effects of low levels of ambient ozone on peak expiratory flow rate in a cohort of Australian children
Zhao et al. Investigating associations between anti-nuclear antibody positivity and combined long-term exposures to NO2, O3, and PM2. 5 using a Bayesian kernel machine regression approach
Mayo et al. Identifying response shift statistically at the individual level
Resseguier et al. Dealing with missing data in the Center for Epidemiologic Studies Depression self-report scale: a study based on the French E3N cohort
Waller et al. Chi-Square and T-Tests using SAS®: performance and interpretation
CN115481835B (en) Atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match
Kistnasamy et al. The relationship between asthma and ambient air pollutants among primary school students in Durban, South Africa
Florax et al. A spatial economic perspective on language acquisition: Segregation, networking, and assimilation of immigrants
McClellan et al. Critical considerations in evaluating scientific evidence of health effects of ambient ozone: a conference report
Cutright et al. Family planning program effects on the fertility of low-income US women
Motl et al. Confirmatory factor analysis of the physical self-efficacy scale with a college-aged sample of men and women
Nur et al. Correlation of Gadget Use on Social Behavior and Learning Interest of Elementary School Students
An et al. Impact of airborne pollen concentration and meteorological factors on the number of outpatients with allergic rhinitis
Malesza The reduced discounting inventory-construction and initial validation
CN113808393A (en) Method for eliminating influence of hybrid control object
Kamalı et al. Prevalence of asthma among children in an industrial town.
Willekens et al. Age-period-cohort (APC) analysis of mortality with applications to Soviet data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant