CN115376638A - Physiological characteristic data analysis method based on multi-source health perception data fusion - Google Patents

Physiological characteristic data analysis method based on multi-source health perception data fusion Download PDF

Info

Publication number
CN115376638A
CN115376638A CN202211005027.0A CN202211005027A CN115376638A CN 115376638 A CN115376638 A CN 115376638A CN 202211005027 A CN202211005027 A CN 202211005027A CN 115376638 A CN115376638 A CN 115376638A
Authority
CN
China
Prior art keywords
data
physiological characteristic
characteristic data
medical sensing
health perception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211005027.0A
Other languages
Chinese (zh)
Inventor
牛耕田
朱峰
李总池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN202211005027.0A priority Critical patent/CN115376638A/en
Publication of CN115376638A publication Critical patent/CN115376638A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a physiological characteristic data analysis method based on multi-source health perception data fusion, which comprises the following steps: acquiring health perception data; preprocessing the data; obtaining personal physique record data vectors; adding binary mask vectors to the preprocessed medical sensing data and splicing to obtain a medical sensing data matrix; on the basis of a gated cyclic unit network, hidden features of a medical sensing data matrix and a personal physique record data vector are fused and learned; constructing a physiological characteristic incidence matrix based on the physiological characteristic condition data; and converting the hidden features into the discrimination probability of multi-class physiological features through a full-connection network, and calculating to obtain a final physiological feature classification result. The method can be used for fusion learning of multi-source heterogeneous health perception data, and makes full use of potential correlation among physiological characteristic data. The method has a better data analysis effect in the calculation based on the massive health perception data.

Description

Physiological characteristic data analysis method based on multi-source health perception data fusion
Technical Field
The invention relates to a data classification method, in particular to a physiological characteristic data analysis method based on multi-source health perception data fusion.
Background
In recent years, sensing technology is rapidly developed in China, medical sensing instruments rise, and high and new technologies such as biosensing and the like appear and are applied, so that clinical monitoring equipment is continuously developed towards the direction of higher measuring speed and higher accuracy. Because of the large population of China, the number of people admitted to the hospital and treated is measured in hundreds of millions every year, a large number of Electronic Health Records (EHRs) containing medical sensing data, hospital information, personal physical condition and physiological characteristic data records and the like are necessarily generated, but the complexity of the Electronic Health Record data brings great difficulty to the processing and utilization of the data. With the continuous progress of artificial intelligence technology and the improvement of computer computing power, physiological characteristic data analysis and risk assessment by using the electronic health system become possible, and good development opportunity is provided for the intellectualization of medical data analysis.
To better utilize the large volume of electronic health record data, it is necessary to process and analyze data of different types of structures. Most of the previous works only aim at data with similar structures to be researched, for example, R Mohammad et al selects a large amount of time series physical sign data, and adopts a logistic regression and recurrent neural network model to predict data characteristics, ma L et al provides a model which can learn long-term and short-term changes of physiological characteristic data of a patient as clinical characteristics, and physiological characteristic data of the patient at different time stages are evaluated by using medical sensing data. Ayon si et al learn multiple measurement data using deep neural networks and use it for data prediction.
Although there is a lot of work on the analysis of physiological characteristic data by health perception data in the existing research works, the work only considers the use of a single type of data and the extraction of characteristics thereof, and does not consider the fusion of multiple data to promote synergy among the data. Moreover, as the medical sensing data in the electronic health record data is time sequence data acquired by various devices, and the sampling frequencies of different devices have large difference, the acquired data has heterogeneity in the time dimension; and the personal physique record data (such as age, sex and the like) is non-time-series data, and heterogeneity also exists between medical sensing data. How to model these heterogeneous data fusions is a difficult point of the physiological characteristic data analysis task.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of providing a physiological characteristic data analysis method based on multi-source health perception data fusion aiming at the defects of the prior art.
In order to solve the technical problem, the invention discloses a physiological characteristic data analysis method based on multi-source health perception data fusion, which comprises the following steps:
step 1, acquiring health perception data; the health awareness data includes: medical sensing data, personal physique record data and physiological characteristic condition data; recording the length of a time series of data acquisition of health awareness data; preprocessing the health perception data, eliminating noise data, and obtaining a total preprocessed sample, wherein the total number of the samples is N; obtaining a personal constitution record data vector X by the mathematical expression and definition of the personal constitution record data R
Step 2, filling the medical sensing data preprocessed in the step 1 into regular sequence data according to the same time interval, adding a binary mask vector indicating whether the characteristic value corresponding to each time interval is a real measurement value, splicing the binary mask vector with the processed medical sensing data to obtain a medical sensing data matrix X M
Step 3, fusing and learning medical sensing data matrix X based on the gated cyclic unit network M And personal physique record data vector X R Hidden feature H of T
Step 4, based on the physiological characteristic condition data X D Calculating the influence coefficient of any one type of physiological characteristic data j on another type of physiological characteristic data k by adopting a conditional probability method, and constructing a physiological characteristic incidence matrix I;
step 5, hiding the characteristic H through the full-connection network T The discrimination probability C of the data converted into multi-class physiological characteristic data is determined byAnd (5) calculating the probability C and the physiological characteristic incidence matrix I to obtain a final physiological characteristic data classification result.
The method for preprocessing the data to eliminate the noise data in the step 1 comprises the following specific processes:
step 1-1, removing health perception data samples which lack any one of medical sensing data, personal physique record data and physiological characteristic condition data in the health perception data;
step 1-2, deleting the time sequence length T of data acquisition in the remaining health perception data c Data for less than 24 hours;
and 1-3, performing normalization processing on the remaining data, and processing the discrete attribute data (converting the data preprocessing-class data processing class-fixed discrete characteristics into One-Hot-Encoding single-heat codes, which are common processing methods in the prior art) into single-heat-value data.
Obtaining the medical sensing data matrix X in step 2 of the present invention M The method comprises the following steps:
step 2-1, resampling the medical sensing data at intervals of 1 hour, and if a plurality of measurement values of the same characteristic exist in the same time interval, using the last measurement value;
step 2-2, for the medical sensing data with missing values, if the missing values have measured values in the previous time, replacing the missing values with the previous latest measured values, otherwise, using the preset values;
step 2-3, adding a binary mask vector to mark real data and filling data, splicing the binary mask vector with the medical sensing data processed in the step 2-1 and the step 2-2 to obtain a medical sensing data matrix
Figure BDA0003808713670000031
Figure BDA0003808713670000032
Wherein, T d Is the length of the time series after data processing, dm is the dimension of the medical sensing data feature,
Figure BDA0003808713670000033
representing a medical sensing data matrix vector space.
The fusion learning method in step 3 of the invention comprises the following steps:
step 3-1, initializing hidden state
Figure BDA0003808713670000034
Is a zero vector, at each time step T, where T =1,2 M A time series of (a); couple vectors
Figure BDA0003808713670000035
As input to the gated-cycle cell, is described
Figure BDA0003808713670000036
Wherein the content of the first and second substances,
Figure BDA0003808713670000037
the medical sensing data representing the time t,
Figure BDA0003808713670000038
data representing individual physical records, d r Recording data characteristic dimension and dynamic and static data fusion dimension d for personal physique in =2d m +d r The number of the hidden units is h, and the hidden state of the last time step is given
Figure BDA0003808713670000039
Then the door is reset
Figure BDA00038087136700000310
And a retrofit gate
Figure BDA00038087136700000311
Figure BDA00038087136700000312
Is calculated as follows:
R t =σ(X t W xr +H t-1 W hr +b r )
Z t =σ(X t W xz +H t-1 W hz +b z )
wherein, W xr ,W hr ,W xz ,W hz ,b r ,b z To learn the parameters, σ is the sigmoid function, and the variables are mapped to [0,1]In the middle of; the value range of each element in the reset gate and the update gate is [0,1 ]];
3-2, calculating candidate hidden states by a gating circulation unit and assisting subsequent hidden state calculation; candidate hidden states at time step t
Figure BDA00038087136700000316
Defined in the form:
Figure BDA00038087136700000313
wherein, W xh ,W hh ,b h To learn the parameters, the tanh activation function will candidate hidden states
Figure BDA00038087136700000314
Is mapped to [ -1,1 [ ]]Performing the following steps;
in this step, the reset gate determines whether to discard the hidden state of the previous time step; if the reset gate reaches the threshold, the last hidden state will be discarded; the reset gate discards historical information that is not relevant to the predicted future;
step 3-3, hiding the hidden state of the current time step t
Figure BDA00038087136700000315
Is defined as:
Figure BDA0003808713670000041
updating the door control hidden state; updating the candidate hidden state containing the current time step information;
step 3-4, continuously updating the time step of the gating cycle unit T, and enabling the hidden state of the last time step
Figure BDA0003808713670000042
And outputting the data as hidden features after the medical sensing data and the personal physique record data are fused.
Physiological characteristic condition data X in step 4 of the invention D In the step (1), the first step,
Figure BDA0003808713670000043
n is the number of samples of the physiological characteristic condition data, and K is the number of physiological characteristic data categories in the physiological characteristic condition data; in any type of physiological characteristic data j, j is more than or equal to 1 and less than or equal to K; in the other type of physiological characteristic data K, K is more than or equal to 1 and less than or equal to K.
The method for constructing the physiological characteristic incidence matrix I in the step 4 comprises the following steps:
step 4-1, counting the number S of positive samples of various physiological characteristic data k Wherein K =1,2, \8230andk, the calculation formula is:
Figure BDA0003808713670000044
wherein, y nk Whether the nth individual has the binary value of the physiological characteristic data k, 1 represents that the physiological characteristic data k exists, and 0 represents that the physiological characteristic data k does not exist; n represents the total number of samples after data preprocessing;
step 4-2, calculating the positive sample rate P (S) of various physiological characteristic data k ) The calculation formula is as follows:
Figure BDA0003808713670000045
4-3, counting the number of people with any two types of physiological characteristic data; order S jk Representing the number of individuals having both physiological characteristic data j and physiological characteristic data k, then:
Figure BDA0003808713670000046
Figure BDA0003808713670000047
wherein, y nj And y nk Binary values, P (S), of whether the nth individual has physiological characteristic data j and physiological characteristic data k, respectively jk ) Representing the common discrimination probability of the physiological characteristic data j and the physiological characteristic data k;
4-4, calculating a physiological characteristic incidence matrix I; because the data difference of different physiological characteristics is large, in order to avoid the problem that the numerical value of specific physiological characteristic data is small when calculating the correlation relationship, the conditional probability is adopted to calculate the influence I of the physiological characteristic data j on the physiological characteristic data k jk Namely:
Figure BDA0003808713670000051
I={I jk |1≤j,k≤K}
wherein, P (S) j ) Represents the probability of j physiological characteristic data in the total sample after the pretreatment in the step 1, P (S) jk ) Representing the probability of j physiological characteristic data and k physiological characteristic data in the total sample after the pretreatment in the step 1.
The method for obtaining the final physiological characteristic data classification result in the step 5 of the invention comprises the following steps:
step 5-1, hiding the feature H through the full-connection network T Converting the physiological characteristics into K-class physiological characteristics, namely class discrimination probability C, wherein the calculation formula is as follows:
C=f(W c H T +b c )
wherein f is a fully connected neural network, W c And b c To learn parameters, probability is discriminated
Figure BDA0003808713670000052
Step 5-2, performing matrix multiplication operation on the discrimination probability C and the physiological characteristic incidence matrix I, fusing the physiological characteristic data dependency relationship into a model, and obtaining a final physiological characteristic data classification result by using a sigmoid activation function
Figure BDA0003808713670000053
Figure BDA0003808713670000054
The medical sensing data in the invention comprises: measured capillary filling rate, inspired oxygen concentration, glasgow coma scale open eyes, glasgow coma scale motor response, glasgow coma total score, glasgow coma scale verbal response, diastolic pressure, systolic pressure, mean blood pressure, blood glucose, heart rate, blood oxygen saturation, respiratory rate, body temperature, height, weight, and alkalinity acidity.
The personal physique record data comprises: sex, age and race.
Has the advantages that:
1. aiming at the difficulties that health perception data in electronic health records are various in sources, complex in data structure and relevant among different types of data, a multi-source health perception data fusion model is adopted, fusion learning is carried out on the multi-source health perception data on the basis of a gate control cycle unit network architecture, hidden features of multi-source heterogeneous data are mined, and physiological feature data analysis is achieved.
2. In order to solve the problem of heterogeneity of multi-source medical sensing data and personal physique record data, data filling and mask operation is adopted to map heterogeneous data to the same representation space.
3. For the problems of physiological characteristic data correlation and physiological characteristic data mutual exclusion of a physiological characteristic data analysis task, a correlation matrix is constructed to correct the classification result, and the classification accuracy of the model is improved.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of the method of the present invention.
Fig. 3 is a schematic diagram of a medical sensing data and personal physique record data fusion learning method based on a gating cycle unit.
FIG. 4 is a graph of algorithm error versus one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In one embodiment, the invention provides a physiological characteristic data analysis method based on multi-source health perception data fusion, as shown in fig. 1, the method comprises the following steps:
step 1, acquiring medical sensing data and personal physique recording data, and preprocessing the data to eliminate noise data;
here, the health awareness data includes:
medical sensing data including capillary filling rate, inspired oxygen concentration, glasgow coma scale eye opening, glasgow coma scale motor response, glasgow coma total score, glasgow coma scale verbal response, diastolic pressure, systolic pressure, mean blood pressure, blood glucose, heart rate, blood oxygen saturation, respiratory rate, body temperature, height, weight, ph;
personal physical records data including gender, age, race;
step 2, filling the medical sensing data into regular sequence data according to the same time interval, and adding a binary mask vector indicating whether the characteristic value corresponding to each time interval is a real measurement value or notSplicing the medical sensing data matrix with the processed medical sensing data to obtain a medical sensing data matrix X with a corresponding size structure M
Step 3, fusing and learning medical sensing data matrix X based on the gated cyclic unit network M And personal physique record data vector X R Hidden feature H of T
Step 4, based on the physiological characteristic condition data X D Calculating the influence coefficient of any type of physiological characteristic data I on another type of physiological characteristic data j by adopting a conditional probability method, and constructing a physiological characteristic association matrix I;
step 5, hiding the characteristic H through the full-connection network T And converting the physiological characteristic data into the discrimination probability C of the multi-class physiological characteristic data, and multiplying the discrimination probability C by the physiological characteristic incidence matrix I to obtain a final physiological characteristic data classification result.
Further, in one embodiment, as shown in fig. 2:
in step 1, the data is preprocessed to eliminate noise data, and the specific process comprises the following steps:
step 1-1, removing a data sample lacking any one of medical sensing data, personal physique record data and physiological characteristic condition data;
step 1-2, deleting data with the time sequence length less than 24 hours in the outlier data and the residual data;
and 1-3, performing normalization processing on the residual data, and processing the discrete attribute data into single heat value data in a mode of converting the classified discrete characteristics into One-Hot-Encoding single heat codes.
Further, in one embodiment, the populating of the medical sensing data into the regular sequence data at the same time interval in step 2, and adding a binary mask vector indicating whether the feature value corresponding to each time interval is a true measurement value, includes:
step 2-1, resampling the medical sensing data at intervals of 1 hour, and if a plurality of measurement values of the same characteristic exist in the same time interval, using the last measurement value;
step 2-2, for the data with missing values, if the missing values have measured values in the previous time, replacing the missing values with the previous latest measured values, otherwise, using the preset values;
step 2-3, adding binary mask vectors to mark real data and filling data, splicing the real data and the filling data with the processed medical sensing data to obtain a medical sensing data matrix
Figure BDA0003808713670000071
T is the length of the time series, d m Is a medical sensing data feature dimension.
Further, in one embodiment, the hidden features of the medical sensing data matrix and the personal physique record data vector are fusion-learned based on the gated cyclic unit network in step 3, as shown in fig. 3, and the specific learning process is as follows:
step 3-1, initializing hidden state
Figure BDA0003808713670000072
For zero vectors, at each time step T, (T =1, 2.. Eta., T), the vectors are paired
Figure BDA0003808713670000073
As inputs to gated cyclic units, note
Figure BDA0003808713670000074
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003808713670000075
the medical sensing data representing the time t,
Figure BDA0003808713670000076
data representing personal physical records, d r Recording data characteristic dimensions for individual constitution, d in =2d m +d r . Assuming the number of the hidden units is h, the hidden state of the last time step is given
Figure BDA0003808713670000077
Then the door is reset
Figure BDA0003808713670000078
And a retrofit gate
Figure BDA0003808713670000079
Is calculated as follows:
R t =σ(X t W xr +H t-1 W hr +b r )
Z t =σ(X t W xz +H t-1 W hz +b z )
wherein, W xr ,W hr ,W xz ,W hz ,b r ,b z For learnable parameters, σ is a sigmoid function, mapping variables to [0,1]In the meantime. Thus, the value range of each element in the reset gate and the update gate is [0,1 ]]。
Step 3-2, the gated loop unit computes candidate hidden states to assist in later hidden state computations. In this step, the reset gate determines whether to discard the hidden state at the previous time step. Candidate hidden states at time step t
Figure BDA0003808713670000081
Is defined as follows:
Figure BDA0003808713670000082
wherein, W xh ,W hh ,b h For learnable parameters, the tanh activation function will candidate hidden states
Figure BDA0003808713670000088
Is mapped to [ -1,1]In (1). The reset gate is a vector composed of 0 to 1 for measuring the gating opening size, if the corresponding gating value of the reset gate is 0, the information of the element is completely forgotten, and if the reset gate is less than 0.1, the last hidden state is discarded. Thus, the reset gate can be discarded andfuture irrelevant historical information is predicted.
Step 3-3, hiding the hidden state of the current time step t
Figure BDA0003808713670000083
Is defined as:
Figure BDA0003808713670000084
as can be seen from the formula, the update gate may control how the hidden state should be updated by the candidate hidden state containing the current time step information.
Step 3-4, continuously updating the time step of the gate control cycle unit T, and hiding the hidden state of the last time step
Figure BDA0003808713670000085
And outputting the data as a hidden feature after the sensing data and the recording data are fused.
Further, in one embodiment, the physiological characteristic condition data of step 4
Figure BDA0003808713670000086
N is the number of data samples, K is the number of physiological characteristic data categories, the influence coefficient of any one type of physiological characteristic data j (j is more than or equal to 1 and less than or equal to K) on another type of physiological characteristic data K (K is more than or equal to 1 and less than or equal to K) is calculated by adopting a conditional probability method, and a physiological characteristic incidence matrix I is constructed, and the method specifically comprises the following steps:
step 4-1, counting the number S of positive samples of various physiological characteristic data k (K =1,2, \ 8230;, K), the calculation formula is:
Figure BDA0003808713670000087
wherein, y nk Is the binary value of whether the nth individual has the physiological characteristic data k, 1 represents that the nth individual has the physiological characteristic data k, and 0 is vice versa.
Step 4-2, calculating the positive values of various physiological characteristic dataSample rate P (S) k ) The calculation formula is as follows:
Figure BDA0003808713670000091
and 4-3, counting the number of people with any two types of physiological characteristic data. Order S jk Representing the number of individuals having both physiological characteristic data j and physiological characteristic data k, then:
Figure BDA0003808713670000092
Figure BDA0003808713670000093
wherein, y nj And y nk Whether the nth individual has the binary values of the physiological characteristic data j and the physiological characteristic data k, respectively, P (S) jk ) The common discrimination probability of the physiological characteristic data j and the physiological characteristic data k is represented.
And 4-4, calculating a physiological characteristic incidence matrix I. Because the category characteristics of different physiological characteristic data have larger difference, in order to avoid the problem that the numerical value of the physiological characteristic data of a specific category is smaller when calculating the correlation relationship, the influence I of the physiological characteristic data j on the physiological characteristic data k is calculated by adopting the conditional probability jk Namely:
Figure BDA0003808713670000094
I={I jk |1≤j,k≤K}
further, in one embodiment, step 5 hides feature H over a fully connected network T Converting the physiological characteristic data into discrimination probability C of multi-class physiological characteristic data, and multiplying the discrimination probability C by a physiological characteristic data association matrix I to obtain a final physiological characteristic data classification result, wherein the specific method comprises the following steps:
step 5-1, byFully connected network will hide feature H T The discrimination probability C of the class converted into the K-class physiological characteristic data is calculated by the following formula:
C=f(W c H T +b c )
wherein f is a fully-connected neural network, W c And b c To learn parameters, determine probabilities
Figure BDA0003808713670000095
Step 5-2, performing matrix multiplication operation on the discrimination probability C and the physiological characteristic incidence matrix I, fusing the physiological characteristic data dependency relationship into a model, and obtaining a final physiological characteristic data classification result by using a sigmoid activation function
Figure BDA0003808713670000096
Figure BDA0003808713670000097
As a specific example, in one of the embodiments, the invention is further described.
In this example, a large medical clinical database MIMIC-III (V1.4) in the united states was selected to perform the experiment to test the model of the present invention, wherein the database contains 60,000 pieces of hospitalization data of over 40,000 adult patients (16 years and over) who entered the intensive care unit between 2001 and 2012, and each record contains a plurality of medical sensing data, hospitalization record data and physiological characteristic status data. Statistically, only 881 patients (about 2.10%) had medical sensing data of an ultra-long time series (i.e., time series length > 480). With less impact on model accuracy, the experiment truncates the sample length to a reasonable limit (i.e., 480) to reduce the time and space overhead of model training.
In this embodiment, different currently popular physiological characteristic data classification methods are selected as comparison methods to perform comparison experiments: logistic Regression (LR), attention-based clinical time series analysis (SAnD), long-short term memory recurrent neural networks (LSTM), interpretable clinical health status characterization learning based on scale-adaptive feature extraction and recalibration (AdaCare), clinical time series analysis migratory learning based on deep neural networks (TimeNet-Eps). The method of the present invention is denoted MHSDF.
FIG. 4 is a graph showing comparison between algorithm errors in one embodiment, in which Micro AUC-ROC, macro AUC-ROC, and Weighted AUC-ROC are used as evaluation indexes, and physiological characteristic data classification is performed using different methods, respectively, the horizontal axis represents different physiological characteristic data classification methods, and the vertical axis represents evaluation index values (Micro AUC-ROC, macro AUC-ROC, weighted AUC-ROC). It can be seen that: the MHSDF model proposed by the invention is superior to other methods.
Among them, LR performance is the worst because it is only applicable to feature extraction by statistical methods, and the method of manually extracting features ignores the precedence relationship of time dimensions. The second worse is the SAnD algorithm, which models clinical time series data with a masked self-attention mechanism and learns the time characteristics of the data using position coding and a dense interpolation strategy, but this method has a limitation in extracting long-time characteristics. The LSTM, adaCare and TimeNet-Eps methods are algorithms based on a recurrent neural network, and have relatively good performance, but are still lower than the MFCFP method of the invention. This is because the LSTM and AdaCare methods do not take into account the influence of individual physical conditions on physiological characteristic data and the mutual influence between various physiological characteristic data, whereas the TimeNet-Eps method cannot distinguish between missing values and measured values in time series data when mapping variable-length time series to fixed-dimension feature vectors. This demonstrates that the method of the present invention is effective for physiological characteristic data classification from multi-source health perception data.
The method can be used for fusion learning of multi-source heterogeneous health perception data, and makes full use of potential correlation among physiological characteristic data. The method has better detection effect in the calculation based on the massive health perception data, and compared with other related algorithms, the method further verifies that the method can classify the physiological characteristic data more accurately.
In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and the computer program, when executed by the data processing unit, may execute the inventive content of the physiological characteristic data analysis method based on multi-source health awareness data fusion provided by the present invention and some or all of the steps in each embodiment. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
It is obvious to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a computer program, that is, a software product, which may be stored in a storage medium and include several instructions for enabling a device (which may be a personal computer, a server, a single chip microcomputer, an MUU, or a network device, etc.) including a data processing unit to execute the method according to each embodiment or some portions of the embodiments of the present invention.
The invention provides a thought and a method of a physiological characteristic data analysis method based on multi-source health perception data fusion, and a method and a way for realizing the technical scheme are many. All the components not specified in this embodiment can be implemented by the prior art.

Claims (10)

1. A physiological characteristic data analysis method based on multi-source health perception data fusion is characterized by comprising the following steps:
step 1, obtaining health perception numberAccordingly; the health awareness data includes: medical sensing data, personal physique record data and physiological characteristic status data; recording the time series length of data acquisition of the health perception data; preprocessing the health perception data, and eliminating noise data to obtain a total preprocessed sample, wherein the total number of the samples is N; obtaining individual constitution record data vector X by mathematically expressing and defining the individual constitution record data R
Step 2, filling the medical sensing data preprocessed in the step 1 into regular sequence data according to the same time interval, adding a binary mask vector indicating whether the characteristic value corresponding to each time interval is a real measurement value, splicing the binary mask vector with the processed medical sensing data to obtain a medical sensing data matrix X M
Step 3, fusing and learning medical sensing data matrix X based on the gated cyclic unit network M And personal physique record data vector X R Hidden feature H of T
Step 4, based on the physiological characteristic condition data X D Calculating the influence coefficient of any type of physiological characteristic data j on another type of physiological characteristic data k by adopting a conditional probability method, and constructing a physiological characteristic incidence matrix I;
step 5, hiding the characteristic H through the full-connection network T And converting the physiological characteristic data into the discrimination probability C of the multi-class physiological characteristic data, and calculating the discrimination probability C and the physiological characteristic incidence matrix I to obtain a final physiological characteristic data classification result.
2. The method for analyzing physiological feature data based on multi-source health perception data fusion according to claim 1, wherein the preprocessing of the data in step 1 is performed to eliminate noise data, and the specific process includes:
step 1-1, removing health perception data samples which lack any one of medical sensing data, personal physique record data and physiological characteristic condition data in the health perception data;
step 1-2, deleting data acquisition in the remaining health perception dataTime series length T of c Data for less than 24 hours;
and 1-3, performing normalization processing on the residual data, and processing the discrete attribute data into single-calorific-value data.
3. The method for analyzing physiological feature data based on multi-source health perception data fusion according to claim 2, wherein the medical sensing data matrix X is obtained in step 2 M The method comprises the following steps:
step 2-1, resampling the medical sensing data at 1 hour intervals, and if a plurality of measurement values with the same characteristic exist in the same time interval, using the last measurement value;
step 2-2, for the medical sensing data with missing values, if the missing values have measured values in the previous time, replacing the missing values with the previous latest measured values, otherwise, using the preset values;
step 2-3, adding a binary mask vector to mark real data and filling data, splicing the binary mask vector with the medical sensing data processed in the step 2-1 and the step 2-2 to obtain a medical sensing data matrix
Figure FDA0003808713660000021
Figure FDA0003808713660000022
Wherein, T d Length of time series after data processing, d m For the feature dimensions of the medical sensing data,
Figure FDA0003808713660000023
representing a medical sensing data matrix vector space.
4. The physiological feature data analysis method based on multi-source health perception data fusion according to claim 3, wherein the fusion learning method in step 3 comprises:
step 3-1, beginningInitializing hidden states
Figure FDA0003808713660000024
Is a zero vector, at each time step T, where T =1,2 M A time series of (a); couple vectors
Figure FDA0003808713660000025
As input to the gated-cycle cell, is described
Figure FDA0003808713660000026
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003808713660000027
the medical sensing data representing the time t,
Figure FDA0003808713660000028
data representing individual physical records, d r Recording data characteristic dimension and dynamic and static data fusion dimension d for personal physique in =2d m +d r The number of the hidden units is h, and the hidden state of the last time step is given
Figure FDA0003808713660000029
Then the door is reset
Figure FDA00038087136600000210
And a retrofit gate
Figure FDA00038087136600000211
Figure FDA00038087136600000212
Is calculated as follows:
R t =σ(X t W xr +H t-1 W hr +b r )
Z t =σ(X t W xz +H t-1 W hz +b z )
wherein, W xr ,W hr ,W xz ,W hz ,b r ,b z To learn the parameters, σ is a sigmoid function, mapping the variables to [0,1]In the middle of; the value range of each element in the reset gate and the update gate is [0,1 ]];
3-2, calculating candidate hidden states by a gating circulation unit and assisting subsequent hidden state calculation; candidate hidden states at time step t
Figure FDA00038087136600000213
Defined in the form:
Figure FDA00038087136600000214
wherein, W xh ,W hh ,b h To learn the parameters, the tanh activation function assigns candidate hidden states
Figure FDA00038087136600000215
Is mapped to [ -1,1 [ ]]Performing the following steps;
in this step, the reset gate determines whether to discard the hidden state of the previous time step; if the reset gate reaches the threshold, the last hidden state will be discarded; the reset gate discards historical information that is not relevant to the predicted future;
step 3-3, hiding the hidden state of the current time step t
Figure FDA00038087136600000216
Is defined as:
Figure FDA0003808713660000031
updating the door control hidden state; updating the hidden state to a candidate hidden state containing the current time step information;
step 3-4, continuously updating the last time through the time step of the gating cycle unit THidden state of step
Figure FDA0003808713660000032
And outputting the data as hidden features after the medical sensing data and the personal physique record data are fused.
5. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 4, wherein the physiological characteristic condition data X in the step 4 D In (1),
Figure FDA0003808713660000033
n is the number of samples of the physiological characteristic condition data, and K is the number of categories of the physiological characteristic data in the physiological characteristic condition data; in any type of physiological characteristic data j, j is more than or equal to 1 and less than or equal to K; in the other type of physiological characteristic data K, K is more than or equal to 1 and less than or equal to K.
6. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 5, wherein the method for constructing the physiological characteristic incidence matrix I in the step 4 comprises:
step 4-1, counting the number S of positive samples of various physiological characteristic data k Wherein K =1,2, \8230andk, the calculation formula is:
Figure FDA0003808713660000034
wherein, y nk Whether the nth individual has a binary value of the physiological characteristic data k, 1 represents that the data has the physiological characteristic data, and 0 represents that the data does not have the physiological characteristic data; n represents the total number of samples after data preprocessing;
step 4-2, calculating positive sample rate P (S) of various physiological characteristic data k ) The calculation formula is as follows:
Figure FDA0003808713660000035
4-3, counting the number of people with any two types of physiological characteristic data; order S jk Representing the number of individuals having both physiological characteristic data j and physiological characteristic data k, then:
Figure FDA0003808713660000036
Figure FDA0003808713660000037
wherein, y nj And y nk Whether the nth individual has the binary values of the physiological characteristic data j and the physiological characteristic data k, respectively, P (S) jk ) Representing the common discrimination probability of the physiological characteristic data j and the physiological characteristic data k;
4-4, calculating a physiological characteristic incidence matrix I; calculating the influence I of the physiological characteristic data j on the physiological characteristic data k by adopting the conditional probability jk Namely:
Figure FDA0003808713660000041
I={I jk |1≤j,k≤K}
wherein, P (S) j ) Represents the probability of j physiological characteristic data in the total sample after the pretreatment in the step 1, P (S) jk ) Representing the probability of j physiological characteristic data and k physiological characteristic data in the total sample after the pretreatment in the step 1.
7. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 6, wherein the method for obtaining the final physiological characteristic data classification result in step 5 comprises:
step 5-1, hiding the feature H through the full-connection network T Conversion to class K physiological characteristicsThe discrimination probability C of the data category is calculated by the following formula:
C=f(W c H T +b c )
wherein f is a fully connected neural network, W c And b c For learning parameters, probability is discriminated
Figure FDA0003808713660000042
Step 5-2, performing matrix multiplication operation on the discrimination probability C and the physiological characteristic incidence matrix I, fusing the physiological characteristic data dependency relationship into a model, and obtaining a final physiological characteristic data classification result by using a sigmoid activation function
Figure FDA0003808713660000043
Figure FDA0003808713660000044
8. The method for analyzing physiological feature data based on multi-source health perception data fusion according to claim 7, wherein the medical sensing data comprises: measured capillary filling rate, inspired oxygen concentration, glasgow coma scale open eyes, glasgow coma scale motor response, glasgow coma total score, glasgow coma scale verbal response, diastolic pressure, systolic pressure, mean blood pressure, blood glucose, heart rate, blood oxygen saturation, respiratory rate, body temperature, height, weight, and alkalinity acidity.
9. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 8, wherein the personal physique record data includes: sex, age and race.
10. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 9, wherein the final physiological characteristic data classification result obtained in step 5-2 is the physiological characteristic data analysis based on multi-source health perception data fusion.
CN202211005027.0A 2022-08-22 2022-08-22 Physiological characteristic data analysis method based on multi-source health perception data fusion Pending CN115376638A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211005027.0A CN115376638A (en) 2022-08-22 2022-08-22 Physiological characteristic data analysis method based on multi-source health perception data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211005027.0A CN115376638A (en) 2022-08-22 2022-08-22 Physiological characteristic data analysis method based on multi-source health perception data fusion

Publications (1)

Publication Number Publication Date
CN115376638A true CN115376638A (en) 2022-11-22

Family

ID=84067848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211005027.0A Pending CN115376638A (en) 2022-08-22 2022-08-22 Physiological characteristic data analysis method based on multi-source health perception data fusion

Country Status (1)

Country Link
CN (1) CN115376638A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116911354A (en) * 2023-09-14 2023-10-20 首都信息发展股份有限公司 Encoder neural network model construction method and data processing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116911354A (en) * 2023-09-14 2023-10-20 首都信息发展股份有限公司 Encoder neural network model construction method and data processing method

Similar Documents

Publication Publication Date Title
JP6522161B2 (en) Medical data analysis method based on deep learning and intelligent analyzer thereof
CN109036553B (en) Disease prediction method based on automatic extraction of medical expert knowledge
CN110020623B (en) Human body activity recognition system and method based on conditional variation self-encoder
CN113040711B (en) Cerebral apoplexy incidence risk prediction system, equipment and storage medium
CN106934235A (en) Patient's similarity measurement migratory system between a kind of disease areas based on transfer learning
CN111180068A (en) Chronic disease prediction system based on multi-task learning model
CN113096818B (en) Method for evaluating occurrence probability of acute diseases based on ODE and GRUD
CN108399434B (en) Analysis and prediction method of high-dimensional time series data based on feature extraction
CN110659677A (en) Human body falling detection method based on movable sensor combination equipment
CN114512239B (en) Cerebral apoplexy risk prediction method and system based on transfer learning
CN110767279A (en) Electronic health record missing data completion method and system based on LSTM
Wang et al. Diabetes Risk Analysis Based on Machine Learning LASSO Regression Model
CN110491506A (en) Auricular fibrillation prediction model and its forecasting system
CN115376638A (en) Physiological characteristic data analysis method based on multi-source health perception data fusion
CN114504298B (en) Physiological characteristic discriminating method and system based on multisource health perception data fusion
CN114191665A (en) Method and device for classifying man-machine asynchronous phenomena in mechanical ventilation process
CN112259232B (en) VTE risk automatic evaluation system based on deep learning
Li et al. Predicting Parkinson's Disease with Multimodal Irregularly Collected Longitudinal Smartphone Data
Siddiqa et al. Robust Length of Stay Prediction Model for Indoor Patients.
CN115147768B (en) Fall risk assessment method and system
CN115171896A (en) System and method for predicting long-term death risk of critically ill patient
CN114613497A (en) Intelligent medical auxiliary diagnosis method of patient sample based on GBDT sample level
CN111466877B (en) LSTM network-based oxygen reduction state prediction method
CN111243697A (en) Method and system for judging target object data based on neural network
CN115512833B (en) Establishment of long-term cost effectiveness prediction system for lung cancer patient based on deep learning Markov framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination