CN115376638A - Physiological characteristic data analysis method based on multi-source health perception data fusion - Google Patents
Physiological characteristic data analysis method based on multi-source health perception data fusion Download PDFInfo
- Publication number
- CN115376638A CN115376638A CN202211005027.0A CN202211005027A CN115376638A CN 115376638 A CN115376638 A CN 115376638A CN 202211005027 A CN202211005027 A CN 202211005027A CN 115376638 A CN115376638 A CN 115376638A
- Authority
- CN
- China
- Prior art keywords
- data
- physiological characteristic
- characteristic data
- medical sensing
- health perception
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention discloses a physiological characteristic data analysis method based on multi-source health perception data fusion, which comprises the following steps: acquiring health perception data; preprocessing the data; obtaining personal physique record data vectors; adding binary mask vectors to the preprocessed medical sensing data and splicing to obtain a medical sensing data matrix; on the basis of a gated cyclic unit network, hidden features of a medical sensing data matrix and a personal physique record data vector are fused and learned; constructing a physiological characteristic incidence matrix based on the physiological characteristic condition data; and converting the hidden features into the discrimination probability of multi-class physiological features through a full-connection network, and calculating to obtain a final physiological feature classification result. The method can be used for fusion learning of multi-source heterogeneous health perception data, and makes full use of potential correlation among physiological characteristic data. The method has a better data analysis effect in the calculation based on the massive health perception data.
Description
Technical Field
The invention relates to a data classification method, in particular to a physiological characteristic data analysis method based on multi-source health perception data fusion.
Background
In recent years, sensing technology is rapidly developed in China, medical sensing instruments rise, and high and new technologies such as biosensing and the like appear and are applied, so that clinical monitoring equipment is continuously developed towards the direction of higher measuring speed and higher accuracy. Because of the large population of China, the number of people admitted to the hospital and treated is measured in hundreds of millions every year, a large number of Electronic Health Records (EHRs) containing medical sensing data, hospital information, personal physical condition and physiological characteristic data records and the like are necessarily generated, but the complexity of the Electronic Health Record data brings great difficulty to the processing and utilization of the data. With the continuous progress of artificial intelligence technology and the improvement of computer computing power, physiological characteristic data analysis and risk assessment by using the electronic health system become possible, and good development opportunity is provided for the intellectualization of medical data analysis.
To better utilize the large volume of electronic health record data, it is necessary to process and analyze data of different types of structures. Most of the previous works only aim at data with similar structures to be researched, for example, R Mohammad et al selects a large amount of time series physical sign data, and adopts a logistic regression and recurrent neural network model to predict data characteristics, ma L et al provides a model which can learn long-term and short-term changes of physiological characteristic data of a patient as clinical characteristics, and physiological characteristic data of the patient at different time stages are evaluated by using medical sensing data. Ayon si et al learn multiple measurement data using deep neural networks and use it for data prediction.
Although there is a lot of work on the analysis of physiological characteristic data by health perception data in the existing research works, the work only considers the use of a single type of data and the extraction of characteristics thereof, and does not consider the fusion of multiple data to promote synergy among the data. Moreover, as the medical sensing data in the electronic health record data is time sequence data acquired by various devices, and the sampling frequencies of different devices have large difference, the acquired data has heterogeneity in the time dimension; and the personal physique record data (such as age, sex and the like) is non-time-series data, and heterogeneity also exists between medical sensing data. How to model these heterogeneous data fusions is a difficult point of the physiological characteristic data analysis task.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of providing a physiological characteristic data analysis method based on multi-source health perception data fusion aiming at the defects of the prior art.
In order to solve the technical problem, the invention discloses a physiological characteristic data analysis method based on multi-source health perception data fusion, which comprises the following steps:
Step 4, based on the physiological characteristic condition data X D Calculating the influence coefficient of any one type of physiological characteristic data j on another type of physiological characteristic data k by adopting a conditional probability method, and constructing a physiological characteristic incidence matrix I;
step 5, hiding the characteristic H through the full-connection network T The discrimination probability C of the data converted into multi-class physiological characteristic data is determined byAnd (5) calculating the probability C and the physiological characteristic incidence matrix I to obtain a final physiological characteristic data classification result.
The method for preprocessing the data to eliminate the noise data in the step 1 comprises the following specific processes:
step 1-1, removing health perception data samples which lack any one of medical sensing data, personal physique record data and physiological characteristic condition data in the health perception data;
step 1-2, deleting the time sequence length T of data acquisition in the remaining health perception data c Data for less than 24 hours;
and 1-3, performing normalization processing on the remaining data, and processing the discrete attribute data (converting the data preprocessing-class data processing class-fixed discrete characteristics into One-Hot-Encoding single-heat codes, which are common processing methods in the prior art) into single-heat-value data.
Obtaining the medical sensing data matrix X in step 2 of the present invention M The method comprises the following steps:
step 2-1, resampling the medical sensing data at intervals of 1 hour, and if a plurality of measurement values of the same characteristic exist in the same time interval, using the last measurement value;
step 2-2, for the medical sensing data with missing values, if the missing values have measured values in the previous time, replacing the missing values with the previous latest measured values, otherwise, using the preset values;
step 2-3, adding a binary mask vector to mark real data and filling data, splicing the binary mask vector with the medical sensing data processed in the step 2-1 and the step 2-2 to obtain a medical sensing data matrix Wherein, T d Is the length of the time series after data processing, dm is the dimension of the medical sensing data feature,representing a medical sensing data matrix vector space.
The fusion learning method in step 3 of the invention comprises the following steps:
step 3-1, initializing hidden stateIs a zero vector, at each time step T, where T =1,2 M A time series of (a); couple vectorsAs input to the gated-cycle cell, is describedWherein the content of the first and second substances,the medical sensing data representing the time t,data representing individual physical records, d r Recording data characteristic dimension and dynamic and static data fusion dimension d for personal physique in =2d m +d r The number of the hidden units is h, and the hidden state of the last time step is givenThen the door is resetAnd a retrofit gate Is calculated as follows:
R t =σ(X t W xr +H t-1 W hr +b r )
Z t =σ(X t W xz +H t-1 W hz +b z )
wherein, W xr ,W hr ,W xz ,W hz ,b r ,b z To learn the parameters, σ is the sigmoid function, and the variables are mapped to [0,1]In the middle of; the value range of each element in the reset gate and the update gate is [0,1 ]];
3-2, calculating candidate hidden states by a gating circulation unit and assisting subsequent hidden state calculation; candidate hidden states at time step tDefined in the form:
wherein, W xh ,W hh ,b h To learn the parameters, the tanh activation function will candidate hidden statesIs mapped to [ -1,1 [ ]]Performing the following steps;
in this step, the reset gate determines whether to discard the hidden state of the previous time step; if the reset gate reaches the threshold, the last hidden state will be discarded; the reset gate discards historical information that is not relevant to the predicted future;
updating the door control hidden state; updating the candidate hidden state containing the current time step information;
step 3-4, continuously updating the time step of the gating cycle unit T, and enabling the hidden state of the last time stepAnd outputting the data as hidden features after the medical sensing data and the personal physique record data are fused.
Physiological characteristic condition data X in step 4 of the invention D In the step (1), the first step,n is the number of samples of the physiological characteristic condition data, and K is the number of physiological characteristic data categories in the physiological characteristic condition data; in any type of physiological characteristic data j, j is more than or equal to 1 and less than or equal to K; in the other type of physiological characteristic data K, K is more than or equal to 1 and less than or equal to K.
The method for constructing the physiological characteristic incidence matrix I in the step 4 comprises the following steps:
step 4-1, counting the number S of positive samples of various physiological characteristic data k Wherein K =1,2, \8230andk, the calculation formula is:
wherein, y nk Whether the nth individual has the binary value of the physiological characteristic data k, 1 represents that the physiological characteristic data k exists, and 0 represents that the physiological characteristic data k does not exist; n represents the total number of samples after data preprocessing;
step 4-2, calculating the positive sample rate P (S) of various physiological characteristic data k ) The calculation formula is as follows:
4-3, counting the number of people with any two types of physiological characteristic data; order S jk Representing the number of individuals having both physiological characteristic data j and physiological characteristic data k, then:
wherein, y nj And y nk Binary values, P (S), of whether the nth individual has physiological characteristic data j and physiological characteristic data k, respectively jk ) Representing the common discrimination probability of the physiological characteristic data j and the physiological characteristic data k;
4-4, calculating a physiological characteristic incidence matrix I; because the data difference of different physiological characteristics is large, in order to avoid the problem that the numerical value of specific physiological characteristic data is small when calculating the correlation relationship, the conditional probability is adopted to calculate the influence I of the physiological characteristic data j on the physiological characteristic data k jk Namely:
I={I jk |1≤j,k≤K}
wherein, P (S) j ) Represents the probability of j physiological characteristic data in the total sample after the pretreatment in the step 1, P (S) jk ) Representing the probability of j physiological characteristic data and k physiological characteristic data in the total sample after the pretreatment in the step 1.
The method for obtaining the final physiological characteristic data classification result in the step 5 of the invention comprises the following steps:
step 5-1, hiding the feature H through the full-connection network T Converting the physiological characteristics into K-class physiological characteristics, namely class discrimination probability C, wherein the calculation formula is as follows:
C=f(W c H T +b c )
wherein f is a fully connected neural network, W c And b c To learn parameters, probability is discriminated
Step 5-2, performing matrix multiplication operation on the discrimination probability C and the physiological characteristic incidence matrix I, fusing the physiological characteristic data dependency relationship into a model, and obtaining a final physiological characteristic data classification result by using a sigmoid activation function
The medical sensing data in the invention comprises: measured capillary filling rate, inspired oxygen concentration, glasgow coma scale open eyes, glasgow coma scale motor response, glasgow coma total score, glasgow coma scale verbal response, diastolic pressure, systolic pressure, mean blood pressure, blood glucose, heart rate, blood oxygen saturation, respiratory rate, body temperature, height, weight, and alkalinity acidity.
The personal physique record data comprises: sex, age and race.
Has the advantages that:
1. aiming at the difficulties that health perception data in electronic health records are various in sources, complex in data structure and relevant among different types of data, a multi-source health perception data fusion model is adopted, fusion learning is carried out on the multi-source health perception data on the basis of a gate control cycle unit network architecture, hidden features of multi-source heterogeneous data are mined, and physiological feature data analysis is achieved.
2. In order to solve the problem of heterogeneity of multi-source medical sensing data and personal physique record data, data filling and mask operation is adopted to map heterogeneous data to the same representation space.
3. For the problems of physiological characteristic data correlation and physiological characteristic data mutual exclusion of a physiological characteristic data analysis task, a correlation matrix is constructed to correct the classification result, and the classification accuracy of the model is improved.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of the method of the present invention.
Fig. 3 is a schematic diagram of a medical sensing data and personal physique record data fusion learning method based on a gating cycle unit.
FIG. 4 is a graph of algorithm error versus one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In one embodiment, the invention provides a physiological characteristic data analysis method based on multi-source health perception data fusion, as shown in fig. 1, the method comprises the following steps:
here, the health awareness data includes:
medical sensing data including capillary filling rate, inspired oxygen concentration, glasgow coma scale eye opening, glasgow coma scale motor response, glasgow coma total score, glasgow coma scale verbal response, diastolic pressure, systolic pressure, mean blood pressure, blood glucose, heart rate, blood oxygen saturation, respiratory rate, body temperature, height, weight, ph;
personal physical records data including gender, age, race;
Step 4, based on the physiological characteristic condition data X D Calculating the influence coefficient of any type of physiological characteristic data I on another type of physiological characteristic data j by adopting a conditional probability method, and constructing a physiological characteristic association matrix I;
step 5, hiding the characteristic H through the full-connection network T And converting the physiological characteristic data into the discrimination probability C of the multi-class physiological characteristic data, and multiplying the discrimination probability C by the physiological characteristic incidence matrix I to obtain a final physiological characteristic data classification result.
Further, in one embodiment, as shown in fig. 2:
in step 1, the data is preprocessed to eliminate noise data, and the specific process comprises the following steps:
step 1-1, removing a data sample lacking any one of medical sensing data, personal physique record data and physiological characteristic condition data;
step 1-2, deleting data with the time sequence length less than 24 hours in the outlier data and the residual data;
and 1-3, performing normalization processing on the residual data, and processing the discrete attribute data into single heat value data in a mode of converting the classified discrete characteristics into One-Hot-Encoding single heat codes.
Further, in one embodiment, the populating of the medical sensing data into the regular sequence data at the same time interval in step 2, and adding a binary mask vector indicating whether the feature value corresponding to each time interval is a true measurement value, includes:
step 2-1, resampling the medical sensing data at intervals of 1 hour, and if a plurality of measurement values of the same characteristic exist in the same time interval, using the last measurement value;
step 2-2, for the data with missing values, if the missing values have measured values in the previous time, replacing the missing values with the previous latest measured values, otherwise, using the preset values;
step 2-3, adding binary mask vectors to mark real data and filling data, splicing the real data and the filling data with the processed medical sensing data to obtain a medical sensing data matrixT is the length of the time series, d m Is a medical sensing data feature dimension.
Further, in one embodiment, the hidden features of the medical sensing data matrix and the personal physique record data vector are fusion-learned based on the gated cyclic unit network in step 3, as shown in fig. 3, and the specific learning process is as follows:
step 3-1, initializing hidden stateFor zero vectors, at each time step T, (T =1, 2.. Eta., T), the vectors are pairedAs inputs to gated cyclic units, noteWherein, the first and the second end of the pipe are connected with each other,the medical sensing data representing the time t,data representing personal physical records, d r Recording data characteristic dimensions for individual constitution, d in =2d m +d r . Assuming the number of the hidden units is h, the hidden state of the last time step is givenThen the door is resetAnd a retrofit gateIs calculated as follows:
R t =σ(X t W xr +H t-1 W hr +b r )
Z t =σ(X t W xz +H t-1 W hz +b z )
wherein, W xr ,W hr ,W xz ,W hz ,b r ,b z For learnable parameters, σ is a sigmoid function, mapping variables to [0,1]In the meantime. Thus, the value range of each element in the reset gate and the update gate is [0,1 ]]。
Step 3-2, the gated loop unit computes candidate hidden states to assist in later hidden state computations. In this step, the reset gate determines whether to discard the hidden state at the previous time step. Candidate hidden states at time step tIs defined as follows:
wherein, W xh ,W hh ,b h For learnable parameters, the tanh activation function will candidate hidden statesIs mapped to [ -1,1]In (1). The reset gate is a vector composed of 0 to 1 for measuring the gating opening size, if the corresponding gating value of the reset gate is 0, the information of the element is completely forgotten, and if the reset gate is less than 0.1, the last hidden state is discarded. Thus, the reset gate can be discarded andfuture irrelevant historical information is predicted.
as can be seen from the formula, the update gate may control how the hidden state should be updated by the candidate hidden state containing the current time step information.
Step 3-4, continuously updating the time step of the gate control cycle unit T, and hiding the hidden state of the last time stepAnd outputting the data as a hidden feature after the sensing data and the recording data are fused.
Further, in one embodiment, the physiological characteristic condition data of step 4N is the number of data samples, K is the number of physiological characteristic data categories, the influence coefficient of any one type of physiological characteristic data j (j is more than or equal to 1 and less than or equal to K) on another type of physiological characteristic data K (K is more than or equal to 1 and less than or equal to K) is calculated by adopting a conditional probability method, and a physiological characteristic incidence matrix I is constructed, and the method specifically comprises the following steps:
step 4-1, counting the number S of positive samples of various physiological characteristic data k (K =1,2, \ 8230;, K), the calculation formula is:
wherein, y nk Is the binary value of whether the nth individual has the physiological characteristic data k, 1 represents that the nth individual has the physiological characteristic data k, and 0 is vice versa.
Step 4-2, calculating the positive values of various physiological characteristic dataSample rate P (S) k ) The calculation formula is as follows:
and 4-3, counting the number of people with any two types of physiological characteristic data. Order S jk Representing the number of individuals having both physiological characteristic data j and physiological characteristic data k, then:
wherein, y nj And y nk Whether the nth individual has the binary values of the physiological characteristic data j and the physiological characteristic data k, respectively, P (S) jk ) The common discrimination probability of the physiological characteristic data j and the physiological characteristic data k is represented.
And 4-4, calculating a physiological characteristic incidence matrix I. Because the category characteristics of different physiological characteristic data have larger difference, in order to avoid the problem that the numerical value of the physiological characteristic data of a specific category is smaller when calculating the correlation relationship, the influence I of the physiological characteristic data j on the physiological characteristic data k is calculated by adopting the conditional probability jk Namely:
I={I jk |1≤j,k≤K}
further, in one embodiment, step 5 hides feature H over a fully connected network T Converting the physiological characteristic data into discrimination probability C of multi-class physiological characteristic data, and multiplying the discrimination probability C by a physiological characteristic data association matrix I to obtain a final physiological characteristic data classification result, wherein the specific method comprises the following steps:
step 5-1, byFully connected network will hide feature H T The discrimination probability C of the class converted into the K-class physiological characteristic data is calculated by the following formula:
C=f(W c H T +b c )
wherein f is a fully-connected neural network, W c And b c To learn parameters, determine probabilities
Step 5-2, performing matrix multiplication operation on the discrimination probability C and the physiological characteristic incidence matrix I, fusing the physiological characteristic data dependency relationship into a model, and obtaining a final physiological characteristic data classification result by using a sigmoid activation function
As a specific example, in one of the embodiments, the invention is further described.
In this example, a large medical clinical database MIMIC-III (V1.4) in the united states was selected to perform the experiment to test the model of the present invention, wherein the database contains 60,000 pieces of hospitalization data of over 40,000 adult patients (16 years and over) who entered the intensive care unit between 2001 and 2012, and each record contains a plurality of medical sensing data, hospitalization record data and physiological characteristic status data. Statistically, only 881 patients (about 2.10%) had medical sensing data of an ultra-long time series (i.e., time series length > 480). With less impact on model accuracy, the experiment truncates the sample length to a reasonable limit (i.e., 480) to reduce the time and space overhead of model training.
In this embodiment, different currently popular physiological characteristic data classification methods are selected as comparison methods to perform comparison experiments: logistic Regression (LR), attention-based clinical time series analysis (SAnD), long-short term memory recurrent neural networks (LSTM), interpretable clinical health status characterization learning based on scale-adaptive feature extraction and recalibration (AdaCare), clinical time series analysis migratory learning based on deep neural networks (TimeNet-Eps). The method of the present invention is denoted MHSDF.
FIG. 4 is a graph showing comparison between algorithm errors in one embodiment, in which Micro AUC-ROC, macro AUC-ROC, and Weighted AUC-ROC are used as evaluation indexes, and physiological characteristic data classification is performed using different methods, respectively, the horizontal axis represents different physiological characteristic data classification methods, and the vertical axis represents evaluation index values (Micro AUC-ROC, macro AUC-ROC, weighted AUC-ROC). It can be seen that: the MHSDF model proposed by the invention is superior to other methods.
Among them, LR performance is the worst because it is only applicable to feature extraction by statistical methods, and the method of manually extracting features ignores the precedence relationship of time dimensions. The second worse is the SAnD algorithm, which models clinical time series data with a masked self-attention mechanism and learns the time characteristics of the data using position coding and a dense interpolation strategy, but this method has a limitation in extracting long-time characteristics. The LSTM, adaCare and TimeNet-Eps methods are algorithms based on a recurrent neural network, and have relatively good performance, but are still lower than the MFCFP method of the invention. This is because the LSTM and AdaCare methods do not take into account the influence of individual physical conditions on physiological characteristic data and the mutual influence between various physiological characteristic data, whereas the TimeNet-Eps method cannot distinguish between missing values and measured values in time series data when mapping variable-length time series to fixed-dimension feature vectors. This demonstrates that the method of the present invention is effective for physiological characteristic data classification from multi-source health perception data.
The method can be used for fusion learning of multi-source heterogeneous health perception data, and makes full use of potential correlation among physiological characteristic data. The method has better detection effect in the calculation based on the massive health perception data, and compared with other related algorithms, the method further verifies that the method can classify the physiological characteristic data more accurately.
In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and the computer program, when executed by the data processing unit, may execute the inventive content of the physiological characteristic data analysis method based on multi-source health awareness data fusion provided by the present invention and some or all of the steps in each embodiment. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
It is obvious to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a computer program, that is, a software product, which may be stored in a storage medium and include several instructions for enabling a device (which may be a personal computer, a server, a single chip microcomputer, an MUU, or a network device, etc.) including a data processing unit to execute the method according to each embodiment or some portions of the embodiments of the present invention.
The invention provides a thought and a method of a physiological characteristic data analysis method based on multi-source health perception data fusion, and a method and a way for realizing the technical scheme are many. All the components not specified in this embodiment can be implemented by the prior art.
Claims (10)
1. A physiological characteristic data analysis method based on multi-source health perception data fusion is characterized by comprising the following steps:
step 1, obtaining health perception numberAccordingly; the health awareness data includes: medical sensing data, personal physique record data and physiological characteristic status data; recording the time series length of data acquisition of the health perception data; preprocessing the health perception data, and eliminating noise data to obtain a total preprocessed sample, wherein the total number of the samples is N; obtaining individual constitution record data vector X by mathematically expressing and defining the individual constitution record data R ;
Step 2, filling the medical sensing data preprocessed in the step 1 into regular sequence data according to the same time interval, adding a binary mask vector indicating whether the characteristic value corresponding to each time interval is a real measurement value, splicing the binary mask vector with the processed medical sensing data to obtain a medical sensing data matrix X M ;
Step 3, fusing and learning medical sensing data matrix X based on the gated cyclic unit network M And personal physique record data vector X R Hidden feature H of T ;
Step 4, based on the physiological characteristic condition data X D Calculating the influence coefficient of any type of physiological characteristic data j on another type of physiological characteristic data k by adopting a conditional probability method, and constructing a physiological characteristic incidence matrix I;
step 5, hiding the characteristic H through the full-connection network T And converting the physiological characteristic data into the discrimination probability C of the multi-class physiological characteristic data, and calculating the discrimination probability C and the physiological characteristic incidence matrix I to obtain a final physiological characteristic data classification result.
2. The method for analyzing physiological feature data based on multi-source health perception data fusion according to claim 1, wherein the preprocessing of the data in step 1 is performed to eliminate noise data, and the specific process includes:
step 1-1, removing health perception data samples which lack any one of medical sensing data, personal physique record data and physiological characteristic condition data in the health perception data;
step 1-2, deleting data acquisition in the remaining health perception dataTime series length T of c Data for less than 24 hours;
and 1-3, performing normalization processing on the residual data, and processing the discrete attribute data into single-calorific-value data.
3. The method for analyzing physiological feature data based on multi-source health perception data fusion according to claim 2, wherein the medical sensing data matrix X is obtained in step 2 M The method comprises the following steps:
step 2-1, resampling the medical sensing data at 1 hour intervals, and if a plurality of measurement values with the same characteristic exist in the same time interval, using the last measurement value;
step 2-2, for the medical sensing data with missing values, if the missing values have measured values in the previous time, replacing the missing values with the previous latest measured values, otherwise, using the preset values;
step 2-3, adding a binary mask vector to mark real data and filling data, splicing the binary mask vector with the medical sensing data processed in the step 2-1 and the step 2-2 to obtain a medical sensing data matrix Wherein, T d Length of time series after data processing, d m For the feature dimensions of the medical sensing data,representing a medical sensing data matrix vector space.
4. The physiological feature data analysis method based on multi-source health perception data fusion according to claim 3, wherein the fusion learning method in step 3 comprises:
step 3-1, beginningInitializing hidden statesIs a zero vector, at each time step T, where T =1,2 M A time series of (a); couple vectorsAs input to the gated-cycle cell, is describedWherein, the first and the second end of the pipe are connected with each other,the medical sensing data representing the time t,data representing individual physical records, d r Recording data characteristic dimension and dynamic and static data fusion dimension d for personal physique in =2d m +d r The number of the hidden units is h, and the hidden state of the last time step is givenThen the door is resetAnd a retrofit gate Is calculated as follows:
R t =σ(X t W xr +H t-1 W hr +b r )
Z t =σ(X t W xz +H t-1 W hz +b z )
wherein, W xr ,W hr ,W xz ,W hz ,b r ,b z To learn the parameters, σ is a sigmoid function, mapping the variables to [0,1]In the middle of; the value range of each element in the reset gate and the update gate is [0,1 ]];
3-2, calculating candidate hidden states by a gating circulation unit and assisting subsequent hidden state calculation; candidate hidden states at time step tDefined in the form:
wherein, W xh ,W hh ,b h To learn the parameters, the tanh activation function assigns candidate hidden statesIs mapped to [ -1,1 [ ]]Performing the following steps;
in this step, the reset gate determines whether to discard the hidden state of the previous time step; if the reset gate reaches the threshold, the last hidden state will be discarded; the reset gate discards historical information that is not relevant to the predicted future;
updating the door control hidden state; updating the hidden state to a candidate hidden state containing the current time step information;
5. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 4, wherein the physiological characteristic condition data X in the step 4 D In (1),n is the number of samples of the physiological characteristic condition data, and K is the number of categories of the physiological characteristic data in the physiological characteristic condition data; in any type of physiological characteristic data j, j is more than or equal to 1 and less than or equal to K; in the other type of physiological characteristic data K, K is more than or equal to 1 and less than or equal to K.
6. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 5, wherein the method for constructing the physiological characteristic incidence matrix I in the step 4 comprises:
step 4-1, counting the number S of positive samples of various physiological characteristic data k Wherein K =1,2, \8230andk, the calculation formula is:
wherein, y nk Whether the nth individual has a binary value of the physiological characteristic data k, 1 represents that the data has the physiological characteristic data, and 0 represents that the data does not have the physiological characteristic data; n represents the total number of samples after data preprocessing;
step 4-2, calculating positive sample rate P (S) of various physiological characteristic data k ) The calculation formula is as follows:
4-3, counting the number of people with any two types of physiological characteristic data; order S jk Representing the number of individuals having both physiological characteristic data j and physiological characteristic data k, then:
wherein, y nj And y nk Whether the nth individual has the binary values of the physiological characteristic data j and the physiological characteristic data k, respectively, P (S) jk ) Representing the common discrimination probability of the physiological characteristic data j and the physiological characteristic data k;
4-4, calculating a physiological characteristic incidence matrix I; calculating the influence I of the physiological characteristic data j on the physiological characteristic data k by adopting the conditional probability jk Namely:
I={I jk |1≤j,k≤K}
wherein, P (S) j ) Represents the probability of j physiological characteristic data in the total sample after the pretreatment in the step 1, P (S) jk ) Representing the probability of j physiological characteristic data and k physiological characteristic data in the total sample after the pretreatment in the step 1.
7. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 6, wherein the method for obtaining the final physiological characteristic data classification result in step 5 comprises:
step 5-1, hiding the feature H through the full-connection network T Conversion to class K physiological characteristicsThe discrimination probability C of the data category is calculated by the following formula:
C=f(W c H T +b c )
wherein f is a fully connected neural network, W c And b c For learning parameters, probability is discriminated
Step 5-2, performing matrix multiplication operation on the discrimination probability C and the physiological characteristic incidence matrix I, fusing the physiological characteristic data dependency relationship into a model, and obtaining a final physiological characteristic data classification result by using a sigmoid activation function
8. The method for analyzing physiological feature data based on multi-source health perception data fusion according to claim 7, wherein the medical sensing data comprises: measured capillary filling rate, inspired oxygen concentration, glasgow coma scale open eyes, glasgow coma scale motor response, glasgow coma total score, glasgow coma scale verbal response, diastolic pressure, systolic pressure, mean blood pressure, blood glucose, heart rate, blood oxygen saturation, respiratory rate, body temperature, height, weight, and alkalinity acidity.
9. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 8, wherein the personal physique record data includes: sex, age and race.
10. The method for analyzing physiological characteristic data based on multi-source health perception data fusion according to claim 9, wherein the final physiological characteristic data classification result obtained in step 5-2 is the physiological characteristic data analysis based on multi-source health perception data fusion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211005027.0A CN115376638A (en) | 2022-08-22 | 2022-08-22 | Physiological characteristic data analysis method based on multi-source health perception data fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211005027.0A CN115376638A (en) | 2022-08-22 | 2022-08-22 | Physiological characteristic data analysis method based on multi-source health perception data fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115376638A true CN115376638A (en) | 2022-11-22 |
Family
ID=84067848
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211005027.0A Pending CN115376638A (en) | 2022-08-22 | 2022-08-22 | Physiological characteristic data analysis method based on multi-source health perception data fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115376638A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116911354A (en) * | 2023-09-14 | 2023-10-20 | 首都信息发展股份有限公司 | Encoder neural network model construction method and data processing method |
-
2022
- 2022-08-22 CN CN202211005027.0A patent/CN115376638A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116911354A (en) * | 2023-09-14 | 2023-10-20 | 首都信息发展股份有限公司 | Encoder neural network model construction method and data processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6522161B2 (en) | Medical data analysis method based on deep learning and intelligent analyzer thereof | |
CN109036553B (en) | Disease prediction method based on automatic extraction of medical expert knowledge | |
CN110020623B (en) | Human body activity recognition system and method based on conditional variation self-encoder | |
CN113040711B (en) | Cerebral apoplexy incidence risk prediction system, equipment and storage medium | |
CN106934235A (en) | Patient's similarity measurement migratory system between a kind of disease areas based on transfer learning | |
CN111180068A (en) | Chronic disease prediction system based on multi-task learning model | |
CN113096818B (en) | Method for evaluating occurrence probability of acute diseases based on ODE and GRUD | |
CN108399434B (en) | Analysis and prediction method of high-dimensional time series data based on feature extraction | |
CN110659677A (en) | Human body falling detection method based on movable sensor combination equipment | |
CN114512239B (en) | Cerebral apoplexy risk prediction method and system based on transfer learning | |
CN110767279A (en) | Electronic health record missing data completion method and system based on LSTM | |
Wang et al. | Diabetes Risk Analysis Based on Machine Learning LASSO Regression Model | |
CN110491506A (en) | Auricular fibrillation prediction model and its forecasting system | |
CN115376638A (en) | Physiological characteristic data analysis method based on multi-source health perception data fusion | |
CN114504298B (en) | Physiological characteristic discriminating method and system based on multisource health perception data fusion | |
CN114191665A (en) | Method and device for classifying man-machine asynchronous phenomena in mechanical ventilation process | |
CN112259232B (en) | VTE risk automatic evaluation system based on deep learning | |
Li et al. | Predicting Parkinson's Disease with Multimodal Irregularly Collected Longitudinal Smartphone Data | |
Siddiqa et al. | Robust Length of Stay Prediction Model for Indoor Patients. | |
CN115147768B (en) | Fall risk assessment method and system | |
CN115171896A (en) | System and method for predicting long-term death risk of critically ill patient | |
CN114613497A (en) | Intelligent medical auxiliary diagnosis method of patient sample based on GBDT sample level | |
CN111466877B (en) | LSTM network-based oxygen reduction state prediction method | |
CN111243697A (en) | Method and system for judging target object data based on neural network | |
CN115512833B (en) | Establishment of long-term cost effectiveness prediction system for lung cancer patient based on deep learning Markov framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |