CN107578822B - Pretreatment and feature extraction method for medical multi-modal big data - Google Patents

Pretreatment and feature extraction method for medical multi-modal big data Download PDF

Info

Publication number
CN107578822B
CN107578822B CN201710612240.0A CN201710612240A CN107578822B CN 107578822 B CN107578822 B CN 107578822B CN 201710612240 A CN201710612240 A CN 201710612240A CN 107578822 B CN107578822 B CN 107578822B
Authority
CN
China
Prior art keywords
data
big data
window
model
modal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710612240.0A
Other languages
Chinese (zh)
Other versions
CN107578822A (en
Inventor
鲁仁全
张金涛
吴元清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201710612240.0A priority Critical patent/CN107578822B/en
Publication of CN107578822A publication Critical patent/CN107578822A/en
Application granted granted Critical
Publication of CN107578822B publication Critical patent/CN107578822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention provides an analysis method for analyzing multi-modal big data for a medical institution. Mainly for the analysis of multimodal big data of patients in hospital databases. The method can comprehensively consider the information data of a plurality of modes, effectively avoid the occurrence of the limited condition of a transmission network in the traditional data analysis process, and ensure the real-time feedback of the user information. The established multidimensional partial least square model is combined with a convolutional neural network method, so that the information loss can be reduced, a stable prediction model can be obtained, and a more detailed and accurate analysis report can be provided for a hospital.

Description

Pretreatment and feature extraction method for medical multi-modal big data
Technical Field
The invention relates to the field of medical big data, in particular to preprocessing and feature extraction of multi-modal big data of a hospital.
Background
With the development of society, medical technology is also continuously improved correspondingly. Domestic hospitals almost establish data warehouses belonging to the hospitals, continuously accumulate data of various disease information and databases with large historical records, and the content of the data warehouses reaches a considerable scale. This is an important information resource for each hospital institution. The method provides help for disease information for industrial personnel, and plays a very important role in observing the evolution law and development trend of diseases over the years. However, nowadays, each medical institution faces a dilemma on how to analyze big data of disease multi-modality, improve the utilization rate of disease information, accurately find out needed information and make a high-level decision.
Disclosure of Invention
In order to solve the problems of pretreatment and feature extraction of medical multi-modal big data, the invention provides a method for analyzing multi-modal big data in a hospital, and provides a multi-density quantizer designed, and predictive analysis is carried out by adopting the technologies of a genetic algorithm, a BP genetic algorithm and the like.
A preprocessing and feature extraction method for medical multi-modal big data is disclosed, as shown in FIG. 1, and comprises the following steps:
step 1, preprocessing multi-modal data of a hospital by using an S-G smoothing method. And selecting a section of data before and after the point to be processed. Consecutive odd points constitute a single window and are sorted, taking the median value as the smoothed value.
Step 2, acquiring the processed data, acquiring multi-modal big data of the medical institution by utilizing an information quantification method of the characteristics of the multi-modal data, and designing a multi-density quantizer by combining the load capacity of network transmission
And 3, extracting valuable information in the historical data of the patient by using a local regression method based on correlation coefficient analysis, constructing a data model by using a multi-dimensional partial least square algorithm, adopting a GA-BP (genetic algorithm-BP) modeling method and combining a convolutional neural network method
And 4, deducing a novel information extraction algorithm of the disease data to obtain a dynamic evolution rule of the disease of the patient, making a performance evaluation index on the disease, and providing a rolling optimization scheme for the patient.
Drawings
Fig. 1 is a schematic diagram of a preprocessing and feature extraction method for medical multi-modal big data.
Detailed Description
The S-G smoothing method is characterized in that a proper window is selected, then the data in each window are smoothed according to a polynomial fitting method, the calculated smoothing value replaces the corresponding window data, and then a data point is moved in the time increasing direction in sequence to form a new window until all data points are traversed;
the specific method is to select a matrix smoothing window in the three-dimensional fluorescence spectrum, so that the window contains (2p +1) × (2q +1) data points, and the data points of the window can be represented as:
(a-p,b-q,x(a-p,b-q,))…(a-p,b0,x(a-p,b0,)),…,(a-p,bq,x(a-p,bq,)) .....
(a0,b-q,x(a0,b-q,))…(a0,b0,x(a0,b0,)),…,(a0,bq,x(a0,bq,)) .....
(ap,b-q,x(ap,b-q,))…(ap,b0,x(ap,b0,)),…,(ap,bq,x(ap,bq,))
wherein a ism(m ═ p, …, p) for the mth emission spectrum wavelength, bn(n ═ q, …, q) denotes the nth excitation spectral wavelength, x (a)m,bn) (m ═ p, …, p, n ═ q, …, q) is the data point (a)m,bn) The fluorescence intensity of (2).
Wherein the smooth value calculation formula of each point in the window is as follows:
Figure DEST_PATH_GDA0001464523390000021
the multi-density quantizer is characterized in that the set value of the quantizer can be dynamically adjusted according to the condition of a transmission network. As the actual transmission network condition is dynamic, the multi-density quantizer ensures the maximum efficiency quantization data, and the high-efficiency transmission of the multi-mode big data is achieved. By writing the quantized data to the output value plus a gaussian noise, i.e.:
Figure DEST_PATH_GDA0001464523390000022
and then, calculating the load degree at the corresponding moment, and designing a multi-density quantizer according to the window value of the historical big data statistical data change and the precision and load required by the data warehouse.
The multidimensional partial least square algorithm is used for constructing a data model, and is characterized in that multidimensional partial least squares are a multidimensional data model, load vectors directly related to all dimensions can be obtained in the process of establishing the regression model, and the dimensions of the model are independently explained to obtain the regression model, which can be expressed as:
Figure DEST_PATH_GDA0001464523390000031
wherein, X is a matrix generated after the multi-mode big data processing, F is a group score, T is a scoring matrix, the size is I rows and F columns, WJAnd WKThe weight matrixes in the J direction and the K direction are respectively J rows and F columns and K rows and F columns.
When performing predictive operations, the multimodal data matrix X is transformed into a multi-modal data matrixw(I × J × K), and the calculation is performed to obtain a prediction result: mixing XwDimension reduction to two-dimensional matrix Xw(I × JK) to solve the predictor variable YnewValue of (A)
The GA-BP modeling method is characterized in that a genetic algorithm and a BP algorithm (GA-BP) are adopted to train an obtained regression model in turn, valuable data packets are selected according to relevant indexes of diseases and substituted into the genetic algorithm model for entry modeling until network convergence is seen.
The BP network learning process is to select a topological structure of a 3-layer BP network, an input layer neuron of the BP network learning process selects quantized multi-mode data, then normalized sample data is input in a networked mode, training is stopped in advance according to the simulation effect of a prediction sample when the root mean square error of a predicted value reaches a certain index, and a trained BP network model is directly output.
The convolutional neural network method is characterized in that output values transmitted back and forth, back propagation weights and bias are adopted, and adjacent frontal neural units in an internal neural network are partially connected, so that part of neurons on the upper layer are sensed through the neurons in the neural network, deep knowledge can be extracted from medical multi-mode big data, and deep knowledge for the multi-mode big data is built.
Firstly, a convolutional neural network is established in the first step, the function of the convolutional neural network is to discover the local characteristics of data, and then a convolutional neural network kernel is shared by using a map in the convolutional neural network. Each map is composed of a plurality of nerve units.
And then, the weight and the bias are adjusted by realizing full connection of the characteristic data and the output layer and utilizing a mode of a post-propagation neural network. The neural network can be solved by a gradient descent method. Because in practical applications the gradient descent method often gives satisfactory results.
The convolution neural network kernel is actually the meaning of the weight, and a weight matrix with a fixed size is used for matching on an image without independent calculation in the actual calculation process. The weight sharing strategy reduces parameters needing to be trained, so that the trained model has stronger bloom capability.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims (4)

1. A preprocessing and feature extraction method for medical multi-modal big data is characterized by comprising the following steps:
step 1, preprocessing multi-modal data of a hospital by using an S-G smoothing method, selecting a section of data before and after a point to be processed, forming a single window by using continuous odd points, sequencing the single window, and taking a middle value as a smoothing value;
step 2, acquiring the processed data, acquiring multi-modal big data of the hospital by using an information quantification method of the characteristics of the multi-modal data, and designing a multi-density quantizer by combining the load capacity of network transmission;
step 3, a local regression method based on correlation coefficient analysis utilizes a multidimensional partial least square algorithm to construct a data model, adopts a GA-BP modeling method and combines a convolutional neural network method to extract valuable information in historical data of a patient;
deducing a novel information extraction algorithm of disease data to obtain a dynamic evolution rule of the disease of the patient, making a performance evaluation index on the disease, and providing a rolling optimization scheme for the patient;
the multi-density quantizer can dynamically adjust the set value of the quantizer according to the condition of a transmission network; because the actual transmission network condition is dynamic, the multi-density quantizer ensures the maximum efficiency quantization data, and the high-efficiency transmission of the multi-mode big data is achieved; by writing the quantized data to the output value plus a gaussian noise, i.e.:
Figure FDA0002581516470000011
then, the load degree of the corresponding moment is calculated, and a multi-density quantizer is designed according to the window value of the historical big data statistical data change and the precision and the load required by the data warehouse;
the multidimensional partial least square algorithm constructs a data model, the multidimensional partial least square is a multidimensional data model, in the process of establishing the regression model, load vectors directly related to all dimensions are obtained, independent explanation is carried out on all dimensions of the model, and the regression model is obtained and can be expressed as follows:
X=T(WK⊙WJ)T+E
wherein, X is a matrix generated after the multi-mode big data processing, F is a group score, T is a scoring matrix, the size is I rows and F columns, WJAnd WKThe weight matrixes in the J direction and the K direction are respectively J rows and F columns and K rows and F columns.
2. The preprocessing and feature extraction method for medical multimodal big data as claimed in claim 1, wherein: the S-G smoothing method comprises the steps of firstly selecting a proper window, then smoothing data in each window according to a polynomial fitting method, replacing corresponding window data with a calculated smoothing value, and then sequentially moving a data point in a time increasing direction to form a new window until all data points are traversed.
3. The preprocessing and feature extraction method for medical multimodal big data as claimed in claim 1, wherein: the GA-BP modeling method adopts a genetic algorithm and a BP algorithm to train the obtained regression model in turn, selects valuable data packets according to the disease-related index requirements, and substitutes the valuable data packets into the genetic algorithm model for entry modeling until the network converges.
4. The preprocessing and feature extraction method for medical multimodal big data as claimed in claim 1, wherein: the method of the convolutional neural network adopts output values transmitted back and forth, back propagation weight and bias, and the adjacent neural units in the internal neural network are partially connected, so that part of neurons on the upper layer are sensed by the neurons in the neural network.
CN201710612240.0A 2017-07-25 2017-07-25 Pretreatment and feature extraction method for medical multi-modal big data Active CN107578822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710612240.0A CN107578822B (en) 2017-07-25 2017-07-25 Pretreatment and feature extraction method for medical multi-modal big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710612240.0A CN107578822B (en) 2017-07-25 2017-07-25 Pretreatment and feature extraction method for medical multi-modal big data

Publications (2)

Publication Number Publication Date
CN107578822A CN107578822A (en) 2018-01-12
CN107578822B true CN107578822B (en) 2020-12-15

Family

ID=61034174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710612240.0A Active CN107578822B (en) 2017-07-25 2017-07-25 Pretreatment and feature extraction method for medical multi-modal big data

Country Status (1)

Country Link
CN (1) CN107578822B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241041B (en) * 2018-06-26 2021-05-11 广东工业大学 Preprocessing method and device for big data of injection molding equipment
CN109448855A (en) * 2018-09-17 2019-03-08 大连大学 A kind of diabetes glucose prediction technique based on CNN and Model Fusion
CN112001228A (en) * 2020-07-08 2020-11-27 上海品览数据科技有限公司 Video monitoring warehouse in-out counting system and method based on deep learning
CN112712895B (en) * 2021-02-04 2024-01-26 广州中医药大学第一附属医院 Data analysis method of multi-modal big data aiming at type 2 diabetes complications

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130117257A1 (en) * 2011-11-03 2013-05-09 Microsoft Corporation Query result estimation
CN105393252A (en) * 2013-04-18 2016-03-09 数字标记公司 Physiologic data acquisition and analysis
CN106339591A (en) * 2016-08-25 2017-01-18 汤平 Breast cancer prevention self-service health cloud service system based on deep convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130117257A1 (en) * 2011-11-03 2013-05-09 Microsoft Corporation Query result estimation
CN105393252A (en) * 2013-04-18 2016-03-09 数字标记公司 Physiologic data acquisition and analysis
CN106339591A (en) * 2016-08-25 2017-01-18 汤平 Breast cancer prevention self-service health cloud service system based on deep convolutional neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Multi-Channel Multi-Mode Physiological Signals Acquisition and Analysis Platform;Sheng-Cheng Lee等;《 2013 IEEE International Symposium on Circuits and Systems (ISCAS)》;20130523;第397-400页 *
MH-ARM: a Multi-mode and High-value Association Rule Mining Technique for Healthcare Data Analysis;Libao Yang等;《2016 International Conference on Computational Science and Computational Intelligence》;20160320;第122-127页 *

Also Published As

Publication number Publication date
CN107578822A (en) 2018-01-12

Similar Documents

Publication Publication Date Title
CN107578822B (en) Pretreatment and feature extraction method for medical multi-modal big data
CN110379506B (en) Arrhythmia detection method using binarization neural network for electrocardiogram data
EP4290412A3 (en) Computer-implemented method, computer program product and system for data analysis
CN111192270A (en) Point cloud semantic segmentation method based on point global context reasoning
CN106778014A (en) A kind of risk Forecasting Methodology based on Recognition with Recurrent Neural Network
Huang et al. An integrated computational intelligence approach to product concept generation and evaluation
CN108876044B (en) Online content popularity prediction method based on knowledge-enhanced neural network
CN106682385B (en) Health information interaction system
CN111008693B (en) Network model construction method, system and medium based on data compression
Huang et al. Tomato leaf disease detection system based on FC-SNDPN
CN107067182A (en) Towards the product design scheme appraisal procedure of multidimensional image
CN111681718A (en) Medicine relocation method based on deep learning multi-source heterogeneous network
CN113012811B (en) Traditional Chinese medicine syndrome diagnosis and health evaluation method combining deep convolutional network and graph neural network
Biswas et al. Hybrid expert system using case based reasoning and neural network for classification
Chen et al. Binarized neural architecture search for efficient object recognition
CN114817663A (en) Service modeling and recommendation method based on class perception graph neural network
CN106485069A (en) The method and system of rehabilitation information pushing
Saini et al. AI based automatic detection of citrus fruit and leaves diseases using deep neural network model
Peng et al. An industrial-grade solution for agricultural image classification tasks
Angiz et al. Ranking alternatives in a preferential voting system using fuzzy concepts and data envelopment analysis
CN107145934A (en) A kind of artificial bee colony optimization method based on enhancing local search ability
Sree et al. Optimized conversion of categorical and numerical features in machine learning models
CN111680846A (en) Simplified width learning system based on L1 and L2 norms
CN110348131A (en) A kind of FPGA implementation method of RBF plate shape identification model
Rao et al. Input-perturbation-sensitivity for performance analysis of cnns on image recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant