CN107578822B - Pretreatment and feature extraction method for medical multi-modal big data - Google Patents
Pretreatment and feature extraction method for medical multi-modal big data Download PDFInfo
- Publication number
- CN107578822B CN107578822B CN201710612240.0A CN201710612240A CN107578822B CN 107578822 B CN107578822 B CN 107578822B CN 201710612240 A CN201710612240 A CN 201710612240A CN 107578822 B CN107578822 B CN 107578822B
- Authority
- CN
- China
- Prior art keywords
- data
- big data
- window
- model
- modal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention provides an analysis method for analyzing multi-modal big data for a medical institution. Mainly for the analysis of multimodal big data of patients in hospital databases. The method can comprehensively consider the information data of a plurality of modes, effectively avoid the occurrence of the limited condition of a transmission network in the traditional data analysis process, and ensure the real-time feedback of the user information. The established multidimensional partial least square model is combined with a convolutional neural network method, so that the information loss can be reduced, a stable prediction model can be obtained, and a more detailed and accurate analysis report can be provided for a hospital.
Description
Technical Field
The invention relates to the field of medical big data, in particular to preprocessing and feature extraction of multi-modal big data of a hospital.
Background
With the development of society, medical technology is also continuously improved correspondingly. Domestic hospitals almost establish data warehouses belonging to the hospitals, continuously accumulate data of various disease information and databases with large historical records, and the content of the data warehouses reaches a considerable scale. This is an important information resource for each hospital institution. The method provides help for disease information for industrial personnel, and plays a very important role in observing the evolution law and development trend of diseases over the years. However, nowadays, each medical institution faces a dilemma on how to analyze big data of disease multi-modality, improve the utilization rate of disease information, accurately find out needed information and make a high-level decision.
Disclosure of Invention
In order to solve the problems of pretreatment and feature extraction of medical multi-modal big data, the invention provides a method for analyzing multi-modal big data in a hospital, and provides a multi-density quantizer designed, and predictive analysis is carried out by adopting the technologies of a genetic algorithm, a BP genetic algorithm and the like.
A preprocessing and feature extraction method for medical multi-modal big data is disclosed, as shown in FIG. 1, and comprises the following steps:
step 1, preprocessing multi-modal data of a hospital by using an S-G smoothing method. And selecting a section of data before and after the point to be processed. Consecutive odd points constitute a single window and are sorted, taking the median value as the smoothed value.
Step 2, acquiring the processed data, acquiring multi-modal big data of the medical institution by utilizing an information quantification method of the characteristics of the multi-modal data, and designing a multi-density quantizer by combining the load capacity of network transmission
And 3, extracting valuable information in the historical data of the patient by using a local regression method based on correlation coefficient analysis, constructing a data model by using a multi-dimensional partial least square algorithm, adopting a GA-BP (genetic algorithm-BP) modeling method and combining a convolutional neural network method
And 4, deducing a novel information extraction algorithm of the disease data to obtain a dynamic evolution rule of the disease of the patient, making a performance evaluation index on the disease, and providing a rolling optimization scheme for the patient.
Drawings
Fig. 1 is a schematic diagram of a preprocessing and feature extraction method for medical multi-modal big data.
Detailed Description
The S-G smoothing method is characterized in that a proper window is selected, then the data in each window are smoothed according to a polynomial fitting method, the calculated smoothing value replaces the corresponding window data, and then a data point is moved in the time increasing direction in sequence to form a new window until all data points are traversed;
the specific method is to select a matrix smoothing window in the three-dimensional fluorescence spectrum, so that the window contains (2p +1) × (2q +1) data points, and the data points of the window can be represented as:
(a-p,b-q,x(a-p,b-q,))…(a-p,b0,x(a-p,b0,)),…,(a-p,bq,x(a-p,bq,)) .....
(a0,b-q,x(a0,b-q,))…(a0,b0,x(a0,b0,)),…,(a0,bq,x(a0,bq,)) .....
(ap,b-q,x(ap,b-q,))…(ap,b0,x(ap,b0,)),…,(ap,bq,x(ap,bq,))
wherein a ism(m ═ p, …, p) for the mth emission spectrum wavelength, bn(n ═ q, …, q) denotes the nth excitation spectral wavelength, x (a)m,bn) (m ═ p, …, p, n ═ q, …, q) is the data point (a)m,bn) The fluorescence intensity of (2).
Wherein the smooth value calculation formula of each point in the window is as follows:
the multi-density quantizer is characterized in that the set value of the quantizer can be dynamically adjusted according to the condition of a transmission network. As the actual transmission network condition is dynamic, the multi-density quantizer ensures the maximum efficiency quantization data, and the high-efficiency transmission of the multi-mode big data is achieved. By writing the quantized data to the output value plus a gaussian noise, i.e.:
and then, calculating the load degree at the corresponding moment, and designing a multi-density quantizer according to the window value of the historical big data statistical data change and the precision and load required by the data warehouse.
The multidimensional partial least square algorithm is used for constructing a data model, and is characterized in that multidimensional partial least squares are a multidimensional data model, load vectors directly related to all dimensions can be obtained in the process of establishing the regression model, and the dimensions of the model are independently explained to obtain the regression model, which can be expressed as:
wherein, X is a matrix generated after the multi-mode big data processing, F is a group score, T is a scoring matrix, the size is I rows and F columns, WJAnd WKThe weight matrixes in the J direction and the K direction are respectively J rows and F columns and K rows and F columns.
When performing predictive operations, the multimodal data matrix X is transformed into a multi-modal data matrixw(I × J × K), and the calculation is performed to obtain a prediction result: mixing XwDimension reduction to two-dimensional matrix Xw(I × JK) to solve the predictor variable YnewValue of (A)
The GA-BP modeling method is characterized in that a genetic algorithm and a BP algorithm (GA-BP) are adopted to train an obtained regression model in turn, valuable data packets are selected according to relevant indexes of diseases and substituted into the genetic algorithm model for entry modeling until network convergence is seen.
The BP network learning process is to select a topological structure of a 3-layer BP network, an input layer neuron of the BP network learning process selects quantized multi-mode data, then normalized sample data is input in a networked mode, training is stopped in advance according to the simulation effect of a prediction sample when the root mean square error of a predicted value reaches a certain index, and a trained BP network model is directly output.
The convolutional neural network method is characterized in that output values transmitted back and forth, back propagation weights and bias are adopted, and adjacent frontal neural units in an internal neural network are partially connected, so that part of neurons on the upper layer are sensed through the neurons in the neural network, deep knowledge can be extracted from medical multi-mode big data, and deep knowledge for the multi-mode big data is built.
Firstly, a convolutional neural network is established in the first step, the function of the convolutional neural network is to discover the local characteristics of data, and then a convolutional neural network kernel is shared by using a map in the convolutional neural network. Each map is composed of a plurality of nerve units.
And then, the weight and the bias are adjusted by realizing full connection of the characteristic data and the output layer and utilizing a mode of a post-propagation neural network. The neural network can be solved by a gradient descent method. Because in practical applications the gradient descent method often gives satisfactory results.
The convolution neural network kernel is actually the meaning of the weight, and a weight matrix with a fixed size is used for matching on an image without independent calculation in the actual calculation process. The weight sharing strategy reduces parameters needing to be trained, so that the trained model has stronger bloom capability.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.
Claims (4)
1. A preprocessing and feature extraction method for medical multi-modal big data is characterized by comprising the following steps:
step 1, preprocessing multi-modal data of a hospital by using an S-G smoothing method, selecting a section of data before and after a point to be processed, forming a single window by using continuous odd points, sequencing the single window, and taking a middle value as a smoothing value;
step 2, acquiring the processed data, acquiring multi-modal big data of the hospital by using an information quantification method of the characteristics of the multi-modal data, and designing a multi-density quantizer by combining the load capacity of network transmission;
step 3, a local regression method based on correlation coefficient analysis utilizes a multidimensional partial least square algorithm to construct a data model, adopts a GA-BP modeling method and combines a convolutional neural network method to extract valuable information in historical data of a patient;
deducing a novel information extraction algorithm of disease data to obtain a dynamic evolution rule of the disease of the patient, making a performance evaluation index on the disease, and providing a rolling optimization scheme for the patient;
the multi-density quantizer can dynamically adjust the set value of the quantizer according to the condition of a transmission network; because the actual transmission network condition is dynamic, the multi-density quantizer ensures the maximum efficiency quantization data, and the high-efficiency transmission of the multi-mode big data is achieved; by writing the quantized data to the output value plus a gaussian noise, i.e.:
then, the load degree of the corresponding moment is calculated, and a multi-density quantizer is designed according to the window value of the historical big data statistical data change and the precision and the load required by the data warehouse;
the multidimensional partial least square algorithm constructs a data model, the multidimensional partial least square is a multidimensional data model, in the process of establishing the regression model, load vectors directly related to all dimensions are obtained, independent explanation is carried out on all dimensions of the model, and the regression model is obtained and can be expressed as follows:
X=T(WK⊙WJ)T+E
wherein, X is a matrix generated after the multi-mode big data processing, F is a group score, T is a scoring matrix, the size is I rows and F columns, WJAnd WKThe weight matrixes in the J direction and the K direction are respectively J rows and F columns and K rows and F columns.
2. The preprocessing and feature extraction method for medical multimodal big data as claimed in claim 1, wherein: the S-G smoothing method comprises the steps of firstly selecting a proper window, then smoothing data in each window according to a polynomial fitting method, replacing corresponding window data with a calculated smoothing value, and then sequentially moving a data point in a time increasing direction to form a new window until all data points are traversed.
3. The preprocessing and feature extraction method for medical multimodal big data as claimed in claim 1, wherein: the GA-BP modeling method adopts a genetic algorithm and a BP algorithm to train the obtained regression model in turn, selects valuable data packets according to the disease-related index requirements, and substitutes the valuable data packets into the genetic algorithm model for entry modeling until the network converges.
4. The preprocessing and feature extraction method for medical multimodal big data as claimed in claim 1, wherein: the method of the convolutional neural network adopts output values transmitted back and forth, back propagation weight and bias, and the adjacent neural units in the internal neural network are partially connected, so that part of neurons on the upper layer are sensed by the neurons in the neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710612240.0A CN107578822B (en) | 2017-07-25 | 2017-07-25 | Pretreatment and feature extraction method for medical multi-modal big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710612240.0A CN107578822B (en) | 2017-07-25 | 2017-07-25 | Pretreatment and feature extraction method for medical multi-modal big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107578822A CN107578822A (en) | 2018-01-12 |
CN107578822B true CN107578822B (en) | 2020-12-15 |
Family
ID=61034174
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710612240.0A Active CN107578822B (en) | 2017-07-25 | 2017-07-25 | Pretreatment and feature extraction method for medical multi-modal big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107578822B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241041B (en) * | 2018-06-26 | 2021-05-11 | 广东工业大学 | Preprocessing method and device for big data of injection molding equipment |
CN109448855A (en) * | 2018-09-17 | 2019-03-08 | 大连大学 | A kind of diabetes glucose prediction technique based on CNN and Model Fusion |
CN112001228A (en) * | 2020-07-08 | 2020-11-27 | 上海品览数据科技有限公司 | Video monitoring warehouse in-out counting system and method based on deep learning |
CN112712895B (en) * | 2021-02-04 | 2024-01-26 | 广州中医药大学第一附属医院 | Data analysis method of multi-modal big data aiming at type 2 diabetes complications |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130117257A1 (en) * | 2011-11-03 | 2013-05-09 | Microsoft Corporation | Query result estimation |
CN105393252A (en) * | 2013-04-18 | 2016-03-09 | 数字标记公司 | Physiologic data acquisition and analysis |
CN106339591A (en) * | 2016-08-25 | 2017-01-18 | 汤平 | Breast cancer prevention self-service health cloud service system based on deep convolutional neural network |
-
2017
- 2017-07-25 CN CN201710612240.0A patent/CN107578822B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130117257A1 (en) * | 2011-11-03 | 2013-05-09 | Microsoft Corporation | Query result estimation |
CN105393252A (en) * | 2013-04-18 | 2016-03-09 | 数字标记公司 | Physiologic data acquisition and analysis |
CN106339591A (en) * | 2016-08-25 | 2017-01-18 | 汤平 | Breast cancer prevention self-service health cloud service system based on deep convolutional neural network |
Non-Patent Citations (2)
Title |
---|
A Multi-Channel Multi-Mode Physiological Signals Acquisition and Analysis Platform;Sheng-Cheng Lee等;《 2013 IEEE International Symposium on Circuits and Systems (ISCAS)》;20130523;第397-400页 * |
MH-ARM: a Multi-mode and High-value Association Rule Mining Technique for Healthcare Data Analysis;Libao Yang等;《2016 International Conference on Computational Science and Computational Intelligence》;20160320;第122-127页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107578822A (en) | 2018-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107578822B (en) | Pretreatment and feature extraction method for medical multi-modal big data | |
CN110379506B (en) | Arrhythmia detection method using binarization neural network for electrocardiogram data | |
EP4290412A3 (en) | Computer-implemented method, computer program product and system for data analysis | |
CN111192270A (en) | Point cloud semantic segmentation method based on point global context reasoning | |
CN106778014A (en) | A kind of risk Forecasting Methodology based on Recognition with Recurrent Neural Network | |
Huang et al. | An integrated computational intelligence approach to product concept generation and evaluation | |
CN108876044B (en) | Online content popularity prediction method based on knowledge-enhanced neural network | |
CN106682385B (en) | Health information interaction system | |
CN111008693B (en) | Network model construction method, system and medium based on data compression | |
Huang et al. | Tomato leaf disease detection system based on FC-SNDPN | |
CN107067182A (en) | Towards the product design scheme appraisal procedure of multidimensional image | |
CN111681718A (en) | Medicine relocation method based on deep learning multi-source heterogeneous network | |
CN113012811B (en) | Traditional Chinese medicine syndrome diagnosis and health evaluation method combining deep convolutional network and graph neural network | |
Biswas et al. | Hybrid expert system using case based reasoning and neural network for classification | |
Chen et al. | Binarized neural architecture search for efficient object recognition | |
CN114817663A (en) | Service modeling and recommendation method based on class perception graph neural network | |
CN106485069A (en) | The method and system of rehabilitation information pushing | |
Saini et al. | AI based automatic detection of citrus fruit and leaves diseases using deep neural network model | |
Peng et al. | An industrial-grade solution for agricultural image classification tasks | |
Angiz et al. | Ranking alternatives in a preferential voting system using fuzzy concepts and data envelopment analysis | |
CN107145934A (en) | A kind of artificial bee colony optimization method based on enhancing local search ability | |
Sree et al. | Optimized conversion of categorical and numerical features in machine learning models | |
CN111680846A (en) | Simplified width learning system based on L1 and L2 norms | |
CN110348131A (en) | A kind of FPGA implementation method of RBF plate shape identification model | |
Rao et al. | Input-perturbation-sensitivity for performance analysis of cnns on image recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |