CN107578822B

CN107578822B - Pretreatment and feature extraction method for medical multi-modal big data

Info

Publication number: CN107578822B
Application number: CN201710612240.0A
Authority: CN
Inventors: 鲁仁全; 张金涛; 吴元清
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2017-07-25
Filing date: 2017-07-25
Publication date: 2020-12-15
Anticipated expiration: 2037-07-25
Also published as: CN107578822A

Abstract

The invention provides an analysis method for analyzing multi-modal big data for a medical institution. Mainly for the analysis of multimodal big data of patients in hospital databases. The method can comprehensively consider the information data of a plurality of modes, effectively avoid the occurrence of the limited condition of a transmission network in the traditional data analysis process, and ensure the real-time feedback of the user information. The established multidimensional partial least square model is combined with a convolutional neural network method, so that the information loss can be reduced, a stable prediction model can be obtained, and a more detailed and accurate analysis report can be provided for a hospital.

Description

Pretreatment and feature extraction method for medical multi-modal big data

Technical Field

The invention relates to the field of medical big data, in particular to preprocessing and feature extraction of multi-modal big data of a hospital.

Background

With the development of society, medical technology is also continuously improved correspondingly. Domestic hospitals almost establish data warehouses belonging to the hospitals, continuously accumulate data of various disease information and databases with large historical records, and the content of the data warehouses reaches a considerable scale. This is an important information resource for each hospital institution. The method provides help for disease information for industrial personnel, and plays a very important role in observing the evolution law and development trend of diseases over the years. However, nowadays, each medical institution faces a dilemma on how to analyze big data of disease multi-modality, improve the utilization rate of disease information, accurately find out needed information and make a high-level decision.

Disclosure of Invention

In order to solve the problems of pretreatment and feature extraction of medical multi-modal big data, the invention provides a method for analyzing multi-modal big data in a hospital, and provides a multi-density quantizer designed, and predictive analysis is carried out by adopting the technologies of a genetic algorithm, a BP genetic algorithm and the like.

A preprocessing and feature extraction method for medical multi-modal big data is disclosed, as shown in FIG. 1, and comprises the following steps:

step 1, preprocessing multi-modal data of a hospital by using an S-G smoothing method. And selecting a section of data before and after the point to be processed. Consecutive odd points constitute a single window and are sorted, taking the median value as the smoothed value.

Step 2, acquiring the processed data, acquiring multi-modal big data of the medical institution by utilizing an information quantification method of the characteristics of the multi-modal data, and designing a multi-density quantizer by combining the load capacity of network transmission

And 3, extracting valuable information in the historical data of the patient by using a local regression method based on correlation coefficient analysis, constructing a data model by using a multi-dimensional partial least square algorithm, adopting a GA-BP (genetic algorithm-BP) modeling method and combining a convolutional neural network method

And 4, deducing a novel information extraction algorithm of the disease data to obtain a dynamic evolution rule of the disease of the patient, making a performance evaluation index on the disease, and providing a rolling optimization scheme for the patient.

Drawings

Fig. 1 is a schematic diagram of a preprocessing and feature extraction method for medical multi-modal big data.

Detailed Description

The S-G smoothing method is characterized in that a proper window is selected, then the data in each window are smoothed according to a polynomial fitting method, the calculated smoothing value replaces the corresponding window data, and then a data point is moved in the time increasing direction in sequence to form a new window until all data points are traversed;

the specific method is to select a matrix smoothing window in the three-dimensional fluorescence spectrum, so that the window contains (2p +1) × (2q +1) data points, and the data points of the window can be represented as:

(a_-p,b_-q,x(a_-p,b_-q,))…(a_-p,b₀,x(a_-p,b₀,)),…,(a_-p,b_q,x(a_-p,b_q,)) .....

(a₀,b_-q,x(a₀,b_-q,))…(a₀,b₀,x(a₀,b_0,)),…,(a₀,b_q,x(a₀,bq,)) .....

(a_p,b_-q,x(a_p,b_-q,))…(a_p,b₀,x(a_p,b_0,)),…,(a_p,b_q,x(a_p,b_q,))

wherein a is_m(m ═ p, …, p) for the mth emission spectrum wavelength, b_n(n ═ q, …, q) denotes the nth excitation spectral wavelength, x (a)_m,b_n) (m ═ p, …, p, n ═ q, …, q) is the data point (a)_m,b_n) The fluorescence intensity of (2).

Wherein the smooth value calculation formula of each point in the window is as follows:

the multi-density quantizer is characterized in that the set value of the quantizer can be dynamically adjusted according to the condition of a transmission network. As the actual transmission network condition is dynamic, the multi-density quantizer ensures the maximum efficiency quantization data, and the high-efficiency transmission of the multi-mode big data is achieved. By writing the quantized data to the output value plus a gaussian noise, i.e.:

and then, calculating the load degree at the corresponding moment, and designing a multi-density quantizer according to the window value of the historical big data statistical data change and the precision and load required by the data warehouse.

The multidimensional partial least square algorithm is used for constructing a data model, and is characterized in that multidimensional partial least squares are a multidimensional data model, load vectors directly related to all dimensions can be obtained in the process of establishing the regression model, and the dimensions of the model are independently explained to obtain the regression model, which can be expressed as:

wherein, X is a matrix generated after the multi-mode big data processing, F is a group score, T is a scoring matrix, the size is I rows and F columns, W^JAnd W^KThe weight matrixes in the J direction and the K direction are respectively J rows and F columns and K rows and F columns.

When performing predictive operations, the multimodal data matrix X is transformed into a multi-modal data matrix^w(I × J × K), and the calculation is performed to obtain a prediction result: mixing X^wDimension reduction to two-dimensional matrix X^w(I × JK) to solve the predictor variable Y_newValue of (A)

The GA-BP modeling method is characterized in that a genetic algorithm and a BP algorithm (GA-BP) are adopted to train an obtained regression model in turn, valuable data packets are selected according to relevant indexes of diseases and substituted into the genetic algorithm model for entry modeling until network convergence is seen.

The BP network learning process is to select a topological structure of a 3-layer BP network, an input layer neuron of the BP network learning process selects quantized multi-mode data, then normalized sample data is input in a networked mode, training is stopped in advance according to the simulation effect of a prediction sample when the root mean square error of a predicted value reaches a certain index, and a trained BP network model is directly output.

The convolutional neural network method is characterized in that output values transmitted back and forth, back propagation weights and bias are adopted, and adjacent frontal neural units in an internal neural network are partially connected, so that part of neurons on the upper layer are sensed through the neurons in the neural network, deep knowledge can be extracted from medical multi-mode big data, and deep knowledge for the multi-mode big data is built.

Firstly, a convolutional neural network is established in the first step, the function of the convolutional neural network is to discover the local characteristics of data, and then a convolutional neural network kernel is shared by using a map in the convolutional neural network. Each map is composed of a plurality of nerve units.

And then, the weight and the bias are adjusted by realizing full connection of the characteristic data and the output layer and utilizing a mode of a post-propagation neural network. The neural network can be solved by a gradient descent method. Because in practical applications the gradient descent method often gives satisfactory results.

The convolution neural network kernel is actually the meaning of the weight, and a weight matrix with a fixed size is used for matching on an image without independent calculation in the actual calculation process. The weight sharing strategy reduces parameters needing to be trained, so that the trained model has stronger bloom capability.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims

1. A preprocessing and feature extraction method for medical multi-modal big data is characterized by comprising the following steps:

step 1, preprocessing multi-modal data of a hospital by using an S-G smoothing method, selecting a section of data before and after a point to be processed, forming a single window by using continuous odd points, sequencing the single window, and taking a middle value as a smoothing value;

step 2, acquiring the processed data, acquiring multi-modal big data of the hospital by using an information quantification method of the characteristics of the multi-modal data, and designing a multi-density quantizer by combining the load capacity of network transmission;

step 3, a local regression method based on correlation coefficient analysis utilizes a multidimensional partial least square algorithm to construct a data model, adopts a GA-BP modeling method and combines a convolutional neural network method to extract valuable information in historical data of a patient;

deducing a novel information extraction algorithm of disease data to obtain a dynamic evolution rule of the disease of the patient, making a performance evaluation index on the disease, and providing a rolling optimization scheme for the patient;

the multi-density quantizer can dynamically adjust the set value of the quantizer according to the condition of a transmission network; because the actual transmission network condition is dynamic, the multi-density quantizer ensures the maximum efficiency quantization data, and the high-efficiency transmission of the multi-mode big data is achieved; by writing the quantized data to the output value plus a gaussian noise, i.e.:

then, the load degree of the corresponding moment is calculated, and a multi-density quantizer is designed according to the window value of the historical big data statistical data change and the precision and the load required by the data warehouse;

the multidimensional partial least square algorithm constructs a data model, the multidimensional partial least square is a multidimensional data model, in the process of establishing the regression model, load vectors directly related to all dimensions are obtained, independent explanation is carried out on all dimensions of the model, and the regression model is obtained and can be expressed as follows:

X＝T(W^K⊙W^J)^T+E

2. The preprocessing and feature extraction method for medical multimodal big data as claimed in claim 1, wherein: the S-G smoothing method comprises the steps of firstly selecting a proper window, then smoothing data in each window according to a polynomial fitting method, replacing corresponding window data with a calculated smoothing value, and then sequentially moving a data point in a time increasing direction to form a new window until all data points are traversed.

3. The preprocessing and feature extraction method for medical multimodal big data as claimed in claim 1, wherein: the GA-BP modeling method adopts a genetic algorithm and a BP algorithm to train the obtained regression model in turn, selects valuable data packets according to the disease-related index requirements, and substitutes the valuable data packets into the genetic algorithm model for entry modeling until the network converges.

4. The preprocessing and feature extraction method for medical multimodal big data as claimed in claim 1, wherein: the method of the convolutional neural network adopts output values transmitted back and forth, back propagation weight and bias, and the adjacent neural units in the internal neural network are partially connected, so that part of neurons on the upper layer are sensed by the neurons in the neural network.