CN112288027A

CN112288027A - Heterogeneous multi-modal image genetics data feature analysis method

Info

Publication number: CN112288027A
Application number: CN202011223328.1A
Authority: CN
Inventors: 郝小可; 王如雪; 师硕; 阎刚; 肖云佳; 李想; 谭麒豪; 安琦瑾
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2021-01-29
Anticipated expiration: 2040-11-05
Also published as: CN112288027B

Abstract

The method for analyzing the characteristics of the heterogeneous multi-modal image genetics data considers the structural relationship among sample data and the 'difficulty degree' of the sample in the training process, and performs characteristic analysis on the brain image data and the gene data by sample weighting and structure sparseness. The method adopts a self-walking learning mechanism, realizes the automatic increase of the sample from simple to complex in the training process, and reduces the influence of noise on the model. In addition, a local retention projection method is introduced under a self-walking learning framework, a neighborhood structure fixed in a sample point under a sample space is effectively retained, and meanwhile, an L1 norm constraint projection matrix is used as a regularization item to realize a feature selection process. And finally, performing fusion classification on the selected features by using a multi-core support vector machine, thereby improving the diagnosis precision of diseases. The method disclosed by the invention can effectively select and classify the features.

Description

Heterogeneous multi-modal image genetics data feature analysis method

Technical Field

The technical scheme of the invention relates to a method for recognizing graphs, in particular to a heterogeneous multi-modal image genetics data feature analysis method.

Background

Alzheimer's disease, also known as senile dementia, is a common degenerative disease of the brain, with manifestation symptoms such as memory impairment, reasoning cognitive dysfunction, language and motor impairment, which are one of the important diseases endangering the health of the elderly at present, and the course of the disease is slow and irreversible. Depending on the development of cognitive models and the extent of functional impairment, the onset of alzheimer's disease can be divided into three stages: normal control, mild cognitive dysfunction and alzheimer's disease. According to the pathogenesis of alzheimer's disease, early detection and effective treatment can delay the progression of the disease. Numerous studies have shown that alzheimer's disease is associated with atrophy of the structure, alterations in metabolism, pathological amyloid deposition of the brain. Commonly used related brain imaging includes structural magnetic resonance imaging, functional magnetic resonance imaging, diffusion tensor imaging, and positron emission tomography imaging. Meanwhile, with the development of genetic techniques, researchers can search for genetic markers associated with neurological and psychiatric diseases from a more refined molecular level (e.g., single nucleotide polymorphisms).

In recent years, with the continuous innovation of technological capabilities, more and more researches are being focused on the early diagnosis of alzheimer's disease, and since the brain has a very complex structure and function, the modality of acquiring data from a single brain cannot provide enough characteristic information to diagnose. In image genetics, the necessary complementary information can be provided between the different modalities, for example, structural magnetic resonance imaging provides information about brain tissue type, while positron emission tomography imaging measures glucose brain metabolism rate. Fusing multimodal data enables discovery of information that cannot be found in a single modality. In recent years, with the development of neuroimaging technology and genetics technology, multi-modal data can be collected in the acquisition process for various subject examinations, and a data source is provided for the diagnosis of alzheimer disease.

Heterogeneous multimodal imaging genetics data are high in dimensionality and contain a large amount of information, and not all features are helpful for detecting and analyzing alzheimer's disease. Therefore, it is important to remove redundant or low-relevance features from the large number of features provided by brain images and genetic data to select features relevant to the classification prediction task. CN109770932A discloses a method for processing multi-modal brain neuroimaging features, which performs feature analysis on multi-modal data by using sample weight and low-rank constraint multi-modal feature selection method. The method does not consider the 'difficulty degree' of the data, generalizes simple and general knowledge and complex specialized knowledge, randomly adds all data (including noise points or outliers) into training in the training process, and cannot effectively eliminate the influence of noise samples on the model. CN111462116A discloses a multimodal parameter model optimization fusion method based on imagery omics characteristics, which obtains low-dimensional imagery omics characteristics by gradient dimensionality reduction on high-dimensional imagery omics characteristics, and ignores data internal structure information in the dimensionality reduction process.

In summary, in the existing alzheimer diagnosis classification technology, the existing feature selection methods have the defects that the relationship between samples cannot be considered better, the classification of the alzheimer diagnosis is easy to be wrong, and the accuracy needs to be further improved.

Disclosure of Invention

The technical task of the invention is to provide a heterogeneous multi-modal image genetics data feature analysis method aiming at the defects, simultaneously consider the structural relationship among sample data and the 'difficulty degree' of the sample in the training process, carry out feature analysis on brain image data and gene data by adopting sample weighting and structure sparsification, take a feature weight matrix as a projection matrix in the dimension reduction process, and simultaneously constrain the feature weight matrix and the projection matrix by adopting sparsification. The method adopts a self-walking learning mechanism, realizes the automatic increase of the sample from simple to complex in the training process, and reduces the influence of noise on the model. In addition, a local retention projection method is introduced under a self-walking learning framework, a neighborhood structure fixed in a sample point under a sample space is effectively retained, and meanwhile, an L1 norm constraint projection matrix is used as a regularization item to realize a feature selection process. And finally, performing fusion classification on the selected features by using a multi-core support vector machine, thereby improving the diagnosis precision of diseases. The method disclosed by the invention can effectively select and classify the features.

Since the english language of "Self-learning" is "Self-processed learning", i.e. SPL, and the english language of "local Preserving projection" is "localization preceding projects", i.e. LPP, and the english language of "Structured sparse" is "Structured space", i.e. SS, the method for selecting genetic characteristics of heterogeneous multi-modal imagery according to the present invention may be hereinafter referred to as "SPLPS" for short.

The technical scheme adopted by the invention for solving the technical problem is as follows:

a characteristic analysis method for heterogeneous multi-modal image genetics data comprises the following steps:

acquiring data after heterogeneous multi-modal preprocessing of a certain type of brain disease sample, wherein the data comprises gene data and image data of different modalities, and acquiring data of each sample in each modality;

performing multi-modal combined feature selection on the data after the heterogeneous multi-modal preprocessing, wherein a feature selection target function is a formula (1):

in the formula (1), n is the number of samples, M is the number of modes,

representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth mode

d represents the dimension of the feature, yi represents the class label corresponding to the ith sample, and Y ═ Y¹,…,yi,…,yn]^TE to Rn represents label vectors corresponding to n samples, wm is a weight vector of the mth mode, and vm e to Rn is a weight vector of the mth mode self-stepping sample; lambda is a regularization parameter with sparse constraint characteristics, and mu is a regularization parameter associated with a constraint sample in a multi-mode manner;

wherein

k^′As an auxiliary parameter, k^′>k>0, vi is the self-step sample weight vector of the ith sample; km is a weight matrix for describing the adjacent relation of sample points, and each element in the weight matrix

The neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection mode

If not, it indicates that there is k-neighborhood between the ith sample and the jth sample, otherwise, it indicates that there is no k-neighborhood between the ith sample and the jth sample,

alternately calculating variables wm and vm, and carrying out optimization solution on the objective function;

and selecting the characteristics corresponding to the weight vector wm with non-zero weight from the obtained solution, further determining the position of the diseased brain area and the related diseased gene, and completing the characteristic analysis of the heterogeneous multi-modal image genetics data.

The heterogeneous multi-modal image genetics data feature analysis method is characterized in that a heterogeneous multi-modal image genetics feature selection method of SPLPS is used for mining biomarkers, and then a multi-core support vector machine is used for fusion classification, and the method specifically comprises the following steps:

firstly, preprocessing heterogeneous multi-modal image genetics data:

step 1.1, preprocessing neuroimaging data:

comparing preprocessed isomorphic multimodality imaging data (voxel-based morphometry processed magnetic resonance image, fluorodeoxyglucose-positron emission tomography image, F-18 fluorescence amyloid-positron emission tomography (F-18 fluorescence amyloid-positron emission tomography can effectively display the neuroinflammatory plaque content in vivo.) with the same visit scan, and then serving as 2 x 2mm in the space of a standard Montreal institute of neurology (MNI)) to obtain the data³Voxels, which create normalized gray matter density, map according to magnetic resonance image data, register fluorodeoxyglucose-positron emission tomography and F-18 fluorescence amyloid-positron emission tomography to the same space through a statistical parameter mapping SPM software package, then measure 116 regions of interest, further extract fluorodeoxyglucose-positron emission tomography glucose metabolic rate, grey scale density of magnetic resonance images processed based on morphometry of voxels, and amyloid deposition characteristics of F-18 fluorescence positron emission tomography amyloid imaging, after removal of cerebellum, use the imaging measurements of 90 regions of interest for each homogeneous multimodal imaging as characteristics;

step 1.2, gene data preprocessing:

for gene data (single nucleotide polymorphism) from ADNI database which is preprocessed, APOE (located on chromosome 19) is used as a risk gene and is related to the development of neurons, plasticity of brain and repair, the ANNOVR annotation information is used for researching the single nucleotide polymorphism of the APOE gene boundary +/-20 kbp, wherein 85 single nucleotide polymorphism gene loci are included, and the value of the single nucleotide polymorphism adopts an additive coding mode of the number 0,1 and 2 of minimum alleles;

thus finishing the preprocessing of the heterogeneous multi-modal image genetics data;

secondly, performing feature analysis by using an SPLPS heterogeneous multi-modal feature selection method:

taking the data of each mode of each sample obtained in the first step as input, and performing multi-mode combined feature selection; the feature selection target formula is:

in the formula (1), n is the number of samples, M is the number of modes,

d represents the dimension of the feature, yⁱIndicates a class label corresponding to the ith sample, Y ═ Y¹,…,yⁱ,…,yⁿ]^T∈RⁿRepresenting label vectors corresponding to n samples, w_mWeight vector for the m-th mode, v_m∈RⁿFor the self-paced sample weight vector, each element in the matrix

Representing the adjacent relation between the m-th modal samples, and effectively preserving samples by adopting a local preserving projection methodNeighborhood structure of sample points in this space

If not, it represents that there is k-neighborhood between the ith sample and the jth sample, otherwise, it represents that there is no k-neighborhood between the ith sample and the jth sample, and it is described by the following formula:

in the formula (2), the parameter σ can be 1, K without loss of generality_mTo characterize the weight matrix of the sample point neighborhood,

lambda is a regularization parameter for constraint characteristic sparseness, mu is a regularization parameter associated with a constraint sample multi-mode, and therefore feature analysis is completed by using the heterogeneous multi-mode feature selection method of the SPLPS;

thirdly, optimizing an objective function and solving w_mAnd v_m: the objective function of the formula (1) in the second step is optimized, and can be solved by adopting alternative structure variables,

step 3.1, fix v_mOptimizing w_m: the objective function at this time is:

the first term of equation (3) is transformed as follows:

the first term of equation (3) can be converted into:

for the third term of formula (3), can be provided

Then equation (3) can be converted to

In the formula (5), the first and second groups,

is the hypergraph laplacian matrix for the mth mode,

at this point the target formula turns into:

defining a matrix P_m，

Is a matrix P_mDiagonal elements of (c):

is w_mI.e. the weight vector of the m-th mode of the i-th sample, can be obtained

"2" and "λ" are both coefficients, and by combining the two together, and incorporating the coefficient "2" into the coefficient "λ", the objective function is transformed into:

by taking the derivative of equation (9) and making the derivative 0, it is obtained

Step 3.2 fix w_mOptimization of v_m：

The objective function at this time is:

in the formula (11), the reaction mixture,

wherein

k 'is an auxiliary parameter, k'>k>0，

Formula (11) to vⁱIs derived by

In the formula (12), l_imRepresents the loss function:

where i represents the ith sample and m represents the mth mode, v is obtained by the above equationⁱIs solved as

Thereby completing an alternate calculation of the variable w_mAnd v_mSolving;

fourthly, feature selection:

solving the objective function to select the non-zero characteristic of the corresponding weight;

fifthly, fusing a multi-core support vector machine:

step 5.1, respectively calculating a kernel matrix of each mode, wherein the linear kernel function of the mth mode is

Step 5.2 at [0,1 ]]Searching the fusion coefficient of each mode by using grids in the range, and finding out the fusion coefficient rho with the best classification effect by adopting a ten-fold cross validation method_m；

Step 5.3, after the multi-modal kernel function is fused, obtaining

Therefore, the dual form of the multi-core support vector machine can be obtained;

α_i≧ 0, i ═ 1,2, …, n (14), in formula (14), α_iThe lagrange multiplier of the ith sample is used for completing the fusion training of the multi-core support vector machine;

sixthly, classifying and predicting:

the parameter alpha obtained by the training of the fifth step_iSubstituted into equation (15) below for a given new test sample x⁰The decision function for determining the sample label is defined as shown in equation (15),

in equation (15), sign () is a sign function, b is an offset, and f (x)⁰) Is the new test sample x⁰The predicted result of (2);

thus, feature selection is carried out by using the heterogeneous multi-modal image genetics data feature analysis method of the SPLPS, and the heterogeneous multi-modal image genetics features are classified by using a multi-core support vector machine method.

Compared with the prior art, the invention adopts the technical scheme that the prominent substantive features and remarkable progress of the invention are as follows:

(1) the method provided by the invention can simultaneously consider the structural relationship among sample data and the 'difficulty degree' of a sample in a training process, namely, a heterogeneous multi-modal image genetics feature selection method based on sample weighting and low-rank constraint is adopted to perform feature selection on multi-modal data, firstly, an L1 norm is utilized to constrain features, meanwhile, a local preserving projection method is adopted, a feature weight matrix is used as a projection matrix in a local preserving projection dimension reduction process, the neighborhood structure of a sample point in a sample space is effectively preserved, then, a self-learning mechanism is adopted, and the 'difficulty degree' of the sample is considered in the training process, so that the automatic growth of the sample is realized. The SPLPS feature selection method based on sample weight and low-rank constraint can simultaneously consider the difference (difficulty degree) of a sample point neighborhood structure and a sample in the feature selection process, judge whether to add a next iteration process by considering the sample difficulty degree (confidence degree), firstly select a simple sample with high confidence degree in the iteration process, then gradually add a difficult sample, avoid the influence of a noise point or an outlier on a model through a special training mode and an L1 regularization term, select a feature with strong discriminability, and achieve a better classification prediction effect.

(2) Compared with other feature selection methods, the invention adopts the SPLPS method, describes the high-order relationship among the samples by constructing the adjacent relationship of the sample points k, fully utilizes the prior distribution knowledge among the samples, fully utilizes the internal information of each modal data, retains the original neighborhood relationship among the samples, is beneficial to selecting the features with more discriminability and improves the accuracy of classification prediction.

(3) The method of the invention considers the 'difficulty degree' of the sample data in the training process, adopts the self-learning strategy to realize the selection process of the sample from 'simple' to 'complex', and can realize the automatic growth of the sample.

(4) The method not only reduces the influence of noise points or outlier points on the model by adopting a regularization term, but also eliminates some noise samples by adding the confidence coefficient of the samples, thereby improving the robustness of the model.

(5) CN109770932A discloses a method for processing multi-modal brain neuroimaging features, which performs feature analysis on multi-modal data by using sample weight and low-rank constraint multi-modal feature selection method. The method does not consider the 'difficulty degree' of the data, generalizes simple and general knowledge and complex specialized knowledge, randomly adds all data (including noise points or outliers) into training in the training process, and cannot effectively eliminate the influence of noise samples on the model. Compared with CN109770932A, the method judges whether the sample is added into the next iteration process by considering the sample confidence coefficient, firstly selects the 'simple' sample with high confidence coefficient in the iteration process, then gradually adds the 'difficult' sample, avoids the influence of noise points or outliers on the model by the training mode of sample self-growth and the regularization term, and ensures that the whole algorithm has more robustness.

(6) CN111462116A discloses a multimodal parameter model optimization fusion method based on imagery omics characteristics, which obtains low-dimensional imagery omics characteristics by gradient dimensionality reduction on high-dimensional imagery omics characteristics, and ignores data internal structure information in the dimensionality reduction process. Compared with CN111462116A, the method of the invention constructs the distance and near affinity relationship between each sample pair in the space by the local preserving projection method, and keeps the relationship in the projection, thus preserving the local neighborhood relationship of the samples in the space while reducing the dimension, and providing more abundant information.

Drawings

The invention is further illustrated with reference to the following figures and examples.

FIG. 1 is a schematic diagram of the processing flow of the multimodal imaging genetics data by the method of the present invention based on the SPLPS method and the multi-nuclear support vector machine.

Detailed Description

The embodiment shown in fig. 1 shows that the processing flow of the heterogeneous multi-modal image genetics data feature analysis based on the SPLPS feature selection method and the multi-core support vector machine in the method of the present invention is as follows: preprocessing heterogeneous multi-modal image genetic data → performing feature analysis by using the heterogeneous multi-modal feature selection method of SPLPS → optimizing an objective function and solving w_mAnd v_m→ feature selection → multi-core support vector machine fusion → classification and prediction.

Examples

The method for analyzing the characteristics of the heterogeneous multi-modal image genetics data in the embodiment is to excavate the biomarker by using a heterogeneous multi-modal image genetics characteristic selection method of the SPLPS, and then perform fusion classification by using a multi-core support vector machine, and comprises the following specific steps:

firstly, preprocessing heterogeneous multi-modal image genetics data:

step 1.1, preprocessing neuroimaging data:

comparing preprocessed isomorphic multimodality imaging data (voxel-based morphometry processed magnetic resonance image, fluorodeoxyglucose-positron emission tomography image, F-18 fluorescence amyloid-positron emission tomography (F-18 fluorescence amyloid-positron emission tomography can effectively display the neuroinflammatory plaque content in vivo.) with the same visit scan, and then serving as 2 x 2mm in the space of a standard Montreal institute of neurology (MNI)) to obtain the data³Voxel, we have created normalized gray matter density, map according to the magnetic resonance image data, and register fluorodeoxyglucose-positron emission tomography and F-18 fluorescence amyloid-positron emission tomography to the same space through the statistical parameter mapping SPM software package, then measure 116 areas of interest, further extract fluorodeoxyglucose-positron emission tomography glucose metabolic rate, the gray-scale density of the magnetic resonance image processed based on the morphometry of voxel and the amyloid deposition characteristics of F-18 fluorescence positron emission tomography amyloid imaging, after removing the cerebellum, use the imaging measured value of 90 areas of interest of each isomorphic multimodality imaging as the characteristics;

step 1.2, gene data preprocessing:

in the formula (1), n is the number of samples, M is the number of modes,

d represents the dimension of the feature, yⁱIndicates a class label corresponding to the ith sample, Y ═ Y¹,…,yⁱ,…,yⁿ]^T∈RⁿRepresenting label vectors corresponding to n samples, w_nWeight vector for the m-th mode, v_m∈RⁿFor the self-paced sample weight vector, each element in the matrix

The neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection method

If not, it indicates that there is k-neighborhood relationship between the ith sample and the jth sample, otherwise, it indicates that there is no k-neighborhood relationship between the ith sample and the jth sample (k-neighborhood describes the structural relationship between sample points in the feature space, where k is a constant, and means to find out the k sample points closest to the sample points in euclidean distance), and it is described by the following formula:

lambda is a regularization parameter for constraint characteristic sparseness, mu is a regularization parameter for constraint sample multi-modal association, thereby completing the characteristic analysis by using the heterogeneous multi-modal characteristic selection method of the SPLPS,

step 3.1, fix v_mOptimizing w_m: the objective function at this time is:

the first term of equation (3) is transformed as follows:

the first term of equation (3) can be converted into:

for the third term of formula (3), can be provided

Then equation (3) can be converted to

In the formula (5), the first and second groups,

is D_mThe ith row and the ith column of (1),

is the hypergraph laplacian matrix for the mth mode,

at this point the target formula turns into:

a matrix P is defined which is,

is a matrix P_mDiagonal elements of (c):

is w_mLine i of (1), can be

Here, "2" and "λ "are both coefficients, and the two can be merged together, and the coefficient" 2 "is included in the coefficient" λ ", then the objective function is transformed into:

Step 3.2 fix w_mOptimization of v_m：

The objective function at this time is:

in the formula (11), the reaction mixture,

wherein

k 'is an auxiliary parameter, k'>k>0，

Formula (11) to vⁱIs derived by

In the formula (12), l is a loss function matrix, l_imRepresents the loss function:

where i represents the ith sample and m represents the mth mode, the formulaGet vⁱIs solved as

Thereby completing an alternate calculation of the variable w_mAnd v_mSolving; in this example, M is 4, n is 371, w in neuroimaging modality_mInitialization to a random vector of size 116X 1, w in the gene modality_mInitializing to a random vector of size 85 × 1; setting k adjacent to k as 5 and sigma as 1; after optimization lambda is 10^-1，μ＝10；

Fourthly, feature selection:

fifthly, fusing a multi-core support vector machine:

Step 5.3, after the multi-modal kernel function is fused, obtaining

α_i≥0,i＝1,2,…,n (14),

in formula (14), α_iThe lagrange multiplier of the ith sample is used for completing the fusion training of the multi-core support vector machine;

sixthly, classifying and predicting:

In the embodiment, when a weight matrix of the neighborhood of the sample points is constructed, the selection of k values in k neighborhood is important, the structural relationship of the sample points is not represented if the k values are too small, and different classes may be contained in the represented k neighborhood if the k values are too large, so that the result is affected. In the embodiment, the classification precision reaches more than 95%.

The invention fully analyzes the importance of the sample to the classification model and the relationship between the neighbor number of the balance sample and the classification model, weights the sample through self-learning (namely, a self-sample weight vector v is introduced), sequences the sample according to the confidence coefficient in the iteration process, firstly selects the simple sample with high confidence coefficient, namely the sample with small loss function value, and then gradually adds the difficult sampleThis, while selecting the sample, solves for the sample weight v_mAnd through experimental verification of k adjacent to different k values in local maintenance projection, the optimal k value is selected to remarkably improve the position of a characteristic lesion brain area and the mining precision of related lesion genes, and improve the precision of classification prediction.

Nothing in this specification is said to apply to the prior art.

Claims

1. A characteristic analysis method for heterogeneous multi-modal image genetics data is characterized by comprising the following steps:

in the formula (1), n is the number of samples, M is the number of modes,

d represents the dimension of the feature, yⁱIndicates a class label corresponding to the ith sample, Y ═ Y¹，…，yⁱ，…，yⁿ]^T∈RⁿRepresenting label vectors corresponding to n samples, w_mWeight vector for the m-th mode, v_m∈RⁿA self-stepping sample weight vector for the mth modality; lambda is a regularization parameter with sparse constraint characteristics, and mu is a regularization parameter associated with a constraint sample in a multi-mode manner;

wherein

k 'is auxiliary parameter, k' > k > 0, vⁱA self-step sample weight vector for the ith sample; k_mIn order to describe the weight matrix of the sample point proximity relation, each element in the weight matrix

alternative calculation of variable w_mAnd v_mOptimizing and solving the objective function;

selecting a weight vector w with non-zero weight from the obtained solution_mAnd determining the position of the diseased brain area and related diseased genes according to the corresponding characteristics to complete the characteristic analysis of the heterogeneous multi-modal image genetics data.

2. The analytical method of claim 1, wherein the multi-modality image data comprises voxel-based morphometry processed magnetic resonance images, fluorodeoxyglucose-positron emission tomography images, F-18 fluorescence amyloid-positron emission tomography images; the gene data included gene data from the ADNI database (single nucleotide polymorphisms) and APOE.

3. A diagnostic method for brain diseases, characterized in that the analysis method according to claim 1 is used to mine biomarkers to obtain feature vectors, and the heterogeneous multi-modal feature vectors obtained after the sample labels and feature selection are input to a multi-core support vector machine to perform classification prediction.

4. A heterogeneous multi-modal image genetics data feature analysis method is characterized in that a heterogeneous multi-modal image genetics feature selection method of SPLPS is used for mining biomarkers, and then a multi-core support vector machine is used for fusion classification, and the method comprises the following specific steps:

firstly, preprocessing heterogeneous multi-modal image genetics data:

step 1.1, preprocessing neuroimaging data:

the preprocessed isomorphic multimodality imaging data (voxel-based morphometry processed magnetic resonance imaging, fluorodeoxyglucose-positron emission tomography imaging, F-18 fluorescence amyloid-positron emission tomography) are compared to the same access scan and then treated as 2X 2mm in standard Montreal institute of neurology (MNI) space³Voxels, which create normalized gray matter density, map according to magnetic resonance image data, register fluorodeoxyglucose-positron emission tomography and F-18 fluorescence amyloid-positron emission tomography to the same space through a statistical parameter mapping SPM software package, then measure 116 regions of interest, further extract fluorodeoxyglucose-positron emission tomography glucose metabolic rate, grey scale density of magnetic resonance images processed based on morphometry of voxels, and amyloid deposition characteristics of F-18 fluorescence positron emission tomography amyloid imaging, after removal of cerebellum, use the imaging measurements of 90 regions of interest for each homogeneous multimodal imaging as characteristics;

step 1.2, gene data preprocessing:

for the gene data (single nucleotide polymorphism) from ADNI database that will be pretreated, APOE (located on chromosome 19) and regarded as the risk gene and neuronal development, plasticity of brain and repair are correlated with, study the single nucleotide polymorphism of APOE gene boundary + -20 kbp through ANNOVR annotation information, wherein include 85 single nucleotide polymorphism gene loci, the value of the single nucleotide polymorphism adopts the number of the minimum allele 0,1, additive code mode of 2;

taking the data of each mode of each sample obtained in the first step as input, and performing multi-mode combined feature selection; the feature selection objective function formula is:

in the formula (1), n is the number of samples, M is the number of modes,

d represents the dimension of the feature, yⁱIndicates a class label corresponding to the ith sample, Y ═ Y¹，…，yⁱ，…，yⁿ]^T∈RⁿRepresenting label vectors corresponding to n samples, w_mWeight vector for the m-th mode, v_m∈RⁿIn order to be a self-stepping sample weight vector,

lambda is a regularization parameter with sparse constraint characteristics, and mu is a regularization parameter associated with a constraint sample in a multi-mode manner;

each element in the weight matrix

in the formula (2), σ is a constant, K_mTo characterize the weight matrix of the sample point neighborhood,

thereby completing the feature analysis by using the heterogeneous multi-modal feature selection method of the SPLPS;

thirdly, optimizing an objective function and solving w_mAnd v_m: optimizing the objective function of the formula (1) in the second step, solving by adopting alternative structure variables,

step 3.1, fix v_mOptimizing w_m: the objective function at this time is:

for the first term of equation (3), define:

the first term of equation (3) translates to:

for the third term of formula (3), let

The third term of formula (3) is converted into

In the formula (5), the first and second groups,

is D_mThe ith row and the ith column of (1),

is the hypergraph laplacian matrix for the mth mode,

for the second term of equation (3), a matrix P is defined_m，

Is a matrix P_mDiagonal elements of (c):

is w_mLine i of

"2" and "λ" are both coefficients, and when the two are combined together and the coefficient "2" is included in the coefficient "λ", the objective function optimized by equation (3) is expressed by equation (9):

the derivative of equation (9) is derived and made 0 to obtain

Step 3.2 fix w_mOptimization of v_m：

The objective function at this time is:

in the formula (11), the reaction mixture,

wherein

k 'is an auxiliary parameter, k' > k > 0,

formula (11) to vⁱIs derived by

In the formula (12), l_imRepresents the loss function:

where i represents the ith sample and m represents the mth mode, then v is obtained by the above formulaⁱIs solved as

Thereby completing an alternate calculation of the variable w_mAnd v_mSolving;

fourthly, feature selection:

fifthly, fusing a multi-core support vector machine:

Step 5.2 at [0,1 ]]Search for fusion system of each mode in range by using gridCounting, and finding out fusion coefficient rho with best classification effect by adopting a ten-fold cross validation method_m；

Step 5.3, after the multi-modal kernel function is fused, obtaining

Thus obtaining a dual form of the multi-core support vector machine;

α_i≥0，i＝1，2，…，n (14)，

sixthly, classifying and predicting:

5. The method of claim 4A feature analysis method characterized in that k is 5, σ is 1, and M is 4; after optimization lambda is 10^-1，μ＝10。