CN112288027A - Heterogeneous multi-modal image genetics data feature analysis method - Google Patents
Heterogeneous multi-modal image genetics data feature analysis method Download PDFInfo
- Publication number
- CN112288027A CN112288027A CN202011223328.1A CN202011223328A CN112288027A CN 112288027 A CN112288027 A CN 112288027A CN 202011223328 A CN202011223328 A CN 202011223328A CN 112288027 A CN112288027 A CN 112288027A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- modal
- mode
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2136—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/40—Population genetics; Linkage disequilibrium
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Abstract
The method for analyzing the characteristics of the heterogeneous multi-modal image genetics data considers the structural relationship among sample data and the 'difficulty degree' of the sample in the training process, and performs characteristic analysis on the brain image data and the gene data by sample weighting and structure sparseness. The method adopts a self-walking learning mechanism, realizes the automatic increase of the sample from simple to complex in the training process, and reduces the influence of noise on the model. In addition, a local retention projection method is introduced under a self-walking learning framework, a neighborhood structure fixed in a sample point under a sample space is effectively retained, and meanwhile, an L1 norm constraint projection matrix is used as a regularization item to realize a feature selection process. And finally, performing fusion classification on the selected features by using a multi-core support vector machine, thereby improving the diagnosis precision of diseases. The method disclosed by the invention can effectively select and classify the features.
Description
Technical Field
The technical scheme of the invention relates to a method for recognizing graphs, in particular to a heterogeneous multi-modal image genetics data feature analysis method.
Background
Alzheimer's disease, also known as senile dementia, is a common degenerative disease of the brain, with manifestation symptoms such as memory impairment, reasoning cognitive dysfunction, language and motor impairment, which are one of the important diseases endangering the health of the elderly at present, and the course of the disease is slow and irreversible. Depending on the development of cognitive models and the extent of functional impairment, the onset of alzheimer's disease can be divided into three stages: normal control, mild cognitive dysfunction and alzheimer's disease. According to the pathogenesis of alzheimer's disease, early detection and effective treatment can delay the progression of the disease. Numerous studies have shown that alzheimer's disease is associated with atrophy of the structure, alterations in metabolism, pathological amyloid deposition of the brain. Commonly used related brain imaging includes structural magnetic resonance imaging, functional magnetic resonance imaging, diffusion tensor imaging, and positron emission tomography imaging. Meanwhile, with the development of genetic techniques, researchers can search for genetic markers associated with neurological and psychiatric diseases from a more refined molecular level (e.g., single nucleotide polymorphisms).
In recent years, with the continuous innovation of technological capabilities, more and more researches are being focused on the early diagnosis of alzheimer's disease, and since the brain has a very complex structure and function, the modality of acquiring data from a single brain cannot provide enough characteristic information to diagnose. In image genetics, the necessary complementary information can be provided between the different modalities, for example, structural magnetic resonance imaging provides information about brain tissue type, while positron emission tomography imaging measures glucose brain metabolism rate. Fusing multimodal data enables discovery of information that cannot be found in a single modality. In recent years, with the development of neuroimaging technology and genetics technology, multi-modal data can be collected in the acquisition process for various subject examinations, and a data source is provided for the diagnosis of alzheimer disease.
Heterogeneous multimodal imaging genetics data are high in dimensionality and contain a large amount of information, and not all features are helpful for detecting and analyzing alzheimer's disease. Therefore, it is important to remove redundant or low-relevance features from the large number of features provided by brain images and genetic data to select features relevant to the classification prediction task. CN109770932A discloses a method for processing multi-modal brain neuroimaging features, which performs feature analysis on multi-modal data by using sample weight and low-rank constraint multi-modal feature selection method. The method does not consider the 'difficulty degree' of the data, generalizes simple and general knowledge and complex specialized knowledge, randomly adds all data (including noise points or outliers) into training in the training process, and cannot effectively eliminate the influence of noise samples on the model. CN111462116A discloses a multimodal parameter model optimization fusion method based on imagery omics characteristics, which obtains low-dimensional imagery omics characteristics by gradient dimensionality reduction on high-dimensional imagery omics characteristics, and ignores data internal structure information in the dimensionality reduction process.
In summary, in the existing alzheimer diagnosis classification technology, the existing feature selection methods have the defects that the relationship between samples cannot be considered better, the classification of the alzheimer diagnosis is easy to be wrong, and the accuracy needs to be further improved.
Disclosure of Invention
The technical task of the invention is to provide a heterogeneous multi-modal image genetics data feature analysis method aiming at the defects, simultaneously consider the structural relationship among sample data and the 'difficulty degree' of the sample in the training process, carry out feature analysis on brain image data and gene data by adopting sample weighting and structure sparsification, take a feature weight matrix as a projection matrix in the dimension reduction process, and simultaneously constrain the feature weight matrix and the projection matrix by adopting sparsification. The method adopts a self-walking learning mechanism, realizes the automatic increase of the sample from simple to complex in the training process, and reduces the influence of noise on the model. In addition, a local retention projection method is introduced under a self-walking learning framework, a neighborhood structure fixed in a sample point under a sample space is effectively retained, and meanwhile, an L1 norm constraint projection matrix is used as a regularization item to realize a feature selection process. And finally, performing fusion classification on the selected features by using a multi-core support vector machine, thereby improving the diagnosis precision of diseases. The method disclosed by the invention can effectively select and classify the features.
Since the english language of "Self-learning" is "Self-processed learning", i.e. SPL, and the english language of "local Preserving projection" is "localization preceding projects", i.e. LPP, and the english language of "Structured sparse" is "Structured space", i.e. SS, the method for selecting genetic characteristics of heterogeneous multi-modal imagery according to the present invention may be hereinafter referred to as "SPLPS" for short.
The technical scheme adopted by the invention for solving the technical problem is as follows:
a characteristic analysis method for heterogeneous multi-modal image genetics data comprises the following steps:
acquiring data after heterogeneous multi-modal preprocessing of a certain type of brain disease sample, wherein the data comprises gene data and image data of different modalities, and acquiring data of each sample in each modality;
performing multi-modal combined feature selection on the data after the heterogeneous multi-modal preprocessing, wherein a feature selection target function is a formula (1):
in the formula (1), n is the number of samples, M is the number of modes,representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth moded represents the dimension of the feature, yi represents the class label corresponding to the ith sample, and Y ═ Y1,…,yi,…,yn]TE to Rn represents label vectors corresponding to n samples, wm is a weight vector of the mth mode, and vm e to Rn is a weight vector of the mth mode self-stepping sample; lambda is a regularization parameter with sparse constraint characteristics, and mu is a regularization parameter associated with a constraint sample in a multi-mode manner;whereink′As an auxiliary parameter, k′>k>0, vi is the self-step sample weight vector of the ith sample; km is a weight matrix for describing the adjacent relation of sample points, and each element in the weight matrixThe neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection modeIf not, it indicates that there is k-neighborhood between the ith sample and the jth sample, otherwise, it indicates that there is no k-neighborhood between the ith sample and the jth sample,
alternately calculating variables wm and vm, and carrying out optimization solution on the objective function;
and selecting the characteristics corresponding to the weight vector wm with non-zero weight from the obtained solution, further determining the position of the diseased brain area and the related diseased gene, and completing the characteristic analysis of the heterogeneous multi-modal image genetics data.
The heterogeneous multi-modal image genetics data feature analysis method is characterized in that a heterogeneous multi-modal image genetics feature selection method of SPLPS is used for mining biomarkers, and then a multi-core support vector machine is used for fusion classification, and the method specifically comprises the following steps:
firstly, preprocessing heterogeneous multi-modal image genetics data:
step 1.1, preprocessing neuroimaging data:
comparing preprocessed isomorphic multimodality imaging data (voxel-based morphometry processed magnetic resonance image, fluorodeoxyglucose-positron emission tomography image, F-18 fluorescence amyloid-positron emission tomography (F-18 fluorescence amyloid-positron emission tomography can effectively display the neuroinflammatory plaque content in vivo.) with the same visit scan, and then serving as 2 x 2mm in the space of a standard Montreal institute of neurology (MNI)) to obtain the data3Voxels, which create normalized gray matter density, map according to magnetic resonance image data, register fluorodeoxyglucose-positron emission tomography and F-18 fluorescence amyloid-positron emission tomography to the same space through a statistical parameter mapping SPM software package, then measure 116 regions of interest, further extract fluorodeoxyglucose-positron emission tomography glucose metabolic rate, grey scale density of magnetic resonance images processed based on morphometry of voxels, and amyloid deposition characteristics of F-18 fluorescence positron emission tomography amyloid imaging, after removal of cerebellum, use the imaging measurements of 90 regions of interest for each homogeneous multimodal imaging as characteristics;
step 1.2, gene data preprocessing:
for gene data (single nucleotide polymorphism) from ADNI database which is preprocessed, APOE (located on chromosome 19) is used as a risk gene and is related to the development of neurons, plasticity of brain and repair, the ANNOVR annotation information is used for researching the single nucleotide polymorphism of the APOE gene boundary +/-20 kbp, wherein 85 single nucleotide polymorphism gene loci are included, and the value of the single nucleotide polymorphism adopts an additive coding mode of the number 0,1 and 2 of minimum alleles;
thus finishing the preprocessing of the heterogeneous multi-modal image genetics data;
secondly, performing feature analysis by using an SPLPS heterogeneous multi-modal feature selection method:
taking the data of each mode of each sample obtained in the first step as input, and performing multi-mode combined feature selection; the feature selection target formula is:
in the formula (1), n is the number of samples, M is the number of modes,representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth moded represents the dimension of the feature, yiIndicates a class label corresponding to the ith sample, Y ═ Y1,…,yi,…,yn]T∈RnRepresenting label vectors corresponding to n samples, wmWeight vector for the m-th mode, vm∈RnFor the self-paced sample weight vector, each element in the matrixRepresenting the adjacent relation between the m-th modal samples, and effectively preserving samples by adopting a local preserving projection methodNeighborhood structure of sample points in this spaceIf not, it represents that there is k-neighborhood between the ith sample and the jth sample, otherwise, it represents that there is no k-neighborhood between the ith sample and the jth sample, and it is described by the following formula:
in the formula (2), the parameter σ can be 1, K without loss of generalitymTo characterize the weight matrix of the sample point neighborhood, lambda is a regularization parameter for constraint characteristic sparseness, mu is a regularization parameter associated with a constraint sample multi-mode, and therefore feature analysis is completed by using the heterogeneous multi-mode feature selection method of the SPLPS;
thirdly, optimizing an objective function and solving wmAnd vm: the objective function of the formula (1) in the second step is optimized, and can be solved by adopting alternative structure variables,
step 3.1, fix vmOptimizing wm: the objective function at this time is:
the first term of equation (3) is transformed as follows:
the first term of equation (3) can be converted into:
for the third term of formula (3), can be provided
Then equation (3) can be converted to
In the formula (5), the first and second groups,is the hypergraph laplacian matrix for the mth mode,
at this point the target formula turns into:
is wmI.e. the weight vector of the m-th mode of the i-th sample, can be obtained "2" and "λ" are both coefficients, and by combining the two together, and incorporating the coefficient "2" into the coefficient "λ", the objective function is transformed into:
by taking the derivative of equation (9) and making the derivative 0, it is obtained
Step 3.2 fix wmOptimization of vm:
The objective function at this time is:
Formula (11) to viIs derived by
In the formula (12), limRepresents the loss function:where i represents the ith sample and m represents the mth mode, v is obtained by the above equationiIs solved as
Thereby completing an alternate calculation of the variable wmAnd vmSolving;
fourthly, feature selection:
solving the objective function to select the non-zero characteristic of the corresponding weight;
fifthly, fusing a multi-core support vector machine:
step 5.1, respectively calculating a kernel matrix of each mode, wherein the linear kernel function of the mth mode is
Step 5.2 at [0,1 ]]Searching the fusion coefficient of each mode by using grids in the range, and finding out the fusion coefficient rho with the best classification effect by adopting a ten-fold cross validation methodm;
Step 5.3, after the multi-modal kernel function is fused, obtainingTherefore, the dual form of the multi-core support vector machine can be obtained;
αi≧ 0, i ═ 1,2, …, n (14), in formula (14), αiThe lagrange multiplier of the ith sample is used for completing the fusion training of the multi-core support vector machine;
sixthly, classifying and predicting:
the parameter alpha obtained by the training of the fifth stepiSubstituted into equation (15) below for a given new test sample x0The decision function for determining the sample label is defined as shown in equation (15),
in equation (15), sign () is a sign function, b is an offset, and f (x)0) Is the new test sample x0The predicted result of (2);
thus, feature selection is carried out by using the heterogeneous multi-modal image genetics data feature analysis method of the SPLPS, and the heterogeneous multi-modal image genetics features are classified by using a multi-core support vector machine method.
Compared with the prior art, the invention adopts the technical scheme that the prominent substantive features and remarkable progress of the invention are as follows:
(1) the method provided by the invention can simultaneously consider the structural relationship among sample data and the 'difficulty degree' of a sample in a training process, namely, a heterogeneous multi-modal image genetics feature selection method based on sample weighting and low-rank constraint is adopted to perform feature selection on multi-modal data, firstly, an L1 norm is utilized to constrain features, meanwhile, a local preserving projection method is adopted, a feature weight matrix is used as a projection matrix in a local preserving projection dimension reduction process, the neighborhood structure of a sample point in a sample space is effectively preserved, then, a self-learning mechanism is adopted, and the 'difficulty degree' of the sample is considered in the training process, so that the automatic growth of the sample is realized. The SPLPS feature selection method based on sample weight and low-rank constraint can simultaneously consider the difference (difficulty degree) of a sample point neighborhood structure and a sample in the feature selection process, judge whether to add a next iteration process by considering the sample difficulty degree (confidence degree), firstly select a simple sample with high confidence degree in the iteration process, then gradually add a difficult sample, avoid the influence of a noise point or an outlier on a model through a special training mode and an L1 regularization term, select a feature with strong discriminability, and achieve a better classification prediction effect.
(2) Compared with other feature selection methods, the invention adopts the SPLPS method, describes the high-order relationship among the samples by constructing the adjacent relationship of the sample points k, fully utilizes the prior distribution knowledge among the samples, fully utilizes the internal information of each modal data, retains the original neighborhood relationship among the samples, is beneficial to selecting the features with more discriminability and improves the accuracy of classification prediction.
(3) The method of the invention considers the 'difficulty degree' of the sample data in the training process, adopts the self-learning strategy to realize the selection process of the sample from 'simple' to 'complex', and can realize the automatic growth of the sample.
(4) The method not only reduces the influence of noise points or outlier points on the model by adopting a regularization term, but also eliminates some noise samples by adding the confidence coefficient of the samples, thereby improving the robustness of the model.
(5) CN109770932A discloses a method for processing multi-modal brain neuroimaging features, which performs feature analysis on multi-modal data by using sample weight and low-rank constraint multi-modal feature selection method. The method does not consider the 'difficulty degree' of the data, generalizes simple and general knowledge and complex specialized knowledge, randomly adds all data (including noise points or outliers) into training in the training process, and cannot effectively eliminate the influence of noise samples on the model. Compared with CN109770932A, the method judges whether the sample is added into the next iteration process by considering the sample confidence coefficient, firstly selects the 'simple' sample with high confidence coefficient in the iteration process, then gradually adds the 'difficult' sample, avoids the influence of noise points or outliers on the model by the training mode of sample self-growth and the regularization term, and ensures that the whole algorithm has more robustness.
(6) CN111462116A discloses a multimodal parameter model optimization fusion method based on imagery omics characteristics, which obtains low-dimensional imagery omics characteristics by gradient dimensionality reduction on high-dimensional imagery omics characteristics, and ignores data internal structure information in the dimensionality reduction process. Compared with CN111462116A, the method of the invention constructs the distance and near affinity relationship between each sample pair in the space by the local preserving projection method, and keeps the relationship in the projection, thus preserving the local neighborhood relationship of the samples in the space while reducing the dimension, and providing more abundant information.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic diagram of the processing flow of the multimodal imaging genetics data by the method of the present invention based on the SPLPS method and the multi-nuclear support vector machine.
Detailed Description
The embodiment shown in fig. 1 shows that the processing flow of the heterogeneous multi-modal image genetics data feature analysis based on the SPLPS feature selection method and the multi-core support vector machine in the method of the present invention is as follows: preprocessing heterogeneous multi-modal image genetic data → performing feature analysis by using the heterogeneous multi-modal feature selection method of SPLPS → optimizing an objective function and solving wmAnd vm→ feature selection → multi-core support vector machine fusion → classification and prediction.
Examples
The method for analyzing the characteristics of the heterogeneous multi-modal image genetics data in the embodiment is to excavate the biomarker by using a heterogeneous multi-modal image genetics characteristic selection method of the SPLPS, and then perform fusion classification by using a multi-core support vector machine, and comprises the following specific steps:
firstly, preprocessing heterogeneous multi-modal image genetics data:
step 1.1, preprocessing neuroimaging data:
comparing preprocessed isomorphic multimodality imaging data (voxel-based morphometry processed magnetic resonance image, fluorodeoxyglucose-positron emission tomography image, F-18 fluorescence amyloid-positron emission tomography (F-18 fluorescence amyloid-positron emission tomography can effectively display the neuroinflammatory plaque content in vivo.) with the same visit scan, and then serving as 2 x 2mm in the space of a standard Montreal institute of neurology (MNI)) to obtain the data3Voxel, we have created normalized gray matter density, map according to the magnetic resonance image data, and register fluorodeoxyglucose-positron emission tomography and F-18 fluorescence amyloid-positron emission tomography to the same space through the statistical parameter mapping SPM software package, then measure 116 areas of interest, further extract fluorodeoxyglucose-positron emission tomography glucose metabolic rate, the gray-scale density of the magnetic resonance image processed based on the morphometry of voxel and the amyloid deposition characteristics of F-18 fluorescence positron emission tomography amyloid imaging, after removing the cerebellum, use the imaging measured value of 90 areas of interest of each isomorphic multimodality imaging as the characteristics;
step 1.2, gene data preprocessing:
for gene data (single nucleotide polymorphism) from ADNI database which is preprocessed, APOE (located on chromosome 19) is used as a risk gene and is related to the development of neurons, plasticity of brain and repair, the ANNOVR annotation information is used for researching the single nucleotide polymorphism of the APOE gene boundary +/-20 kbp, wherein 85 single nucleotide polymorphism gene loci are included, and the value of the single nucleotide polymorphism adopts an additive coding mode of the number 0,1 and 2 of minimum alleles;
thus finishing the preprocessing of the heterogeneous multi-modal image genetics data;
secondly, performing feature analysis by using an SPLPS heterogeneous multi-modal feature selection method:
taking the data of each mode of each sample obtained in the first step as input, and performing multi-mode combined feature selection; the feature selection target formula is:
in the formula (1), n is the number of samples, M is the number of modes,representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth moded represents the dimension of the feature, yiIndicates a class label corresponding to the ith sample, Y ═ Y1,…,yi,…,yn]T∈RnRepresenting label vectors corresponding to n samples, wnWeight vector for the m-th mode, vm∈RnFor the self-paced sample weight vector, each element in the matrixThe neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection methodIf not, it indicates that there is k-neighborhood relationship between the ith sample and the jth sample, otherwise, it indicates that there is no k-neighborhood relationship between the ith sample and the jth sample (k-neighborhood describes the structural relationship between sample points in the feature space, where k is a constant, and means to find out the k sample points closest to the sample points in euclidean distance), and it is described by the following formula:
in the formula (2), the parameter σ can be 1, K without loss of generalitymTo characterize the weight matrix of the sample point neighborhood, lambda is a regularization parameter for constraint characteristic sparseness, mu is a regularization parameter for constraint sample multi-modal association, thereby completing the characteristic analysis by using the heterogeneous multi-modal characteristic selection method of the SPLPS,
thirdly, optimizing an objective function and solving wmAnd vm: the objective function of the formula (1) in the second step is optimized, and can be solved by adopting alternative structure variables,
step 3.1, fix vmOptimizing wm: the objective function at this time is:
the first term of equation (3) is transformed as follows:
the first term of equation (3) can be converted into:
for the third term of formula (3), can be provided
Then equation (3) can be converted to
In the formula (5), the first and second groups,is DmThe ith row and the ith column of (1),is the hypergraph laplacian matrix for the mth mode,
at this point the target formula turns into:
is wmLine i of (1), can beHere, "2" and "λ "are both coefficients, and the two can be merged together, and the coefficient" 2 "is included in the coefficient" λ ", then the objective function is transformed into:
by taking the derivative of equation (9) and making the derivative 0, it is obtained
Step 3.2 fix wmOptimization of vm:
The objective function at this time is:
Formula (11) to viIs derived by
In the formula (12), l is a loss function matrix, limRepresents the loss function:where i represents the ith sample and m represents the mth mode, the formulaGet viIs solved as
Thereby completing an alternate calculation of the variable wmAnd vmSolving; in this example, M is 4, n is 371, w in neuroimaging modalitymInitialization to a random vector of size 116X 1, w in the gene modalitymInitializing to a random vector of size 85 × 1; setting k adjacent to k as 5 and sigma as 1; after optimization lambda is 10-1,μ=10;
Fourthly, feature selection:
solving the objective function to select the non-zero characteristic of the corresponding weight;
fifthly, fusing a multi-core support vector machine:
step 5.1, respectively calculating a kernel matrix of each mode, wherein the linear kernel function of the mth mode is
Step 5.2 at [0,1 ]]Searching the fusion coefficient of each mode by using grids in the range, and finding out the fusion coefficient rho with the best classification effect by adopting a ten-fold cross validation methodm;
Step 5.3, after the multi-modal kernel function is fused, obtainingTherefore, the dual form of the multi-core support vector machine can be obtained;
αi≥0,i=1,2,…,n (14),
in formula (14), αiThe lagrange multiplier of the ith sample is used for completing the fusion training of the multi-core support vector machine;
sixthly, classifying and predicting:
the parameter alpha obtained by the training of the fifth stepiSubstituted into equation (15) below for a given new test sample x0The decision function for determining the sample label is defined as shown in equation (15),
in equation (15), sign () is a sign function, b is an offset, and f (x)0) Is the new test sample x0The predicted result of (2);
thus, feature selection is carried out by using the heterogeneous multi-modal image genetics data feature analysis method of the SPLPS, and the heterogeneous multi-modal image genetics features are classified by using a multi-core support vector machine method.
In the embodiment, when a weight matrix of the neighborhood of the sample points is constructed, the selection of k values in k neighborhood is important, the structural relationship of the sample points is not represented if the k values are too small, and different classes may be contained in the represented k neighborhood if the k values are too large, so that the result is affected. In the embodiment, the classification precision reaches more than 95%.
The invention fully analyzes the importance of the sample to the classification model and the relationship between the neighbor number of the balance sample and the classification model, weights the sample through self-learning (namely, a self-sample weight vector v is introduced), sequences the sample according to the confidence coefficient in the iteration process, firstly selects the simple sample with high confidence coefficient, namely the sample with small loss function value, and then gradually adds the difficult sampleThis, while selecting the sample, solves for the sample weight vmAnd through experimental verification of k adjacent to different k values in local maintenance projection, the optimal k value is selected to remarkably improve the position of a characteristic lesion brain area and the mining precision of related lesion genes, and improve the precision of classification prediction.
Nothing in this specification is said to apply to the prior art.
Claims (5)
1. A characteristic analysis method for heterogeneous multi-modal image genetics data is characterized by comprising the following steps:
acquiring data after heterogeneous multi-modal preprocessing of a certain type of brain disease sample, wherein the data comprises gene data and image data of different modalities, and acquiring data of each sample in each modality;
performing multi-modal combined feature selection on the data after the heterogeneous multi-modal preprocessing, wherein a feature selection target function is a formula (1):
in the formula (1), n is the number of samples, M is the number of modes,representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth moded represents the dimension of the feature, yiIndicates a class label corresponding to the ith sample, Y ═ Y1,…,yi,…,yn]T∈RnRepresenting label vectors corresponding to n samples, wmWeight vector for the m-th mode, vm∈RnA self-stepping sample weight vector for the mth modality; lambda is a regularization parameter with sparse constraint characteristics, and mu is a regularization parameter associated with a constraint sample in a multi-mode manner;whereink 'is auxiliary parameter, k' > k > 0, viA self-step sample weight vector for the ith sample; kmIn order to describe the weight matrix of the sample point proximity relation, each element in the weight matrixThe neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection modeIf not, it indicates that there is k-neighborhood between the ith sample and the jth sample, otherwise, it indicates that there is no k-neighborhood between the ith sample and the jth sample,
alternative calculation of variable wmAnd vmOptimizing and solving the objective function;
selecting a weight vector w with non-zero weight from the obtained solutionmAnd determining the position of the diseased brain area and related diseased genes according to the corresponding characteristics to complete the characteristic analysis of the heterogeneous multi-modal image genetics data.
2. The analytical method of claim 1, wherein the multi-modality image data comprises voxel-based morphometry processed magnetic resonance images, fluorodeoxyglucose-positron emission tomography images, F-18 fluorescence amyloid-positron emission tomography images; the gene data included gene data from the ADNI database (single nucleotide polymorphisms) and APOE.
3. A diagnostic method for brain diseases, characterized in that the analysis method according to claim 1 is used to mine biomarkers to obtain feature vectors, and the heterogeneous multi-modal feature vectors obtained after the sample labels and feature selection are input to a multi-core support vector machine to perform classification prediction.
4. A heterogeneous multi-modal image genetics data feature analysis method is characterized in that a heterogeneous multi-modal image genetics feature selection method of SPLPS is used for mining biomarkers, and then a multi-core support vector machine is used for fusion classification, and the method comprises the following specific steps:
firstly, preprocessing heterogeneous multi-modal image genetics data:
step 1.1, preprocessing neuroimaging data:
the preprocessed isomorphic multimodality imaging data (voxel-based morphometry processed magnetic resonance imaging, fluorodeoxyglucose-positron emission tomography imaging, F-18 fluorescence amyloid-positron emission tomography) are compared to the same access scan and then treated as 2X 2mm in standard Montreal institute of neurology (MNI) space3Voxels, which create normalized gray matter density, map according to magnetic resonance image data, register fluorodeoxyglucose-positron emission tomography and F-18 fluorescence amyloid-positron emission tomography to the same space through a statistical parameter mapping SPM software package, then measure 116 regions of interest, further extract fluorodeoxyglucose-positron emission tomography glucose metabolic rate, grey scale density of magnetic resonance images processed based on morphometry of voxels, and amyloid deposition characteristics of F-18 fluorescence positron emission tomography amyloid imaging, after removal of cerebellum, use the imaging measurements of 90 regions of interest for each homogeneous multimodal imaging as characteristics;
step 1.2, gene data preprocessing:
for the gene data (single nucleotide polymorphism) from ADNI database that will be pretreated, APOE (located on chromosome 19) and regarded as the risk gene and neuronal development, plasticity of brain and repair are correlated with, study the single nucleotide polymorphism of APOE gene boundary + -20 kbp through ANNOVR annotation information, wherein include 85 single nucleotide polymorphism gene loci, the value of the single nucleotide polymorphism adopts the number of the minimum allele 0,1, additive code mode of 2;
thus finishing the preprocessing of the heterogeneous multi-modal image genetics data;
secondly, performing feature analysis by using an SPLPS heterogeneous multi-modal feature selection method:
taking the data of each mode of each sample obtained in the first step as input, and performing multi-mode combined feature selection; the feature selection objective function formula is:
in the formula (1), n is the number of samples, M is the number of modes,representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth moded represents the dimension of the feature, yiIndicates a class label corresponding to the ith sample, Y ═ Y1,…,yi,…,yn]T∈RnRepresenting label vectors corresponding to n samples, wmWeight vector for the m-th mode, vm∈RnIn order to be a self-stepping sample weight vector, lambda is a regularization parameter with sparse constraint characteristics, and mu is a regularization parameter associated with a constraint sample in a multi-mode manner;
each element in the weight matrixThe neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection modeIf not, it represents that there is k-neighborhood between the ith sample and the jth sample, otherwise, it represents that there is no k-neighborhood between the ith sample and the jth sample, and it is described by the following formula:
in the formula (2), σ is a constant, KmTo characterize the weight matrix of the sample point neighborhood, thereby completing the feature analysis by using the heterogeneous multi-modal feature selection method of the SPLPS;
thirdly, optimizing an objective function and solving wmAnd vm: optimizing the objective function of the formula (1) in the second step, solving by adopting alternative structure variables,
step 3.1, fix vmOptimizing wm: the objective function at this time is:
for the first term of equation (3), define:
the first term of equation (3) translates to:
for the third term of formula (3), let
The third term of formula (3) is converted into
In the formula (5), the first and second groups,is DmThe ith row and the ith column of (1),is the hypergraph laplacian matrix for the mth mode,
is wmLine i of"2" and "λ" are both coefficients, and when the two are combined together and the coefficient "2" is included in the coefficient "λ", the objective function optimized by equation (3) is expressed by equation (9):
the derivative of equation (9) is derived and made 0 to obtain
Step 3.2 fix wmOptimization of vm:
The objective function at this time is:
formula (11) to viIs derived by
In the formula (12), limRepresents the loss function:where i represents the ith sample and m represents the mth mode, then v is obtained by the above formulaiIs solved as
Thereby completing an alternate calculation of the variable wmAnd vmSolving;
fourthly, feature selection:
solving the objective function to select the non-zero characteristic of the corresponding weight;
fifthly, fusing a multi-core support vector machine:
step 5.1, respectively calculating a kernel matrix of each mode, wherein the linear kernel function of the mth mode is
Step 5.2 at [0,1 ]]Search for fusion system of each mode in range by using gridCounting, and finding out fusion coefficient rho with best classification effect by adopting a ten-fold cross validation methodm;
Step 5.3, after the multi-modal kernel function is fused, obtainingThus obtaining a dual form of the multi-core support vector machine;
αi≥0,i=1,2,…,n (14),
in formula (14), αiThe lagrange multiplier of the ith sample is used for completing the fusion training of the multi-core support vector machine;
sixthly, classifying and predicting:
the parameter alpha obtained by the training of the fifth stepiSubstituted into equation (15) below for a given new test sample x0The decision function for determining the sample label is defined as shown in equation (15),
in equation (15), sign () is a sign function, b is an offset, and f (x)0) Is the new test sample x0The predicted result of (2);
thus, feature selection is carried out by using the heterogeneous multi-modal image genetics data feature analysis method of the SPLPS, and the heterogeneous multi-modal image genetics features are classified by using a multi-core support vector machine method.
5. The method of claim 4A feature analysis method characterized in that k is 5, σ is 1, and M is 4; after optimization lambda is 10-1,μ=10。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011223328.1A CN112288027B (en) | 2020-11-05 | 2020-11-05 | Heterogeneous multi-modal image genetics data feature analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011223328.1A CN112288027B (en) | 2020-11-05 | 2020-11-05 | Heterogeneous multi-modal image genetics data feature analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112288027A true CN112288027A (en) | 2021-01-29 |
CN112288027B CN112288027B (en) | 2022-05-03 |
Family
ID=74350529
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011223328.1A Active CN112288027B (en) | 2020-11-05 | 2020-11-05 | Heterogeneous multi-modal image genetics data feature analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112288027B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113724863A (en) * | 2021-09-08 | 2021-11-30 | 山东建筑大学 | Automatic discrimination system, storage medium and equipment for autism spectrum disorder |
CN114580497A (en) * | 2022-01-26 | 2022-06-03 | 南京航空航天大学 | Method for analyzing influence of genes on multi-modal brain image phenotype |
CN114820460A (en) * | 2022-04-02 | 2022-07-29 | 南京航空航天大学 | Method and device for analyzing correlation of single gene locus and time sequence brain image |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105957047A (en) * | 2016-05-06 | 2016-09-21 | 中国科学院自动化研究所 | Supervised multimodal brain image fusion method |
US20170249547A1 (en) * | 2016-02-26 | 2017-08-31 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and Methods for Holistic Extraction of Features from Neural Networks |
WO2017190337A1 (en) * | 2016-05-06 | 2017-11-09 | 中国科学院自动化研究所 | Supervised multi-modality brain image fusion method |
CN109770932A (en) * | 2019-02-21 | 2019-05-21 | 河北工业大学 | The processing method of multi-modal brain neuroblastoma image feature |
CN106250914B (en) * | 2016-07-22 | 2019-07-09 | 华侨大学 | Multi-modal data Feature Selection and classification method based on the sparse Multiple Kernel Learning of structure |
CN110009049A (en) * | 2019-04-10 | 2019-07-12 | 江南大学 | It is a kind of based on from step tied mechanism can supervision image classification method |
-
2020
- 2020-11-05 CN CN202011223328.1A patent/CN112288027B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170249547A1 (en) * | 2016-02-26 | 2017-08-31 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and Methods for Holistic Extraction of Features from Neural Networks |
CN105957047A (en) * | 2016-05-06 | 2016-09-21 | 中国科学院自动化研究所 | Supervised multimodal brain image fusion method |
WO2017190337A1 (en) * | 2016-05-06 | 2017-11-09 | 中国科学院自动化研究所 | Supervised multi-modality brain image fusion method |
CN106250914B (en) * | 2016-07-22 | 2019-07-09 | 华侨大学 | Multi-modal data Feature Selection and classification method based on the sparse Multiple Kernel Learning of structure |
CN109770932A (en) * | 2019-02-21 | 2019-05-21 | 河北工业大学 | The processing method of multi-modal brain neuroblastoma image feature |
CN110009049A (en) * | 2019-04-10 | 2019-07-12 | 江南大学 | It is a kind of based on from step tied mechanism can supervision image classification method |
Non-Patent Citations (2)
Title |
---|
HONGCHENG LIU ET AL: "Folded concave penalized learning in identifying multimodal MRI", 《JOURNAL OF NEUROSCIENCE METHODS》 * |
彭瑶 等: "基于超图的多模态特征选择算法及其应用", 《计算机科学与探索》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113724863A (en) * | 2021-09-08 | 2021-11-30 | 山东建筑大学 | Automatic discrimination system, storage medium and equipment for autism spectrum disorder |
CN114580497A (en) * | 2022-01-26 | 2022-06-03 | 南京航空航天大学 | Method for analyzing influence of genes on multi-modal brain image phenotype |
CN114580497B (en) * | 2022-01-26 | 2023-07-11 | 南京航空航天大学 | Method for analyzing influence of genes on multimodal brain image phenotype |
CN114820460A (en) * | 2022-04-02 | 2022-07-29 | 南京航空航天大学 | Method and device for analyzing correlation of single gene locus and time sequence brain image |
CN114820460B (en) * | 2022-04-02 | 2023-09-29 | 南京航空航天大学 | Method and device for correlation analysis of single gene locus and time sequence brain image |
Also Published As
Publication number | Publication date |
---|---|
CN112288027B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112288027B (en) | Heterogeneous multi-modal image genetics data feature analysis method | |
Weiner et al. | 2014 Update of the Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception | |
CN111488914B (en) | Alzheimer disease classification and prediction system based on multitask learning | |
Iqbal et al. | Developing a brain atlas through deep learning | |
CA3125883C (en) | Grading of structures for state determination | |
Kostro et al. | Correction of inter-scanner and within-subject variance in structural MRI based automated diagnosing | |
CN111063442B (en) | Brain disease process prediction method and system based on weak supervision multitask matrix completion | |
Platero et al. | Longitudinal neuroimaging hippocampal markers for diagnosing Alzheimer’s disease | |
Rahaman et al. | Multi-modal deep learning of functional and structural neuroimaging and genomic data to predict mental illness | |
Wang et al. | Applications of generative adversarial networks in neuroimaging and clinical neuroscience | |
Singh et al. | Genetic, structural and functional imaging biomarkers for early detection of conversion from MCI to AD | |
CN114359642A (en) | Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer | |
Alkabawi et al. | Computer-aided classification of multi-types of dementia via convolutional neural networks | |
Liu et al. | Volumetric segmentation of white matter tracts with label embedding | |
Du et al. | Fast multi-task SCCA learning with feature selection for multi-modal brain imaging genetics | |
Yang et al. | Diagnosis of Parkinson’s disease based on 3D ResNet: The frontal lobe is crucial | |
Ong et al. | Detection of subtle white matter lesions in MRI through texture feature extraction and boundary delineation using an embedded clustering strategy | |
Wang et al. | Joint learning framework of cross-modal synthesis and diagnosis for alzheimer’s disease by mining underlying shared modality information | |
CN114202075A (en) | Guided multi-mode image genetics data feature analysis method | |
Xu et al. | Role of hippocampal subfields in neurodegenerative disease progression analyzed with a multi-scale attention-based network | |
Filipovych et al. | A composite multivariate polygenic and neuroimaging score for prediction of conversion to Alzheimer's disease | |
Gu et al. | Autism spectrum disorder diagnosis using the relational graph attention network | |
Wang et al. | Identifying biomarkers of Alzheimer’s disease via a novel structured sparse canonical correlation analysis approach | |
CN114187962A (en) | Nonlinear correlation analysis method based on joint structure constraint and incomplete multi-modal data | |
Hett et al. | Patch-Based abnormality maps for improved deep learning-based classification of Huntington’s disease |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |