CN112288027A - Heterogeneous multi-modal image genetics data feature analysis method - Google Patents

Heterogeneous multi-modal image genetics data feature analysis method Download PDF

Info

Publication number
CN112288027A
CN112288027A CN202011223328.1A CN202011223328A CN112288027A CN 112288027 A CN112288027 A CN 112288027A CN 202011223328 A CN202011223328 A CN 202011223328A CN 112288027 A CN112288027 A CN 112288027A
Authority
CN
China
Prior art keywords
sample
data
modal
mode
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011223328.1A
Other languages
Chinese (zh)
Other versions
CN112288027B (en
Inventor
郝小可
王如雪
师硕
阎刚
肖云佳
李想
谭麒豪
安琦瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Technology
Original Assignee
Hebei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Technology filed Critical Hebei University of Technology
Priority to CN202011223328.1A priority Critical patent/CN112288027B/en
Publication of CN112288027A publication Critical patent/CN112288027A/en
Application granted granted Critical
Publication of CN112288027B publication Critical patent/CN112288027B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Abstract

The method for analyzing the characteristics of the heterogeneous multi-modal image genetics data considers the structural relationship among sample data and the 'difficulty degree' of the sample in the training process, and performs characteristic analysis on the brain image data and the gene data by sample weighting and structure sparseness. The method adopts a self-walking learning mechanism, realizes the automatic increase of the sample from simple to complex in the training process, and reduces the influence of noise on the model. In addition, a local retention projection method is introduced under a self-walking learning framework, a neighborhood structure fixed in a sample point under a sample space is effectively retained, and meanwhile, an L1 norm constraint projection matrix is used as a regularization item to realize a feature selection process. And finally, performing fusion classification on the selected features by using a multi-core support vector machine, thereby improving the diagnosis precision of diseases. The method disclosed by the invention can effectively select and classify the features.

Description

Heterogeneous multi-modal image genetics data feature analysis method
Technical Field
The technical scheme of the invention relates to a method for recognizing graphs, in particular to a heterogeneous multi-modal image genetics data feature analysis method.
Background
Alzheimer's disease, also known as senile dementia, is a common degenerative disease of the brain, with manifestation symptoms such as memory impairment, reasoning cognitive dysfunction, language and motor impairment, which are one of the important diseases endangering the health of the elderly at present, and the course of the disease is slow and irreversible. Depending on the development of cognitive models and the extent of functional impairment, the onset of alzheimer's disease can be divided into three stages: normal control, mild cognitive dysfunction and alzheimer's disease. According to the pathogenesis of alzheimer's disease, early detection and effective treatment can delay the progression of the disease. Numerous studies have shown that alzheimer's disease is associated with atrophy of the structure, alterations in metabolism, pathological amyloid deposition of the brain. Commonly used related brain imaging includes structural magnetic resonance imaging, functional magnetic resonance imaging, diffusion tensor imaging, and positron emission tomography imaging. Meanwhile, with the development of genetic techniques, researchers can search for genetic markers associated with neurological and psychiatric diseases from a more refined molecular level (e.g., single nucleotide polymorphisms).
In recent years, with the continuous innovation of technological capabilities, more and more researches are being focused on the early diagnosis of alzheimer's disease, and since the brain has a very complex structure and function, the modality of acquiring data from a single brain cannot provide enough characteristic information to diagnose. In image genetics, the necessary complementary information can be provided between the different modalities, for example, structural magnetic resonance imaging provides information about brain tissue type, while positron emission tomography imaging measures glucose brain metabolism rate. Fusing multimodal data enables discovery of information that cannot be found in a single modality. In recent years, with the development of neuroimaging technology and genetics technology, multi-modal data can be collected in the acquisition process for various subject examinations, and a data source is provided for the diagnosis of alzheimer disease.
Heterogeneous multimodal imaging genetics data are high in dimensionality and contain a large amount of information, and not all features are helpful for detecting and analyzing alzheimer's disease. Therefore, it is important to remove redundant or low-relevance features from the large number of features provided by brain images and genetic data to select features relevant to the classification prediction task. CN109770932A discloses a method for processing multi-modal brain neuroimaging features, which performs feature analysis on multi-modal data by using sample weight and low-rank constraint multi-modal feature selection method. The method does not consider the 'difficulty degree' of the data, generalizes simple and general knowledge and complex specialized knowledge, randomly adds all data (including noise points or outliers) into training in the training process, and cannot effectively eliminate the influence of noise samples on the model. CN111462116A discloses a multimodal parameter model optimization fusion method based on imagery omics characteristics, which obtains low-dimensional imagery omics characteristics by gradient dimensionality reduction on high-dimensional imagery omics characteristics, and ignores data internal structure information in the dimensionality reduction process.
In summary, in the existing alzheimer diagnosis classification technology, the existing feature selection methods have the defects that the relationship between samples cannot be considered better, the classification of the alzheimer diagnosis is easy to be wrong, and the accuracy needs to be further improved.
Disclosure of Invention
The technical task of the invention is to provide a heterogeneous multi-modal image genetics data feature analysis method aiming at the defects, simultaneously consider the structural relationship among sample data and the 'difficulty degree' of the sample in the training process, carry out feature analysis on brain image data and gene data by adopting sample weighting and structure sparsification, take a feature weight matrix as a projection matrix in the dimension reduction process, and simultaneously constrain the feature weight matrix and the projection matrix by adopting sparsification. The method adopts a self-walking learning mechanism, realizes the automatic increase of the sample from simple to complex in the training process, and reduces the influence of noise on the model. In addition, a local retention projection method is introduced under a self-walking learning framework, a neighborhood structure fixed in a sample point under a sample space is effectively retained, and meanwhile, an L1 norm constraint projection matrix is used as a regularization item to realize a feature selection process. And finally, performing fusion classification on the selected features by using a multi-core support vector machine, thereby improving the diagnosis precision of diseases. The method disclosed by the invention can effectively select and classify the features.
Since the english language of "Self-learning" is "Self-processed learning", i.e. SPL, and the english language of "local Preserving projection" is "localization preceding projects", i.e. LPP, and the english language of "Structured sparse" is "Structured space", i.e. SS, the method for selecting genetic characteristics of heterogeneous multi-modal imagery according to the present invention may be hereinafter referred to as "SPLPS" for short.
The technical scheme adopted by the invention for solving the technical problem is as follows:
a characteristic analysis method for heterogeneous multi-modal image genetics data comprises the following steps:
acquiring data after heterogeneous multi-modal preprocessing of a certain type of brain disease sample, wherein the data comprises gene data and image data of different modalities, and acquiring data of each sample in each modality;
performing multi-modal combined feature selection on the data after the heterogeneous multi-modal preprocessing, wherein a feature selection target function is a formula (1):
Figure BDA0002762811500000021
Figure BDA0002762811500000022
in the formula (1), n is the number of samples, M is the number of modes,
Figure BDA0002762811500000023
representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth mode
Figure BDA0002762811500000024
d represents the dimension of the feature, yi represents the class label corresponding to the ith sample, and Y ═ Y1,…,yi,…,yn]TE to Rn represents label vectors corresponding to n samples, wm is a weight vector of the mth mode, and vm e to Rn is a weight vector of the mth mode self-stepping sample; lambda is a regularization parameter with sparse constraint characteristics, and mu is a regularization parameter associated with a constraint sample in a multi-mode manner;
Figure BDA0002762811500000025
wherein
Figure BDA0002762811500000026
kAs an auxiliary parameter, k>k>0, vi is the self-step sample weight vector of the ith sample; km is a weight matrix for describing the adjacent relation of sample points, and each element in the weight matrix
Figure BDA0002762811500000027
The neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection mode
Figure BDA0002762811500000028
If not, it indicates that there is k-neighborhood between the ith sample and the jth sample, otherwise, it indicates that there is no k-neighborhood between the ith sample and the jth sample,
alternately calculating variables wm and vm, and carrying out optimization solution on the objective function;
and selecting the characteristics corresponding to the weight vector wm with non-zero weight from the obtained solution, further determining the position of the diseased brain area and the related diseased gene, and completing the characteristic analysis of the heterogeneous multi-modal image genetics data.
The heterogeneous multi-modal image genetics data feature analysis method is characterized in that a heterogeneous multi-modal image genetics feature selection method of SPLPS is used for mining biomarkers, and then a multi-core support vector machine is used for fusion classification, and the method specifically comprises the following steps:
firstly, preprocessing heterogeneous multi-modal image genetics data:
step 1.1, preprocessing neuroimaging data:
comparing preprocessed isomorphic multimodality imaging data (voxel-based morphometry processed magnetic resonance image, fluorodeoxyglucose-positron emission tomography image, F-18 fluorescence amyloid-positron emission tomography (F-18 fluorescence amyloid-positron emission tomography can effectively display the neuroinflammatory plaque content in vivo.) with the same visit scan, and then serving as 2 x 2mm in the space of a standard Montreal institute of neurology (MNI)) to obtain the data3Voxels, which create normalized gray matter density, map according to magnetic resonance image data, register fluorodeoxyglucose-positron emission tomography and F-18 fluorescence amyloid-positron emission tomography to the same space through a statistical parameter mapping SPM software package, then measure 116 regions of interest, further extract fluorodeoxyglucose-positron emission tomography glucose metabolic rate, grey scale density of magnetic resonance images processed based on morphometry of voxels, and amyloid deposition characteristics of F-18 fluorescence positron emission tomography amyloid imaging, after removal of cerebellum, use the imaging measurements of 90 regions of interest for each homogeneous multimodal imaging as characteristics;
step 1.2, gene data preprocessing:
for gene data (single nucleotide polymorphism) from ADNI database which is preprocessed, APOE (located on chromosome 19) is used as a risk gene and is related to the development of neurons, plasticity of brain and repair, the ANNOVR annotation information is used for researching the single nucleotide polymorphism of the APOE gene boundary +/-20 kbp, wherein 85 single nucleotide polymorphism gene loci are included, and the value of the single nucleotide polymorphism adopts an additive coding mode of the number 0,1 and 2 of minimum alleles;
thus finishing the preprocessing of the heterogeneous multi-modal image genetics data;
secondly, performing feature analysis by using an SPLPS heterogeneous multi-modal feature selection method:
taking the data of each mode of each sample obtained in the first step as input, and performing multi-mode combined feature selection; the feature selection target formula is:
Figure BDA0002762811500000031
Figure BDA0002762811500000032
in the formula (1), n is the number of samples, M is the number of modes,
Figure BDA0002762811500000033
representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth mode
Figure BDA0002762811500000034
d represents the dimension of the feature, yiIndicates a class label corresponding to the ith sample, Y ═ Y1,…,yi,…,yn]T∈RnRepresenting label vectors corresponding to n samples, wmWeight vector for the m-th mode, vm∈RnFor the self-paced sample weight vector, each element in the matrix
Figure BDA0002762811500000035
Representing the adjacent relation between the m-th modal samples, and effectively preserving samples by adopting a local preserving projection methodNeighborhood structure of sample points in this space
Figure BDA0002762811500000036
If not, it represents that there is k-neighborhood between the ith sample and the jth sample, otherwise, it represents that there is no k-neighborhood between the ith sample and the jth sample, and it is described by the following formula:
Figure BDA0002762811500000037
in the formula (2), the parameter σ can be 1, K without loss of generalitymTo characterize the weight matrix of the sample point neighborhood,
Figure BDA0002762811500000038
Figure BDA0002762811500000039
lambda is a regularization parameter for constraint characteristic sparseness, mu is a regularization parameter associated with a constraint sample multi-mode, and therefore feature analysis is completed by using the heterogeneous multi-mode feature selection method of the SPLPS;
thirdly, optimizing an objective function and solving wmAnd vm: the objective function of the formula (1) in the second step is optimized, and can be solved by adopting alternative structure variables,
step 3.1, fix vmOptimizing wm: the objective function at this time is:
Figure BDA0002762811500000041
the first term of equation (3) is transformed as follows:
Figure BDA0002762811500000042
Figure BDA0002762811500000043
the first term of equation (3) can be converted into:
Figure BDA0002762811500000044
for the third term of formula (3), can be provided
Figure BDA0002762811500000045
Then equation (3) can be converted to
Figure BDA0002762811500000046
In the formula (5), the first and second groups,
Figure BDA0002762811500000051
is the hypergraph laplacian matrix for the mth mode,
Figure BDA0002762811500000052
at this point the target formula turns into:
Figure BDA0002762811500000053
defining a matrix Pm
Figure BDA0002762811500000054
Is a matrix PmDiagonal elements of (c):
Figure BDA0002762811500000055
Figure BDA0002762811500000056
is wmI.e. the weight vector of the m-th mode of the i-th sample, can be obtained
Figure BDA0002762811500000057
Figure BDA0002762811500000058
"2" and "λ" are both coefficients, and by combining the two together, and incorporating the coefficient "2" into the coefficient "λ", the objective function is transformed into:
Figure BDA0002762811500000059
by taking the derivative of equation (9) and making the derivative 0, it is obtained
Figure BDA00027628115000000510
Step 3.2 fix wmOptimization of vm
The objective function at this time is:
Figure BDA00027628115000000511
Figure BDA00027628115000000512
in the formula (11), the reaction mixture,
Figure BDA00027628115000000513
wherein
Figure BDA00027628115000000514
k 'is an auxiliary parameter, k'>k>0,
Formula (11) to viIs derived by
Figure BDA00027628115000000515
In the formula (12), limRepresents the loss function:
Figure BDA00027628115000000516
where i represents the ith sample and m represents the mth mode, v is obtained by the above equationiIs solved as
Figure BDA00027628115000000517
Thereby completing an alternate calculation of the variable wmAnd vmSolving;
fourthly, feature selection:
solving the objective function to select the non-zero characteristic of the corresponding weight;
fifthly, fusing a multi-core support vector machine:
step 5.1, respectively calculating a kernel matrix of each mode, wherein the linear kernel function of the mth mode is
Figure BDA0002762811500000061
Figure BDA0002762811500000062
Step 5.2 at [0,1 ]]Searching the fusion coefficient of each mode by using grids in the range, and finding out the fusion coefficient rho with the best classification effect by adopting a ten-fold cross validation methodm
Step 5.3, after the multi-modal kernel function is fused, obtaining
Figure BDA0002762811500000063
Therefore, the dual form of the multi-core support vector machine can be obtained;
Figure BDA0002762811500000064
Figure BDA0002762811500000065
αi≧ 0, i ═ 1,2, …, n (14), in formula (14), αiThe lagrange multiplier of the ith sample is used for completing the fusion training of the multi-core support vector machine;
sixthly, classifying and predicting:
the parameter alpha obtained by the training of the fifth stepiSubstituted into equation (15) below for a given new test sample x0The decision function for determining the sample label is defined as shown in equation (15),
Figure BDA0002762811500000066
in equation (15), sign () is a sign function, b is an offset, and f (x)0) Is the new test sample x0The predicted result of (2);
thus, feature selection is carried out by using the heterogeneous multi-modal image genetics data feature analysis method of the SPLPS, and the heterogeneous multi-modal image genetics features are classified by using a multi-core support vector machine method.
Compared with the prior art, the invention adopts the technical scheme that the prominent substantive features and remarkable progress of the invention are as follows:
(1) the method provided by the invention can simultaneously consider the structural relationship among sample data and the 'difficulty degree' of a sample in a training process, namely, a heterogeneous multi-modal image genetics feature selection method based on sample weighting and low-rank constraint is adopted to perform feature selection on multi-modal data, firstly, an L1 norm is utilized to constrain features, meanwhile, a local preserving projection method is adopted, a feature weight matrix is used as a projection matrix in a local preserving projection dimension reduction process, the neighborhood structure of a sample point in a sample space is effectively preserved, then, a self-learning mechanism is adopted, and the 'difficulty degree' of the sample is considered in the training process, so that the automatic growth of the sample is realized. The SPLPS feature selection method based on sample weight and low-rank constraint can simultaneously consider the difference (difficulty degree) of a sample point neighborhood structure and a sample in the feature selection process, judge whether to add a next iteration process by considering the sample difficulty degree (confidence degree), firstly select a simple sample with high confidence degree in the iteration process, then gradually add a difficult sample, avoid the influence of a noise point or an outlier on a model through a special training mode and an L1 regularization term, select a feature with strong discriminability, and achieve a better classification prediction effect.
(2) Compared with other feature selection methods, the invention adopts the SPLPS method, describes the high-order relationship among the samples by constructing the adjacent relationship of the sample points k, fully utilizes the prior distribution knowledge among the samples, fully utilizes the internal information of each modal data, retains the original neighborhood relationship among the samples, is beneficial to selecting the features with more discriminability and improves the accuracy of classification prediction.
(3) The method of the invention considers the 'difficulty degree' of the sample data in the training process, adopts the self-learning strategy to realize the selection process of the sample from 'simple' to 'complex', and can realize the automatic growth of the sample.
(4) The method not only reduces the influence of noise points or outlier points on the model by adopting a regularization term, but also eliminates some noise samples by adding the confidence coefficient of the samples, thereby improving the robustness of the model.
(5) CN109770932A discloses a method for processing multi-modal brain neuroimaging features, which performs feature analysis on multi-modal data by using sample weight and low-rank constraint multi-modal feature selection method. The method does not consider the 'difficulty degree' of the data, generalizes simple and general knowledge and complex specialized knowledge, randomly adds all data (including noise points or outliers) into training in the training process, and cannot effectively eliminate the influence of noise samples on the model. Compared with CN109770932A, the method judges whether the sample is added into the next iteration process by considering the sample confidence coefficient, firstly selects the 'simple' sample with high confidence coefficient in the iteration process, then gradually adds the 'difficult' sample, avoids the influence of noise points or outliers on the model by the training mode of sample self-growth and the regularization term, and ensures that the whole algorithm has more robustness.
(6) CN111462116A discloses a multimodal parameter model optimization fusion method based on imagery omics characteristics, which obtains low-dimensional imagery omics characteristics by gradient dimensionality reduction on high-dimensional imagery omics characteristics, and ignores data internal structure information in the dimensionality reduction process. Compared with CN111462116A, the method of the invention constructs the distance and near affinity relationship between each sample pair in the space by the local preserving projection method, and keeps the relationship in the projection, thus preserving the local neighborhood relationship of the samples in the space while reducing the dimension, and providing more abundant information.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic diagram of the processing flow of the multimodal imaging genetics data by the method of the present invention based on the SPLPS method and the multi-nuclear support vector machine.
Detailed Description
The embodiment shown in fig. 1 shows that the processing flow of the heterogeneous multi-modal image genetics data feature analysis based on the SPLPS feature selection method and the multi-core support vector machine in the method of the present invention is as follows: preprocessing heterogeneous multi-modal image genetic data → performing feature analysis by using the heterogeneous multi-modal feature selection method of SPLPS → optimizing an objective function and solving wmAnd vm→ feature selection → multi-core support vector machine fusion → classification and prediction.
Examples
The method for analyzing the characteristics of the heterogeneous multi-modal image genetics data in the embodiment is to excavate the biomarker by using a heterogeneous multi-modal image genetics characteristic selection method of the SPLPS, and then perform fusion classification by using a multi-core support vector machine, and comprises the following specific steps:
firstly, preprocessing heterogeneous multi-modal image genetics data:
step 1.1, preprocessing neuroimaging data:
comparing preprocessed isomorphic multimodality imaging data (voxel-based morphometry processed magnetic resonance image, fluorodeoxyglucose-positron emission tomography image, F-18 fluorescence amyloid-positron emission tomography (F-18 fluorescence amyloid-positron emission tomography can effectively display the neuroinflammatory plaque content in vivo.) with the same visit scan, and then serving as 2 x 2mm in the space of a standard Montreal institute of neurology (MNI)) to obtain the data3Voxel, we have created normalized gray matter density, map according to the magnetic resonance image data, and register fluorodeoxyglucose-positron emission tomography and F-18 fluorescence amyloid-positron emission tomography to the same space through the statistical parameter mapping SPM software package, then measure 116 areas of interest, further extract fluorodeoxyglucose-positron emission tomography glucose metabolic rate, the gray-scale density of the magnetic resonance image processed based on the morphometry of voxel and the amyloid deposition characteristics of F-18 fluorescence positron emission tomography amyloid imaging, after removing the cerebellum, use the imaging measured value of 90 areas of interest of each isomorphic multimodality imaging as the characteristics;
step 1.2, gene data preprocessing:
for gene data (single nucleotide polymorphism) from ADNI database which is preprocessed, APOE (located on chromosome 19) is used as a risk gene and is related to the development of neurons, plasticity of brain and repair, the ANNOVR annotation information is used for researching the single nucleotide polymorphism of the APOE gene boundary +/-20 kbp, wherein 85 single nucleotide polymorphism gene loci are included, and the value of the single nucleotide polymorphism adopts an additive coding mode of the number 0,1 and 2 of minimum alleles;
thus finishing the preprocessing of the heterogeneous multi-modal image genetics data;
secondly, performing feature analysis by using an SPLPS heterogeneous multi-modal feature selection method:
taking the data of each mode of each sample obtained in the first step as input, and performing multi-mode combined feature selection; the feature selection target formula is:
Figure BDA0002762811500000081
Figure BDA0002762811500000082
in the formula (1), n is the number of samples, M is the number of modes,
Figure BDA0002762811500000083
representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth mode
Figure BDA0002762811500000084
d represents the dimension of the feature, yiIndicates a class label corresponding to the ith sample, Y ═ Y1,…,yi,…,yn]T∈RnRepresenting label vectors corresponding to n samples, wnWeight vector for the m-th mode, vm∈RnFor the self-paced sample weight vector, each element in the matrix
Figure BDA0002762811500000085
The neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection method
Figure BDA0002762811500000086
If not, it indicates that there is k-neighborhood relationship between the ith sample and the jth sample, otherwise, it indicates that there is no k-neighborhood relationship between the ith sample and the jth sample (k-neighborhood describes the structural relationship between sample points in the feature space, where k is a constant, and means to find out the k sample points closest to the sample points in euclidean distance), and it is described by the following formula:
Figure BDA0002762811500000087
in the formula (2), the parameter σ can be 1, K without loss of generalitymTo characterize the weight matrix of the sample point neighborhood,
Figure BDA0002762811500000088
Figure BDA0002762811500000089
lambda is a regularization parameter for constraint characteristic sparseness, mu is a regularization parameter for constraint sample multi-modal association, thereby completing the characteristic analysis by using the heterogeneous multi-modal characteristic selection method of the SPLPS,
thirdly, optimizing an objective function and solving wmAnd vm: the objective function of the formula (1) in the second step is optimized, and can be solved by adopting alternative structure variables,
step 3.1, fix vmOptimizing wm: the objective function at this time is:
Figure BDA0002762811500000091
the first term of equation (3) is transformed as follows:
Figure BDA0002762811500000092
Figure BDA0002762811500000093
the first term of equation (3) can be converted into:
Figure BDA0002762811500000094
for the third term of formula (3), can be provided
Figure BDA0002762811500000095
Then equation (3) can be converted to
Figure BDA0002762811500000096
In the formula (5), the first and second groups,
Figure BDA0002762811500000101
is DmThe ith row and the ith column of (1),
Figure BDA0002762811500000102
is the hypergraph laplacian matrix for the mth mode,
Figure BDA0002762811500000103
at this point the target formula turns into:
Figure BDA0002762811500000104
a matrix P is defined which is,
Figure BDA0002762811500000105
is a matrix PmDiagonal elements of (c):
Figure BDA0002762811500000106
Figure BDA0002762811500000107
is wmLine i of (1), can be
Figure BDA0002762811500000108
Here, "2" and "λ "are both coefficients, and the two can be merged together, and the coefficient" 2 "is included in the coefficient" λ ", then the objective function is transformed into:
Figure BDA0002762811500000109
by taking the derivative of equation (9) and making the derivative 0, it is obtained
Figure BDA00027628115000001010
Step 3.2 fix wmOptimization of vm
The objective function at this time is:
Figure BDA00027628115000001011
Figure BDA00027628115000001012
in the formula (11), the reaction mixture,
Figure BDA00027628115000001013
wherein
Figure BDA00027628115000001014
k 'is an auxiliary parameter, k'>k>0,
Formula (11) to viIs derived by
Figure BDA00027628115000001015
In the formula (12), l is a loss function matrix, limRepresents the loss function:
Figure BDA00027628115000001016
where i represents the ith sample and m represents the mth mode, the formulaGet viIs solved as
Figure BDA00027628115000001017
Thereby completing an alternate calculation of the variable wmAnd vmSolving; in this example, M is 4, n is 371, w in neuroimaging modalitymInitialization to a random vector of size 116X 1, w in the gene modalitymInitializing to a random vector of size 85 × 1; setting k adjacent to k as 5 and sigma as 1; after optimization lambda is 10-1,μ=10;
Fourthly, feature selection:
solving the objective function to select the non-zero characteristic of the corresponding weight;
fifthly, fusing a multi-core support vector machine:
step 5.1, respectively calculating a kernel matrix of each mode, wherein the linear kernel function of the mth mode is
Figure BDA0002762811500000111
Figure BDA0002762811500000112
Step 5.2 at [0,1 ]]Searching the fusion coefficient of each mode by using grids in the range, and finding out the fusion coefficient rho with the best classification effect by adopting a ten-fold cross validation methodm
Step 5.3, after the multi-modal kernel function is fused, obtaining
Figure BDA0002762811500000113
Therefore, the dual form of the multi-core support vector machine can be obtained;
Figure BDA0002762811500000114
Figure BDA0002762811500000115
αi≥0,i=1,2,…,n (14),
in formula (14), αiThe lagrange multiplier of the ith sample is used for completing the fusion training of the multi-core support vector machine;
sixthly, classifying and predicting:
the parameter alpha obtained by the training of the fifth stepiSubstituted into equation (15) below for a given new test sample x0The decision function for determining the sample label is defined as shown in equation (15),
Figure BDA0002762811500000116
in equation (15), sign () is a sign function, b is an offset, and f (x)0) Is the new test sample x0The predicted result of (2);
thus, feature selection is carried out by using the heterogeneous multi-modal image genetics data feature analysis method of the SPLPS, and the heterogeneous multi-modal image genetics features are classified by using a multi-core support vector machine method.
In the embodiment, when a weight matrix of the neighborhood of the sample points is constructed, the selection of k values in k neighborhood is important, the structural relationship of the sample points is not represented if the k values are too small, and different classes may be contained in the represented k neighborhood if the k values are too large, so that the result is affected. In the embodiment, the classification precision reaches more than 95%.
The invention fully analyzes the importance of the sample to the classification model and the relationship between the neighbor number of the balance sample and the classification model, weights the sample through self-learning (namely, a self-sample weight vector v is introduced), sequences the sample according to the confidence coefficient in the iteration process, firstly selects the simple sample with high confidence coefficient, namely the sample with small loss function value, and then gradually adds the difficult sampleThis, while selecting the sample, solves for the sample weight vmAnd through experimental verification of k adjacent to different k values in local maintenance projection, the optimal k value is selected to remarkably improve the position of a characteristic lesion brain area and the mining precision of related lesion genes, and improve the precision of classification prediction.
Nothing in this specification is said to apply to the prior art.

Claims (5)

1. A characteristic analysis method for heterogeneous multi-modal image genetics data is characterized by comprising the following steps:
acquiring data after heterogeneous multi-modal preprocessing of a certain type of brain disease sample, wherein the data comprises gene data and image data of different modalities, and acquiring data of each sample in each modality;
performing multi-modal combined feature selection on the data after the heterogeneous multi-modal preprocessing, wherein a feature selection target function is a formula (1):
Figure FDA0002762811490000011
Figure FDA0002762811490000012
in the formula (1), n is the number of samples, M is the number of modes,
Figure FDA0002762811490000013
representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth mode
Figure FDA0002762811490000014
d represents the dimension of the feature, yiIndicates a class label corresponding to the ith sample, Y ═ Y1,…,yi,…,yn]T∈RnRepresenting label vectors corresponding to n samples, wmWeight vector for the m-th mode, vm∈RnA self-stepping sample weight vector for the mth modality; lambda is a regularization parameter with sparse constraint characteristics, and mu is a regularization parameter associated with a constraint sample in a multi-mode manner;
Figure FDA0002762811490000015
wherein
Figure FDA0002762811490000016
k 'is auxiliary parameter, k' > k > 0, viA self-step sample weight vector for the ith sample; kmIn order to describe the weight matrix of the sample point proximity relation, each element in the weight matrix
Figure FDA0002762811490000017
The neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection mode
Figure FDA0002762811490000018
If not, it indicates that there is k-neighborhood between the ith sample and the jth sample, otherwise, it indicates that there is no k-neighborhood between the ith sample and the jth sample,
alternative calculation of variable wmAnd vmOptimizing and solving the objective function;
selecting a weight vector w with non-zero weight from the obtained solutionmAnd determining the position of the diseased brain area and related diseased genes according to the corresponding characteristics to complete the characteristic analysis of the heterogeneous multi-modal image genetics data.
2. The analytical method of claim 1, wherein the multi-modality image data comprises voxel-based morphometry processed magnetic resonance images, fluorodeoxyglucose-positron emission tomography images, F-18 fluorescence amyloid-positron emission tomography images; the gene data included gene data from the ADNI database (single nucleotide polymorphisms) and APOE.
3. A diagnostic method for brain diseases, characterized in that the analysis method according to claim 1 is used to mine biomarkers to obtain feature vectors, and the heterogeneous multi-modal feature vectors obtained after the sample labels and feature selection are input to a multi-core support vector machine to perform classification prediction.
4. A heterogeneous multi-modal image genetics data feature analysis method is characterized in that a heterogeneous multi-modal image genetics feature selection method of SPLPS is used for mining biomarkers, and then a multi-core support vector machine is used for fusion classification, and the method comprises the following specific steps:
firstly, preprocessing heterogeneous multi-modal image genetics data:
step 1.1, preprocessing neuroimaging data:
the preprocessed isomorphic multimodality imaging data (voxel-based morphometry processed magnetic resonance imaging, fluorodeoxyglucose-positron emission tomography imaging, F-18 fluorescence amyloid-positron emission tomography) are compared to the same access scan and then treated as 2X 2mm in standard Montreal institute of neurology (MNI) space3Voxels, which create normalized gray matter density, map according to magnetic resonance image data, register fluorodeoxyglucose-positron emission tomography and F-18 fluorescence amyloid-positron emission tomography to the same space through a statistical parameter mapping SPM software package, then measure 116 regions of interest, further extract fluorodeoxyglucose-positron emission tomography glucose metabolic rate, grey scale density of magnetic resonance images processed based on morphometry of voxels, and amyloid deposition characteristics of F-18 fluorescence positron emission tomography amyloid imaging, after removal of cerebellum, use the imaging measurements of 90 regions of interest for each homogeneous multimodal imaging as characteristics;
step 1.2, gene data preprocessing:
for the gene data (single nucleotide polymorphism) from ADNI database that will be pretreated, APOE (located on chromosome 19) and regarded as the risk gene and neuronal development, plasticity of brain and repair are correlated with, study the single nucleotide polymorphism of APOE gene boundary + -20 kbp through ANNOVR annotation information, wherein include 85 single nucleotide polymorphism gene loci, the value of the single nucleotide polymorphism adopts the number of the minimum allele 0,1, additive code mode of 2;
thus finishing the preprocessing of the heterogeneous multi-modal image genetics data;
secondly, performing feature analysis by using an SPLPS heterogeneous multi-modal feature selection method:
taking the data of each mode of each sample obtained in the first step as input, and performing multi-mode combined feature selection; the feature selection objective function formula is:
Figure FDA0002762811490000021
Figure FDA0002762811490000022
in the formula (1), n is the number of samples, M is the number of modes,
Figure FDA0002762811490000023
representing the characteristic column vector corresponding to the mth mode of the ith sample, and giving a training set of the mth mode
Figure FDA0002762811490000024
d represents the dimension of the feature, yiIndicates a class label corresponding to the ith sample, Y ═ Y1,…,yi,…,yn]T∈RnRepresenting label vectors corresponding to n samples, wmWeight vector for the m-th mode, vm∈RnIn order to be a self-stepping sample weight vector,
Figure FDA0002762811490000025
Figure FDA0002762811490000026
lambda is a regularization parameter with sparse constraint characteristics, and mu is a regularization parameter associated with a constraint sample in a multi-mode manner;
each element in the weight matrix
Figure FDA0002762811490000027
The neighborhood relationship among the m-th modal samples is represented, and the neighborhood structure of the sample points in the sample space is effectively reserved by adopting a local reservation projection mode
Figure FDA0002762811490000028
If not, it represents that there is k-neighborhood between the ith sample and the jth sample, otherwise, it represents that there is no k-neighborhood between the ith sample and the jth sample, and it is described by the following formula:
Figure FDA0002762811490000029
in the formula (2), σ is a constant, KmTo characterize the weight matrix of the sample point neighborhood,
Figure FDA00027628114900000210
Figure FDA00027628114900000211
thereby completing the feature analysis by using the heterogeneous multi-modal feature selection method of the SPLPS;
thirdly, optimizing an objective function and solving wmAnd vm: optimizing the objective function of the formula (1) in the second step, solving by adopting alternative structure variables,
step 3.1, fix vmOptimizing wm: the objective function at this time is:
Figure FDA0002762811490000031
for the first term of equation (3), define:
Figure FDA0002762811490000032
Figure FDA0002762811490000033
the first term of equation (3) translates to:
Figure FDA0002762811490000034
for the third term of formula (3), let
Figure FDA0002762811490000035
The third term of formula (3) is converted into
Figure FDA0002762811490000036
In the formula (5), the first and second groups,
Figure FDA0002762811490000037
is DmThe ith row and the ith column of (1),
Figure FDA0002762811490000038
is the hypergraph laplacian matrix for the mth mode,
Figure FDA0002762811490000039
for the second term of equation (3), a matrix P is definedm
Figure FDA00027628114900000310
Is a matrix PmDiagonal elements of (c):
Figure FDA00027628114900000311
Figure FDA0002762811490000041
is wmLine i of
Figure FDA0002762811490000042
"2" and "λ" are both coefficients, and when the two are combined together and the coefficient "2" is included in the coefficient "λ", the objective function optimized by equation (3) is expressed by equation (9):
Figure FDA0002762811490000043
the derivative of equation (9) is derived and made 0 to obtain
Figure FDA0002762811490000044
Step 3.2 fix wmOptimization of vm
The objective function at this time is:
Figure FDA0002762811490000045
Figure FDA0002762811490000046
in the formula (11), the reaction mixture,
Figure FDA0002762811490000047
wherein
Figure FDA0002762811490000048
k 'is an auxiliary parameter, k' > k > 0,
formula (11) to viIs derived by
Figure FDA0002762811490000049
In the formula (12), limRepresents the loss function:
Figure FDA00027628114900000410
where i represents the ith sample and m represents the mth mode, then v is obtained by the above formulaiIs solved as
Figure FDA00027628114900000411
Thereby completing an alternate calculation of the variable wmAnd vmSolving;
fourthly, feature selection:
solving the objective function to select the non-zero characteristic of the corresponding weight;
fifthly, fusing a multi-core support vector machine:
step 5.1, respectively calculating a kernel matrix of each mode, wherein the linear kernel function of the mth mode is
Figure FDA00027628114900000412
Figure FDA00027628114900000413
Step 5.2 at [0,1 ]]Search for fusion system of each mode in range by using gridCounting, and finding out fusion coefficient rho with best classification effect by adopting a ten-fold cross validation methodm
Step 5.3, after the multi-modal kernel function is fused, obtaining
Figure FDA00027628114900000414
Thus obtaining a dual form of the multi-core support vector machine;
Figure FDA0002762811490000051
Figure FDA0002762811490000052
αi≥0,i=1,2,…,n (14),
in formula (14), αiThe lagrange multiplier of the ith sample is used for completing the fusion training of the multi-core support vector machine;
sixthly, classifying and predicting:
the parameter alpha obtained by the training of the fifth stepiSubstituted into equation (15) below for a given new test sample x0The decision function for determining the sample label is defined as shown in equation (15),
Figure FDA0002762811490000053
in equation (15), sign () is a sign function, b is an offset, and f (x)0) Is the new test sample x0The predicted result of (2);
thus, feature selection is carried out by using the heterogeneous multi-modal image genetics data feature analysis method of the SPLPS, and the heterogeneous multi-modal image genetics features are classified by using a multi-core support vector machine method.
5. The method of claim 4A feature analysis method characterized in that k is 5, σ is 1, and M is 4; after optimization lambda is 10-1,μ=10。
CN202011223328.1A 2020-11-05 2020-11-05 Heterogeneous multi-modal image genetics data feature analysis method Active CN112288027B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011223328.1A CN112288027B (en) 2020-11-05 2020-11-05 Heterogeneous multi-modal image genetics data feature analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011223328.1A CN112288027B (en) 2020-11-05 2020-11-05 Heterogeneous multi-modal image genetics data feature analysis method

Publications (2)

Publication Number Publication Date
CN112288027A true CN112288027A (en) 2021-01-29
CN112288027B CN112288027B (en) 2022-05-03

Family

ID=74350529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011223328.1A Active CN112288027B (en) 2020-11-05 2020-11-05 Heterogeneous multi-modal image genetics data feature analysis method

Country Status (1)

Country Link
CN (1) CN112288027B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113724863A (en) * 2021-09-08 2021-11-30 山东建筑大学 Automatic discrimination system, storage medium and equipment for autism spectrum disorder
CN114580497A (en) * 2022-01-26 2022-06-03 南京航空航天大学 Method for analyzing influence of genes on multi-modal brain image phenotype
CN114820460A (en) * 2022-04-02 2022-07-29 南京航空航天大学 Method and device for analyzing correlation of single gene locus and time sequence brain image

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105957047A (en) * 2016-05-06 2016-09-21 中国科学院自动化研究所 Supervised multimodal brain image fusion method
US20170249547A1 (en) * 2016-02-26 2017-08-31 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Holistic Extraction of Features from Neural Networks
WO2017190337A1 (en) * 2016-05-06 2017-11-09 中国科学院自动化研究所 Supervised multi-modality brain image fusion method
CN109770932A (en) * 2019-02-21 2019-05-21 河北工业大学 The processing method of multi-modal brain neuroblastoma image feature
CN106250914B (en) * 2016-07-22 2019-07-09 华侨大学 Multi-modal data Feature Selection and classification method based on the sparse Multiple Kernel Learning of structure
CN110009049A (en) * 2019-04-10 2019-07-12 江南大学 It is a kind of based on from step tied mechanism can supervision image classification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170249547A1 (en) * 2016-02-26 2017-08-31 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Holistic Extraction of Features from Neural Networks
CN105957047A (en) * 2016-05-06 2016-09-21 中国科学院自动化研究所 Supervised multimodal brain image fusion method
WO2017190337A1 (en) * 2016-05-06 2017-11-09 中国科学院自动化研究所 Supervised multi-modality brain image fusion method
CN106250914B (en) * 2016-07-22 2019-07-09 华侨大学 Multi-modal data Feature Selection and classification method based on the sparse Multiple Kernel Learning of structure
CN109770932A (en) * 2019-02-21 2019-05-21 河北工业大学 The processing method of multi-modal brain neuroblastoma image feature
CN110009049A (en) * 2019-04-10 2019-07-12 江南大学 It is a kind of based on from step tied mechanism can supervision image classification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HONGCHENG LIU ET AL: "Folded concave penalized learning in identifying multimodal MRI", 《JOURNAL OF NEUROSCIENCE METHODS》 *
彭瑶 等: "基于超图的多模态特征选择算法及其应用", 《计算机科学与探索》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113724863A (en) * 2021-09-08 2021-11-30 山东建筑大学 Automatic discrimination system, storage medium and equipment for autism spectrum disorder
CN114580497A (en) * 2022-01-26 2022-06-03 南京航空航天大学 Method for analyzing influence of genes on multi-modal brain image phenotype
CN114580497B (en) * 2022-01-26 2023-07-11 南京航空航天大学 Method for analyzing influence of genes on multimodal brain image phenotype
CN114820460A (en) * 2022-04-02 2022-07-29 南京航空航天大学 Method and device for analyzing correlation of single gene locus and time sequence brain image
CN114820460B (en) * 2022-04-02 2023-09-29 南京航空航天大学 Method and device for correlation analysis of single gene locus and time sequence brain image

Also Published As

Publication number Publication date
CN112288027B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN112288027B (en) Heterogeneous multi-modal image genetics data feature analysis method
Weiner et al. 2014 Update of the Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception
CN111488914B (en) Alzheimer disease classification and prediction system based on multitask learning
Iqbal et al. Developing a brain atlas through deep learning
CA3125883C (en) Grading of structures for state determination
Kostro et al. Correction of inter-scanner and within-subject variance in structural MRI based automated diagnosing
CN111063442B (en) Brain disease process prediction method and system based on weak supervision multitask matrix completion
Platero et al. Longitudinal neuroimaging hippocampal markers for diagnosing Alzheimer’s disease
Rahaman et al. Multi-modal deep learning of functional and structural neuroimaging and genomic data to predict mental illness
Wang et al. Applications of generative adversarial networks in neuroimaging and clinical neuroscience
Singh et al. Genetic, structural and functional imaging biomarkers for early detection of conversion from MCI to AD
CN114359642A (en) Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer
Alkabawi et al. Computer-aided classification of multi-types of dementia via convolutional neural networks
Liu et al. Volumetric segmentation of white matter tracts with label embedding
Du et al. Fast multi-task SCCA learning with feature selection for multi-modal brain imaging genetics
Yang et al. Diagnosis of Parkinson’s disease based on 3D ResNet: The frontal lobe is crucial
Ong et al. Detection of subtle white matter lesions in MRI through texture feature extraction and boundary delineation using an embedded clustering strategy
Wang et al. Joint learning framework of cross-modal synthesis and diagnosis for alzheimer’s disease by mining underlying shared modality information
CN114202075A (en) Guided multi-mode image genetics data feature analysis method
Xu et al. Role of hippocampal subfields in neurodegenerative disease progression analyzed with a multi-scale attention-based network
Filipovych et al. A composite multivariate polygenic and neuroimaging score for prediction of conversion to Alzheimer's disease
Gu et al. Autism spectrum disorder diagnosis using the relational graph attention network
Wang et al. Identifying biomarkers of Alzheimer’s disease via a novel structured sparse canonical correlation analysis approach
CN114187962A (en) Nonlinear correlation analysis method based on joint structure constraint and incomplete multi-modal data
Hett et al. Patch-Based abnormality maps for improved deep learning-based classification of Huntington’s disease

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant