CN112884754A - Multi-modal Alzheimer's disease medical image recognition and classification method and system - Google Patents

Multi-modal Alzheimer's disease medical image recognition and classification method and system Download PDF

Info

Publication number
CN112884754A
CN112884754A CN202110265610.4A CN202110265610A CN112884754A CN 112884754 A CN112884754 A CN 112884754A CN 202110265610 A CN202110265610 A CN 202110265610A CN 112884754 A CN112884754 A CN 112884754A
Authority
CN
China
Prior art keywords
data
snp
classifier
alzheimer
classifiers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110265610.4A
Other languages
Chinese (zh)
Inventor
曾安
陈国斌
潘丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110265610.4A priority Critical patent/CN112884754A/en
Publication of CN112884754A publication Critical patent/CN112884754A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Computing Systems (AREA)
  • Genetics & Genomics (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a system for recognizing and classifying multi-modal medical images of Alzheimer's disease, two modal data of medical images and genomics are combined, the diagnosis of the Alzheimer's disease is more accurate and reliable by reading the image data and combining genome association analysis and utilizing the combination of the image and the gene data, and the technical problems that the multi-modal fusion effect of the image data and the genetic data in the medical diagnosis of the existing Alzheimer's disease is poor, and the recognition and classification accuracy of people in different stages of the Alzheimer's disease are influenced are solved.

Description

Multi-modal Alzheimer's disease medical image recognition and classification method and system
Technical Field
The application relates to the technical field of medical image analysis, in particular to a method and a system for recognizing and classifying multi-modal medical images of Alzheimer's disease.
Background
Alzheimer's Disease (AD) is a progressive degenerative disease of the nervous system with occult onset. Clinically, it is characterized by generalized dementia such as memory impairment, aphasia, disuse, agnosia, impairment of visual-spatial skills, dysfunction in executive functioning, and personality and behavioral changes. Patients who are older than 65 years are called presenile dementia; the patient after 65 years old is called senile dementia.
People groups of people in different stages of the Alzheimer disease are classified, so that the early-stage people of the Alzheimer disease can be identified, effective gene information can be obtained, and assistance can be provided for prevention and diagnosis of patients with the early-stage Alzheimer disease. The traditional multi-mode fusion effect of image data and genetic data in the medical diagnosis of the Alzheimer's disease is poor, and the effect is mainly reflected in that: the features extracted from the medical image after the current preprocessing have a plurality of features which have no effect on the classification of the crowd, so that the classification accuracy is influenced; nowadays, in the aspect of using SNP (single nucleotide polymorphism) data for alzheimer's disease diagnosis, usually, manually selecting SNP data of a gene related to a disease condition, however, there is a possibility that the manually selected SNP data is missed in the selection process, and many SNPs related to diseases are not recorded, and the calculation complexity of the SNP data is high. Therefore, it is still a technical problem to be solved by those skilled in the art to improve the multi-modal fusion effect of image data and genetic data in medical diagnosis of alzheimer's disease and further improve the recognition and classification accuracy of people in different stages of alzheimer's disease.
Disclosure of Invention
The application provides a method and a system for recognizing and classifying multi-modal medical images of Alzheimer's disease, which are used for solving the technical problems that the multi-modal fusion effect of image data and genetic data in the existing medical diagnosis of Alzheimer's disease is poor, and the recognition and classification accuracy of people in different stages of Alzheimer's disease is influenced.
In view of the above, a first aspect of the present application provides a method for recognizing and classifying multi-modal alzheimer medical images, including:
constructing medical databases of different populations of Alzheimer's disease, wherein the medical databases comprise coronal MRI image data and gene SNP data;
after image preprocessing is carried out on the MRI image data, a CNN (convolutional neural network) is used for constructing classifiers, and at least three optimal classifiers are selected as high-quality MRI-based classifiers;
preprocessing the gene SNP data by using a GWAS whole genome association analysis method to obtain a coded SNP locus data set;
constructing classifiers by using a decision tree as a base classifier and using three integration strategies of a random forest classifier, a Bagging classifier and an XGboost classifier to obtain three SNP base classifiers;
performing ensemble learning on all the high-quality MRI-based classifiers and the SNP-based classifier based on an improved probability weight ensemble learning mode to obtain a final enhanced version classifier;
and performing multi-modal Alzheimer's disease medical image recognition classification by using the enhanced classifier.
Optionally, the preprocessing the gene SNP data using GWAS genome-wide association analysis to obtain an encoded SNP locus data set, includes:
performing GWAS whole genome association analysis on the gene SNP data by using PLINK software, wherein the GWAS whole genome association analysis comprises the following steps: screening gene SNP data according to site deletion rate, screening gene SNP data according to site information deletion rate, screening gene SNP data according to Hardy-Weinberg balance, screening gene SNP data according to linkage imbalance, screening gene SNP data according to individual independence, analyzing by using a Logistic regression model to obtain the related significance p value of each SNP and phenotype, selecting SNP with high relevance according to the p value to encode, and forming an encoded SNP site data set.
Optionally, the image pre-processing the MRI image data comprises:
performing skull removal and registration processing on the MRI image data;
smoothing the MRI image data;
performing gray scale normalization on the MRI image data;
two-dimensional slicing is performed on the MRI image data.
Optionally, the MRI image data is image pre-processed using SPM12 software.
Optionally, the ensemble learning mode based on the improved probability weights is:
p(x)=sigmoid(w1)p(x|h1)+sigmoid(w2)p(x|h2)+···sigmoid(wn)p(x|hn)
wherein n is the number of classifiers, sigmoid () is an activation function, w is a performance index of the classifier, p is the probability of the current classifier, and h is the number of network layers.
The second aspect of the present application provides a multimodal alzheimer's disease medical image recognition and classification system, comprising:
the data module is used for constructing medical databases of different populations of Alzheimer's disease, and the medical databases comprise coronal MRI image data and gene SNP data;
the MRI image processing module is used for preprocessing the MRI image data, constructing classifiers by using CNN (convolutional neural network), and selecting at least three optimal classifiers as high-quality MRI-based classifiers;
the first gene data processing module is used for preprocessing the gene SNP data by using a GWAS whole genome association analysis method to obtain an encoded SNP locus data set;
the second gene data processing module is used for constructing classifiers by using a decision tree as a base classifier and using three integration strategies of a random forest classifier, a Bagging classifier and an XGboost classifier to obtain three SNP base classifiers;
the ensemble learning reinforcement module is used for carrying out ensemble learning on all the high-quality MRI-based classifiers and the SNP-based classifiers based on an improved probability weight ensemble learning mode to obtain a final reinforcement version classifier;
and the recognition and classification module is used for performing multi-modal Alzheimer disease medical image recognition and classification by using the enhanced classifier.
Optionally, the first genetic data processing module is specifically configured to:
performing GWAS whole genome association analysis on the gene SNP data by using PLINK software, wherein the GWAS whole genome association analysis comprises the following steps: screening gene SNP data according to site deletion rate, screening gene SNP data according to site information deletion rate, screening gene SNP data according to Hardy-Weinberg balance, screening gene SNP data according to linkage imbalance, screening gene SNP data according to individual independence, analyzing by using a Logistic regression model to obtain the related significance p value of each SNP and phenotype, selecting SNP with high relevance according to the p value to encode, and forming an encoded SNP site data set.
Optionally, the image pre-processing the MRI image data comprises:
performing skull removal and registration processing on the MRI image data;
smoothing the MRI image data;
performing gray scale normalization on the MRI image data;
two-dimensional slicing is performed on the MRI image data.
Optionally, the MRI image data is image pre-processed using SPM12 software.
Optionally, the ensemble learning mode based on the improved probability weights is:
p(x)=sigmoid(w1)p(x|h1)+sigmoid(w2)p(x|h2)+···sigmoid(wn)p(x|hn)
wherein n is the number of classifiers, sigmoid () is an activation function, w is a performance index of the classifier, p is the probability of the current classifier, and h is the number of network layers.
According to the technical scheme, the embodiment of the application has the following advantages:
the application provides a multi-modal Alzheimer's disease medical image recognition and classification method, which comprises the following steps: constructing medical databases of different populations of Alzheimer's disease, wherein the medical databases comprise coronal MRI image data and gene SNP data; after image preprocessing is carried out on the MRI image data, a CNN (convolutional neural network) is used for constructing classifiers, and at least three optimal classifiers are selected as high-quality MRI-based classifiers; preprocessing the gene SNP data by using a GWAS whole genome association analysis method to obtain a coded SNP locus data set; constructing classifiers by using a decision tree as a base classifier and using three integration strategies of a random forest classifier, a Bagging classifier and an XGboost classifier to obtain three SNP base classifiers; performing ensemble learning on all the high-quality MRI-based classifiers and the SNP-based classifier based on an improved probability weight ensemble learning mode to obtain a final enhanced version classifier; and performing multi-modal Alzheimer's disease medical image recognition classification by using the enhanced classifier.
The method comprises the steps of training a group of base classifiers by using a deep convolutional neural network for each MRI image in a two-dimensional space, then selecting at least three image slice classifiers with classification effects on disease groups as the base classifiers of the image classifiers during integration, ensuring that the selected slices have certain coincidence with clinical manifestations of diseases, and simultaneously enabling the image classifiers during integration to have diversity, wherein the performance of integrated learning is not only related to the performance of the classifiers but also related to the diversity of the classifiers, and the plurality of image classifiers are used for integration better than the single image classifier in integration effect.
In the application, GWAS (genome wide association analysis) is used for preprocessing genome data, and GWAS is used for analyzing the association between SNP sites and phenotypes, so that SNP related to the phenotypes is screened out, and the phenotypes can be selected in two modes, wherein the first mode is a linear phenotype, such as height, weight, intelligence and the like; the second is a binary phenotype, such as diseased and unaffected, known as case and control, with 0 diseased and 1 unaffected. The GWAS is used for analyzing and reducing the dimensionality of the SNP data, so that the complexity of calculation is greatly reduced, the identification error caused by redundant information is reduced, and the identification precision is improved; in addition, in order to improve the performance of SNP data classification, the invention uses various integration strategies to construct the SNP classifier, so that the classification performance is improved on one hand, and the diversity of the SNP classifier is improved on the other hand.
The method combines two modal data of medical images and genomics, combines genome correlation analysis by reading image data and combining the image data with gene data to enable the diagnosis of the Alzheimer's disease to be more accurate and reliable, and solves the technical problems that the multi-modal fusion effect of the image data and the genetic data in the medical diagnosis of the existing Alzheimer's disease is poor, and the recognition and classification accuracy of people in different stages of the Alzheimer's disease is influenced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other related drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a multi-modal alzheimer medical image recognition and classification method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an MRI image data preprocessing process in an embodiment of the present application;
fig. 3 is a schematic flowchart of constructing a classifier using CNN in the embodiment of the present application;
FIG. 4 is a schematic diagram showing a process of preprocessing gene data in the examples of the present application;
FIG. 5 is a schematic diagram of a process of constructing a classifier by the SNP classifier model in the embodiment of the present application;
fig. 6 is a schematic diagram of an ensemble learning process in the embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example 1
For easy understanding, please refer to fig. 1, the present application provides an embodiment of a method for recognizing and classifying multi-modal alzheimer's disease medical images, comprising:
step 101, constructing medical databases of different populations of Alzheimer's disease, wherein the medical databases comprise coronal MRI image data and gene SNP data.
The invention relates to multi-modal ensemble learning, which needs to combine two modal data of medical images and genomics, so that a stable and reliable medical image and SNP data database of different populations of Alzheimer's disease containing coronary MRI image data and gene SNP data needs to be constructed in advance.
And 102, after image preprocessing is carried out on the MRI image data, a CNN (convolutional neural network) is used for constructing classifiers, and at least three optimal classifiers are selected as high-quality MRI-based classifiers.
Acquiring coronal MRI image data from a medical database, and preprocessing the coronal MRI image data as shown in fig. 2, where the preprocessing process may be performed by SPM12 software, and is intended to normalize the original image and appropriately reduce noise processing, so as to facilitate implementation of subsequent medical image classification, including:
1. firstly, noise and the influence of non-brain tissue structures are removed through operations such as head movement correction and skull stripping, and then all tested structural images are subjected to spatial standardization to register different tested MRI images to a uniform coordinate space so as to eliminate differences among individuals.
2. And then, the obtained result image is subjected to Gaussian smoothing to remove the influence of noise on the image, so that the data is more closely similar to positive distribution, and the effectiveness of parameter detection is increased.
3. And carrying out gray level normalization on the image.
4. Two-dimensional slicing is performed.
After image preprocessing is performed on MRI image data, a CNN (convolutional neural network) construction classifier is used, at least three optimal classifiers are selected as high-quality MRI-based classifiers, as shown in FIG. 3, an adopted CNN model structure is composed of 6 convolutional layers (conv in FIG. 3), 3 pooling layers (pool in FIG. 3) and 3 full-link layers (FC in FIG. 3), the last full-link layer is only provided with two nodes, and a softmax function is adopted to realize two classifications. And (3) training 40 epochs by each CNN-based classifier, wherein after testing, the 40 epochs are enough to make the base classifier converge, so that the classification accuracy of the base classifier on the original slices of the training set reaches 100%. The ReLU function was used for all convolutional layer activation functions, Adam was used for the gradient update algorithm, the learning rate was set to 0.0001, and the number of input slices per batch (blocksize) was set to 200.
And 103, preprocessing gene SNP data by using a GWAS whole genome association analysis method to obtain a coded SNP locus data set.
As shown in FIG. 4, FIG. 4 shows the preprocessing process of gene data, which can be performed by using PLINK software to perform GWAS whole genome association analysis, and the obtained SNP site data is encoded into 0, 1, 2(AA-0, AA-1, AA-2). The process is as follows:
(1) screening according to heterozygosity
In the genotype data, every two characters represent the genotype of one SNP, for example, GGGCAATA contains the genotypes of four SNPs, namely GG, GC, AA and TA, wherein the GG and the AA belong to homozygous type, and the GC and the TA belong to heterozygous type. According to the genetic law, the frequencies of heterozygous genes of different samples are similar in a natural population. Abnormal data to be tested which do not accord with the rule can be eliminated according to the rule.
(2) Screening according to site deletion Rate
The SNP deletion rate of a sample is an important index reflecting the genotype data quality of the sample, if the site deletion rate of the sample is too high, the sample data quality is poor, and the sample needs to be removed so as not to influence the subsequent analysis.
(3) Screening according to site information deletion rate
The site information deletion rate is the information deletion rate of a certain SNP in all the test subjects. If the information deletion rate of a certain SNP is too high, the data quality of the SNP is poor, the SNP is not suitable for subsequent analysis, and the SNP information needs to be deleted.
(4) Equilibrium screening according to Hardy-Weinberg
Hardy-Weinberg's law of equilibrium, also known as the law of genetic equilibrium, is an important law in the inheritance of the population, independently demonstrated in 1908 and 1909 by England mathematicians G.H.Hardy (Godfrey Harold Hardy) and German physicians William.Winberg (Wilhelm Weinberg), respectively. The main contents are as follows: a population is ideally (independent of specific interfering factors such as nonrandom mating, day selection, population migration, mutation or limited population size) and over multiple generations, the gene frequency and genotype frequency remain constant and in a stable equilibrium.
(5) Screening according to linkage imbalance
Linkage Disequilibrium (LD) refers to the presence of non-random combinations at two or more loci or alleles from a single genus. Simply, if two genes are not completely independently inherited during the process of inheritance, linkage disequilibrium exists between them. In practice, r2 is commonly used to indicate the linkage disequilibrium strength of SNPs, and the larger r2, the stronger the linkage disequilibrium phenomenon, and the weaker the independence of the related SNPs. Since the SNP that is desired to be finally found in the GWAS analysis is a highly independent SNP, the SNP with a high linkage phenomenon is deleted by linkage disequilibrium (typically, one SNP is left for each set of linked SNPs).
(6) Screening according to Individual independence
Data independence needs to be maintained as much as possible, and if the samples have close relativity or data of the same sample is adopted for multiple times during data acquisition, SNP distribution is not in a natural state, and the analysis result is deviated. The genetic relationship coefficient is also called as blood relationship coefficient, and the similarity degree of the genetic composition between individuals in a population is represented by a numerical value, namely the blood relationship coefficient, which can reflect the genetic relationship degree between two individuals.
(7) Association analysis
In GWAS, there are two ways to select a phenotype, the first is a linear phenotype, if height, weight, intelligence, etc.; the second is a binary phenotype, such as diseased and unaffected, known as case and control, with 0 diseased and 1 unaffected. When the phenotype to be analyzed is a binary trait, the analysis is typically performed using Logistic regression models; when the phenotype to be analyzed is a linear trait, a common linear regression model is typically used. The method uses a Logistic regression model to analyze to obtain the related significance p value of each SNP and phenotype, and selects SNPs with high relevance according to the p value to encode into 0, 1 and 2 to form a data set.
And step 104, constructing classifiers by using a decision tree as a base classifier and using three integration strategies of a random forest classifier, a Bagging classifier and an XGboost classifier to obtain three SNP base classifiers.
As shown in fig. 5, fig. 5 is an SNP classifier model, which is constructed by using a decision tree as a base classifier and using three integration modes, namely a random forest classifier, a Bagging classifier and an XGBoost classifier.
And 105, performing ensemble learning on all high-quality MRI-based classifiers and the SNP-based classifier based on the improved probability weight ensemble learning mode to obtain a final enhanced classifier.
And step 106, carrying out multi-modal Alzheimer disease medical image recognition and classification by using an enhanced classifier.
As shown in fig. 6, after the MRI classifier and the SNP classifier are constructed, a learning mode based on improved probability weight integration is used, and finally an enhanced version of the classifier is obtained. The integration method based on improved probability weight weighting is used:
p(x)=sigmoid(w1)p(x|h1)+sigmoid(w2)p(x|h2)+···sigmoid(wn)p(x|hn)
wherein n is the number of classifiers, sigmoid () is an activation function, w is a performance index of the classifier and is composed of the probability of a verification set, p is the probability of the current classifier, and h is the number of network layers. The method can effectively solve the degree of unbalanced weight among the classifiers, so that a high-efficiency enhanced classifier is formed. And performing multi-modal Alzheimer disease medical image recognition classification by using an enhanced classifier.
The ensemble learning results are related not only to individual classifier performance, but also to the diversity between the integrated classifiers. MRI selects the base classifier to be integrated finally according to the performance of each slice classifier, ensures that the selected slice has certain inosculation with the clinical manifestation of the disease, and simultaneously ensures that the image classifiers have diversity during integration; the practice of convolutional networks has proven that convolutional neural networks are advantageous for reducing the risk of over-fitting, while deep features of the image are learned.
The GWAS is used for analyzing and reducing the dimensionality of the SNP data, so that the complexity of calculation is greatly reduced, the identification error caused by redundant information is reduced, and the identification precision is improved; the SNP classifier takes a decision tree as a base classifier and constructs the classifier in various integrated modes, so that the performance of the SNP classifier is improved on one hand, and the diversity of the SNP classifier is also improved on the other hand.
The method combines two modal data of medical images and genomics, combines genome correlation analysis by reading image data and combining the image data with gene data to enable the diagnosis of the Alzheimer's disease to be more accurate and reliable, and solves the technical problems that the multi-modal fusion effect of the image data and the genetic data in the medical diagnosis of the existing Alzheimer's disease is poor, and the recognition and classification accuracy of people in different stages of the Alzheimer's disease is influenced.
The application also provides an embodiment of the multi-modal alzheimer's disease medical image recognition and classification system, which comprises:
the data module is used for constructing medical databases of different populations of Alzheimer's disease, and the medical databases comprise coronal MRI image data and gene SNP data;
the MRI image processing module is used for preprocessing the MRI image data, constructing classifiers by using CNN (convolutional neural network), and selecting at least three optimal classifiers as high-quality MRI-based classifiers;
the first gene data processing module is used for preprocessing gene SNP data by using a GWAS whole genome association analysis method to obtain a coded SNP locus data set;
the second gene data processing module is used for constructing classifiers by using a decision tree as a base classifier and using three integration strategies of a random forest classifier, a Bagging classifier and an XGboost classifier to obtain three SNP base classifiers;
the ensemble learning reinforcement module is used for carrying out ensemble learning on all high-quality MRI-based classifiers and SNP-based classifiers based on the improved probability weight ensemble learning mode to obtain a final reinforcement version classifier;
and the recognition and classification module is used for performing multi-modal Alzheimer disease medical image recognition and classification by using an enhanced classifier.
The first gene data processing module is specifically configured to:
using PLINK software for GWAS whole genome association analysis of gene SNP data, including: screening gene SNP data according to site deletion rate, screening gene SNP data according to site information deletion rate, screening gene SNP data according to Hardy-Weinberg balance, screening gene SNP data according to linkage imbalance, screening gene SNP data according to individual independence, analyzing by using a Logistic regression model to obtain the related significance p value of each SNP and phenotype, selecting SNP with high relevance according to the p value to encode, and forming an encoded SNP site data set.
Image pre-processing MRI image data includes:
performing skull removal and registration processing on the MRI image data;
smoothing the MRI image data;
carrying out gray level normalization on MRI image data;
two-dimensional slices are taken of the MRI image data.
Image pre-processing of the MRI image data is performed using SPM12 software.
The ensemble learning mode based on improved probability weights is:
p(x)=sigmoid(w1)p(x|h1)+sigmoid(w2)p(x|h2)+···sigmoid(wn)p(x|hn)
wherein n is the number of classifiers, sigmoid () is an activation function, w is a performance index of the classifier, p is the probability of the current classifier, and h is the number of network layers.
The ensemble learning results are related not only to individual classifier performance, but also to the diversity between the integrated classifiers. MRI selects the base classifier to be integrated finally according to the performance of each slice classifier, ensures that the selected slice has certain inosculation with the clinical manifestation of the disease, and simultaneously ensures that the image classifiers have diversity during integration; the practice of convolutional networks has proven that convolutional neural networks are advantageous for reducing the risk of over-fitting, while deep features of the image are learned.
The GWAS is used for analyzing and reducing the dimensionality of the SNP data, so that the complexity of calculation is greatly reduced, the identification error caused by redundant information is reduced, and the identification precision is improved; the SNP classifier takes a decision tree as a base classifier and constructs the classifier in various integrated modes, so that the performance of the SNP classifier is improved on one hand, and the diversity of the SNP classifier is also improved on the other hand.
The method combines two modal data of medical images and genomics, combines genome correlation analysis by reading image data and combining the image data with gene data to enable the diagnosis of the Alzheimer's disease to be more accurate and reliable, and solves the technical problems that the multi-modal fusion effect of the image data and the genetic data in the medical diagnosis of the existing Alzheimer's disease is poor, and the recognition and classification accuracy of people in different stages of the Alzheimer's disease is influenced.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A multi-modal Alzheimer's disease medical image recognition and classification method is characterized by comprising the following steps:
constructing medical databases of different populations of Alzheimer's disease, wherein the medical databases comprise coronal MRI image data and gene SNP data;
after image preprocessing is carried out on the MRI image data, a CNN (convolutional neural network) is used for constructing classifiers, and at least three optimal classifiers are selected as high-quality MRI-based classifiers;
preprocessing the gene SNP data by using a GWAS whole genome association analysis method to obtain a coded SNP locus data set;
constructing classifiers by using a decision tree as a base classifier and using three integration strategies of a random forest classifier, a Bagging classifier and an XGboost classifier to obtain three SNP base classifiers;
performing ensemble learning on all the high-quality MRI-based classifiers and the SNP-based classifier based on an improved probability weight ensemble learning mode to obtain a final enhanced version classifier;
and performing multi-modal Alzheimer's disease medical image recognition classification by using the enhanced classifier.
2. The method for multi-modal alzheimer's disease medical image recognition and classification as claimed in claim 1, wherein the preprocessing of the genetic SNP data using GWAS genome-wide association analysis to obtain encoded SNP site data set comprises:
performing GWAS whole genome association analysis on the gene SNP data by using PLINK software, wherein the GWAS whole genome association analysis comprises the following steps: screening gene SNP data according to site deletion rate, screening gene SNP data according to site information deletion rate, screening gene SNP data according to Hardy-Weinberg balance, screening gene SNP data according to linkage imbalance, screening gene SNP data according to individual independence, analyzing by using a Logistic regression model to obtain the related significance p value of each SNP and phenotype, selecting SNP with high relevance according to the p value to encode, and forming an encoded SNP site data set.
3. The method for recognizing and classifying the medical images of the multi-modal alzheimer's disease according to claim 1, wherein the image preprocessing of the MRI image data comprises:
performing skull removal and registration processing on the MRI image data;
smoothing the MRI image data;
performing gray scale normalization on the MRI image data;
two-dimensional slicing is performed on the MRI image data.
4. The method for medical image recognition and classification of multi-modal alzheimer's disease according to claim 3, wherein the MRI image data is pre-processed using SPM12 software.
5. The method for recognizing and classifying multi-modal medical images of alzheimer's disease as claimed in claim 1, wherein the integrated learning mode based on improved probability weight is:
p(x)=sigmoid(w1)p(x|h1)+sigmoid(w2)p(x|h2)+···sigmoid(wn)p(x|hn)
wherein n is the number of classifiers, sigmoid () is an activation function, w is a performance index of the classifier, p is the probability of the current classifier, and h is the number of network layers.
6. A multi-modal Alzheimer's disease medical image recognition and classification system is characterized by comprising:
the data module is used for constructing medical databases of different populations of Alzheimer's disease, and the medical databases comprise coronal MRI image data and gene SNP data;
the MRI image processing module is used for preprocessing the MRI image data, constructing classifiers by using CNN (convolutional neural network), and selecting at least three optimal classifiers as high-quality MRI-based classifiers;
the first gene data processing module is used for preprocessing the gene SNP data by using a GWAS whole genome association analysis method to obtain an encoded SNP locus data set;
the second gene data processing module is used for constructing classifiers by using a decision tree as a base classifier and using three integration strategies of a random forest classifier, a Bagging classifier and an XGboost classifier to obtain three SNP base classifiers;
the ensemble learning reinforcement module is used for carrying out ensemble learning on all the high-quality MRI-based classifiers and the SNP-based classifiers based on an improved probability weight ensemble learning mode to obtain a final reinforcement version classifier;
and the recognition and classification module is used for performing multi-modal Alzheimer disease medical image recognition and classification by using the enhanced classifier.
7. The multi-modality alzheimer's disease medical image recognition classification system of claim 6, wherein the first genetic data processing module is specifically configured to:
performing GWAS whole genome association analysis on the gene SNP data by using PLINK software, wherein the GWAS whole genome association analysis comprises the following steps: screening gene SNP data according to site deletion rate, screening gene SNP data according to site information deletion rate, screening gene SNP data according to Hardy-Weinberg balance, screening gene SNP data according to linkage imbalance, screening gene SNP data according to individual independence, analyzing by using a Logistic regression model to obtain the related significance p value of each SNP and phenotype, selecting SNP with high relevance according to the p value to encode, and forming an encoded SNP site data set.
8. The multi-modality alzheimer's medical image recognition classification system of claim 6 wherein image pre-processing the MRI image data comprises:
performing skull removal and registration processing on the MRI image data;
smoothing the MRI image data;
performing gray scale normalization on the MRI image data;
two-dimensional slicing is performed on the MRI image data.
9. The multi-modality alzheimer's medical image recognition classification system of claim 8 wherein the MRI image data is image pre-processed using SPM12 software.
10. The multimodal alzheimer's disease medical image recognition and classification system of claim 6 wherein the integrated learning mode based on improved probability weights is:
p(x)=sigmoid(w1)p(x|h1)+sigmoid(w2)p(x|h2)+···sigmoid(wn)p(x|hn)
wherein n is the number of classifiers, sigmoid () is an activation function, w is a performance index of the classifier, p is the probability of the current classifier, and h is the number of network layers.
CN202110265610.4A 2021-03-11 2021-03-11 Multi-modal Alzheimer's disease medical image recognition and classification method and system Pending CN112884754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110265610.4A CN112884754A (en) 2021-03-11 2021-03-11 Multi-modal Alzheimer's disease medical image recognition and classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110265610.4A CN112884754A (en) 2021-03-11 2021-03-11 Multi-modal Alzheimer's disease medical image recognition and classification method and system

Publications (1)

Publication Number Publication Date
CN112884754A true CN112884754A (en) 2021-06-01

Family

ID=76041325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110265610.4A Pending CN112884754A (en) 2021-03-11 2021-03-11 Multi-modal Alzheimer's disease medical image recognition and classification method and system

Country Status (1)

Country Link
CN (1) CN112884754A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380379A (en) * 2021-06-08 2021-09-10 上海健康医学院 Imaging phenotype-based whole genome association analysis method, medium and equipment
CN113724863A (en) * 2021-09-08 2021-11-30 山东建筑大学 Automatic discrimination system, storage medium and equipment for autism spectrum disorder
CN114202524A (en) * 2021-12-10 2022-03-18 中国人民解放军陆军特色医学中心 Performance evaluation method and system of multi-modal medical image
CN114372497A (en) * 2021-08-18 2022-04-19 中电长城网际系统应用有限公司 Multi-modal security data classification method and classification system
CN117349714A (en) * 2023-12-06 2024-01-05 中南大学 Classification method, system, equipment and medium for medical image of Alzheimer disease

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109589092A (en) * 2018-10-08 2019-04-09 广州市本真网络科技有限公司 Method and system are determined based on the Alzheimer's disease of integrated study
CN110097128A (en) * 2019-05-07 2019-08-06 广东工业大学 Medical Images Classification apparatus and system
CN110232679A (en) * 2019-05-24 2019-09-13 潘丹 A kind of Alzheimer's disease genetic biomarkers object determines method and system
CN110516758A (en) * 2019-09-02 2019-11-29 广东工业大学 A kind of alzheimer's disease classification prediction technique and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109589092A (en) * 2018-10-08 2019-04-09 广州市本真网络科技有限公司 Method and system are determined based on the Alzheimer's disease of integrated study
CN110097128A (en) * 2019-05-07 2019-08-06 广东工业大学 Medical Images Classification apparatus and system
CN110232679A (en) * 2019-05-24 2019-09-13 潘丹 A kind of Alzheimer's disease genetic biomarkers object determines method and system
CN110516758A (en) * 2019-09-02 2019-11-29 广东工业大学 A kind of alzheimer's disease classification prediction technique and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380379A (en) * 2021-06-08 2021-09-10 上海健康医学院 Imaging phenotype-based whole genome association analysis method, medium and equipment
CN114372497A (en) * 2021-08-18 2022-04-19 中电长城网际系统应用有限公司 Multi-modal security data classification method and classification system
CN113724863A (en) * 2021-09-08 2021-11-30 山东建筑大学 Automatic discrimination system, storage medium and equipment for autism spectrum disorder
CN114202524A (en) * 2021-12-10 2022-03-18 中国人民解放军陆军特色医学中心 Performance evaluation method and system of multi-modal medical image
CN117349714A (en) * 2023-12-06 2024-01-05 中南大学 Classification method, system, equipment and medium for medical image of Alzheimer disease
CN117349714B (en) * 2023-12-06 2024-02-13 中南大学 Classification method, system, equipment and medium for medical image of Alzheimer disease

Similar Documents

Publication Publication Date Title
CN112884754A (en) Multi-modal Alzheimer's disease medical image recognition and classification method and system
US7133856B2 (en) Binary tree for complex supervised learning
AU2002359549B2 (en) Methods for the identification of genetic features
US7653491B2 (en) Computer systems and methods for subdividing a complex disease into component diseases
US20030224394A1 (en) Computer systems and methods for identifying genes and determining pathways associated with traits
CN113517066B (en) Depression assessment method and system based on candidate gene methylation sequencing and deep learning
WO2004013727A2 (en) Computer systems and methods that use clinical and expression quantitative trait loci to associate genes with traits
Hejase et al. A deep-learning approach for inference of selective sweeps from the ancestral recombination graph
US7640113B2 (en) Methods and apparatus for complex genetics classification based on correspondence analysis and linear/quadratic analysis
Bi et al. Detecting risk gene and pathogenic brain region in EMCI using a novel GERF algorithm based on brain imaging and genetic data
Ying et al. Multi-modal data analysis for alzheimer’s disease diagnosis: An ensemble model using imagery and genetic features
Kumar et al. An amalgam method efficient for finding of cancer gene using CSC from micro array data
CN109215738B (en) Method for predicting Alzheimer's disease-related gene
Alatrany et al. Transfer learning for classification of Alzheimer's disease based on genome wide data
Alatrany et al. A novel hybrid machine learning approach using deep learning for the prediction of Alzheimer disease using genome data
Abd El Hamid et al. Identifying genetic biomarkers associated to Alzheimer's disease using Support Vector Machine
Filipovych et al. A composite multivariate polygenic and neuroimaging score for prediction of conversion to Alzheimer's disease
Hejase et al. Sia: Selection inference using the ancestral recombination graph
US20030077617A1 (en) Method for diagnosis of a disease by using multiple SNP (single nucleotide polymorphism) variations and clinical data
JP5852902B2 (en) Gene interaction analysis system, method and program thereof
Cudic et al. Prediction of sorghum bicolor genotype from in-situ images using autoencoder-identified SNPs
CN110993031B (en) Analysis method, analysis device, apparatus and storage medium for autism candidate gene
AU2021207383B2 (en) Ancestry inference based on convolutional neural network
Sherwood et al. Brain evolution: Mapping the inner Neandertal
Nahlawi Genetic feature selection using dimensionality reduction approaches: A comparative study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210601

RJ01 Rejection of invention patent application after publication