CN113222915B - Method for establishing PD (potential of Hydrogen) diagnosis model based on multi-modal magnetic resonance imaging omics - Google Patents

Method for establishing PD (potential of Hydrogen) diagnosis model based on multi-modal magnetic resonance imaging omics Download PDF

Info

Publication number
CN113222915B
CN113222915B CN202110468956.4A CN202110468956A CN113222915B CN 113222915 B CN113222915 B CN 113222915B CN 202110468956 A CN202110468956 A CN 202110468956A CN 113222915 B CN113222915 B CN 113222915B
Authority
CN
China
Prior art keywords
brain
subcortical
omics
image
qsm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110468956.4A
Other languages
Chinese (zh)
Other versions
CN113222915A (en
Inventor
张敏鸣
管晓军
徐晓俊
黄沛钰
郭涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110468956.4A priority Critical patent/CN113222915B/en
Publication of CN113222915A publication Critical patent/CN113222915A/en
Application granted granted Critical
Publication of CN113222915B publication Critical patent/CN113222915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30016Brain

Abstract

The invention discloses a method for establishing a PD diagnosis model based on multi-modal magnetic resonance imaging group, which is expected to improve the current situation that PD diagnosis excessively depends on subjective evaluation by converting multi-modal brain magnetic resonance images into high-order and massive brain features with potential values and establishing a diagnosis model by means of machine learning. According to the method, a group of brain image omics characteristics mainly based on black iron distribution are obtained through image processing, data segmentation, characteristic extraction, characteristic screening, model construction and external independent verification, and the method also has good diagnosis accuracy during external independent verification. Secondly, the random forest classifier constructed based on the brain imaging omics characteristics has good performance in diagnosing different clinical states of PD (early stage patients, middle and late stage patients, PD patients without drug treatment and drug treatment, PD patients with tremor as a main type and PD patients without tremor as a main type).

Description

Method for establishing PD (potential of Hydrogen) diagnosis model based on multi-modal magnetic resonance imaging omics
Technical Field
The invention belongs to the technical field of neuroimaging, and particularly relates to a method for establishing a PD (potential of Hydrogen) diagnosis model based on multi-modal magnetic resonance imaging omics.
Background
Parkinson's Disease (PD) is a common neurodegenerative disease, and its main clinical manifestations include bradykinesia, rigidity and resting tremor. A large body of pathological evidence suggests that the death of dopaminergic neurons of the substantia nigra is a marker of clinical PD; with the development of modern neuroimaging studies, researchers have profoundly assumed that PD is a disease involving dysfunction in different brain regions and is closely associated with its clinical heterogeneity. Due to the lack of reliable and stable biomarkers, clinical diagnosis of PD has been embarrassed, with no significant improvement compared to 20 years ago, and a higher diagnosis error rate occurs, especially when diagnosing early or non-drug-using PD patients. Therefore, it is of great interest to integrate the clinical heterogeneity of PD in finding biomarkers that can mark PD brain changes.
Multi-modal Magnetic Resonance Imaging (MRI) has multiple tissue specificities, and provides a non-invasive and highly reproducible technical means for studying neurodegenerative diseases. In PD studies, the presence of significant iron deposition in the substantia nigra region was consistently detected by Quantitative magnetic Susceptibility Mapping (QSM) and transverse relaxation rate (R2) imaging. While high resolution T1 weighted imaging can stabilize the whole brain gray matter structure. Previous studies used T1-weighted image datasets from the international multicenter database, PPMI, and found that gray matter volume was of some efficacy in diagnosing early PD. Thus, despite the advances made in previous studies in constructing biomarkers for PD imaging, the limitations of single modality data, small sample size, and lack of independent data set validation make these studies difficult to fully characterize the brain of PD, and clinical translation is challenging.
In addition, high-order brain features beyond traditional analysis are also stored in the medical image, and the research considers that the features have the advantage of reflecting the potential brain pathophysiology; however, in PD studies they were often ignored, so the diagnostic value of the T1 weighted images, QSM and R2 maps was underestimated.
Disclosure of Invention
The invention aims to provide a method for establishing a PD diagnostic model based on multi-modal magnetic resonance imaging omics aiming at the defects of biomarker research in PD images. The method improves the diagnosis accuracy of PD and verifies the generalization capability of PD by converting multi-modal brain magnetic resonance images (quantitative magnetic sensitivity imaging, R2 imaging and high-resolution T1 weighted imaging) into high-order and massive brain features with potential values and combining various features by means of a diagnosis model constructed by machine learning.
The purpose of the invention is realized by the following technical scheme: a method for establishing a PD diagnostic model based on multi-modal magnetic resonance imaging omics comprises the following steps:
(1) t1 weighted images and enhanced sensitivity weighted angiographic images of PD patients and normal controls were acquired.
(2) Obtaining QSM and R2 maps according to the enhanced sensitivity weighted angiography image obtained in the step (1), and segmenting the QSM and R2 maps to obtain 10 subcortical nuclei; and (2) segmenting the subcortical nuclei of the T1 weighted image obtained in the step (1) to obtain 8 subcortical nuclei, and segmenting the cortical gray matter of the T1 weighted image to obtain 62 cortical regions.
(3) And (3) extracting brain image omics characteristics aiming at each subcortical nucleus and cortical region segmented in the step (2).
(4) And selecting the characteristics of the brain imaging omics with higher contribution.
(5) And (5) according to the step (4), training a classification model by utilizing the characteristics of the brain imaging omics with higher contribution in the step (3) for diagnosing the PD patient type.
Further, in the step (1), a T1 weighted image is obtained by using a fast spoiling gradient sequence; a gradient echo sequence is used to acquire enhanced sensitivity weighted angiographic images.
Further, in the step (2), the STAR-QSM algorithm is used to reconstruct the enhanced sensitivity weighted angiography image obtained in the step (1) to obtain QSM and R2 maps, and then a segmentation method based on the registration algorithm is used to segment 10 subcortical nuclei of the individual space QSM and R2 maps, specifically:
(2.1.1) co-registering all individual space QSM images to a QSM template to obtain QSM images in the template space and obtain a transformation matrix.
(2.1.2) constructing subcortical nuclei masks of caudate nucleus, putamen, globus pallidus, red nucleus and substantia nigra in QSM template space.
(2.1.3) deforming the subcortical nuclei mask in the template space obtained in the step (2.1.2) to the individual space through the inverse transformation matrix of the transformation matrix obtained in the step (2.1.1), and obtaining the subcortical nuclei mask in the individual space.
(2.1.4) the R2 map was segmented based on the mask of subcortical nuclei of the individual space obtained in step (2.1.3) to obtain 10 subcortical nuclei.
Further, the method comprises the following steps:
in the step (2.1.1), the registration algorithm is a SyN registration algorithm in ANTs software; the age of the QSM template is more than or equal to 55 years old.
In the step (2.1.3), the subcortical nuclear mass mask of the individual space is adjusted manually, and the dislocation of surrounding tissue voxels caused by registration deviation is corrected.
In the step (2), the FIRST software is adopted for the segmentation of the subcortical nuclei of the T1 weighted image; carrying out field nonuniformity correction, brain tissue extraction, SyN registration, tissue segmentation and skin layer thickness calculation on the T1 weighted image cortex segmentation by adopting ANTs software, and then segmenting to obtain 62 cortex regions based on a DKT mask; wherein, the DKT mask is finely adjusted according to the cortical thickness mask to remove the extracerebral part.
Further, the step (3) is specifically:
(3.1) for QSM and R2 maps, 28 histogram features and 20 texture features were calculated within each subcortical nodule, respectively. Histogram features include calculating the mean signal intensity, standard deviation, kurtosis and skewness separately for 7 signal value ranges of 5%, 10%, 20%, 30%, 40% and 50% of all signal values and their highest values within each subcortical nodule. The texture features are the average of the 26-direction gray level co-occurrence matrix features corresponding to each subcortical nucleus.
(3.2) extracting the brain imaging omics characteristics in the T1 weighted image, which specifically comprises the following steps: for the T1 weighted image subcortical nuclei, a nuclei normalized volume, 4 histogram features, and 20 texture features were calculated. Cortical regions for the T1 weighted images, cortical volume, mass, thickness and surface area were calculated.
Further, in step (3.2), for T1 weighted subcortical nuclei of the image, the normalized volume of the nuclei is obtained by multiplying the number of voxels in each nuclei by the voxel resolution and dividing by the total intracranial volume. The nuclear signal is divided by the mean value of the whole white matter intensity to normalize the nuclear signal intensity distribution and then obtain the histogram feature and the texture feature. For the cortical regions of the T1 weighted image, the characteristics of cortical volume and surface area were normalized by dividing the cortical volume of a single cortical region by the total volume of intracranial brain tissue, and dividing the cortical surface area of a single cortical region by the total surface area of intracranial brain tissue.
Further, before the step (4), preprocessing the brain imaging omics features extracted in the step (3), specifically: according to the brain imaging omics characteristics and the corresponding ages and sexes of normal people, a general linear model is constructed:
Y~β+α 1 X 12 X 2
wherein Y is a brain imaging omics feature, X 1 Is age, X 2 Is sex, beta, alpha 12 Are parameters of a general linear model. Applying the constructed general linear model to the PD patient and the normal control group in the step (1) to obtain estimated brain imaging omics characteristics; subtracting the estimated brain image omics characteristics from the extracted brain image omics characteristics in the step (3) to obtain the brain image omics characteristics with age and gender influence eliminated; then the brain image omics characteristics after eliminating the influence are processed to [ -1,1 [ -1 [ ]]And (4) normalizing.
Further, the use method of the PD diagnosis model is as follows: and (3) sequentially carrying out the steps (2) to (3) on the T1 weighted image to be diagnosed and the enhanced sensitivity weighted angiography image, eliminating the influence of age and gender based on the general linear model, and inputting the extracted brain image omics characteristics with higher contribution degree into the classification model trained in the step (5) according to the step (4) to obtain the corresponding PD patient type.
Further, in the step (4), according to the current brain image omics features and the corresponding classification labels, selecting features with higher contribution, specifically: creating a plurality of random databases by rearranging label columns of the current brain image omics feature data set, using an R language packet 'caret' in combination with a random forest classifier, carrying out non-deviation estimation on classification errors by cross validation to obtain importance parameters of the current brain image omics features, and selecting the brain image omics features with higher importance.
Further, in the step (5), a random forest classifier is adopted as the classification model, and the classification model comprises two hyper-parameters: number of decision trees and number of nodes on decision trees
Figure GDA0003776686430000031
Wherein, F is the characteristic number of the currently used brain imaging omics.
The invention has the beneficial effects that:
(1) a group of 36 important brain imaging omics characteristics mainly based on the distribution of the substantia nigra iron is obtained, and the diagnosis accuracy of the internal test is 81.1% +/-8.0%; to avoid overfitting, the model of the invention was externally independently validated with an accuracy of 78.5% ± 2.1%. Therefore, a diagnosis model constructed based on the 36 brain imaging omics feature sets has better stability and accuracy and potential clinical value;
(2) random forest classifiers based on brain imaging omics features perform well in diagnosing PD patients in different clinical states. (FIG. 4) Using 36 imaging omics features in combination, the present inventors observed a diagnostic accuracy of 80.3% + -7.1% and 79.1% + -6.5% for patients diagnosed with PD at the early stage and 79.4% + -6.3% and 82.0% + -5.8% for patients receiving drug treatment at the mid-late stage of the diagnosis, respectively. Furthermore, existing MRI studies focus on differences between PD patients with different motor symptoms, and no image biomarkers independent of motor heterogeneity have been studied. Therefore, the 36 brain imaging omics characteristics are also verified to be represented in diagnosis in PD-TD patients and PD-nonTD patients respectively, and the accuracy rates are respectively 79.8% + -6.9% and 79.1% + -6.5%. In conclusion, the present invention shows good performance in diagnosing PD patients, which can be independent of disease stage, drug effect and motor subtype, and its clinical value is predictable.
Drawings
FIG. 1 is a flow chart of a semi-automatic segmentation of a subcortical nodule based on a registration algorithm; wherein A is an individual space QSM image, B is an age-specific QSM template, C is a subcortical censphere mask, D is the QSM image in the template space, E is the subcortical censphere mask in the individual space, F is the modified subcortical censphere mask in the individual space, and G is the subcortical censphere in the R2 @;
FIG. 2 is a flowchart of a brain imaging omics framework analysis; wherein, A is the adopted QSM, R2 and T1 weighted images and the extracted features thereof; b is a general linear model constructed from 121 normal controls to eliminate as much as possible the effects of age and gender on the extracted raw brain features; c, obtaining effective brain imaging omics characteristics through a data-driven characteristic selection method; d, establishing a machine learning model, comprising model training, testing and external verification, and carrying out parallel testing on PD patients in different clinical states;
FIG. 3 is a schematic diagram of feature selection and classifier construction based on brain imaging omics; wherein, A is the most important first 50 information features selected by adopting R packet 'caret' according to the contribution degree of the features, and the upper right corner is a feature matrix (1408 multiplied by 244); b, constructing a random forest classifier by using brain image omics characteristics of a training database, performing 1000 times of iterative operation to obtain model expression, and setting a characteristic matrix of a database-244 (50 x 244) at the top; c, performing external independent verification on the constructed random forest classifier by adopting an unused database, wherein the top of the database is a characteristic matrix of database-106 (50 x 106); represents the final brain imagery omics feature set that achieves the best accuracy upon external independent validation (database-106);
figure 4 is a graphical representation of the efficacy of 36 defined brain imaging omics signatures in diagnosing PD in different clinical states; wherein, A is the identification accuracy for distinguishing PD patients with different clinical states from a normal control group, and B is the contribution of each brain imaging omics characteristic in each machine learning test estimated by adopting an average reduction kini coefficient.
Detailed Description
In the method for establishing the PD diagnosis model based on the multi-modal magnetic resonance imaging omics, the brain imaging omics can comprehensively acquire the brain features with PD diagnosis value from multi-modal MRI images (T1 weighted images, QSM and R2), and combines various features by machine learning, so that the diagnosis accuracy of PD is expected to be improved, and the method does not depend on the clinical states of PD (such as disease stage, medication condition and motion subtype) too much. In addition, the model of the invention is externally verified in a completely independent data set, and the generalization capability of the model is used as an important index for verification. The research framework defined by the invention comprises image preprocessing, data segmentation, feature extraction, feature screening and model testing and verification.
The study of the example of the present invention was approved by the ethical committee of medical science of the second subsidiary hospital of the medical college of Zhejiang university, and informed consent was signed by all PD patients and normal control groups (NC).
The clinical and imaging data of this example began from 8 months 2014 to 5 months 2018. Diagnosis of PD was made by two skilled neurologists according to the british PD association brain pool standard. Initially, a total of 293 subjects received MRI scans and clinical assessments. For PD patients taking anti-parkinson drugs, MRI scans and clinical assessments ("withdrawal status") were performed in the morning after one night (at least 12 hours) of anti-parkinson drug withdrawal. Of the 293 subjects, 49 were excluded due to head movement, misregistration, significant brain atrophy/ventricular enlargement, multiple microhemorrhages, lack of clinical data, and other neurological/psychiatric disorders. Finally, 244 subjects included 121 normal controls and 123 PD patients enrolled in the database-244.
(1) An MRI sequence is obtained. All subjects were scanned using a 3.0T MRI scanner (GE Discovery 750) equipped with an 8-channel head coil. During MRI scanning, the head is secured using a restraining foam pad and earplugs are provided to reduce noise. The scan parameters for each MRI sequence were recorded as follows:
(1.1) obtaining a high resolution 3D T1 weighted image using a fast spoiled gradient sequence: the repetition time is 7.336 ms; echo time is 3.036 ms; the reverse time is 450 ms; the turning angle is equal to 11 degrees; visual field of 260X 260mm 2 (ii) a The matrix is 256 × 256; the layer thickness is 1.2 mm; the number of layers is 196.
(1.2) acquiring Enhanced Sensitivity Weighted Angiography (ESWAN) images using gradient echo sequence: the repetition time is 33.7 ms; for the first timeThe echo time/interval/eighth echo time is 4.556ms/3.648ms/30.092 ms; the turning angle is 20 degrees; field of view 240X 240mm 2 (ii) a The matrix is 416 × 384; layer thickness 2 mm; the interlayer spacing is 0 mm; the number of layers is 64.
(2) QSM and R2 maps were obtained from ESWAN images, and QSM, R2 maps, T1 weighted images were segmented.
(2.1) subcortical segmentation in the individual space QSM and R2 ×. According to the invention, a STAR-QSM algorithm in an STI Suite V3.0 software package is adopted to reconstruct Enhanced Sensitivity Weighted Angiography (ESWAN) data obtained in the step (1) so as to obtain QSM and R2 STAR maps (https:// peoples. eecs. berkeley. edu/. chunlei. liu/software. html). In order to obtain the original magnetic susceptibility of each nuclei in the individual space, a segmentation method based on a registration algorithm is adopted to segment the 10 subcortical nuclei of QSM and R2 maps in the individual space, and the steps are as shown in fig. 1:
(2.1.1) co-registering all individual space QSM images A to a specific age (age ≧ 55) QSM template B by means of the SyN registration algorithm in the ANTs package to obtain QSM images D in the template space and obtain the transformation matrix for each subject.
(2.1.2) artificially constructing a subcortical nucleus mask C comprising a Caudate Nucleus (CN), a putamen, a Globus Pallidus (GP), a Red Nucleus (RN) and Substantia Nigra (SN) in a QSM template B space; the left and right brains are 5 each.
(2.1.3) deforming the subcortical nuclear mass mask C in the template space to the individual space through the inverse transformation matrix of the transformation matrix obtained in the step (2.1.1), and obtaining the subcortical nuclear mass mask E in the individual space.
(2.1.4) in order to reduce the dislocation of the surrounding tissue voxels caused by the registration deviation, the subcortical bolus mask E of the individual space is artificially modified appropriately to obtain a modified subcortical bolus mask F of the individual space.
(2.1.5) since the R2 map and the QSM image have the same spatial information, the subcortical nuclei mask F of the individual space obtained in step (2.1.4) is applied to the R2 map, and the 10 subcortical nuclei in the R2 map are segmented accordingly to obtain the subcortical nuclei G in the 10R 2 map of the individual space.
And (2.2) segmenting the subcortical nuclei and the cortical gray matter of the T1 weighted image of the original individual obtained in the step (1) by means of open source software such as ANTs and FIRST. Wherein the quality control of the subcortical and cortical segmentation is performed by visually inspecting the segmentation effect to prevent registration errors for each subject.
(2.2.1) the subcortical nuclei of the T1 weighted image of the individual space were segmented using a fully automated segmentation method (FIRST) to obtain 8 nuclei (a in fig. 2) with bilateral CN, putamen, GP, thalamus.
(2.2.2) in cortical segmentation, "antsCortic thickness" by ANTs was used, including field inhomogeneity correction (N4 bias field correction), brain tissue extraction, SyN registration, tissue segmentation and cortical thickness calculation. Subsequently, 62 cerebral cortex (31 cortical areas per hemisphere) per subject's individual space were segmented by means of the "antsJoint Labelfusion" algorithm and Desikan-Killiany-Tourville (DKT) mask in the ANTs package (A in FIG. 2). To eliminate the risk of zero regions in each original segmentation, the present invention fine-tunes the DKT mask described above according to the cortical thickness mask, removing the extracerebral part.
(3) And (5) extracting brain features.
(3.1) for QSM and R2 maps, the present invention calculates 28 histogram features and 20 texture features based on gray level co-occurrence matrix (GLCM) algorithm within each kernel, respectively. 480((4 x 7+20) x 10) brain proteomics features were extracted from each of the QSM images and the R2.
(3.1.1) based on the quantitative nature of QSM and R2 maps, the present invention also extracts data for different signal value ranges including all signal values within the nuclei and their highest 5%, 10%, 20%, 30%, 40% and 50% when calculating the signal histogram features for each nuclei, and calculates the mean signal intensity, standard deviation, kurtosis and skewness (4 histogram features) for the 7 signal value ranges, respectively.
(3.1.2) secondly, the invention uses a gray level co-occurrence matrix (GLCM) algorithm written by Matlab2018a to extract 3D texture features, and in order to ensure the robustness of the quantization features, the invention takes 2 corresponding to each kernel group6(3 3 -1) mean value of gray level co-occurrence matrix GLCM features for each direction.
(3.2) high resolution T1 weighted images.
(3.2.1) for the subcortical nuclei obtained in step (2) and segmented based on the T1 weighted image, the present invention calculated three features, namely, the normalized volume of the nuclei, 4 histogram features and 20 texture features. First, the present invention calculates the number of voxels within each cluster, multiplies it by the voxel resolution, and divides by the total intracranial volume to obtain the normalized cluster volume. Then, the present invention calculates a relative quantitative nuclei signal (nuclei signal divided by the average of the whole brain white matter intensity) to normalize the nuclei signal intensity distribution, and obtains the 4 histogram features and 20 texture features described above. Also aiming at GLCM features, the invention averages the GLCM features of 26 directions corresponding to each nucleus, and finally obtains 200((1+4+20) × 8) FIRST subcortical nucleus features.
(3.2.2) for cortical regions (62) segmented by the ANTs algorithm, the present invention calculates the characteristics of cortical volume, mass, thickness and surface area by applying the "labelgeometric measures" and "ImageMath" algorithms of ANTs. Wherein the cortical volume and surface area of a single cortical region are normalized to the total intracranial brain tissue volume (TIV) and total surface area (cortical volume/TIV, cortical surface area/total intracranial brain tissue surface area), respectively. Finally, 248(62 x 4) cerebral image omics features (ANTs features) based on the cerebral cortex were obtained.
(4) Brain characteristic pretreatment: general linear model (B in fig. 2).
Considering that age and gender are potential confounding factors, the invention constructs a general linear model in the data of 121 normal populations to estimate the influence of age and gender of the normal population on each brain feature:
Y~β+α 1 X 12 X 2
wherein the dependent variable Y is the calculated brain features (1408 in total), and the independent variable X 1 Is age (continuous variable), independent variable X 2 Gender (categorical variable); beta, alpha 1 、α 2 Are coefficients of a general linear model.
For each brain feature Y i (121 × 1) (i 1 to 1408), and a set of β and α can be calculated 1 、α 2 Corresponding general linear models (1408) are obtained. Substituting the ages and sexes of the subjects in the PD group and the normal control group in the database-244 into a general linear model of all brain features to calculate an estimated brain image omics feature (244 x 1408), and subtracting the estimated brain image omics feature from the actual brain image omics feature (244 x 1408) extracted in the step (3) to obtain the brain image omics feature (244 x 1408) with confounding factors eliminated, namely the part of the actual brain feature which cannot be explained by the two factors of the ages and the sexes.
Then, the invention uses mapminmax in Matlab2018a to normalize all the encephalomics characteristics without the confounders by [ -1,1 ].
(5) Feature selection and model construction: the R language package 'caret' is combined with a random forest classifier to select features, and a classification model is constructed, wherein the method specifically comprises the following steps:
(5.1) creating 1000 new random databases by rearranging label columns of the database-244 preprocessed in the step (4), carrying out unbiased estimation on classification errors by 10-fold cross validation by adopting an R language package 'caret' in combination with a random forest classifier, and further carrying out importance (import) sorting on 1408 brain image omics features processed in the step (4) (C in the figure 2); and the most important 50 features of the brain imaging omics (a in fig. 3) are selected as the features of the training data set input into the subsequent classification model.
(5.2) the classification model also adopts a random forest classifier, and the invention firstly determines two hyper-parameters: number of decision trees (default 500) and number of nodes on decision trees
Figure GDA0003776686430000081
Wherein F is a characteristic number used in the random forest classifier.
(5.3) Next, as shown in D of FIG. 2, the present invention divides the preprocessed (database-244) from step (4) into a training data set (for training the classification model) and a testing data set (for internal verification), divides the ratio into 9:1, and performs 1000 resampling times on the training data set to train the classification model, so that a stable prediction result can be obtained after the testing data set is input into the classification model (B of FIG. 3). Wherein, the input of the classification model is 50 characteristics selected in the step (5.1) in the corresponding data set.
(6) Clinical validation
In order to further verify the generalization ability of the obtained model of the invention, the invention carries out external independent verification on the constructed diagnosis model by using a database-106 consisting of 48 PD patients and 58 normal controls which are recruited in the 5 th to 8 th months in 2018. All procedures were performed independently except using a general linear model estimated previously to eliminate the effects of age and gender.
In addition, the invention also constructs a data-driven algorithm to process the redundant brain imaging omics characteristics, namely, the steps (5.2) to (5.3) are repeated to construct a diagnosis model, and the input of 1 characteristic is continuously reduced in each cycle, so that F is reduced by 1 from 50 to 2(F is 50: -1:2) in each cycle; with the model reaching the highest accuracy in the external validation as the criterion (C in fig. 3), the brain imaging omics feature number was set to
Figure GDA0003776686430000082
I.e. the 36 features with the top importance (such as the corrugated part of a in fig. 3).
In conclusion, the invention establishes a model construction process integrating training, internal testing and external verification, and respectively records the accuracy, sensitivity and specificity of average diagnosis through 1000 iterations. In addition, in order to determine the optimal brain imaging omics feature set, the invention performs feature screening based on data driving, and finally determines 36 brain imaging omics features with important values.
In view of the heterogeneity of clinical manifestations of PD patients, the present invention tested the ability of the above-mentioned 36 brain imaging omics signatures in the training database (database-244) to diagnose PD patients in different clinical states (a in fig. 4). For each test, the present invention calculated an average reduced Gini (Mean coarse Gini) coefficient for 36 features, which quantifies how each image omics feature contributes to the classification (B in FIG. 4). PD patients with different clinical states were grouped as follows:
(A) according to disease staging, patients with Hoehn-Yahr staging at 1 and 1.5 were defined as early stage PD (epd) (45 cases), patients with Hoehn-Yahr staging greater than 1.5 were defined as middle and late stage PD (M-LPD) (78 cases), and PD patients of different disease stages were diagnosed with the aid of the 36 brain imaging omics feature sets described above.
(B) Patients with a motor subtype ratio >1.0 were defined as tremor primary (PD-TD) patients (49 cases) and the rest as non-tremor primary (PD-nodd) patients (74 cases) according to the motor subtype ratio (ratio of tremor symptoms to dyskinesia/rigidity score). The ability of the 36 features to recognize two motor subtypes of PD was also verified.
(C) According to the fact whether the medicine is taken or not, PD patients are divided into PD patients who do not take anti-Parkinson's disease (drug negative) (41 cases) and PD patients who take the medicine (drug management) (82 cases), and corresponding diagnosis models are also constructed.
The invention can be used for providing objective basis for PD diagnosis and helping to improve the diagnosis accuracy, and has better clinical transformation value. The machine learning process comprises image preprocessing, data segmentation, feature extraction, feature screening and model construction and verification. The research characteristics and the innovation of the invention are that a group of 36 core brain imaging omics characteristics mainly based on the distribution of the substantia nigra iron are obtained, so that the important role of the substantia nigra iron metabolism abnormality in PD characterization is determined, and the validity of the structural characteristics and the combination of the characteristics related to the substantia nigra iron metabolism as the imaging biological markers of PD is verified. Wherein the model test is an internal test (database-244) followed by an external verification in an unused database (database-106) in order to verify the robustness of the diagnostic model. The internal test is to test the capability of a random forest model based on the characteristics of the multi-modal brain imaging group to diagnose PD patients with different clinical states (such as disease stages, medication conditions and movement subtypes) and a normal control group in a training database.

Claims (6)

1. A method for establishing a PD diagnosis model based on multi-modal magnetic resonance imaging omics is characterized by comprising the following steps:
(1) acquiring T1 weighted images and enhanced sensitivity weighted angiography images of PD patients and normal control groups;
(2) obtaining QSM and R2 maps according to the enhanced sensitivity weighted angiography image obtained in the step (1), and segmenting the QSM and R2 maps to obtain 10 subcortical nuclei; segmenting the subcortical nuclei of the T1 weighted image obtained in the step (1) to obtain 8 subcortical nuclei, and segmenting the cortical gray matter of the T1 weighted image to obtain 62 cortical areas; wherein, the T1 weighted image subcortical nucleus segmentation adopts FIRST software; the cortex segmentation of the T1 weighted image adopts ANTs software to carry out field nonuniformity correction, brain tissue extraction, SyN registration, tissue segmentation and cortex thickness calculation, and then based on a DKT mask, 62 cortex regions are obtained by segmentation; fine-tuning the DKT mask according to the skin thickness mask to remove the extracephalic part;
specifically, a STAR-QSM algorithm is used to reconstruct the enhanced sensitivity weighted angiography image acquired in step (1) to obtain QSM and R2 maps, and a segmentation method based on a registration algorithm is used to segment 10 subcortical nuclei of the individual space QSM and R2 maps, specifically:
(2.1.1) co-registering all the individual space QSM images to a QSM template to obtain QSM images in the template space and obtain a transformation matrix; wherein, the registration algorithm is a SyN registration algorithm in ANTs software; the age of the QSM template is more than or equal to 55 years old;
(2.1.2) constructing subcortical nucleus masks of caudate nucleus, putamen, globus pallidus, red nucleus and substantia nigra in QSM template space;
(2.1.3) deforming the subcortical nuclear mass mask in the template space obtained in the step (2.1.2) to an individual space through an inverse transformation matrix of the transformation matrix obtained in the step (2.1.1), so as to obtain the subcortical nuclear mass mask in the individual space; wherein, the subcortical nucleus mask of the individual space is manually adjusted to correct the dislocation of surrounding tissue voxels caused by registration deviation;
(2.1.4) segmenting the R2 map based on the subcortical nuclei mask of the individual space obtained in step (2.1.3) to obtain 10 subcortical nuclei;
(3) extracting brain image omics characteristics aiming at each subcortical nucleus and cortical region segmented in the step (2), and eliminating the influence of age and gender based on a general linear model; the method specifically comprises the following steps:
(3.1) calculating 28 histogram features and 20 texture features within each subcortical nodule for QSM and R2 maps, respectively; the histogram features include calculating average signal intensity, standard deviation, kurtosis, and skewness for 7 signal value ranges, such as 5%, 10%, 20%, 30%, 40%, and 50% of all signal values and their highest values in each subcortical nucleus; the texture characteristic is the average value of 26-direction gray level co-occurrence matrix characteristics corresponding to each subcortical nucleus;
(3.2) extracting the brain imaging omics characteristics in the T1 weighted image, which specifically comprises the following steps: for the T1 weighted image subcortical nuclei, the nuclei normalized volume, 4 histogram features and 20 texture features were calculated; for cortical regions of the T1 weighted image, cortical volume, mass, thickness, and surface area were calculated;
(3.3) preprocessing the brain imaging omics characteristics extracted in the step (3.2), specifically comprising the following steps: according to the brain imaging omics characteristics and the corresponding ages and sexes of normal people, a general linear model is constructed:
Y~β+α 1 X 12 X 2
wherein Y is a brain imaging omics feature, X 1 Is age, X 2 Is sex, beta, alpha 12 Parameters of a general linear model; applying the constructed general linear model to the PD patient and the normal control group in the step (1) to obtain estimated brain imaging omics characteristics; subtracting the estimated brain imaging omics characteristics from the extracted brain imaging omics characteristics in the step (3.2) to obtain the brain imaging omics characteristics without the influence of age and gender; then the brain image omics characteristics after eliminating the influence are processed to [ -1,1]Normalization of (1);
(4) selecting brain image omics characteristics with higher contribution;
(5) and (5) according to the step (4), training a classification model by utilizing the characteristics of the brain imaging omics with higher contribution in the step (3) for diagnosing the PD patient type.
2. The method for building the PD diagnostic model based on multi-modality magnetic resonance imaging omics as set forth in claim 1, wherein in step (1), the T1 weighted image is obtained using the fast spoiling gradient sequence; a gradient echo sequence is used to acquire enhanced sensitivity weighted angiographic images.
3. The method for building a PD diagnostic model based on multi-modal magnetic resonance imaging omics as defined in claim 1, wherein in step (3.2), for the subcortical nuclei of the T1 weighted image, the normalized volume of the nuclei is obtained by multiplying the voxel resolution by the number of voxels in each nuclei and then dividing by the total intracranial volume; dividing the nuclear mass signal by the average value of the whole brain white matter intensity to standardize the nuclear mass signal intensity distribution and then acquiring histogram features and texture features; for the cortical regions of the T1 weighted image, the characteristics of cortical volume and surface area were normalized by dividing the cortical volume of a single cortical region by the total volume of intracranial brain tissue, and dividing the cortical surface area of a single cortical region by the total surface area of intracranial brain tissue.
4. The method for establishing the PD diagnostic model based on the multi-modality magnetic resonance imaging omics as set forth in claim 1, wherein the PD diagnostic model is used by the following method: and (3) sequentially carrying out the steps (2) to (3) on the T1 weighted image to be diagnosed and the enhanced sensitivity weighted angiography image, eliminating the influence of age and gender based on the general linear model, and inputting the extracted brain image omics characteristics with higher contribution degree into the classification model trained in the step (5) according to the step (4) to obtain the corresponding PD patient type.
5. The method for establishing a PD diagnostic model based on multi-modality magnetic resonance imaging omics as set forth in claim 1, wherein in step (4), the feature with higher contribution is selected according to the current brain imaging omics features and the corresponding classification labels, specifically: creating a plurality of random databases by rearranging label columns of the current brain image omics feature data set, using an R language packet 'caret' in combination with a random forest classifier, carrying out non-deviation estimation on classification errors by cross validation to obtain importance parameters of the current brain image omics features, and selecting the brain image omics features with higher importance.
6. The method for building PD diagnostic model based on multi-modality magnetic resonance imaging omics as defined in claim 1,
in the step (5), the classification model adopts a random forest classifier, and comprises two hyper-parameters: number of decision trees and number of nodes on decision trees
Figure FDA0003787549550000031
Wherein, F is the characteristic number of the currently used cerebral imaging omics.
CN202110468956.4A 2021-04-28 2021-04-28 Method for establishing PD (potential of Hydrogen) diagnosis model based on multi-modal magnetic resonance imaging omics Active CN113222915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110468956.4A CN113222915B (en) 2021-04-28 2021-04-28 Method for establishing PD (potential of Hydrogen) diagnosis model based on multi-modal magnetic resonance imaging omics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110468956.4A CN113222915B (en) 2021-04-28 2021-04-28 Method for establishing PD (potential of Hydrogen) diagnosis model based on multi-modal magnetic resonance imaging omics

Publications (2)

Publication Number Publication Date
CN113222915A CN113222915A (en) 2021-08-06
CN113222915B true CN113222915B (en) 2022-09-23

Family

ID=77089878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110468956.4A Active CN113222915B (en) 2021-04-28 2021-04-28 Method for establishing PD (potential of Hydrogen) diagnosis model based on multi-modal magnetic resonance imaging omics

Country Status (1)

Country Link
CN (1) CN113222915B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592847B (en) * 2021-08-10 2023-10-10 浙江大学 Deep learning-based QSM brain deep nucleolus automatic segmentation method
CN114711717A (en) * 2022-03-10 2022-07-08 中国科学院深圳先进技术研究院 Epilepsy drug treatment outcome prediction method and device based on multi-modal imaging omics

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11079453B2 (en) * 2017-08-30 2021-08-03 The Board Of Trustees Of The University Of Illinois System and method for ultrafast magnetic resonance spectroscopic imaging using learned spectral features
EP3766042A4 (en) * 2018-03-14 2021-11-24 Emory University Systems and methods for generating biomarkers based on multivariate mri and multimodality classifiers for disorder diagnosis
CN110544252A (en) * 2019-09-05 2019-12-06 重庆邮电大学 parkinson's disease auxiliary diagnosis system based on multi-mode magnetic resonance brain image
CN111862049B (en) * 2020-07-22 2024-03-29 齐鲁工业大学 Brain glioma segmentation network system and brain glioma segmentation method based on deep learning
CN111753833A (en) * 2020-07-23 2020-10-09 南京脑科医院 Parkinson auxiliary identification method for building brain network modeling based on fMRI and DTI

Also Published As

Publication number Publication date
CN113222915A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
US8280482B2 (en) Method and apparatus for evaluating regional changes in three-dimensional tomographic images
CN111488914B (en) Alzheimer disease classification and prediction system based on multitask learning
CN113222915B (en) Method for establishing PD (potential of Hydrogen) diagnosis model based on multi-modal magnetic resonance imaging omics
Watson et al. Assessment of regional gray matter loss in dementia with Lewy bodies: a surface-based MRI analysis
Klein et al. Early diagnosis of dementia based on intersubject whole-brain dissimilarities
CN112348785B (en) Epileptic focus positioning method and system
Ji et al. Brainstem atrophy in the early stage of Alzheimer’s disease: a voxel-based morphometry study
CN113080876B (en) Parkinson disease depression auxiliary diagnosis method based on functional magnetic resonance image
Irimia et al. Brain segmentation from computed tomography of healthy aging and geriatric concussion at variable spatial resolutions
Grydeland et al. Improved prediction of Alzheimer's disease with longitudinal white matter/gray matter contrast changes
Catricalà et al. Functional correlates of preserved naming performance in amnestic Mild Cognitive Impairment
Tushar et al. Brain tissue segmentation using neuronet with different pre-processing techniques
Yang et al. Diagnosis of Parkinson’s disease based on 3D ResNet: The frontal lobe is crucial
Zheng et al. Application of generalized Split linearized Bregman iteration algorithm for Alzheimer's disease prediction
KR102363221B1 (en) Diagnosis Method and System of Idiopathic Normal Pressure Hydrocephalus Using Brain Image
CN114847922A (en) Brain age prediction method based on automatic fiber bundle identification
CN112837807A (en) Early intelligent high-precision auxiliary diagnosis method for T2DM brain aging cognitive impairment
Winzeck Methods for Data Management in Multi-Centre MRI Studies and Applications to Traumatic Brain Injury
Liu et al. Group comparison of cortical fiber connectivity map: an application between post-stroke patients and healthy subjects
CN115605911A (en) Detecting cognitive disorders in the human brain from images
CN113610742A (en) Whole brain structure volume measurement method and system based on deep learning
Cheng et al. Classification algorithms for brain magnetic resonance imaging images of patients with end-stage renal disease and depression
Wang et al. Voxel-based discriminant map classification on brain ventricles for Alzheimer's disease
Manochandar et al. Classification of Alzheimer’s Disease using Neuroimaging Techniques
TWI796278B (en) Chemo-brain image visualization classifying system and operating method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant