CN115337000B

CN115337000B - Machine learning method for evaluating brain aging caused by diseases based on brain structure images

Info

Publication number: CN115337000B
Application number: CN202211276691.9A
Authority: CN
Inventors: 张瑜; 王凯凯; 孙超良; 张欢; 王志超; 钱浩天
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2022-12-20
Anticipated expiration: 2042-10-19
Also published as: CN115337000A

Abstract

The invention discloses a machine learning method for evaluating brain aging caused by diseases based on brain structure images, which extracts structural features of different brain areas, including structural features of different brain areas, such as thickness, volume and the like of cerebral cortex from brain structure magnetic resonance images. Because not all features are helpful for model prediction, the features are screened, and the screened features which can be generalized and more concise and effective on different training subsets are used for constructing a brain age prediction model based on ridge regression. And (5) finding out the repeatedly identified features in the k models by adopting k-fold cross validation, and positioning the structural features of the brain region most relevant to brain age prediction. And finally, predicting the trained model on patient data to evaluate the degree of the disease affecting the brain aging.

Description

Machine learning method for evaluating brain aging caused by diseases based on brain structure images

Technical Field

The invention relates to the technical field of neuro-image data analysis, in particular to a machine learning method for evaluating brain aging caused by diseases based on brain structure images.

Background

Brain aging is a natural process, but in this process, there are significant individual differences in changes in brain volume, cortical thickness, and the like. Brain development follows a specific pattern during normal aging, which means that one can predict normal age based on brain development patterns. The brain age prediction not only has important scientific significance, but also has wide clinical value. Studies have shown that various types of neurological and metabolic disorders such as schizophrenia, metabolic disorders, diabetes, and cardiac decline are associated with brain aging. The age of the brain generated by the brain age prediction model is well used to evaluate the correlation of these diseases with brain aging. If a subject's brain is older than the actual physiological age, the subject may have an older brain, or a higher degree of brain aging, deviating from the normal aging trajectory, with a higher risk of having the associated disease. Vice versa, a subject may have a young brain if the subject's brain age is less than the actual physiological age. The difference between the predicted brain age and the actual age of a given individual is referred to as "brain age difference". This value is believed to reflect diffuse, multivariate morphological changes throughout the brain and may be a marker of overall brain health. The degree of deviation of the aging trajectory of the brain from the average aging trajectory of a healthy brain can reflect the risk of the subject to suffer from neurodegenerative diseases in the future. Therefore, a model is constructed based on brain aging characteristic patterns contained in neuroimaging data, and aging tracks of the individual brain are detected, so that a new method can be provided for researching changes of the brain in an aging process and how brain diseases influence normal brain aging.

Structural Magnetic Resonance Imaging (srmri) has the unique ability to non-invasively study brain structures, and presents different structural features in its image data for both health and patients. By utilizing the characteristics, the corresponding brain age prediction of the tested brain can be carried out, and the biological age of the brain can be estimated. Different brain maps have different effects on predicting brain age because different brain maps have different bases for dividing the whole brain.

In recent years, with the development of artificial intelligence, brain age prediction of different individuals by using imaging data based on machine learning is one of the clinical key research directions, so that the diagnosis rate of diseases can be effectively improved, and meanwhile, beneficial guidance is provided for the formulation of disease treatment schemes. Conventionally, in studies of brain age prediction in imaging, various regression models such as Support Vector Regression (SVR), ridge regression, and Random Forest (RF) are used to predict the brain age. Ridge regression is a special biased estimation regression method for collinear data analysis, and is essentially an improved least square estimation method, and a regression coefficient is obtained by giving up unbiased property of the least square method at the cost of losing part of information and reducing precision, so that the regression method is more practical and reliable. These studies confirm that brain age prediction algorithms based on mr brain structure information are of great help for clinical diagnosis. However, features related to brain age, such as cortical thickness, cortical area, curvature, etc., are extracted from the structural magnetic resonance image. Combining these features results in high-dimensional features for which feature selection is required. At present, in the brain age research of imaging, a principal component analysis method, a partial least square method and the like are commonly adopted for dimension reduction, but the dimension reduction methods have ambiguity and poor interpretability, so how to select a feature selection algorithm is important for brain age prediction. In addition, the current brain age prediction model based on machine learning rarely locates brain region features related to brain age, and locating key brain regions affecting brain age has important significance for model interpretability and clinical diagnosis. Meanwhile, measuring the degree of brain aging by the difference between the age of the brain and the actual age contributes to early detection and identification of diseases. Therefore, a new prediction method is needed to solve the above problems.

Disclosure of Invention

The invention aims to provide a machine learning method for evaluating brain aging caused by diseases based on brain structure images, aiming at the defects of the prior art. The method can screen out the characteristics which can be generalized and are more simple and effective on different training subsets, and can also locate the structural characteristics of the brain area which have the greatest contribution to the prediction of the brain age. The trained model is preferably predicted on patient data to enable assessment of the extent to which the disease affects brain aging.

The invention is realized by the following technical scheme: a machine learning method for evaluating brain aging caused by diseases based on brain structure images comprises the following steps:

s1: preprocessing and characteristic extraction are carried out on the brain structure magnetic resonance image;

s2: performing feature screening on the brain structure features extracted in the step S1 by adopting a Bootstrap self-development method;

s3: constructing a ridge regression brain age prediction model according to the screening result of the S2;

s4: positioning a brain region which has the maximum contribution to the predicted brain age by adopting k-fold cross validation;

s5: training and testing the constructed brain age prediction model;

s6: tests were conducted using independent patient data sets to assess the extent to which disease affects brain aging.

Preferably, the step S1 includes the following substeps:

s1.1: dividing the acquired clinical data into a type 0 patient, a type 1 patient and a healthy person according to clinical manifestations, and removing non-brain structure images in brain structure magnetic resonance images of the patients;

s1.2: extracting the magnetic resonance image of the target tissue according to the brain planning structure;

s1.3: adopting an image segmentation algorithm to segment the magnetic resonance image of the target tissue into 3 different tissues according to the structures of grey brain matter, white brain matter and cerebrospinal fluid;

s1.4: performing cortical reconstruction on the segmented tissue through a FreeSharfer software package, quantifying the function, connection and structural attributes of the human brain, performing three-dimensional reconstruction on the structural image to generate a flattened or flatwise image, and obtaining anatomical parameters of cortex thickness, curvature, area and gray matter volume of different brain areas by using different brain maps.

Preferably, the step S2 comprises the following substeps:

s2.1: setting the proportion of the sampling subset to the health group data set aiming at the health group data set;

s2.2: setting sampling times and sampling modes, namely, how many times of sampling with the return is executed;

s2.3: sampling according to the proportion set in the S2.1 and the times and modes set in the S2.2, calculating the correlation between the characteristics such as cortex thickness, cortex surface area and the like and the pilsner of the brain age for the sampled subsets, reserving the structural characteristics with significant correlation as candidate characteristics, and counting the frequency of the characteristics appearing in the sampling;

s2.4: and setting the frequency value of the feature extracted from the sampling subset, and taking the screened feature as the final feature of the model according to the frequency value.

Preferably, the step S3 comprises the following substeps:

s3.1: randomly dividing a health group data set into a training set and a testing set according to a proportion;

s3.2: standardizing the characteristic data of the divided training set and test set, unifying the data from different sources under a reference system for convenient comparison, accelerating the convergence when the security program runs, and accelerating the convergence speed after most models are normalized;

s3.3: defining a value range of an alpha parameter of the ridge regression model;

s3.4: defining the evaluation index of cross validation as R2, and searching for an optimal parameter in the value range of alpha parameters of the ridge regression model by using k-fold cross validation, namely the parameter with the highest model accuracy;

s3.5: and taking the optimal parameters as model parameters under the brain template.

Preferably, the step S4 includes the following substeps:

s4.1: performing k-fold cross validation on the training set, and setting a cross validation evaluation index as R2;

s4.2: obtaining the feature weights of k models through coef _ parameters of ridge regression models, sequencing the weights of the k models from large to small, and respectively obtaining the structural features corresponding to the first h feature weights of the k models;

s4.3: features that recur in these k models are identified, and structural features of the brain region that contribute most to the predicted brain age are located.

Preferably, the step S5 includes the following substeps:

s5.1: selecting a model with the best test score in the k-fold cross validation under n brain templates, and retraining the model on the whole training set;

s5.2: performing model test on the test set to obtain the brain age of the model test;

s5.3: and calculating MAE, R2, pearson correlation coefficient and average error between the real age and the predicted brain age, taking the MAE, R2, pearson correlation coefficient and average error as evaluation indexes of the model, and finally selecting a brain age prediction model established by a brain atlas as an optimal model.

Preferably, the step S6 includes the following substeps:

s6.1: normalizing the independent patient group data, loading the trained brain age prediction model, testing the patient test set by using the trained optimal model, and acquiring the brain age predicted by the model;

s6.2: calculating the average error between the real age and the predicted brain age, and comparing the average error with the average error tested by the health test set, wherein if the average error of the patient test set is higher than that of the health group, the disease can cause the brain aging of the patient;

s6.3: generating a fit line between the real value and the predicted value of the health group dataset and the patient group dataset, and comparing the slopes of the two fit lines to prove that the brain of the patient deviates from the aging track of the healthy brain.

Preferably, the brain atlas used in step S1.5 includes AAL, DKT, destrieux and Brainnetome.

The invention provides a machine learning method for evaluating brain aging caused by diseases based on brain structure images, which extracts structural features of different brain areas, including structural features of different brain areas, such as thickness, volume and the like of cerebral cortex from brain structure magnetic resonance images. Because not all features are helpful for model prediction, the features are screened, and the screened features which can be generalized and more concise and effective on different training subsets are used for constructing a brain age prediction model based on ridge regression. And (5) finding out the repeatedly identified features in the k models by adopting k-fold cross validation, and positioning the structural features of the brain region most relevant to brain age prediction. And finally, predicting the trained model on patient data to evaluate the degree of the disease affecting the brain aging. The method can locate the structural features of the brain region which have the greatest contribution to the prediction of the brain age, and can predict the trained model on the data of the patient, thereby being capable of evaluating the degree of the disease influencing the brain aging.

Drawings

FIG. 1 is a flow chart of a machine learning method for assessing disease-induced brain aging based on brain structure images;

FIG. 2 is a schematic diagram of Bootstrap autofrettage feature screening;

FIG. 3 is a diagram of the results of a brain age prediction model test set.

Detailed Description

The invention will be further described with reference to the accompanying drawings. In order to make the technical solutions in the present application better understood, the present invention will be further described with reference to the accompanying drawings. This is only a subset of the embodiments of the present application and not all embodiments. Other embodiments, which can be derived by others skilled in the art from the specific embodiments described herein without making any inventive step, are intended to fall within the scope of the present inventive concept.

The invention relates to a machine learning method for evaluating brain aging caused by diseases based on brain structure images, which comprises the following steps:

s4: positioning a brain region which has the maximum contribution to predicting the brain age by adopting k-fold cross validation;

s5: training and testing the constructed brain age prediction model;

s6: tests were performed using independent patient panel data sets to assess the extent to which disease affects brain aging.

In the step S1, the preprocessing and feature extraction of the magnetic resonance image of the brain structure includes the following steps: first, the original structural image of the magnetic resonance data contains some non-brain structures, such as skull. Because the skull signal is not used in the subsequent analysis and the signal-to-noise ratio of the image edges is poor, it is necessary to remove non-brain structures such as the skull in the image in an image preprocessing operation. Then, in magnetic resonance image processing, sometimes only the states of certain specific regions are concerned, which requires that the tissue of the target region be extracted according to the anatomical structure of the brain. In the preprocessing procedure, the brain image is segmented into 3 different tissues according to the structures of the gray matter, the white matter and the cerebrospinal fluid, because the three tissues have different functions in the brain, so that an image segmentation algorithm is required in the step. And finally, carrying out cortical reconstruction through a FreeScherfer software package, quantifying the function, connection and structural attributes of the human brain, carrying out three-dimensional reconstruction on the structural image, generating a flattened or flatly expanded image, and obtaining anatomical parameters such as cortex thickness, curvature, area, gray matter volume and the like. 4 commonly used brain maps are adopted to respectively obtain the structural characteristics of the brain, including AAL, DKT, desrieux and Brainnetome. The method specifically comprises the following substeps:

s1.1: dividing the acquired clinical data into type 0 patients, type 1 patients and healthy people according to clinical manifestations, and removing non-brain structure images in the brain structure magnetic resonance images of the patients;

s1.2: extracting the magnetic resonance image of the target tissue according to the planning structure of the brain;

s1.3: adopting an image segmentation algorithm to segment the magnetic resonance image of the target tissue into 3 different tissues according to the structures of the grey brain matter, the white brain matter and the cerebrospinal fluid;

s1.4: performing cortex reconstruction on the segmented tissues through a FreeScherfer software package, quantifying the functions, connection and structural attributes of the human brain, performing three-dimensional reconstruction on the structural image to generate a flattened or flatly expanded image, and obtaining anatomical parameters of cortex thickness, curvature, area and gray matter volume of different brain areas by using different brain maps.

In the step S2, the characteristic screening by adopting a Bootstrap self-development method comprises the following steps: firstly, aiming at a health group data set, setting the proportion of a sampling subset in the data set as r%, namely r% of samples are used for constructing a sampling sample set; secondly, setting sampling times and modes, namely executing s times of non-return sampling; then, calculating the correlation between each feature and the pilsner of the brain age, reserving the structural features (p < 0.05) with significant correlation as candidate features, and counting the frequency of the occurrence of the features in s times of sampling; and finally, setting the frequency of the feature extracted by the sampling subset to be t, namely, the feature screened t times in s times of sampling as the final feature of the model. The final size of the feature set is m × n, m is the number of samples, and n is the dimension of the structural feature of each sample. The features screened out can be generalized and more succinctly effective across different training subsets. The method specifically comprises the following substeps:

s2.3: sampling according to the proportion set in S2.1 and the times and modes set in S2.2, calculating the correlation between the characteristics such as cortical thickness and cortical surface area and the pilsner of the brain age for the sampled subsets, reserving the structural characteristics with significant correlation as candidate characteristics, and counting the frequency of the characteristics in sampling;

In the step S3, the construction of the ridge regression brain age prediction model comprises the following steps: firstly, randomly dividing a health group data set into a training set and a testing set according to a ratio a to b; secondly, standardizing the characteristic data of the divided training set and test set, wherein the mean value of the standardized data is 0, and the standard deviation is 1; then, defining an alpha parameter value range (0.01, 0.1,1,3.. 60) of the ridge regression model, defining an evaluation index of cross validation as R2, and searching an optimal parameter in the value range by using k-fold cross validation; and finally, taking the optimal parameters as model parameters under 4 brain templates. The method specifically comprises the following substeps:

s3.3: defining the value range of alpha parameters of the ridge regression model;

s3.4: defining the evaluation index of cross validation as R2, and searching an optimal parameter in the value range of alpha parameters of the ridge regression model by using k-fold cross validation, namely the parameter with the highest model accuracy;

In step S4, locating a brain region that contributes most to predicting brain age by using k-fold cross validation includes the following steps: firstly, performing k-fold cross validation on a training set, and setting a cross validation evaluation index as R2; then, obtaining the feature weights of k models through coef _ parameters of the ridge regression model, sequencing the weights of the k models from large to small, and respectively obtaining the structural features corresponding to the first h feature weights of the k models; finally, features that recur in the k models are identified, and structural features of the brain region that are most relevant to brain age prediction are located. The method specifically comprises the following substeps:

s4.2: obtaining the feature weights of k models through coef _ parameters of a ridge regression model, sequencing the weights of the k models from large to small, and respectively obtaining the structural features corresponding to the first h feature weights of the k models;

In the step S5, the training and testing of the built brain age prediction model comprises the following steps: firstly, selecting a model with the best test score in k-fold cross validation under n brain templates, and retraining the model on the whole training set; secondly, performing model test on the test set to obtain the age of the brain of the model test; and finally, calculating MAE, R2, pearson correlation coefficient and average error between the real age and the predicted brain age, taking the coefficients as evaluation indexes of the model, and finally selecting the brain age prediction model established by the brain atlas as an optimal model. The method specifically comprises the following substeps:

s5.1: selecting a model with the best test score in the k-fold cross validation under the n brain templates, and retraining the model on the whole training set;

In step S6, the test using the independent patient group dataset to assess the extent to which diseases affect brain aging comprises the following steps: firstly, the size of a patient group data set is p x n, p represents the number of patient samples, n is the structural feature dimension of each sample, the patient data is normalized, a trained brain age prediction model is loaded, a patient test set is tested by using a trained optimal model, and the brain age predicted by the model is obtained; then, calculating the average error between the real age and the predicted brain age, and comparing the average error with the average error tested by a health test set, wherein if the average error of the patient test set is higher than that of a health group, the disease can cause the brain aging of the patient; finally, in order to further verify that the brains of the patient groups have aging phenomena, fitting lines between the actual values and the predicted values of the healthy groups and the patient groups are generated, and the slopes of the two fitting lines are compared to prove that the brains of the patients deviate from the aging tracks of the healthy brains. The method specifically comprises the following substeps:

s6.1: carrying out normalization processing on independent patient group data, loading a trained brain age prediction model, and testing a patient test set by using the trained optimal model to obtain the brain age predicted by the model;

s6.3: and generating a fit line between the real value and the predicted value of the health group data set and the patient group data set, and comparing the slopes of the two fit lines to prove that the brain of the patient deviates from the aging track of the healthy brain.

In general, the present invention provides a machine learning method for evaluating brain aging caused by diseases based on brain structure images. The method can locate the structural features of the brain region which have the greatest contribution to the prediction of the brain age, and can predict the trained model on the data of the patient, thereby being capable of evaluating the degree of the disease influencing the brain aging. As shown in fig. 1, the overall method flow includes, first, preprocessing and feature extraction on structural magnetic resonance data, and the flow includes: removing skull and non-brain tissues of the brain image, segmenting gray matter, white matter and cerebrospinal fluid of the image, and performing cortical reconstruction on the structural image to generate a flattened image. After the preprocessing is completed, structural characteristics such as thickness, area, curvature, volume and the like of cortex of different brain areas are obtained through statistics, and combined brain area characteristics are constructed. Secondly, screening the combination features based on different brain regions based on a Bootstrap self-development method, namely screening t times of features in s times of samples as final features of the model, and generalizing and more simply and effectively on different training subsets. And then, constructing a ridge regression brain age prediction model, and obtaining ridge regression optimal alpha parameters through k-fold cross validation. And thirdly, identifying the characteristics which repeatedly appear in the k models by adopting k-fold cross validation, and positioning the structural characteristics of the brain region which are most relevant to the brain age prediction. The model is then trained and tested based on the normal group dataset. And finally, testing the independent patient group dataset based on the trained brain age prediction model to prove that the brain of the patient has aging symptoms.

Example 1

This example uses clinical data collected by hospitals, and the data is classified into type 0 patients, type 1 patients and healthy persons according to clinical manifestations. The data are collated and quality controlled, and the patients who are finally grouped comprise 138 patients of type 0, 94 patients of type 1 and 109 healthy people.

The specific implementation process of the method comprises the following steps:

(1) Preprocessing brain structure magnetic resonance images and extracting characteristics: first, the original structural image of the magnetic resonance data contains some non-brain structures, such as skull. Since the skull signal is not used in the subsequent analysis and the signal-to-noise ratio of the image edge is poor, it is necessary to remove the non-brain structures such as the skull in the image in an image preprocessing operation. Then, in magnetic resonance image processing, sometimes only the states of certain specific regions are concerned, which requires that the tissue of the target region be extracted according to the anatomical structure of the brain. In the preprocessing procedure, the brain image is segmented into 3 different tissues according to the structures of the gray matter, the white matter and the cerebrospinal fluid, because the three tissues have different functions in the brain, so that an image segmentation algorithm is required in the step. And finally, performing cortical reconstruction through a FreeSharfer software package, quantifying the function, connection and structural attributes of the human brain, performing three-dimensional reconstruction on the structural image, generating a flattened or flatwise image, and obtaining anatomical parameters such as cortical thickness, curvature, area, gray matter volume and the like. 4 commonly used brain maps are adopted to respectively obtain the structural characteristics of the brain, including AAL, DKT, desrieux and Brainnetome.

(2) As shown in fig. 2, the signature screening was performed by using a Bootstrap self-development method: firstly, setting the proportion of a sampling subset in a data set to be 80% aiming at a health group data set, namely, r% of samples are used for constructing a sampling sample set; secondly, setting sampling times and modes, namely executing s times of non-return sampling; then, calculating the correlation between each feature and the pilsner of the brain age, reserving structural features (p < 0.05) with significant correlation as candidate features, and counting the frequency of the occurrence of the features in s times of sampling; and finally, setting the frequency of occurrence of the features extracted by the sampling subset to be t, namely, screening the features t times in s times of sampling to be used as the final features of the model. The final size of the feature set is m × n, m is the number of samples, and n is the feature dimension of each sample structure. The features screened out can be generalized and more succinctly effective across different training subsets.

(3) Constructing a ridge regression brain age prediction model: firstly, randomly dividing a health group data set into a training set and a testing set according to a proportion of a to b; secondly, standardizing the characteristic data of the divided training set and test set, wherein the mean value of the standardized data is 0, and the standard deviation is 1; then, defining an alpha parameter value range (0.01, 0.1,1,3.. 60) of the ridge regression model, defining an evaluation index of cross validation as R2, and searching an optimal parameter in the value range by using k-fold cross validation; and finally, taking the optimal parameters as model parameters under 4 brain templates. The results of the ridge regression brain age prediction are shown in FIG. 3.

(4) And (3) positioning a brain region which has the maximum contribution to the predicted brain age by adopting k-fold cross validation: firstly, performing k-fold cross validation on a training set, namely dividing the training set into k parts, taking k-1 parts as training data and 1 part as test data in turn, performing a test, and setting a cross validation evaluation index as R2; then, obtaining the feature weights of k models through coef _ parameters of the ridge regression model, sequencing the weights of the k models from large to small, and respectively obtaining the structural features of different brain areas corresponding to the first h feature weights of the k models; finally, identifying the characteristics which repeatedly appear in the k models, and positioning the structural characteristics of the brain area which greatly contributes to the brain age prediction, wherein the larger the characteristic weight is, the larger the contribution of the structural characteristics of the brain area to the brain age prediction is.

(5) Training and testing the built brain age prediction model: firstly, selecting a model with the best test score in k-fold cross validation under 4 brain templates, and retraining the model on the whole training set; secondly, performing model test on the test set to obtain the age of the brain of the model test; and finally, calculating MAE, R2, pearson correlation coefficient and average error between the real age and the predicted brain age, taking the MAE, R2, pearson correlation coefficient and average error as evaluation indexes of the model, and finally selecting a brain age prediction model established by the brain atlas as an optimal model.

(6) Tests were performed using independent patient data sets to assess the extent to which disease affects brain aging: firstly, the size of a patient group data set is p x n, p represents the number of patient samples, n is the structural feature dimension of each sample, the patient data is normalized, a trained brain age prediction model is loaded, a patient test set is tested by using a trained optimal model, and the brain age predicted by the model is obtained; then, calculating the average error between the real age and the predicted brain age, and comparing the average error with the average error tested by the health test set, wherein if the average error of the patient test set is higher than that of the health group, the disease can cause the brain aging of the patient; finally, in order to further verify that the brains of the patient groups have aging phenomena, fitting lines between the actual values and the predicted values of the healthy groups and the patient groups are generated, and the slopes of the two fitting lines are compared to prove that the brains of the patients deviate from the aging tracks of the healthy brains.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A machine learning method for evaluating brain aging caused by diseases based on brain structure images is characterized by comprising the following steps:

s1: preprocessing a brain structure magnetic resonance image and extracting characteristics of the brain structure magnetic resonance image;

s2: screening the structural features of the brain extracted in the step S1 by adopting a self-development method;

s5: training and testing the constructed brain age prediction model;

s6: testing using an independent patient group dataset to assess the extent to which disease affects brain aging;

said step S2 comprises the following sub-steps:

s2.2: setting sampling times and a sampling mode;

s2.3: sampling according to the proportion set in the S2.1 and the times and modes set in the S2.2, calculating the correlation between the cortex thickness and the cortex surface area characteristic of the sampled subset and the pilsner of the brain age, reserving the structural characteristic with obvious correlation as a candidate characteristic, and counting the frequency of the characteristic appearing in the sampling;

s2.4: setting frequency values of the features extracted from the sampling subsets, and taking the screened features as final features of the model according to the frequency values;

the step S3 comprises the following substeps:

s3.2: standardizing the characteristic data of the divided training set and test set;

s3.5: taking the optimal parameters as model parameters under a brain template;

the step S4 comprises the following substeps:

s4.3: identifying the characteristics which repeatedly appear in the k models, and positioning the structural characteristics of the brain region which have the greatest contribution to the prediction of the brain age;

said step S5 comprises the following sub-steps:

2. The method for machine learning to evaluate brain aging caused by diseases based on brain structure images as claimed in claim 1, wherein the step S1 comprises the following sub-steps:

3. The method for machine learning to assess brain aging caused by diseases based on brain structure images as claimed in claim 1, wherein the step S6 comprises the following sub-steps:

4. The method for machine learning based on brain structure image to assess brain aging caused by diseases according to claim 2, wherein the brain atlas adopted in step S1.4 includes AAL, DKT, destrieux and Brainnetome.