CN115984142B

CN115984142B - Multi-center data correction method for MRI (magnetic resonance imaging) image

Info

Publication number: CN115984142B
Application number: CN202310083221.9A
Authority: CN
Inventors: 张锡哲; 王菲; 董帅
Original assignee: Nanjing Medical University
Current assignee: Nanjing Medical University
Priority date: 2023-02-08
Filing date: 2023-02-08
Publication date: 2023-09-22
Anticipated expiration: 2043-02-08
Also published as: CN115984142A

Abstract

The invention discloses a multi-center data correction method for MRI images, which comprises the steps of firstly, additionally recruiting a batch of mobile tested objects with the same health state, respectively scanning the mobile tested objects in a plurality of centers, integrating independent scanning data of the centers and mobile tested data, secondly, preprocessing the multi-center data, registering the data into a standard Montreal space to align the tested brain tissue characteristics, correcting scanning parameters, eliminating a part of data differences and unifying data specifications, and finally, correcting the data by adopting a multi-center data correction algorithm based on mobile samples, thereby better eliminating the influence of central effects of different scanning equipment, scanning parameters and the like on the data. The invention starts from the brain image data after registration, avoids the data difference caused by different preprocessing flows, fundamentally corrects the data, only needs one correction, and can be applied to all the characteristic data of subsequent processing.

Description

Multi-center data correction method for MRI (magnetic resonance imaging) image

Technical Field

The invention relates to the technical field of computer application, in particular to a multi-center data correction method for MRI images.

Background

fMRI technology (Functional Magnetic Resonance Imaging, functional magnetic resonance imaging technology) has been developed in great extent since the early birth of the 90 th century, and has become an indispensable research tool for clinical and academic research mental science. By measuring the blood oxygen level dependent (Blood oxygenation level dependent, bold) signal of the human brain, fMRI brain images can quantify regional and temporal changes in brain metabolism, thereby reflecting brain health status. The resonance-state fMRI (rs-fMRI, resting-state functional magnetic resonance imaging) technology can reflect spontaneous activities of the brain of a human body in a quiet state, and has been widely used for researching neuropsychiatric diseases such as schizophrenia, autism, alzheimer's disease, major depression and the like.

Although more and more biomedical research is based on fMRI data development, most of the center test data tends to be smaller because fMRI brain image data acquisition costs are higher. For example, the international maximum multi-center major depressive disorder data set Rest-meta-MDD contains 2438 tested data from 25 centers, wherein the tested data amount of only 7 centers is greater than or equal to 100, and the center with the least data amount is only 24 data. Whereas Turner et al's study showed that a small sample size reduced the reproducibility of task-based fMRI (task-based functional magnetic resonance imaging) studies, and authors advocated studies using larger sample size data. Further, the Button et al study also revealed problems arising from biomedical studies based on small sample data, and the authors indicated that in the biomedical field, the small sample data produced studies with high false positive rates and low reproducibility and statistical efficacy.

Studies have shown that there are differences in brain image data from different centers that severely hamper cross-center integrated analysis of brain image data, which also greatly reduces the reproducibility and statistical efficacy of the results. The differences between multicenter brain image data can be mainly attributed to the following two points:

1) Center effects (site effects), that is, differences in functional magnetic resonance image data of each center due to differences in the type of nuclear magnetic resonance equipment, scanning parameters, etc. The scanner effects (scanner effects) from 11 nmr scanners in two multicentric studies are shown in detail in the paper published by Fortin et al in 2018, which states that the magnetic field strength of the scanning device, manufacturer, scanning posture being tested, etc. all have an effect on the measurements of brain images.

2) Differences in patient population distribution are also another important cause of differences in data due to magnetic resonance effects at various center functions. The differences of the distribution of the people visiting the centers are caused by the different recording standards of the patients in the centers, the regional differences and the like. For example, central 2-centered major depressive data collected herein, patients were tested to consist essentially of teenagers under 18 years of age. Previous studies have found that age is closely related to changes in fMRI data. In particular, brain image data of mental diseases of different kinds show different patterns. As demonstrated by Yang et al, the group changes in schizophrenic patients occur predominantly in the ventral lateral prefrontal cortex, striatum and thalamus, while the group changes in major depressive patients occur predominantly in the left motor cortex and parietal lobe, as seen from the fALFF (Fractional amplitude of low-frequency fluctuation) data.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a multi-center data correction method for MRI images. The method starts from the brain image data after registration, avoids the data difference caused by different preprocessing flows, fundamentally corrects the data, only needs one correction, and can be applied to all the characteristic data of subsequent processing. By moving the test data through the multi-center data correction algorithm based on the moving sample, the difference between the multi-center rest state nuclear magnetic resonance data can be reduced as a whole.

The technical scheme adopted by the invention is as follows: a multi-center data correction method facing MRI images comprises the following steps:

step 1, data collection is carried out,

collecting tested data independently scanned by each center, and additionally recruiting mobile tested with the same health state, and scanning the mobile tested data in each center;

step 2, the data are preprocessed, and the data are processed,

after collecting the data, firstly carrying out unified preprocessing on each central data;

step 3, correcting the scanning parameters,

after the registered brain image data are obtained, correcting parameters to obtain multi-center brain image data with uniform specification;

step 4, correcting the data by a multi-center data correction algorithm based on the moving samples,

the algorithm is divided into two steps: firstly, establishing a regression equation for the features, and estimating the mean value, variance, biological variable coefficients and the like of the features by using all the tested data, and secondly, estimating a central effect factor in the regression equation by using the mobile tested data alone.

Further, in step 1, not only the tested data of each center is collected, but also a batch of mobile tested with the same health state is additionally recruited for better correction effect, and each center is scanned respectively.

Further, in step 2, after the data is collected, the original data has problems of scan noise, dislocation of scan layer, head movement of the tested person, inconsistent brain size, and the like. Therefore, before data correction, unified preprocessing is required for each center data. The pretreatment is mainly divided into the following 4 steps:

(1) The first 10 time point data were removed. Because the problems of adaptation to the scanning environment, stability of the gradient magnetic field and the like are required when the tested object just enters the scanning state, conventionally, the time point data with more image noise of the first 10 images are required to be removed, and the time point data with more stable scanning is reserved for subsequent analysis and research;

(2) And (3) adjusting the sequence of scanning layers: in the case of nuclear magnetic resonance scanning, a barrier scan method is generally used to avoid crosstalk caused by partial overlapping of adjacent interlayer spectrum regions. Therefore, the sequence of the scanning layers needs to be adjusted, and the marks of the scanning layers are sequentially arranged from small to small;

(3) Head movement correction: during the scanning process, the situation that the head to be tested moves is unavoidable. If the processing is not performed, image offset occurs, and the data quality is further affected. Therefore, the head movement correction is needed, and the head movement is corrected to the original angle by an algorithm;

(4) Registration to montreal space: the size and shape of the brains of different tests are different, so that the same brain tissues of different tests are misplaced. To solve this problem, it is necessary to register all the brain images tested to the standard montreal space so that the same brain structures are all in the same spatial location.

Further, in step 3, after the registered brain image data is obtained, the difference of the scanning parameters of different center devices is reflected on the brain image data, so the method is selected hereinSeveral important parameters are selected for correction, and on the premise of reducing a part of differences, multi-center brain image data with uniform specification is obtained. In the scan parameters, the radio frequency pulse excitation interval time TR represents the time required for one whole brain scan, the scan layer number Slices represents how many layers are divided together to perform the scan during one whole brain scan, and the scan time point TP represents how many times the whole brain scan is performed. The invention marks the radio frequency pulse excitation interval time of each center as TR ₁ ,TR ₂ …TR _M (M is the number of centers), the number of scanning layers is recorded as Slices ₁ ,Slices ₂ …Slices _M The scanning time point is recorded as TP ₁ ,TR ₂ …TR _M . When the nuclear magnetic resonance scanner works, the whole brain scanning with the number of layers of slice is carried out once in the TR time, the scanning process is repeated for a plurality of times, and finally the whole brain scanning data of TP time points are obtained.

Further, first, the number n from the center m _m Is a single test data of (1)Defining in more detail, the data comprises +.>Layer scan data, wherein->And ith layer scan data indicating a jth scan time point. Second, for the parameter TR for each center, the least common multiple TR shared by them is selected herein _LCM As corrected TR values. TR is set _LCM The number of times the TR value for each center source is denoted as B ₁ ,B ₂ …B _M And average value of the data in each time point of the central brain image data at intervals of corresponding multiple B is shown in formula 3.7, so as to obtain the data +.>To indicate that in the new TR _LCM Data of one scanning layer in interval time, and scanning time point of each center is updated to +.>Then, find the minimum value of the new time points for each center +.>As the TP value after correction, and the front +.>Unifying TP with time point data, since each time point data is independent whole brain scan data, the remaining +.>The individual time points are sufficient to indicate the health status of the test. In addition, since the brain image data has been resampled during the registration of chapter 3.2.4, the problem of multi-center data scan layer data slice differences has been corrected and no separate correction is required.

Taking two centers as an example, the method specifically comprises the following steps when performing scanning parameter correction on two centers Center1 and Center2 used in an experiment. The Center1 Center was scanned for a total of 200 time points, with a radio frequency pulse firing interval of 2000 ms, i.e., a total brain scan of 2000 ms, while the Center2 Center data was scanned for 960 time points, and a total brain scan of 500 ms. The scan parameters are corrected as described above, and first, the time of one whole brain scan at Center2 is adjusted to 2000 ms based on the time of one whole brain scan at Center1, but in 2000 ms, center2 performs 4 whole brain scans, and the four whole brain scan data are averaged and taken as a scan result of 2000 ms time interval, so that the total number of scanning time points of Center2 becomes 230. Then, based on the total of 190 time points of Center1, the first 190 time point data of Center2 were taken. Through the above two steps, the scanning time points of the two centers and the radio frequency pulse excitation interval time are unified. The purpose of this is to unify the data specifications of the two centers to facilitate subsequent data correction operations. .

Furthermore, in step 4, a novel multi-center data correction method for magnetic resonance brain images is provided, and the correction process is performed on the data after registration and parameter correction are completed. It is assumed that the brain image data obtained by scanning the same subject at different centers should be identical. Based on the above assumptions, multi-center data correction is performed using the additional recruited mobile test data and the independent scan data already available at each center. The algorithm is mainly divided into two steps, wherein the first step is to establish a regression equation for the features, the mean value, variance, biological variable coefficient and the like of the features are estimated by using all tested data, and the second step is to estimate a central effect factor in the regression equation by using the mobile tested data alone.

Further, the correction algorithm establishes a regression formula for each brain voxel characteristic value using y _ijv Characteristic value, alpha, representing characteristic v of tested j from center i _v Representing the mean of all tested data features v in the multiple centers.The whole represents biological variable item, < +.>Representing biological variables such as age, sex, etc., beta _v Is a biological variable->Is a coefficient of (a). It has been clarified that the difference between the center data is mainly caused by two reasons, namely, the center effect and the crowd distribution difference, if only the center effect is considered in the regression formula, the crowd distribution difference existing in the center is also taken as a part of the center effect, and is removed. In particular, differences in biological variables can cause differences in data, which reflect important features of the test,and thus cannot be removed in the correction. By introducing biological variable terms in regression formulasIt can be avoided that crowd distribution differences are corrected together with center differences. The specific correction algorithm is shown in formula 3.8:

the central effector is divided into two parts, one is the sum factor and one is the multiplication factor. First of all,the sum factor term representing the central effect overall, < +.>Number of center i where test j is located, θ _v Regression coefficients for center numbering, where the center effect is +.>Marked as gamma _iv The algorithm formula rewrites the following:

second, delta _iv Representing multiplication factor, epsilon _ijv The error value of the feature v of the tested j for the center i. The center effect factor can be estimated more fully by summing the factors and multiplying the factors.

During calculation, the number of mobile tested data is small and cannot represent the overall distribution of the data, and the mobile tested data is singly used for estimating the characteristic mean value alpha _v Biological variable coefficient beta _v And the like may cause certain deviations. The algorithm therefore divides into two steps to calculate the parameters in the regression equation and the regression factor. The first step is to calculate part of the coefficients in the regression equation using all the test data (including the test data scanned independently at each center and the mobile test data). Head partFirstly, obtaining a biological variable coefficient beta by a least square method _v Estimation of (a)Then calculating to obtain the characteristic mean value alpha of the whole data _v Estimate of->And standard deviation estimate of center i feature v>And secondly, calculating a central effect factor in a regression formula by using independent mobile tested data. First, estimate +.A. using the feature mean value obtained in the first step>Standard deviation estimation +.>Biological variable coefficient estimation ++>The eigenvalues are normalized to avoid the error of Bayes estimation caused by the difference of the range of the eigenvalues, and the normalized eigenvalues S _ijv As shown in equation 3.10:

then estimating the sum and multiplication factor gamma by an empirical Bayesian algorithm _iv And delta _iv ObtainingAnd->Final corrected eigenvalue +.>Can be calculated from equation 3.11:

unlike the previous method, when using ComBat to correct multi-center nuclear magnetic image data, the previous researchers generally only select the tested data with independent scanning of each center to perform regression variable estimation, and only consider the quantifiable biological variables such as age and sex among different tested persons during calculation, while neglecting other factors such as mental health state, physiological health state and the like, which can influence the estimation result of the regression variable. Ideally, all of the mobile test data, which is consistent in health and scanned in multiple centers, is used to estimate regression variables, but such experiments are costly and difficult to achieve. Thus, the data correction algorithm herein attempts to combine existing test data that is scanned independently at each center and move the test data to make variable estimates of the regression equation. Each time point data tested was filtered herein using AAL90 (Automated Anatomical Labeling) templates, leaving only brain region voxels to remove cerebellum as well as non-brain region voxels. This not only reduces the amount of computation at correction, but also does not affect the results because some subsequent column analysis and feature data also use only brain region voxels. Finally, each brain region voxel after test correction is combined with the cerebellum and non-brain region voxels before correction to restore Bold signal data

Compared with the prior art, the invention has the beneficial effects that: in view of the fact that the traditional ALFF, reHo, FC and other data can show completely different results under different preprocessing flows, the method and the device start from the registered brain image data, avoid data differences caused by different preprocessing flows, fundamentally correct the data, and can be applied to all characteristic data of subsequent processing only by correcting once. By means of the multi-center data correction algorithm based on the mobile samples, the difference of multi-center data is reduced as a whole by means of the mobile tested data, and the optimal result is obtained for the optimization of the mobile tested data.

Drawings

FIG. 1 is a graph of two center gender age data in an embodiment of the present invention;

FIG. 2 is a graph showing the numerical distribution of the corrected Bold signal data in the embodiment of the invention;

FIG. 3 is a graph of the data values of ALFF and ReHo before and after correction 5 in an embodiment of the present invention;

FIG. 4 is a graph of Wasserstein distance scale for a numerical distribution before and after correction in an embodiment of the present invention;

FIG. 5 is a corrected Bold signal data scatter plot in an embodiment of the invention;

FIG. 6 is a plot of the data scatter of ALFF and ReHo before and after correction in an embodiment of the present invention;

FIG. 7 is a graph of distance measurement results in an embodiment of the invention;

FIG. 8 is a graph of t-test results before and after calibration in an embodiment of the invention;

FIG. 9 is a graph of the effect dose results in an embodiment of the invention.

Detailed Description

The following describes the embodiments of the present invention in further detail with reference to examples.

The embodiment provides a multi-center data correction method for MRI images, firstly, the method additionally recruits a batch of mobile tested objects with the same health state, scans the mobile tested objects at a plurality of centers respectively, integrates independent scanning data of the centers and mobile tested data, secondly, the method preprocesses the multi-center data, registers the data to a standard Montreal (Montreal Neurological Institute, MNI) space to align the characteristics of the tested brain tissues, thirdly, the method performs scanning parameter correction, eliminates a part of data difference and unifies data specification, and finally, the method adopts a multi-center data correction algorithm based on a mobile sample to correct the data, thereby better eliminating the influence of central effects of different scanning equipment, scanning parameters and the like on the data. The following describes the implementation application of the present invention in detail:

1. introduction to data set

In total, the method collects 309 pieces of resting functional magnetic resonance data of Center1 and Center2, wherein Center1 contains 206 pieces of data, 100 pieces of schizophrenic patients are tested, 106 pieces of healthy control are tested, and Center2 contains 103 pieces of data, 50 pieces of major depressive patients and 53 pieces of healthy control are tested. The minimum test age was 8 years old and the maximum test age was 51 years old, and the data collected included a total of 125 male and 184 female tests. Of these, 6 healthy subjects scanned at both centers of Center1 and Center2, respectively, and the scanning time interval at both centers was less than 20 days. The two central nmr scanning apparatuses were CE HDxT 3T and siemens Magnetom prism 3.0T, respectively, and the detailed scanning parameters are shown in table 1. Among these parameters, scanner represents the brand and model of the Scanner, TR (ms) represents the interval Time between two radio-frequency pulse excitations, TE (ms), i.e., echo Time, represents the Time of radio-frequency pulse excitations until maximum Echo is formed, FOV (mm 2), i.e., field-of-view, represents the imaging range of acquired MR images, also referred to as Field of view, slice represents the number of layers scanned, thickness (mm) represents the Thickness of the scanned layers, gap (mm) represents the layer spacing of two-layer scans, and Time Points represent the total number of Time Points scanned.

As can be seen from fig. 1, at the age level, the patients at both centers were tested with a large difference and the independent double sample t test results for age were also significant (P < 0.001), while the healthy test was not significantly different, while the independent double sample t test results also did not show significance (P > 0.05), at the gender level, the patients at both centers were tested and the healthy test were very different, and the independent double sample t test results for gender were neither significant (P > 0.05). In order to avoid the introduction of crowd distribution differences and disease types differences caused by different patient recording standards as much as possible, data correction and analysis are only limited to healthy control data of each center.

Table 1 two center scan device models and main parameters

2. Data preprocessing and feature computation

The method selects data preprocessing software Dpabi (version v4.3.200401) commonly used in the field of magnetic resonance brain images for data processing. Before data correction, 4 steps of preprocessing operation are carried out on the data to obtain registered Bold signal data. And ALFF and ReHo feature data before and after correction are calculated, respectively. Wherein the pretreatment mainly comprises the following 4 steps:

(1) The first 10 time point data were removed. Because the tested needs to adapt to the scanning environment, the stability of the gradient magnetic field and the like when the tested enters the scanning state, conventionally, the first 10 time point data with more image noise need to be removed, and the later stable time point data of scanning is reserved for subsequent analysis and research.

(2) And (5) adjusting the scanning layer sequence. In the case of nuclear magnetic resonance scanning, a barrier scan method is generally used to avoid crosstalk caused by partial overlapping of adjacent interlayer spectrum regions. Therefore, the sequence of the scanning layers needs to be adjusted, and the marks of the scanning layers are sequentially arranged from small to small.

(3) And correcting head movement. During the scanning process, the situation that the head to be tested moves is unavoidable. If the processing is not performed, image offset occurs, and the data quality is further affected. Therefore, it is necessary to make a head movement correction, and the head movement is corrected back to the original angle by an algorithm.

(4) Registration to montreal space. The size and shape of the brains of different tests are different, so that the same brain tissues of different tests are misplaced. To solve this problem, it is necessary to register all the brain images tested to the standard montreal space so that the same brain structures are all in the same spatial location.

3. Scan parameter correction

After the registered brain image data are obtained, as the differences of scanning parameters of different central equipment can be reflected on the brain image data, a plurality of important parameters are selected for correction, and on the premise of reducing a part of differences, the multi-center brain image data with uniform specification are obtained. In the scan parameters, the RF pulse excitation interval time TR represents the time required for a whole brain scan, the scanThe scan number of Slices indicates how many Slices are divided together to perform the scan during one whole brain scan, and the scan time point TP indicates how many times the whole brain scan is performed. The radio frequency pulse excitation interval time of each center is referred to herein as TR ₁ ,TR ₂ …TR _M (M is the number of centers), the number of scanning layers is recorded as Slices ₁ ,Slices ₂ …Slices _M The scan time point is denoted as RP ₁ ,TR ₂ …TR _M . When the nuclear magnetic resonance scanner works, the whole brain scanning with the number of layers of slice is carried out once in the TR time, the scanning process is repeated for a plurality of times, and finally the whole brain scanning data of TP time points are obtained.

First, number n from center m _m Is a single test data of (1)Defining in more detail, the data comprises +.>Layer scan data, wherein->And ith layer scan data indicating a jth scan time point. Second, for the parameter TR for each center, the least common multiple TR shared by them is selected herein _LCM As corrected TR values. TR is set _LCM The number of times the TR value for each center source is denoted as B ₁ ,B ₂ …B _M And average value of the data in each time point of the central brain image data at intervals of corresponding multiple B is shown in formula 3.7, so as to obtain the data +.>To indicate that in the new TR _LCM Data of one scanning layer in interval time, and scanning time point of each center is updated to +.>Then, find the minimum value of the new time points for each center +.>As the TP value after correction, and the front +.>Unifying TP with time point data, since each time point data is independent whole brain scan data, the remaining +.>The individual time points are sufficient to indicate the health status of the test. In addition, since the brain image data has been resampled during the registration of chapter 3.2.4, the problem of multi-center data scan layer data slice differences has been corrected and no separate correction is required.

4. Multi-center data correction algorithm based on moving samples for correcting data

The algorithm is mainly divided into two steps, wherein the first step is to establish a regression equation for the features, the mean value, variance, biological variable coefficient and the like of the features are estimated by using all tested data, and the second step is to estimate a central effect factor in the regression equation by using the mobile tested data alone.

The correction algorithm establishes a regression formula for each brain voxel characteristic value by using y _ijv Characteristic value, alpha, representing characteristic v of tested j from center i _v Representing the mean of all tested data features v in the multiple centers.The whole represents biological variable item, < +.>Representing biological variables such as age, sex, etc., beta _v Is a biological variable->Is a coefficient of (a). It has been clarified hereinbefore that there are differences in the data of different centersThe center effect is the one caused by two reasons, and the crowd distribution difference is the other caused by the center effect, if only the center effect is considered in the regression formula, the crowd distribution difference existing in the center is also taken as a part of the center effect, and is removed. In particular, differences in biological variables can cause differences in data that reflect the important features tested and therefore cannot be removed in the correction. By introducing biological variable terms in regression formula +.>It can be avoided that crowd distribution differences are corrected together with center differences. The specific correction algorithm is shown in equation 3.8.

In the calculation, the number of mobile tested objects is small, and the mobile tested objects cannot represent the whole distribution of the dataEstimating feature mean alpha using moving test data _v Biological variable coefficient beta _v And the like may cause certain deviations. The algorithm therefore divides into two steps to calculate the parameters in the regression equation and the regression factor. The first step is to calculate part of the coefficients in the regression equation using all the test data (including the test data scanned independently at each center and the mobile test data). Firstly, obtaining a biological variable coefficient beta by a least square method _v Estimation of (a)Then calculating to obtain the characteristic mean value alpha of the whole data _v Estimate of->And standard deviation estimate of center i feature v>And secondly, calculating a central effect factor in a regression formula by using independent mobile tested data. First, estimate +.A. using the feature mean value obtained in the first step>Standard deviation estimation +.>Biological variable coefficient estimation ++>The eigenvalues are normalized to avoid the error of Bayes estimation caused by the difference of the range of the eigenvalues, and the normalized eigenvalues S _ijv As shown in equation 3.10.

Then estimating the sum and multiplication factor gamma by an empirical Bayesian algorithm _iv And delta _iv ObtainingAnd->Final corrected eigenvalue +.>Can be calculated from equation 3.11.

5. Verifying the correction result

In the correction result analysis process, the present embodiment refers to a method using the ComBat algorithm in the correction link as "method one", and a method using the multi-center data correction algorithm based on the moving sample in the correction link as "method two". The result verification is carried out from two parts, wherein one part is the correction effect verification of the whole data, the other part is carried out from the aspects of numerical distribution diagram, scatter diagram, numerical independent double-sample t test and the like, and the other part is carried out from 6 mobile test subjects, and the correction effect verification is carried out at the individual level.

(1) Data dimension processing

In the experiment, three mode data such as the Bold signal data of the standard Montreal space, the preprocessed ALFF data and the preprocessed ReHo data and the like are mainly used for carrying out result display. When the numerical distribution map, the data scatter diagram and the data t test analysis are carried out, the dimension of the Bold signal data is too large, and on the premise that the overall effect is not affected, the dimension is reduced by adopting a mode of taking the average value of the dimension of the scanning time point, so that the size of each tested data matrix is reduced from 61 x 73 x 61 x 190 to 61 x 73 x 61, and the subsequent analysis processing is convenient.

(2) Numerical distribution map

The change of the numerical distribution of each center can be intuitively sensed on the whole through the numerical distribution map, wherein the abscissa represents the voxel value, and the ordinate represents the density of the voxel value. Numerical distribution diagrams are drawn for the Bold signal data before and after correction and the ALFF and ReHo data, so as to observe the numerical distribution change condition before and after correction. In addition, wasserstein distances of two central numerical distributions before and after correction are calculated respectively and are used for measuring the similarity of the two distributions, the smaller the Wasserstein distance is, the smaller the distribution is, the smaller the data difference is, and otherwise, the larger the data difference is. The Wasserstein distance of the two center distributions of each mode before correction is respectively taken as a standard in calculation, and the ratio of the Wasserstein distance after correction to the standard is calculated.

As is clear from fig. 2, the numerical distribution of the Bold signal data at the first two centers of correction has a large difference in the Bold signal data, and after the correction of the data using the correction method, it can be found that the peak value and the numerical distribution range of the two centers are significantly reduced. After correction by the first use method, the peak difference of the distribution of the Bold signal data is most obviously reduced, and the distribution ranges of specific numerical values of the two centers also tend to be consistent. Similarly, after correction by the second method, the difference of the distribution peak values of the Bold signal data is greatly reduced, and the numerical distribution range is also reduced as compared with the original value. It can also be seen from fig. 4 that the wasperstein distance of the two center distributions is greatly reduced after correction, and the distribution difference is obviously reduced. This means that both the first and second correction methods can correct the data better from the point of view of the numerical distribution.

In addition, the main purpose of data correction is to serve the characteristic data, and the change conditions of the two characteristic data, namely ALFF and ReHo, obtained by data preprocessing are compared before and after correction. It can be seen from fig. 3 that prior to data correction, the peak value distribution and morphology of the two modal data have large differences, which cause cross-center data fusion of multi-center data and a model application to be a threshold. From the corrected result, the method successfully reduces the numerical distribution peak value and the morphological difference between the two central two mode data. Meanwhile, the second method also successfully reduces the peak value and the morphological difference of two characteristic data distributions in two centers. As can also be seen from fig. 4, the two-center distribution wasperstein distance of the two kinds of feature data of ALFF and ReHo after correction is greatly reduced, and the distribution difference is reduced.

The result of the comprehensive numerical distribution map shows that after the correction method I and the correction method II correct the Bold signal data, the numerical distribution difference of two centers is obviously reduced, and the numerical distribution difference of ALFF and ReHo data obtained by preprocessing the data after the correction of the two centers is also greatly reduced. As for the problem that the corrected two-center data are not in the same distribution, the present embodiment considers that this is caused by individual differences in healthy persons, and there is also a problem of differences in physiological and psychological states in healthy persons, which differences are reflected on brain image data, so that the corrected numerical distribution is not completely uniform.

(3) Data scatter plot and mobile tested distance metric

The distribution of the multi-center data can be observed from the whole through the scatter diagram, and the relation among the tested objects can be observed from the individual angle. In the embodiment, a classical visual dimension reduction algorithm Isomap is selected, and high-dimension Bold signal data and ALFF and ReHo data are reduced to 2 dimensions. The algorithm can ensure that the original relative distance between samples is kept unchanged in the low-dimensional space after the dimension reduction, thereby ensuring that the relationship between the original high-dimensional space data can be restored after the dimension reduction.

Meanwhile, the data points to be tested are subjected to amplification processing, and special labels are carried out, wherein i=1, 2 represents a center number, j=1, 2,3 … 6 represents a mobile test number, so that the change condition of the data points to be tested is observed before and after correction. In addition, in order to ensure the comparability of the scatter data before and after correction, the embodiment respectively uses the Bold signal data before correction, the ALFF and the ReHo data to train the Isomap dimension reduction model, and then applies the trained model to the dimension reduction process of the data after correction by two correction methods. And for more clear observation of data change conditions, the data of the same mode are displayed by adopting the same coordinate scale.

As can be seen from fig. 5, before data correction, the data point distribution forms of the data of different center Bold signals have very significant differences, the data point distances among different centers are far, and the boundaries among centers are obvious. After correction by the first method and the second method, the difference between the data of the two centers is obviously reduced, the data are basically distributed in the same area, and the scatter diagram corrected by the second method can clearly show that 6 moving tested data points of the two centers are scattered at positions with similar distances. Moreover, as can be seen from the data scatter diagrams of the ALFF and ReHo modes before and after correction in fig. 6, the distance between the two center data points is significantly reduced after the correction by the first correction method and the correction, which means that the difference between the two center data points is successfully reduced by the first correction method and the second correction method. In addition, the distance between the 6 moving test data points is also significantly reduced, which is more apparent in the results of method two.

For the dimension reduction scatter data before and after correction in fig. 5 and 6, the euclidean distance between the corresponding scatter points of the 6 moving tested Bold signals, ALFF and ReHo data in the two centers is calculated respectively, so as to measure the difference between centers of each tested data. Because of the order of magnitude difference in distance between different modalities, the average distance ratio of 6 tested is calculated for each modality data by selecting the average distance of the data before correction of 6 tested as a standard. As shown in fig. 7, the average distance ratio after correction by both methods is significantly reduced, and the average distance ratio of the second correction method is significantly reduced, and the average distance between the two center data is increased by the first correction method, while the average distance between the 6 moving test pieces is significantly reduced by the second correction method, as shown in fig. 7, from the Bold signal data. This means that the second correction method achieves a better correction effect by moving the data under test on the basis of the independent scan data.

(4) Independent double-sample t-test of characteristic value

And (3) performing t-test between two central healthy subjects on each voxel point of the brain data to obtain a matrix of brain difference voxels, thereby judging the change condition of the proportion of the difference voxels before and after correction. In this embodiment, the Dpabi software is used to perform t-test of Bold signal data, ALFF and ReHo data, the significance p-value is set to 0.001, and multiple GRF (gaussian random field) comparison correction is performed on the obtained result, so as to obtain a corrected t-value matrix, that is, a t-value matrix with significance difference data. And finally, calculating the proportion of the difference voxels in the corrected t-value matrix to the whole brain voxels, so as to reflect the variation of the difference of the t-test results before and after correction.

The pre-correction results in fig. 8 show that the difference voxels of the two center Bold signal data, ALFF and ReHo data are all highest in the whole brain voxel ratio. From the corrected result, the ratio of the t-test difference data value of the three mode data to the total numerical value is reduced after the correction is carried out by adopting the first method and the second method, and the reduction amplitude of the difference voxel ratio after the correction of the Bold signal data is maximum by the first correction method.

(5) Effect quantity Change

For each data correction method, 6 mobile test data were used for individuality verification. For each data mode of different data correction methods, the Cohen's d effect quantity of the tested data corresponding to the two centers is calculated respectively and is used for counting the difference of each person in the two center data. The specific formula of the Cohen's d effect is as follows:

wherein M is ₁ And M ₂ Respectively, the characteristic mean value of the same tested at two centers, and S _p The concrete representation is as follows:

wherein N is ₁ And N ₂ Representing the number of eigenvalues of two identical data tested in two centers,and->Represented as standard deviations of two central eigenvalues, respectively. The difference between two central data of the same tested sample can be intuitively compared through the calculation of Cohen's d effect quantity. The larger the value of the effect amount is, the more clear the difference between the two tested data isAnd (5) displaying. Finally, in the statistics, the mean of the effect amounts of 6 trials for each correction method was calculated herein.

As can be seen from the result of the effect quantity calculation in fig. 9, the mean value effect quantity of 6 tested is greatly reduced in the corrected data except the ALFF data, and especially the method two is most successful in the correction of 6 tested, the mean value of the effect quantity of Cohen's d reaches the minimum in two modes, which also means that the method two realizes better correction result by the correction "gold standard" by means of the movement test. From the structure of the ALFF data, the ratio of the mean value of the effect amount is increased after correction by the first and second methods, and this phenomenon is considered herein to be caused by further amplification of the data differences during calculation of the different modalities.

6. Analysis of results

From the comprehensive view of the above several indexes, the method for correcting multi-center data based on registered brain image data according to the embodiment can effectively correct data before generating other characteristic data, and can provide reliable support for downstream characteristics and various analysis tasks once and for all. From the above list of results, the multi-center data correction algorithm based on the mobile sample developed by the patent can achieve the effect similar to the effect of using the ComBat correction algorithm on the whole, and the correction effect on the mobile tested object is more obvious. This also suggests that there may be differences in population distribution among healthy individuals due to differences in physiological or psychological health conditions, such that correction may not achieve a fully desirable effect.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It should be understood by those skilled in the art that the above embodiments do not limit the scope of the present invention in any way, and all technical solutions obtained by equivalent substitution and the like fall within the scope of the present invention.

The invention is not related in part to the same as or can be practiced with the prior art.

Claims

1. The multi-center data correction method for the MRI image is characterized by comprising the following steps of:

step 1, data collection is carried out,

step 2, the data are preprocessed, and the data are processed,

step 3, correcting the scanning parameters,

in the step 3, in the scan parameters, the radio frequency pulse excitation interval time TR represents the time required for one whole brain scan, the number of scan layers Slices represents how many layers are divided to perform the scan in one whole brain scan process, and the scan time point TP represents how many times of whole brain scan are performed; the radio frequency pulse excitation interval time of each center is recorded as TR ₁ ，TR ₂ ...TR _M M is the center number, and the number of scanning layers is recorded as Slices ₁ ，Slices ₂ ...Slices _M The scanning time point is recorded as TP ₁ ，TR ₂ ...TR _M The method comprises the steps of carrying out a first treatment on the surface of the When the nuclear magnetic resonance scanner works, performing full brain scanning with the number of layers of Slics once in TR time, and repeating the scanning process for a plurality of times to finally obtain full brain scanning data of TP time points;

first, number n from center m _m Is a single test data of (1)Defining in more detail, the data comprises +.>Layer scan data, wherein->Ith layer scan data indicating a jth scan time point; next, for each center parameter TR, the smallest common multiple TR is selected _LCM As corrected TR value, TR is taken _LCM The number of times the TR value for each center source is denoted as B ₁ ,B ₂ …B _M And average the data of each central brain image data at intervals of corresponding multiple B to obtain average data +.>To indicate that in the new TR _LCM Data of one scanning layer in interval time, and scanning time point of each center is updated to +.>Then, find the minimum value of the new time points for each center +.>As the TP value after correction, and the front +.>The TP is unified by the time point data, and the left +.>The time points are sufficient to indicate the health status of the test;

the algorithm is divided into two steps: firstly, establishing a regression equation for the features, and estimating the mean value, variance and biological variable coefficient of the features by using all tested data, and secondly, estimating a central effect factor in the regression equation by using the mobile tested data alone;

in the step 4, the correction algorithm is characterized by each brain voxelEstablishing a regression formula of the values by using y _ijv Characteristic value, alpha, representing characteristic v of tested j from center i _v Representing the mean of all tested data features v in the multi-center,the whole represents biological variable item, < +.>Representing biological variables beta _v Is a biological variable->Coefficients of (2); the specific correction algorithm is shown in the following formula:

the central effector is divided into two parts, one is an addition factor and the other is a multiplication factor; first of all,the sum factor term representing the central effect overall, < +.>Number of center i where test j is located, θ _v Regression coefficients for center numbering, where the center effect is +.>Marked as gamma _iv The algorithm formula rewrites the following:

second, delta _iv Representing multiplication factor, epsilon _ijv Tested for center iThe error value of the j feature v; the central effect factor can be estimated more comprehensively through the addition factor and the multiplication factor;

the algorithm is divided into two steps to calculate the parameters and regression factors in the regression formula: firstly, calculating partial coefficients in a regression formula by using all tested data, and firstly obtaining a biological variable coefficient beta by a least square method _v Estimation of (a)Then calculating to obtain the characteristic mean value alpha of the whole data _v Estimate of->And standard deviation estimate of center i feature v>Secondly, calculating a central effector in a regression formula by using independent mobile tested data, and firstly, estimating +.f. by using the characteristic mean value obtained in the first step>Standard deviation estimation +.>Biological variable coefficient estimation ++>The eigenvalues are normalized to avoid the error of Bayes estimation caused by the difference of the range of the eigenvalues, and the normalized eigenvalues S _ijv The following formula is shown:

then estimating the sum and multiplication factor gamma by an empirical Bayesian algorithm _iv And delta _iv ObtainingAnd->Final corrected eigenvalue +.>Can be calculated by the following formula:

2. the MRI image-oriented multi-center data correction method as set forth in claim 1, wherein the preprocessing in step 2 is divided into four steps:

(1) Removing the first 10 time point data;

(2) And (3) adjusting the sequence of scanning layers: during nuclear magnetic resonance scanning, the interlayer scanning mode is adopted to adjust the sequence of scanning layers, and the marks of the scanning layers are sequentially arranged from small to small;

(3) Head movement correction: in the scanning process, head movement correction is carried out, and the head movement is corrected to the original angle;

(4) Registration to montreal space: all brain images tested were registered to the standard montreal space so that the same brain structures were all in the same spatial location.

3. The MRI image-oriented multi-center data correction method according to claim 1, wherein the mean value of the data in each time point of the central brain image data at every corresponding multiple B is represented by the following formula: