CN114359642A - Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer - Google Patents

Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer Download PDF

Info

Publication number
CN114359642A
CN114359642A CN202210030228.XA CN202210030228A CN114359642A CN 114359642 A CN114359642 A CN 114359642A CN 202210030228 A CN202210030228 A CN 202210030228A CN 114359642 A CN114359642 A CN 114359642A
Authority
CN
China
Prior art keywords
image
organ
projection
model
modal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210030228.XA
Other languages
Chinese (zh)
Inventor
王洪凯
刘林琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202210030228.XA priority Critical patent/CN114359642A/en
Publication of CN114359642A publication Critical patent/CN114359642A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10104Positron emission tomography [PET]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer, belonging to the technical field of medical image processing. The invention utilizes a conditional Gaussian model and a self-attention mechanism of a Transformer to simulate the correlation of the position and the size between organs. The one-to-one target query architecture enforces a unique target query for each target organ, and the query sequence is a prediction category, so that classification is not needed, the network structure is simplified, redundant calculation is reduced, and the learning convergence speed is higher. Before organ detection is executed, 3D multi-modal images are projected onto two orthogonal 2D planes, complementary information from the multi-modal images is combined through a multi-modal fusion method, and finally the obtained 2D bounding box is back projected to obtain a 3D bounding box, so that the calculation burden is reduced, and a more stable organ positioning result is obtained.

Description

Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer
Technical Field
The invention belongs to the technical field of medical image processing, and particularly relates to a multi-modality medical image multi-organ positioning method based on one-to-one object query Transformer.
Background
The multi-modality medical images are imaging modalities commonly used in clinical medicine, such as Positron Emission Tomography/electron Computed Tomography (PET/CT), T1, T2 in nuclear magnetic scanning, proton density map, multiple spectral images in spectral CT, and the like. Taking a PET/CT image as an example, wherein the PET image is a functional image, reflects the distribution state of the radioactive tracer in the body, and can be used for diagnosing benign and malignant tumors and quantifying tissue metabolism; the CT image reflects the absorption degree of human organs and tissues to X-rays, and can clearly display the human anatomy structure. With the popularization of multi-modal medical imaging, a large amount of image data is acquired every day, and the diagnosis burden of doctors is increasing. Automatic organ localization methods may help reduce image reading time and provide specific organ regions for subsequent computer-aided diagnosis. Therefore, achieving automatic localization of multiple organs quickly and accurately becomes an indispensable step for multi-modal medical image analysis.
Organ localization in an image refers to determining the three-dimensional bounding box of an organ, i.e., determining the upper and lower bounds (i.e., x) of the x, y, z coordinates of the organmin,xmax,ymin,ymax,zmin,zmax)。
In recent years, deep Convolutional Neural Networks (CNNs) are excellent in object detection in natural images or medical images. Although CNN has been applied to organ localization in images, its performance is still limited due to lack of understanding of the geometric relationships between organs in images.
Recently, the development of the Transformer network shows the potential of image classification or segmentation using long-distance target correlation, and is gradually used for modeling the correlation between targets, thereby improving the performance of medical image analysis. However, the Transformer network has a complex structure and requires more training time; moreover, a large amount of training data is required to perform the excellence, but manual marking of a large number of three-dimensional medical images by a doctor is a time-consuming and laborious task; existing three-dimensional transformers also maintain a balance between accuracy and learnability simply by reducing the depth of the network. Furthermore, none of the current Transformer models use information from multi-modal medical images to improve the accuracy of organ localization.
Disclosure of Invention
In order to solve the above problems, the present invention provides a multi-modality medical image multi-organ localization method based on one-to-one object query Transformer. The method comprises the steps of fusing two-dimensional projection views (in the coronal and sagittal directions) of a three-dimensional multi-modal image together for organ detection, performing correlation modeling between organs in a human body region by using a Conditional Gaussian Model (CGM), performing feature extraction on the multi-modal fusion image by using a CNN (computer network), using an obtained image feature sequence and a CGM (computer generated Markov Model) prediction result as input of a transform, inquiring one-to-one targets under the comprehensive action of the transform to finally obtain the positions of multiple organs, wherein the sequence of the target inquiry corresponds to the types of the multiple organs, and reversely projecting the detection results of the two projection views to a 3D space to obtain a 3D boundary frame so as to finish automatic positioning of the multiple organs.
The technical scheme of the invention is as follows:
a multi-modal medical image multi-organ positioning method based on one-to-one object query Transformer comprises the following steps:
s1, two-dimensional projection and image fusion of three-dimensional multi-modal images
S11, preprocessing data
Carrying out normalization preprocessing on the spatial resolution and the image size of the data set, which specifically comprises the following steps:
firstly, resampling the multi-modal image to a uniform voxel size according to the finest spatial resolution in the multi-modal image; then, selecting the image size, performing center cutting on the resampled image, removing surrounding background pixels, and simultaneously reserving a human body area.
S12, two-dimensional projection and gray level normalization of multi-modal image
S121, the three-dimensional medical image is large in calculation amount, and the projection graph of the three-dimensional medical image can represent main information in the three-dimensional medical image to a certain degree, so that the calculation amount is greatly reduced, and the positioning difficulty is reduced. Therefore, each mode image is projected by adopting a projection mode which keeps the image characteristics, and a two-dimensional coronal projection and a sagittal projection are obtained. The method comprises the following specific steps:
for functional modality images, such as PET images, hypermetabolic organ tissue is highlighted in the images, and in order to highlight hypermetabolic organ tissue, Maximum Intensity Projection (MIP) is selected, thereby ensuring good contrast of highly metabolic organs.
For anatomical mode images, such as CT and mri, anatomical structures can be displayed, and most of the clearly displayed organ tissues can be observed by human eyes. Since MIP can highlight high contrast tissue and weaken low contrast tissue, an Average Intensity Projection (AIP) is chosen to maintain tissue contrast.
S122, selecting an appropriate threshold range for the projection picture of each mode to normalize the gray scale of the image to 0,255]The gray scale range of (a); the threshold range is specifically used [ g (p) ]1),g(p2)]Normalized window of (2), wherein g (p)1) And g (p)2) Representing a grey threshold value which excludes sub-p in the image1Sum of percentages being higher than p2A percentage of pixels.
S13 image fusion of multi-modal projection
Fusing each projection view together to generate two fused images of coronal and sagittal views, respectively; the fused image calculation formula is as follows:
Figure BDA0003466152210000031
wherein alpha isuRepresents a weight factor, and
Figure BDA0003466152210000032
Iua projection view representing the image of the u-th modality.
S2, constructing a coarse prediction model of position correlation between organs by CGM
S21, constructing shape vector of organ
The detection of the position of anatomical structures in multi-modal images is a key step in the automatic diagnosis of images. For organ localization, the organ need not be accurately marked with contour lines, but only the upper and lower coordinate bounds of the three-dimensional bounding box of the organ in the three-dimensional multi-modal image need to be determined.
And 6 characteristic values of each training sample in the training set, namely the coordinates of the central point of the three-dimensional organ bounding box and the length, the width and the height of the three-dimensional organ bounding box, form the training set. For each training sample, construct a shape vector, let siFor training the shape vector of sample i, i.e. si=(x,y,z,lx,ly,lz)。
S22 generalized Pushing transform
Before statistical modeling is performed by using shape vectors representing positions and sizes of organs, in order to eliminate images of other factors, the shape vector of each training sample in an image space needs to be normalized to a model space through Generalized Procrustes (Generalized Procrustes) transformation, and then subsequent operations are performed in the model space.
S23 Principal Component Analysis (PCA)
PCA is derived from linear algebra, a plurality of variables in the data set are sorted according to the proportion of principal components through linear transformation, and a selected part can represent the characteristics of the whole data set, so that the dimensionality of the data set is reduced. The method can reduce the calculation amount, retain the main information and remove redundant parts and noise during complex data analysis. Note that PCA is performed in model space.
By all s subjected to generalized Fourier transformiFor the training set, the statistical shape model method is used to model the organ position and size change, and the obtained shape model is expressed as follows:
Figure BDA0003466152210000041
wherein the content of the first and second substances,
Figure BDA0003466152210000042
the average shape of the training set is the average value of the characteristics of the selected training set; phisA feature vector matrix (subscript s represents shape) obtained by performing Principal Component Analysis (PCA) on the training set, wherein k feature vectors represent k organ position and size change patterns learned from the training set; bsAs a shape parameter, bsK elements in (1) respectively represent phisThe weight of the k deformation patterns in (a) superimposed with the average shape of the training set; s represents the resulting shape model, bsInfluence s by adjusting bsCan control model deformation.
And (3) reversely solving the shape parameter of each training sample according to the formula (2), wherein the formula is as follows:
Figure BDA0003466152210000051
wherein the content of the first and second substances,
Figure BDA0003466152210000052
for transposing the eigenvector matrix obtained after PCA on the training set, siFor the shape vector of the ith training sample,
Figure BDA0003466152210000053
as the average shape of the training set, bsIs the shape parameter of the training sample.
S24 construction of conditional Gaussian model
Human body trunk organs are highly correlated in position and shape, so that the correlation between organs can be modeled, and the method used is a conditional gaussian model. Taking the modeling of the correlation of the shapes between organs (the coordinates of the center point and the length, width and height of the three-dimensional bounding box of the organ) as an example, let
Figure BDA0003466152210000054
And
Figure BDA0003466152210000055
representing the shape parameters of two adjacent organs A and B, respectively, when known
Figure BDA0003466152210000056
When the temperature of the water is higher than the set temperature,
Figure BDA0003466152210000057
the conditional gaussian model CGM of (a) is described by equations (4) to (6):
Figure BDA0003466152210000058
Figure BDA0003466152210000059
Figure BDA00034661522100000510
wherein the content of the first and second substances,
Figure BDA00034661522100000511
is the mean value (k multiplied by 1) of the shape parameters of the organ to be predicted obtained by a conditional Gaussian model,
Figure BDA00034661522100000512
is a covariance matrix (k x k) of shape parameters of the organ to be predicted obtained by a conditional gaussian model;
Figure BDA00034661522100000513
and
Figure BDA00034661522100000514
is the shape parameter (k × 1) of each training sample;
Figure BDA00034661522100000515
and
Figure BDA00034661522100000516
is that
Figure BDA00034661522100000517
And
Figure BDA00034661522100000518
cross covariance matrix (k × k) therebetween;
Figure BDA00034661522100000519
and
Figure BDA00034661522100000520
are respectively
Figure BDA00034661522100000521
And
Figure BDA00034661522100000522
a covariance matrix (k × k) in the training set;
Figure BDA00034661522100000523
is known as
Figure BDA00034661522100000524
When the temperature of the water is higher than the set temperature,
Figure BDA00034661522100000525
n refers to a gaussian distribution (the content in the above parentheses indicates the dimension of the matrix).
The conditional gaussian model is a model built from conditional probability relationships between the position, length, width and height of adjacent organs. It is used to predict the mean and standard deviation of the location, length, width and height of the center point of the bounding box of the adjacent organ B when the location, length, width and height of the center point of the bounding box of the organ A are known, so that the approximate range of the organ B can be known.
The conditional Gaussian model is obtained by predicting the shape parameters of the organ to be predicted from the shape parameters of the known organ, and then substituting the shape parameters of the organ to be predicted obtained by CGM into the shape model
Figure BDA0003466152210000061
The data set in the model space is obtained, and because the data set is obtained in the normalized model space, the data set also needs to be inversely normalized to the real physical space of the image, and the final prediction result is obtained.
It is necessary to construct a plurality of CGMs, each of which models the position and size correlation between the trunk region and one organ. Position information of multiple organs of the trunk region is roughly predicted from the trunk as a known region by using CGM (continuous Messaging algorithm) and is used for constraint of one-to-one target query in a subsequent Transformer network.
S3 organ positioning method based on one-to-one target query Transformer network
The Transformer network is a powerful network architecture, originally used for natural language processing. Due to the built-in self-attention mechanism, the Transformer is suitable for learning the correlation between targets so as to improve the performance of the fields of image recognition, object detection, segmentation and the like. The invention utilizes the self-attention mechanism of a Transformer network to model the correlation of positions and sizes among different organs.
The Transformer model was constructed by improving the state-of-the-art 2D detection Transformer (DETR) network. The DETR is the first fully end-to-end 2D image object detector recently proposed, which uses transforms to convert CNN extracted image features directly into localization results. However, since the DETR is used for object detection in natural images, it must process objects that may not be detected by the current image. Although the number of object classes in a natural image dataset is large, the number of object classes and objects present in each image is largely different. Given the N targets defined in the DETR network architecture, the actual number of targets present in an image is typically much less than N. Therefore, the DETR must use a bipartite graph matching strategy and then identify class labels of detection targets in the image through a target classification network.
In contrast, the present invention uses a one-to-one target query architecture, forcing a unique target query for each target organ, thus eliminating the need for a bipartite graph matching module and the need for target classification branches. Furthermore, since the positions and sizes of different organs are closely related to each other, the present invention uses learnable spatial encoding of pixel coordinates to characterize the target location. In this way, modeling the geometric correlation between organs using the self-attention mechanism of the Transformer network ensures robust detection of all target structures from the 2D fused projection images. The method comprises the following specific steps:
s31, feature extraction of 2D fused projection image
And (3) sending each fusion image in the coronal and sagittal directions to a CNN network for image feature extraction, and inputting the fusion images into a Transformer together with the prediction result of CGM after dimension reduction mapping into a sequence.
S32, constructing a Transformer model for one-to-one target query
And sending the characteristic sequences extracted by the CNN and the space position codes corresponding to the characteristic sequences to an encoder module in a Transformer for global correlation calculation. For the decoder module, N target queries are used for N organs to be detected, respectively. Both the N target queries and the encoder output are used as input to the decoder, and under the combined effect of the encoder and CGM prediction, each projection image gets N sets of 2D bounding boxes.
S4, three-dimensional back projection of two-dimensional projection image detection result
The steps of S2 and S3 are performed on the coronal and sagittal projection fusion images, respectively, using an improved Transformer model, the output of which is the 2D bounding box of all target organs in the respective projection views, which is specifically expressed as follows:
let x, y and z be the pixel coordinates of three spatial dimensions, the bounding box of the coronal projection is represented as
Figure BDA0003466152210000071
The bounding box of the sagittal projection is denoted as
Figure BDA0003466152210000072
Finally, the coronal and sagittal bounding boxes are backprojected to obtain a three-dimensional organ bounding box
Figure BDA0003466152210000073
In conclusion, the method is suitable for automatic positioning detection of multiple organs in the multi-modal medical image, and mainly utilizes the geometric correlation between organs jointly modeled by the conditional Gaussian model and the Transformer and the information complementarity of the multi-modal image to efficiently and accurately perform automatic positioning of the organs.
The invention has the beneficial effects that: the method utilizes a self-attention mechanism of a conditional Gaussian model and a Transformer network to simulate the correlation of the position and the size of the organs. The one-to-one target query architecture enforces a unique target query for each target organ without bipartite graph matching; the query sequence is a prediction type, so that classification is not needed, the network structure is simplified, redundant calculation is reduced, and the learning convergence speed is higher. Before organ detection is executed, 3D multi-modal images are projected onto two orthogonal 2D planes, complementary information from the multi-modal images is combined through a multi-modal fusion method, and finally the obtained 2D bounding box is subjected to back projection to obtain a 3D bounding box, so that the calculation burden is reduced, and a more stable organ positioning result is obtained.
Drawings
Fig. 1 is a flow chart of the positioning method of the present invention.
FIG. 2 is a schematic diagram illustrating a three-dimensional bounding box of a multi-organ, using a bimodal PET/CT image as an example; wherein (a) is a coronal plane, (b) is a sagittal plane, and (c) is a transverse plane.
FIG. 3 is a diagram illustrating a two-dimensional projection and image fusion process using a bimodal PET/CT image as an example.
FIG. 4 is a flow chart illustrating the multi-organ positioning method according to the present invention by taking a bimodal PET/CT image as an example.
Detailed Description
The multi-modal medical image multi-organ positioning method based on one-to-one object query Transformer provided by the invention is shown in fig. 1, the purpose of the invention is to determine a three-dimensional bounding box of multiple organs of a human body shown in fig. 2, and the invention is further explained with reference to a specific embodiment mode.
S1, two-dimensional projection and image fusion of three-dimensional multi-modal images
S11, preprocessing data
Firstly, selecting the finest voxel size in a plurality of modal images to resample all the modalities; and then center cropping is carried out on the resampled image, so that the spatial resolution and the image size in the data set are kept consistent.
S12, two-dimensional projection and gray level normalization of multi-modal image
In order to reduce the calculated amount and reduce the positioning difficulty, the projection mode of keeping the image characteristics is adopted for each mode to carry out projection, and therefore two-dimensional coronal projection and sagittal projection are obtained. In particular, the amount of the solvent to be used,
for functional modality images (e.g., PET images) in multi-modality images, the maximum intensity projection is selected in order to continue highlighting highly metabolic organ tissues. However, the projection image obtained by directly using MIP is too dark due to the highlighted organ (such as bladder), and for accurate detection, it needs to be subjected to gray-scale normalization, specifically using [ g (p) ]1),g(p2)]Normalized window of (2), wherein g (p)1) And g (p)2) Representing a grey threshold value which excludes sub-p in the image1Sum of percentages being higher than p2A percentage of pixels. Found through experiments
Figure BDA0003466152210000091
And
Figure BDA0003466152210000092
resulting in robust performance.
For anatomical modality images (e.g., CT images) in multi-modality images, a grayscale normalization window with Hounsfield Units (HU) of [ -1000,1000] is used. Experiments show that the algorithm is not sensitive to the CT gray scale window.
S13 image fusion of multi-modal projection
Normalizing the projection picture of each mode to the gray scale range of [0,255], and then performing multi-mode information fusion according to a fused image calculation formula:
Figure BDA0003466152210000093
wherein alpha isuRepresents a weight factor, and
Figure BDA0003466152210000094
Iua projection view representing the image of the u-th modality.
Taking a bimodal PET/CT image as an example, experimental results show that the weighting factors for PET yield consistent performance across a range of values of [0.3,0.7 ]. Each projection view is fused together with a weight of 0.5, generating two fused images for coronal and sagittal views, respectively, as shown in fig. 3.
S2, constructing a coarse prediction model of position correlation between organs by CGM
A key step in organ localization in multi-modal images is the statistical modeling of the correlation between position and shape between organs.
First a 3D bounding box data set of the organ needs to be acquired. The 3D bounding box of the organ is manually marked in the multi-modal image by a doctor, and the upper and lower limits of x, y and z coordinates of the 3D bounding box need to be acquired, and then the coordinates are converted into the coordinates of the central point and the length, width and height (x, y, z, l) of the 3D bounding boxx,ly,lz). The data characterizing organ position and size in image space is then normalized to model space by a generalized Fourier transform. And then, performing principal component analysis in a model space, selecting principal components with the proportion of the principal characteristics in the data set accounting for more than 90% of all the characteristics, and further completing the organ bounding box statistical modeling based on the PCA.
The steps of the present invention are specifically described by taking an example of predicting an unknown organ in a known torso region, and 6 feature values of each training sample in a training set, namely, the coordinates of the center point and the length, width and height of a three-dimensional bounding box, form the training set (m × n, m is the number of training samples, and n is 6).
When the trunk area predicts organs, the training set of the trunk area is normalized to obtain a normalized training set X (mxn), and a PCA is carried out to obtain a characteristic vector matrix phis(n × k), further based on
Figure BDA0003466152210000101
Substituting the characteristic vector matrix phi obtained by PCA on the normalized training setsAnd the normalized training set X obtains the shape parameter b of each training samples(k × 1). Then, the training set of the organ bounding box is normalized (in the training set, the mean value of the corresponding trunk area is subtracted from the characteristic value of each training sample, and then the mean value is divided by the standard deviation of the trunk area), and the shape parameter b of the organ bounding box of each training sample is obtained according to the previous stepss(k×1)。
Substituting the shape parameters of the obtained trunk area and organ frame into a formula of CGM, wherein the formula is as follows:
Figure BDA0003466152210000102
Figure BDA0003466152210000103
Figure BDA0003466152210000111
thereby obtaining the shape parameter b of the organ bounding box predicted by the trunk areas. According to shape model
Figure BDA0003466152210000112
Substituting the normalized mean value and the normalized feature vector matrix of the organ bounding box training set to obtain s, and performing inverse normalization to a physical space according to the mean value and the standard deviation of the trunk region to obtain a final predicted result.
The test set and the training set are not coincident, but the test data is processed by the same standard as the training data, the trunk area of the test set is used for completing the prediction of the three-dimensional bounding box of the unknown multiple organs, and finally, two-dimensional projection is also carried out to obtain the rough prediction of the two-dimensional bounding box corresponding to the coronal projection and the sagittal projection.
S3 organ positioning method based on one-to-one target query Transformer network
The Transformer model in the invention is constructed by improving the DETR network.
Unlike natural object detection, organs detected in medical images have strong anatomical constraints. In most cases, all anatomical structures of interest are present in each target image, and there is only one target object per structure. We exploit this anatomical constraint to simplify the network architecture. N organs to be detected are given, only N target queries are defined, and each query is assigned to correspond to one target organ, so that a bipartite graph matching strategy is not needed. The order of each query naturally corresponds to the category of each organ, so no additional classification is required. The specific detection process is as follows:
the fused image is firstly sent into a CNN network for image feature extraction, a ResNet-50 structure is selected, and in experiments of CNN layers with different numbers, the best performance is generated in 50 layers. The image features are subjected to dimension reduction mapping to be a sequence and then are added with corresponding position codes to be used as input of an encoder module in a Transformer part, and in addition, the organ coarse prediction result of the CGM can also be used as a channel input to be used for subsequent target position and size constraint.
The self-attention layer in the encoder can aggregate information from each element of the input sequence and update the information of each element, so that the network can perform global correlation calculation, is suitable for long sequences and plays an important role.
In addition, at the input part of the decoder module, only N target queries are defined, each query is assigned to correspond to one target organ, the sequence of each query corresponds to the category of each organ, and under the comprehensive action of the decoder and the CGM prediction result, N outputs are obtained, corresponding to the positions of N organs.
Modeling the geometric correlation between organs jointly by means of the attention mechanism of the Transformer network and CGM ensures robust detection of all target structures from the 2D fused projection images.
S4, three-dimensional back projection of two-dimensional projection image detection result
The same detection procedure is performed on the coronal and sagittal projection fusion images, the output of the transform model being the 2D bounding boxes of all target organs in the coronal and sagittal projection fusion images, respectively. Wherein the coronal projection yields the length and height of the three-dimensional bounding box and the sagittal projection yields the width and height of the three-dimensional bounding box. Then, the maximum range value of the heights of the two projection planes is taken, and the detection results in the two directions are back projected, so that the final three-dimensional bounding box of the organ is obtained
Figure BDA0003466152210000121
FIG. 4 is a flowchart of a method for multi-organ localization based on a one-to-one object queried Transformer network, which is illustrated by taking a bimodal PET/CT image as an example.

Claims (5)

1. A multi-modal medical image multi-organ positioning method based on one-to-one object query Transformer is characterized by comprising the following steps:
s1, two-dimensional projection and image fusion of three-dimensional multi-modal images
S11, preprocessing data
Carrying out normalization preprocessing on the spatial resolution and the image size of the data set;
s12, two-dimensional projection and gray level normalization of multi-modal image
S121, projecting each mode image in a projection mode of keeping image characteristics to obtain a two-dimensional coronal projection and a two-dimensional sagittal projection;
s122, selecting a proper threshold range for the projection picture of each modality to normalize the gray level of the image to a gray level range of [0,255 ];
s13 image fusion of multi-modal projection
Fusing each projection view together to generate two fused images of coronal and sagittal views, respectively; the fused image calculation formula is as follows:
Figure FDA0003466152200000011
wherein alpha isuRepresents a weight factor, and
Figure FDA0003466152200000012
Iua projection view representing an image of a u-th modality;
s2, constructing a coarse prediction model of position correlation between organs by CGM
S21, constructing shape vector of organ
The coordinate of the central point of the three-dimensional organ bounding box and the length, width and height form a training set; each training sample in the training set constructs a shape vector, set siFor training the shape vector of sample i, i.e. si=(x,y,z,lx,ly,lz);
S22 generalized Pushing transform
Normalizing the shape vector of each training sample in the image space to a model space through generalized Fourier transformation, and further performing subsequent operation in the model space;
s23 principal component analysis
By all s subjected to generalized Fourier transformiFor the training set, the statistical shape model method is used to model the organ position and size change, and the obtained shape model is expressed as follows:
Figure FDA0003466152200000021
wherein the content of the first and second substances,
Figure FDA0003466152200000022
the average shape of the training set is the average value of the characteristics of the selected training set; phisA subscript s represents the shape of a feature vector matrix obtained by performing principal component analysis on the training set, wherein k feature vectors represent k organ position and size change patterns learned from the training set; bsAs a shape parameter, bsK elements in (1) respectively represent phisThe weight of the k deformation patterns in (a) superimposed with the average shape of the training set; s represents the resulting shape model, bsInfluence s by adjusting bsThe value of (a) can control model deformation;
and (3) reversely solving the shape parameter of each training sample according to the formula (2), wherein the formula is as follows:
Figure FDA0003466152200000023
wherein the content of the first and second substances,
Figure FDA0003466152200000024
the method is characterized by comprising the following steps of (1) transposing a feature vector matrix obtained after principal component analysis is performed on a training set;
s24 construction of conditional Gaussian model
Is provided with
Figure FDA0003466152200000025
And
Figure FDA0003466152200000026
representing the shape parameters of two adjacent organs A and B, respectively, when known
Figure FDA0003466152200000027
When the temperature of the water is higher than the set temperature,
Figure FDA0003466152200000028
the conditional gaussian model CGM of (a) is described by equations (4) to (6):
Figure FDA0003466152200000029
Figure FDA00034661522000000210
Figure FDA00034661522000000211
wherein the content of the first and second substances,
Figure FDA00034661522000000212
the method comprises the following steps of obtaining the mean value of shape parameters of an organ to be predicted by a conditional Gaussian model, wherein the dimension is k multiplied by 1;
Figure FDA00034661522000000213
the method comprises the steps of obtaining a covariance matrix of shape parameters of an organ to be predicted by a conditional Gaussian model, wherein the dimension is k multiplied by k;
Figure FDA00034661522000000214
and
Figure FDA00034661522000000215
is the shape parameter of each training sample with dimension k × 1;
Figure FDA00034661522000000216
and
Figure FDA00034661522000000217
is that
Figure FDA00034661522000000218
And
Figure FDA00034661522000000219
a cross covariance matrix between the two, the dimension is k multiplied by k;
Figure FDA0003466152200000031
and
Figure FDA0003466152200000032
are respectively
Figure FDA0003466152200000033
And
Figure FDA0003466152200000034
a covariance matrix in the training set, the dimension of which is k × k;
Figure FDA0003466152200000035
is known as
Figure FDA0003466152200000036
When the temperature of the water is higher than the set temperature,
Figure FDA0003466152200000037
n is gaussian distribution;
firstly substituting the shape parameters of the organ to be predicted obtained by CGM into the shape model
Figure FDA0003466152200000038
Obtaining a data set of a model space; then, inverse normalization is carried out to the real physical space of the image to obtain a final prediction result;
s3 organ positioning method based on one-to-one target query Transformer network
S31, feature extraction of 2D fused projection image
Sending each fusion image in the coronal and sagittal directions to a CNN network for image feature extraction, and inputting the fusion images into a Transformer together with a prediction result of CGM after dimension reduction mapping into a sequence;
s32, constructing a Transformer model for one-to-one target query
Sending the characteristic sequence extracted by the CNN and the space position code corresponding to the characteristic sequence to an encoder module in a Transformer for global correlation calculation; for the decoder module, N target queries are respectively used for N organs to be detected; the N target queries and the encoder output are used as the input of a decoder, and under the comprehensive action of the encoder and CGM prediction, each projection image obtains N groups of 2D bounding boxes;
s4, three-dimensional back projection of two-dimensional projection image detection result
The steps of S2 and S3 are performed on the coronal and sagittal projection fusion images, respectively, and the output of the transform model is the 2D bounding box of all target organs in the respective projection views, which is specifically expressed as follows:
let x, y and z be the pixel coordinates of three spatial dimensions, the bounding box of the coronal projection is represented as
Figure FDA0003466152200000039
The bounding box of the sagittal projection is denoted as
Figure FDA00034661522000000310
Finally, the coronal and sagittal bounding boxes are backprojected to obtain a three-dimensional organ bounding box
Figure FDA00034661522000000311
2. The method for multi-modal medical image multi-organ localization based on one-to-one query Transformer according to claim 1, wherein the data preprocessing in step S11 is performed as follows:
firstly, resampling the multi-modal image to a uniform voxel size according to the finest spatial resolution in the multi-modal image; then, selecting the image size, performing center cutting on the resampled image, removing surrounding background pixels, and simultaneously reserving a human body area.
3. The multi-modality medical image multi-organ localization method based on one-to-one query for a transform as claimed in claim 1 or 2, wherein in step S121, the projection manner of the modality image is specifically as follows: for the functional modal image, selecting the maximum density projection so as to ensure the good contrast of the high metabolic organ; for anatomical modality images, the mean intensity projection is chosen to maintain tissue contrast.
4. The method for multi-modal medical image multi-organ localization based on one-to-one query Transformer according to claim 1 or 2, wherein the threshold range is specifically used [ g (p) in step S1221),g(p2)]Normalized window of (2), wherein g (p)1) And g (p)2) Representing a grey threshold value which excludes sub-p in the image1Sum of percentages being higher than p2A percentage of pixels.
5. The method as claimed in claim 3, wherein the threshold range is [ g (p) in step S1221),g(p2)]Normalized window of (2), wherein g (p)1) And g (p)2) Representing a grey threshold value which excludes sub-p in the image1Sum of percentages being higher than p2A percentage of pixels.
CN202210030228.XA 2022-01-12 2022-01-12 Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer Pending CN114359642A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210030228.XA CN114359642A (en) 2022-01-12 2022-01-12 Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210030228.XA CN114359642A (en) 2022-01-12 2022-01-12 Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer

Publications (1)

Publication Number Publication Date
CN114359642A true CN114359642A (en) 2022-04-15

Family

ID=81108548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210030228.XA Pending CN114359642A (en) 2022-01-12 2022-01-12 Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer

Country Status (1)

Country Link
CN (1) CN114359642A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114792315A (en) * 2022-06-22 2022-07-26 浙江太美医疗科技股份有限公司 Medical image visual model training method and device, electronic equipment and storage medium
CN114862881A (en) * 2022-07-11 2022-08-05 四川大学 Cross-modal attention tumor segmentation method and system based on PET-CT
CN115311258A (en) * 2022-09-15 2022-11-08 佛山读图科技有限公司 Method and system for automatically segmenting organs in SPECT (single photon emission computed tomography) plane image
CN115861303A (en) * 2023-02-16 2023-03-28 四川大学 EGFR gene mutation detection method and system based on lung CT image

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114792315A (en) * 2022-06-22 2022-07-26 浙江太美医疗科技股份有限公司 Medical image visual model training method and device, electronic equipment and storage medium
CN114792315B (en) * 2022-06-22 2022-10-11 浙江太美医疗科技股份有限公司 Medical image visual model training method and device, electronic equipment and storage medium
CN114862881A (en) * 2022-07-11 2022-08-05 四川大学 Cross-modal attention tumor segmentation method and system based on PET-CT
CN115311258A (en) * 2022-09-15 2022-11-08 佛山读图科技有限公司 Method and system for automatically segmenting organs in SPECT (single photon emission computed tomography) plane image
CN115861303A (en) * 2023-02-16 2023-03-28 四川大学 EGFR gene mutation detection method and system based on lung CT image
CN115861303B (en) * 2023-02-16 2023-04-28 四川大学 EGFR gene mutation detection method and system based on lung CT image

Similar Documents

Publication Publication Date Title
US10346986B2 (en) System and methods for image segmentation using convolutional neural network
CN111798462B (en) Automatic delineation method of nasopharyngeal carcinoma radiotherapy target area based on CT image
CN114503159A (en) Three-dimensional object segmentation of medical images localized by object detection
US9947102B2 (en) Image segmentation using neural network method
EP3365869B1 (en) System and method for image registration in medical imaging system
CN114359642A (en) Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer
US9760983B2 (en) System and method for image registration in medical imaging system
US7876938B2 (en) System and method for whole body landmark detection, segmentation and change quantification in digital images
US7916919B2 (en) System and method for segmenting chambers of a heart in a three dimensional image
CN113516659B (en) Medical image automatic segmentation method based on deep learning
CN111640120A (en) Pancreas CT automatic segmentation method based on significance dense connection expansion convolution network
JP2019114262A (en) Medical image processing apparatus, medical image processing program, learning apparatus and learning program
KR20230059799A (en) A Connected Machine Learning Model Using Collaborative Training for Lesion Detection
CN112634265B (en) Method and system for constructing and segmenting fully-automatic pancreas segmentation model based on DNN (deep neural network)
CN114693933A (en) Medical image segmentation device based on generation of confrontation network and multi-scale feature fusion
Sokooti et al. Hierarchical prediction of registration misalignment using a convolutional LSTM: Application to chest CT scans
CN115830016A (en) Medical image registration model training method and equipment
CN114693671A (en) Lung nodule semi-automatic segmentation method, device, equipment and medium based on deep learning
Gleason et al. A new deformable model for analysis of X-ray CT images in preclinical studies of mice for polycystic kidney disease
Zhou et al. Learning stochastic object models from medical imaging measurements using Progressively-Growing AmbientGANs
CN115861464A (en) Pseudo CT (computed tomography) synthesis method based on multimode MRI (magnetic resonance imaging) synchronous generation
Erdt et al. Computer aided segmentation of kidneys using locally shape constrained deformable models on CT images
Zhou et al. Learning stochastic object models from medical imaging measurements by use of advanced ambientgans
Chourak et al. Voxel-wise analysis for spatial characterisation of Pseudo-CT errors in MRI-only radiotherapy planning
CN115409837B (en) Endometrial cancer CTV automatic delineation method based on multi-modal CT image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination