CN112150524B

CN112150524B - Two-dimensional and three-dimensional medical image registration method and system based on deep learning

Info

Publication number: CN112150524B
Application number: CN202011048036.9A
Authority: CN
Inventors: 王磊; 杨瑞; 李彦泽; 张烨; 陈志远; 刘修恒
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2022-03-11
Anticipated expiration: 2040-09-29
Also published as: CN112150524A

Abstract

The invention discloses a two-dimensional and three-dimensional medical image registration method and a system based on deep learning. And acquiring the edge information of the focus from the reconstructed three-dimensional image according to clinical prior information to obtain a plurality of binary images. The part which needs to be observed by a doctor is obtained through the camera, the edge of the target focus is calculated by utilizing the deep learning technology, and another binary image is obtained. And registering the binary images of the two sources by using an image registration technology to obtain a transformation matrix between the two types of images, and applying the transformation matrix to the original image to complete the image registration. The method enables a doctor to have a pair of real-time 'perspective eyes' in the process of diagnosis and treatment, better completes the task of medical image registration, more fully utilizes the imaging examination information of a patient and assists in clinical treatment.

Description

Two-dimensional and three-dimensional medical image registration method and system based on deep learning

Technical Field

The invention belongs to the field of medical image registration, mainly applies a deep learning technology and integrates clinical prior knowledge to perform multi-modal and multi-dimensional registration of medical images.

Background

In the medical field, medical imaging examinations can provide imaging information of lesions within the body, such as the morphology, size, presence or absence of envelopes, blood supply, etc. of the lesions. The doctor can make a diagnosis (for example, an operation) of the patient by fusing the information provided by the imaging with his/her knowledge through reading the film. During the operation or treatment process, doctors cannot see deep information through superficial tissues and can only use a layer-by-layer deep method to approach the focus, which requires the doctors to have a firm operation experience of theoretical literacy. If the imaging picture containing the three-dimensional information of the focus of the patient can be processed in a certain way and projected to the visual field seen by a doctor in the diagnosis and treatment process, the information provided by the imaging can be fully utilized to help the doctor to see through the focus in real time, so that the doctor can more accurately identify the variant tissues and blood vessels, the distance from the focus and the like, and the process can better assist the doctor in performing the operation and the treatment.

The medical image registration technology can help doctors to better read medical images of different modalities, more fully utilizes image information, and has important clinical significance for improving the diagnosis and treatment level of diseases. And has therefore received extensive attention. Patent 202010205018.0 discloses a medical image registration method, medical device and storage medium, which focuses on the registration of blood vessels on medical images. Patent 201811010818.6 discloses a method, a system storage medium, and a registration device for registering two-dimensional and three-dimensional images of medical images, which performs image registration by determining sampling points in the two-dimensional and three-dimensional images and calculating a registration matrix. Patent 201710844174.X discloses an image similarity measurement method and an image registration method based on shape information, which firstly discloses a new image similarity measurement method based on shape information, the new similarity measurement method is more suitable for image set registration based on a graph, and based on the new measurement method, an author develops a new image registration method. However, the existing work has the following defects: 1. clinical prior information is not fully utilized, and 2, clinical work is not fully combined. The medicine has professional specificity, and a large amount of clinical prior information can guide the establishment of a model and the optimization of a method. None of the above inventions relates to the application of clinical prior knowledge. Patent 201811010818.6 requires the determination of sample points when performing medical image registration, which undoubtedly increases the difficulty of clinical application thereof. The current deep learning technology has remarkable achievements in the aspects of image classification, segmentation, detection and the like. The application of this technology in the medical field is also increasing. In recent new coronary epidemics, new coronary pneumonia automatic detection systems based on deep learning technology are applied in various hospitals. The invention patent application (application No. 201910676893.4) discloses a hepatocellular carcinoma magnetic resonance image segmentation system and a hepatocellular carcinoma magnetic resonance image segmentation method based on deep learning, the invention patent application (application No. 201910237004.4) discloses a pancreatic neuroendocrine tumor automatic segmentation system and a pancreatic neuroendocrine tumor automatic segmentation system based on deep learning, and the invention patent application (application No. 201910787974.4) applies a CT image liver tumor segmentation method based on deep learning. However, the above patents all have the problems of low registration precision, incapability of projecting an image containing focus three-dimensional information onto a two-dimensional image or poor projection effect, and different from the above inventions or the prior similar inventions, the invention extracts image characteristics by using a deep learning technology, reduces the calculated amount by using clinical prior knowledge, and discloses a method specially used for registering medical three-dimensional images and two-dimensional images.

Disclosure of Invention

The present invention addresses a new clinical problem and specifies a specific implementation approach to this problem. Specifically, a deep learning technique is used to segment a target lesion in a three-dimensional medical image, and then the segmented image is reconstructed. And acquiring the edge information of the focus from the reconstructed three-dimensional image according to clinical prior information to obtain a plurality of binary images. The part which needs to be observed by a doctor is obtained through the camera, the edge of the target focus is calculated by utilizing the deep learning technology, and another binary image is obtained. And registering the binary images of the two sources by using an image registration technology to obtain a transformation matrix between the two types of images, and applying the transformation matrix to the original image. I.e. the image registration problem is completed. The method enables a doctor to have a pair of real-time 'perspective eyes' in the process of diagnosis and treatment, and the invention better completes the task of medical image registration, more fully utilizes the imaging examination information of a patient and assists in clinical treatment.

Step 1, constructing a three-dimensional image picture segmentation network based on deep learning, and obtaining segmentation results of focus parts and focus tumors on a three-dimensional image; the three-dimensional image picture segmentation network based on deep learning comprises 5 segmentation modules with the same structure, each segmentation module consists of a three-dimensional encoder and a three-dimensional decoder, and the processing process of the three-dimensional encoder is as follows: performing convolution operation twice on an input three-dimensional matrix, performing first pooling operation to obtain a first array, performing convolution operation twice, performing second pooling operation to obtain a second array, and performing convolution operation twice to finally generate a third array; the three-dimensional decoder processing procedure is as follows: firstly, carrying out convolution operation and deconvolution operation for the first time on an array three generated by a three-dimensional encoder to obtain an array four, then combining the array four generated by a three-dimensional decoder with a corresponding array two in the three-dimensional encoder, then carrying out convolution operation for two times, then carrying out deconvolution operation for the second time to obtain an array five, combining the array five generated by the three-dimensional decoder with the corresponding array one in the three-dimensional encoder, carrying out convolution operation for two times, and deleting the dimension with the dimension of 1 in the array to obtain the array with the same shape and input; performing Relu operation after each convolution operation;

step 2, reconstructing a three-dimensional image of the focus part and the focus tumor based on the segmentation result of the focus part and the focus tumor in the step 1, displaying the focus part and the focus tumor in different colors, setting certain transparency, acquiring a two-dimensional image1 containing the relative positions of the focus part and the focus tumor under different distances and different angles, and performing binarization processing on the image1 to obtain a corresponding binary image called image1 a;

step 3, constructing a two-dimensional image segmentation network based on deep learning, wherein the two-dimensional image segmentation network is used for acquiring segmentation results of focus parts and focus tumors on the two-dimensional image, the two-dimensional image segmentation network based on deep learning is composed of a two-dimensional encoder and a two-dimensional decoder, the two-dimensional encoder comprises 8 convolution layers, the two-dimensional decoder comprises 8 deconvolution layers, and the two-dimensional encoder is connected with the two-dimensional decoder in a jumping manner; performing Relu operation after each convolution operation;

and 4, recording the segmentation result generated in the step 3 as image2, wherein the corresponding binary image is called as image2a, registering image1a to image2a, calculating a Dice coefficient and a transformation matrix F after each image1a and image2a are registered, taking the transformation matrix F at the time of Dice maximum, applying F to the image1 for transformation, adjusting the transparency of the transformed image, and superposing the transformed image on the image2, so that the relative position relationship between the lesion part and the lesion tumor is visually displayed on the two-dimensional image.

Further, the three-dimensional image includes CT, MRI, PET/CT imaging including three-dimensional tomographic information of the lesion.

Further, the two-dimensional image includes an image of the endoscope or an image captured by a general camera.

Furthermore, the convolution kernels in the convolution layers of the three-dimensional encoder are all 3 × 3, the step size is 1, and the quasi-pooling operation is realized by convolution with the convolution kernels 2 × 2 and the step size is 2; the size of the deconvolution kernel in the three-dimensional decoder is 2 x 2, and the step size is 2;

the specific processing procedure of each segmentation module in the step 1 is as follows;

when a three-dimensional matrix with the size of (255, 255 and 24) is input, performing convolution operation with the convolution kernel size of (3 × 3), the step length of 1 and the number of convolution kernels of 16 for one time, complementing 0 to the surrounding pixel number to ensure that the size of the array after convolution is consistent with the size before convolution, and performing Relu operation;

performing convolution operation with convolution kernel size of (3 × 3), step size of 1 and convolution kernel number of 32 to generate array size of (255, 255, 24, 32), and performing Relu operation;

then, performing a first class pooling operation, wherein the size of the convolution kernel is (2 x 2), the step size is 2, and the size of the generated array one is (128, 128, 12, 32);

performing Relu operation on the generated 32 three-dimensional data arrays through a convolution layer with the convolution kernel size of (3 x 3), the step size of 1 and the convolution kernel size of 64;

performing Relu operation on the convolution layer with the convolution kernel size of (3 x 3), the step size of 1 and the convolution kernel size of 128; generating an array of (128, 128, 12, 128);

then, after a second class pooling operation, still adopting the convolution kernel with the size of (2 x 2) and the step size of 2 to generate an array two with the shape of (64, 64, 6, 128);

then, the 128 three-dimensional arrays pass through the convolution layer with the convolution kernel size of (3 x 3), the step size of 1 and the convolution kernel of 128 to carry out Relu operation;

performing Relu operation on the convolution layer with the convolution kernel size of (3 x 3), the step size of 1 and the convolution kernel of 256;

the resulting array three is a (64, 64, 6, 256) shape, at which point the three-dimensional encoder portion is completed, followed by the three-dimensional decoder portion;

performing Relu operation on the convolution layer with the size of the convolution kernel of the array three inputs being (3 × 3), the step size being 1 and the convolution kernel being 128 to generate a three-dimensional array shape being (64, 64, 6, 128);

after one deconvolution, the size of a deconvolution kernel is (2 x 2), the step size is 2, Relu operation is carried out, and array four is generated to be (128, 128, 12, 128);

merging the generated array four and the corresponding array two in the encoder to generate an array shape of (128, 128, 12, 256), and then sequentially carrying out Relu operation on the merged array through a convolution layer with a convolution kernel of (3 x 3), a step size of 1 and a convolution kernel of 62;

performing Relu operation on the convolution layer with convolution kernel of (3 × 3), step size of 1 and convolution kernel of 32 to obtain an array shape of (128, 128, 12, 32);

after the second deconvolution operation, the size of the adopted deconvolution kernel is (2 x 2), the step size is 2, Relu operation is carried out, and an array five formed as (256, 256, 24, 32) is generated;

combining the array five with the corresponding array one in the encoder to generate an array shape (256, 256, 24, 64); performing Relu operation on the convolution layer with convolution kernel of (3 x 3), step length of 1 and convolution kernel of 16;

and (3 × 3), the step size is 1, the convolution layer with the convolution kernel of 1 is subjected to Relu operation, an array of (256, 256, 24, 1) is generated, and the dimension of 1 is removed, so that the array of (256, 256, 24) with the same shape and input is obtained.

Further, the loss function of the three-dimensional image picture segmentation network based on deep learning is constructed as follows,

L＝λ₁*Dice₁+λ₂*Dice₂+λ₃*Dice₃+λ₄*Dice₄+λ₅*Dice₅

wherein λ_*Hyperparameters, Dice, representing loss terms_*The Dice correlation coefficient represents the segmentation result of the corresponding segmentation module and the real segmentation result; the Dice correlation coefficient is a set similarity measurement function, is used for calculating the overlapping range of two sets, and the value range is [0,1 ]]The closer to 1 the more overlap, the closer to 0 the less overlap, Dice is defined as follows:

wherein | X ≦ Y | refers to the intersection between X and Y; | X | and | Y | represent the number of elements of X and Y, respectively.

Further, λ₁＝1,λ₂＝2,λ₃＝4,λ₄＝8，λ₅＝10。

Further, in the two-dimensional encoder, 5 × 5 large-size convolution kernels are used in the first two convolution layers so as to extract more common features by utilizing a larger receptive field, and 3 × 3 small-size convolution kernels are used in the next 6 convolution layers so as to refine details of the extracted features so as to ensure a high-precision segmentation result;

the structure of the two-dimensional encoder in step 3 is as follows:

the size of the image input layer is (256, 256, 3), passing through the first convolution layer (5 x 5, 1, 8) to obtain the array shape (252, 252, 8), passing through the Relu layer;

passing through the second convolution layer (5 x 5, 1, 16) to obtain a plurality of groups of shapes (248, 248, 16) passing through the Relu layer;

passing through a third convolution layer (3 x 3, 1, 32) to obtain a plurality of groups of shapes (246, 246, 32) passing through the Relu layer;

passing through the fourth convolution layer (3 x 3, 1, 64) to obtain a plurality of groups of shapes (244, 244, 64) passing through the Relu layer;

passing through the fifth convolution layer (3 x 3, 1, 128) to obtain a plurality of groups of shapes (242, 242, 128) passing through the Relu layer;

passing through the sixth convolution layer (3 x 3, 1, 256) to obtain a plurality of groups of shapes (240, 240, 256) passing through the Relu layer;

passing through a seventh convolution layer (3 x 3, 1, 512) to obtain a plurality of groups of shapes (238, 238, 512) passing through the Relu layer;

passing through the eighth convolutional layer (3 x 3, 1, 1024) to obtain groups of shapes (236, 236, 1024) passing through the Relu layer;

and subsequently to the decoder portion, the structure is as follows,

passing through the first deconvolution layer (3 x 3, 1, 512), passing through Relu layer to obtain array shape (238, 238, 512), merging the array shape with the array (238, 238, 512) processed by the seventh convolution layer to obtain array shape (238, 238, 2014),

subjecting the array to a second deconvolution layer (3 x 3, 1, 256), subjecting the output and the array (240, 240, 256) processed by the sixth convolution layer to a locate operation to obtain an array shape (240, 240, 512),

processing the array by the third deconvolution layer (3 x 3, 1, 128), processing the output and the array (242, 242, 128) processed by the fifth convolution layer by the Relu layer to obtain the array shape (242, 242, 512),

subjecting the array to the fourth deconvolution layer (3 x 3, 1, 64), subjecting the output and the array (244, 244, 64) processed by the fourth convolution layer to a concatemate operation to obtain an array shape (244, 244, 128),

subjecting the array to the fifth deconvolution layer (3 x 3, 1, 32), subjecting the output and the third convolved layer carded array (246, 246, 32) to a containment operation via the Relu layer to obtain an array shape such as (246, 246, 64),

subjecting the array to sixth deconvolution layer (3 x 3, 1, 16), subjecting the output and the second convolution layer carded array (248, 248, 16) to containment operation via Relu layer to obtain array shape (248, 248, 32),

processing the array by the seventh deconvolution layer (5 x 5, 1, 8), processing the output and the array (252, 252, 8) after the first convolution layer carding by the Relu layer, obtaining the array shape (252, 252, 16),

and (3) passing the array through an eighth deconvolution layer (5 x 5, 1, 3), carrying out add operation on the output and input arrays (252, 252, 3), and obtaining an output result through a sigmoid layer.

Furthermore, the loss function of the two-dimensional image segmentation network based on the deep learning in the step 3 is shown as the following formula,

L＝Dice_f

Dice_fand the Dice correlation coefficient represents the segmentation result of the two-dimensional image segmentation network and the real segmentation result.

The invention also provides a two-dimensional and three-dimensional medical image registration system based on deep learning, which comprises the following modules:

the three-dimensional segmentation network construction module is used for constructing a three-dimensional image picture segmentation network based on deep learning and used for acquiring segmentation results of focus parts and focus tumors on the three-dimensional image; the three-dimensional image picture segmentation network based on deep learning comprises 5 segmentation modules with the same structure, each segmentation module consists of a three-dimensional encoder and a three-dimensional decoder, and the processing process of the three-dimensional encoder is as follows: performing convolution operation twice on an input three-dimensional matrix, performing first pooling operation to obtain a first array, performing convolution operation twice, performing second pooling operation to obtain a second array, and performing convolution operation twice to finally generate a third array; the three-dimensional decoder processing procedure is as follows: firstly, carrying out convolution operation and deconvolution operation for the first time on an array three generated by a three-dimensional encoder to obtain an array four, then combining the array four generated by a three-dimensional decoder with a corresponding array two in the three-dimensional encoder, then carrying out convolution operation for two times, then carrying out deconvolution operation for the second time to obtain an array five, combining the array five generated by the three-dimensional decoder with the corresponding array one in the three-dimensional encoder, carrying out convolution operation for two times, and deleting the dimension with the dimension of 1 in the array to obtain the array with the same shape and input; performing Relu operation after each convolution operation;

the reconstruction module is used for reconstructing a three-dimensional image of the focus part and the focus tumor based on the segmentation result of the focus part and the focus tumor, displaying the focus part and the focus tumor in different colors, setting certain transparency, acquiring a two-dimensional image1 containing the relative positions of the focus part and the focus tumor under different distances and different angles, and performing binarization processing on the image1 to obtain a corresponding binary image called image1 a;

the two-dimensional segmentation network construction module is used for constructing a two-dimensional image segmentation network based on deep learning and used for obtaining segmentation results of focus parts and focus tumors on a two-dimensional image, the two-dimensional image segmentation network based on the deep learning is composed of a two-dimensional encoder and a two-dimensional decoder, the two-dimensional encoder comprises 8 convolution layers, the two-dimensional decoder comprises 8 deconvolution layers, and the two-dimensional encoder is connected with the two-dimensional decoder in a jumping mode; performing Relu operation after each convolution operation;

and the registration module is used for recording the segmentation result generated in the two-dimensional segmentation network construction module as image2, the corresponding binary image is called as image2a, firstly, image1a is registered to image2a, a Dice coefficient and a transformation matrix F after each image1a and image2a are registered are calculated, the transformation matrix F when Dice is maximum is taken, image1 is transformed by applying F, the transformed image is adjusted in transparency and is superposed on the image2, and the visual display of the relative position relationship between the lesion part and the lesion tumor on the two-dimensional image is realized.

The invention also provides an application of the two-dimensional and three-dimensional medical image registration method based on the technical scheme, which is characterized in that: the registration method is utilized to realize the registration of three-dimensional images and two-dimensional images under gastroscopes, enteroscopes, gynecology hysteroscopes, laparoscopes, cystoscopes, ureteroscopes, laparoscopes and thoracoscopes.

The patent is provided aiming at the current situation that clinical imaging information is not sufficiently applied and a doctor cannot fully utilize imaging data during operation or operation through deep understanding and interpretation of clinical medicine, a deep learning technology and an image registration technology. The application range of the patent is wide, and the application prospect is wide. All preoperative procedures, treatments or operations that rely on imaging examinations can be used with this patent. The patent can help doctors to intuitively know morphological characteristics of the focus, peripheral blood vessel distribution, the interrelation of the focus and peripheral important structures and the like in operation. Thereby improving the operation safety, reducing the operation time, reducing the blood loss in the operation and improving the medical level.

Drawings

FIG. 1 is an overall flow chart of the present invention.

Fig. 2 is a diagram of a network for segmenting a CT image of a kidney according to an embodiment of the present invention.

FIG. 3 is a diagram illustrating the acquisition of a binary image of kidney and kidney tumor in an embodiment of the present invention. A represents the segmentation of the kidney and the kidney tumor on each CT image; b, constructing a three-dimensional model of the kidney and the kidney tumor according to the segmentation result; c represents obtaining the photos of the three-dimensional model from different angles; and D represents a binary image for acquiring the picture of the step C.

FIG. 4 is a diagram illustrating a network for renal laparoscope image segmentation according to an embodiment of the present invention.

FIG. 5 illustrates laparoscopic kidney segmentation and binary image acquisition, A laparoscopic kidney picture, in accordance with an embodiment of the present invention; b, segmented kidney pictures; c binary image of kidney picture.

Fig. 6 is a final effect diagram of the embodiment of the present invention, that is, a diagram obtained by displaying the position information of the middle kidney and the tumor in two modalities in an overlapping manner, where the kidney tumor is a circled area in the diagram.

Detailed Description

The technical solution of the present invention is further explained with reference to the drawings and the embodiments.

As shown in FIG. 1, the present invention can be divided into two steps of an experimental part and an application part. In the experimental part, a large amount of imaging data of specific diseases are collected and labeled, a convolution neural network three-dimensional segmentation model of specific organs is established, and weights are obtained. A large amount of planar data seen by a physician during operation or surgery is collected, a convolutional neural network two-dimensional segmentation model for the same organ is built, and weights are obtained. In the application part, aiming at a specific case and a specific disease, three-dimensional and two-dimensional images after lesion segmentation are obtained by using the model and the weight. And then according to clinical prior, acquiring a binary image projection of the three-dimensional image in a specific direction and a binary image projection of the two-dimensional image, and then registering the two images and obtaining a transformation matrix. The transformation matrix is applied to the original image, namely, the purpose of projecting the image containing the three-dimensional information of the focus onto the two-dimensional image is achieved. It is worth noting that the specific implementation of the method requires a dependency on hardware facilities which are not specific. In the experimental process, a computer with a display card, a central processing unit, a memory, a hard disk, a display and the like is required to be relied on, and the method also needs to rely on independent image acquisition equipment. This particular device may vary from one medical professional to another. For example, all procedures or operations requiring the use of an endoscope are suitable for this project. Including, but not limited to, digestive endoscopes (gastroscopes, enteroscopes), gynecological endoscopes (hysteroscopes, laparoscopes), urological endoscopes (cystoscopes, ureteroscopes, laparoscopes), thoracic surgical endoscopes (thoracoscopes), and the like. For the current clinical operation projects without using an endoscope, such as the excision of superficial tumors of dermatology, the real-time display information of the depth of the tumor can be obtained by adopting the camera and display component and the project.

The experimental part is detailed below:

three-dimensional image picture segmentation network based on deep learning

First, the image picture is an Imaging picture including three-dimensional tomographic information of a lesion such as CT, Magnetic Resonance Imaging (MRI), Positron Emission computed Tomography (PET/CT), and the like, and does not include chest X-ray, X-ray, and the like. The main purpose of this step is to design a convolutional neural network for extracting the lesion or the region of interest. The previous research shows that the convolutional neural network based on the Unet has better capability of segmenting the medical image, so that the convolutional neural network based on the Unet is adopted to segment the medical image.

The steps can be divided into the following steps:

1. acquisition and labeling of large number of iconography images

Building a deep learning model requires a large amount of iconographic data and labeling. For a specific disease or target organ, a large number of targeted pictures need to be collected, and then the images are cleaned to remove the images which are not in accordance with the standard and are too fuzzy in the data. And then, labeling the target organ or tissue by using labeling software, namely drawing the position and the area of the target organ or tissue on the image. And the marking information is stored as an independent file for later use.

2. Construction of three-dimensional deep learning image segmentation network

The Unet was originally proposed when performing cell segmentation, and since it makes full use of information at various depths of the network, a better segmentation effect is achieved. On this basis researchers developed different modified uets for different specific tasks. In our proposed method, researchers can optimize existing models for specific problems (see examples). The optimization process includes, but is not limited to, optimization of the network structure, optimization of the loss function, introduction of new activation functions and convolution calculations, and the like. It should be noted that, since the three-dimensional organ or lesion is segmented, the 3D-Unet can better extract the three-dimensional information of the image, and the model optimized based on the 3D-Unet often has a better segmentation effect.

3. Training of networks and derivation of models

After the data and the model are prepared, the model can be trained. Generally, a model is trained on a training set, the model is verified on a verification set, hyper-parameters of the model are adjusted, and a final segmentation effect of the model is tested on a test set. After the model training is finished, the model can automatically segment the target focus and output the segmented image every time the imaging image is input. We reserve the weights of the network for later calculations.

Two-dimensional image segmentation network based on deep learning

1. Acquisition of a large amount of two-dimensional image data. As described above, the two-dimensional image here includes various kinds of scope images or images captured by a general camera. And marking the target organ or the target focus according to whether the image contains the target organ or is clear.

2. Construction of two-dimensional deep learning image segmentation network

Due to the achievements of Unet in the field of medical image segmentation, specific image segmentation tasks using Unet-based variants may be considered. Likewise, it is necessary to provide optimization methods for specific target problems, including but not limited to optimization of network structure, optimization of loss function, introduction of new activation function and convolution calculation, etc.

3. Training of models and derivation of weights

After training, the weight of the segmentation problem is obtained. Similarly, the model is trained on the training set, the model is verified on the verification set, the hyper-parameters of the model are adjusted, and the final effect of the model is tested on the test set. The weights of the network are retained for later calculations.

The application part is detailed as follows:

acquiring two-dimensional information from three-dimensional image based on clinical prior

When a specific clinical patient appears, firstly, the imaging examination data of the patient is obtained, then the imaging data of the patient is calculated according to the model and the weight obtained by the first part in the experimental part, and the target organ or tissue is segmented to obtain the three-dimensional image of the target organ or tissue. The three-dimensional image is imaged according to a clinical prior, such as the angle and distance from the organ or tissue at which the procedure is clinically performed or operative. And then binarizing the imaged image: the target tissue or organ is represented by white pixels and the background by black pixels.

Secondly, acquiring a binary image of the two-dimensional image

And segmenting the target organ or target tissue in the two-dimensional image of the specific case according to the network and the weight obtained in the second step in the experimental part. Likewise, we binarize this segmentation result. And saved as an image.

Obtaining of transformation matrix

Since there is no change in the target organ or tissue and its edges in the different imaging modes. For different imaging modes, the images of the same organ or tissue segmented by the network are only different in size or distance. There must be a two-value map segmented three-dimensionally and a two-value map segmented two-dimensionally which have the greatest similarity after registration. Based on this, all the binary images obtained through one of the steps in the application part are registered to the binary image in the step 2, and the Dice image similarity coefficient of each pair of images is calculated. The transformation matrices of the two pictures with the highest similarity coefficient are saved for standby.

Four, registration of original images

And registering the image obtained by the image segmentation and the two-dimensional image by using the transformation matrix stored in the last step. And adjusting the transparency of the two types of images and displaying the images in an overlapping mode, namely completing the whole process.

The first embodiment is as follows: renal cancer operation auxiliary system under peritoneoscope

Renal cancer is currently treated by endoscopic partial nephrectomy. When excising a renal tumor, a physician needs to select a cutting site and a cutting thickness by reading a film and combining experience. If the cutting is too little, the tumor remains and the cutting edge is positive; if the cut is too much, it may result in bleeding from the kidney and potentially affect kidney function. Especially for endogenous kidney tumors, clear markers are often lacking under endoscopy to guide the site of the lower knife. Embodiments based on this patent may address this clinical problem. As a specific example, the imaging data in this embodiment is described using CT data, and the two-dimensional data is described using a laparoscopic image in a renal tumor operation. The overall implementation steps are as follows:

one, CT image segmentation network

The construction of the CT image segmentation network relates to the construction of a database, the design of a network structure, the training and the verification of the network.

(1) The construction of the database mainly depends on CT image data of kidney cancer patients collected in the past, the dicom data are converted into images in png format, specific positions of the kidney and the kidney tumor are manually drawn by means of labelme software, and the specific positions are stored and integrated to construct the database of the CT image segmentation network. Generally, each patient contains z images of the kidney with an image size of (x, y) (x, y are the transverse and longitudinal resolution of the image, respectively), so the imaging image of each case can be saved as a three-dimensional matrix with the shape of (x, y, z). Similarly, we save the patient's segmentation information as a three-dimensional matrix of shape (a, b, c). Generally, the long axis of the human kidney is 100-120mm, and at least 18 CT images containing the kidney can be generated by calculating the thickness of the CT examination layer as 5mm, and when the resolution of the images is 255 × 255, a three-dimensional matrix of shapes such as (255, 255, 24) can be generated. Similarly, the segmentation information also exists in the middle of another three-dimensional matrix shaped as (255, 255, 24).

(2) In order to fully utilize the information between CT images of different layers, a network structure accepting 3-dimensional input data is adopted. To more accurately segment the kidney and kidney tumor, we use a cascade segmentation network. As shown in fig. 2:

the inputs of the 3D cascade segmentation network are all continuous kidney CT images, i.e. the three-dimensional matrix of the imaging images and the three-dimensional matrix of the segmentation information described in the previous step. And finally outputting a segmentation result after the processing of 5 segmentation modules with the same structure. Each partitioning module is composed of an encoder part and a decoder part. The segmentation is more and more accurate along with the gradual deepening of the segmentation module.

In each partitioning module, the left half is an encoder and the right half is a decoder. In the encoder, convolution layers with convolution kernel size 3 x 3 and step size 1 are used to extract features of successive CT slices. Convolution kernel size 2 x 2, step size 2 convolution layer is used instead of pooling layer in order to preserve more image detail while reducing image resolution. In the decoder, convolution kernels 3 x 3 and convolution layers with step size 1 are also used to extract features. While the convolution kernel size is 2 x 2, a deconvolution layer with step size 2 is used instead of the fill layer to compensate for the details while restoring the image size. Meanwhile, the shallow feature and the deep feature are combined by using jump connection, so that adverse effects on network training caused by gradient dispersion can be effectively prevented, and the segmentation precision can be improved. All convolutional layers in the encoder and decoder use a linear rectification function (ReLU) as the activation function, which is a linear fit of the network to the segmentation problem of the kidney and kidney tumor. As shown in fig. 2. In the figure, 3D Conv k3s1 refers to 3D convolution, representing a conventional convolution operation. 3D Conv k2s2 refers to 3D downsampling, representing conventional pooling operations. 3D Deconv k2s2 is a deconvolution, representing an upsampling operation.

As a special case for segmenting the kidney and kidney tumor. We will specifically exemplify this particular network. When a three-dimensional matrix of size (255, 255, 24) is input,

firstly, performing convolution operation with convolution kernel size of (3 × 3), step length of 1 and convolution kernel number of 16, and complementing 0 to the surrounding pixel number to ensure that the size of the array after convolution is consistent with the size before convolution (all convolution operations in the network are complemented by 0); relu operation is carried out;

performing convolution operation with the convolution kernel size of (3 × 3), the step size of 1 and the number of convolution kernels of 32 to generate an array size of (255, 255, 24, 32); relu operation is carried out;

then the network performs a first class pooling operation, the convolution kernel size is (2 x 2), the step size is 2, and the generated array one size is (128, 128, 12, 32);

performing Relu operation on the convolution layer with the convolution kernel size of (3 x 3), the step size of 1 and the convolution kernel size of 128; an array of (128, 128, 12, 128) is generated.

Then, after a second type pooling operation, array two of shape (64, 64, 6, 128) is generated, again using convolution kernel size (2 x 2) and step size 2.

the resulting array three is a (64, 64, 6, 256) shape. At this point the encoder portion of the network is complete. Followed by the decoder portion of the network.

The Relu operation is performed on the convolution layer with the array three input convolution kernel size of (3 x 3), the step size of 1 and the convolution kernel of 128, and the generated three-dimensional array shape is (64, 64, 6, 128).

After one deconvolution, the deconvolution kernel size is (2 x 2), the step size is 2, the Relu operation is performed, and the array four size is generated to be (128, 128, 12, 128).

Merging the generated array four with the corresponding array two in the encoder (i.e. concatenate operation), generating the array shape as (128, 128, 12, 256), then sequentially passing the merged array through the convolution kernel with (3 x 3), the step size as 1, the convolution kernel as 62, performing Relu operation,

the Relu operation was performed through the convolutional layer with convolution kernel of (3 × 3), step size of 1, convolution kernel of 32, and the resulting array shape was (128, 128, 12, 32).

After the second deconvolution operation, the size of the deconvolution kernel is (2 x 2), the step size is 2, and the Relu operation is performed to generate an array five as (256, 256, 24, 32).

Array five is combined with corresponding array one in the encoder to generate array shape such as (256, 256, 24, 64). Then, the solution is processed by a convolution layer with convolution kernel of (3 x 3), step length of 1 and convolution kernel of 16 to carry out Relu operation,

and (3 × 3), the step size is 1, the convolution layer with the convolution kernel of 1 is subjected to Relu operation, an array of (256, 256, 24, 1) is generated, and the dimension of 1 is removed, so that the array of (256, 256, 24) with the same shape and input is obtained. This array is input to the next stage of the partitioning module for computation. After passing through the segmentation module-5, the final output result and the gold standard labeled in the previous stage are subjected to calculation of a loss function, and the result is used for optimization of the network.

In the network structure, each segmentation module outputs the segmentation structure of the kidney and the kidney tumor, and the kidney tumor is segmented from coarse to fine, so that accurate segmentation is finally realized. It is noted that we applied this network to segment the kidney and the kidney tumor, respectively.

(3) Network training set validation

The most critical place of network training is the construction of a network loss function, and a multi-precision segmentation loss function is constructed in a deep supervision mode, as shown in formula 1.

L＝λ₁*Dice₁+λ₂*Dice₂+λ₃*Dice₃+λ₄*Dice₄+λ₅*Dice₅

Wherein λ_*Hyperparameters, Dice, representing loss terms_*And the Dice correlation coefficient represents the segmentation result of the corresponding segmentation module and the real segmentation result. By adopting a deep supervised learning mode, the network learning can be guided to more meaningful characteristics in a progressive mode, so that the segmentation effect and the generalization performance of the network are further improved. Considering that the segmentation result is closer to the true value and the training of the network is more difficult as the segmentation module goes deep, we need to set a larger weight to the module at the deep part, where we set λ₁＝1,λ₂＝2,λ₃＝4,λ₄＝8，λ₅10. The Dice correlation coefficient is a set similarity metric function used to calculate the overlap range of two sets (the value range is 0, 1)]) The closer to 1 the more overlap, the closer to 0 the less overlap. Dice is defined as follows:

wherein | X ≦ Y | refers to the intersection between X and Y; | X | and | Y | represent the number of elements of X and Y, respectively. In the network, X refers to an array output after the network is calculated, and Y is the result of early-stage labeling.

In the verification stage of the network, the data of the patient which is not used in the training process is adopted so as to avoid data leakage. And (5) the optimal model obtained after verification is stored in a format of h 5. The model can be used for segmentation of new kidney and kidney tumors.

Multi-angle two-dimensional image acquisition of three-dimensional image based on kidney and kidney tumor

First, three-dimensional images of the kidney and the kidney tumor segmentation results obtained in the first step are reconstructed. Since the patient's imaging data (including CT and MRI) are acquired in a certain order and with a certain thickness. The images obtained after the last segmentation step are sequentially imported into 3D slicer software, and the software can automatically perform projection or smooth calculation on adjacent layers, so that three-dimensional images of the kidney and the kidney tumor are constructed. For ease of illustration, the kidney and tumor are shown in different colors and a degree of transparency is provided. Then, the three-dimensional image is imaged, and two-dimensional pictures containing the relative positions of the two images are obtained at different distances and different angles. Particularly, due to the limitation of the position of a lens of the laparoscope in the kidney operation, most of kidney pictures seen in the laparoscope are seen from the oblique lower part of the kidney to the upper part of the kidney, and some pictures are properly obtained in the direction, so that the operation in the next step is facilitated, and the later-stage calculated amount can be reduced. As shown in fig. 3.

Kidney segmentation network based on laparoscope image and deep learning technique

(1) Building of database

An important step in renal tumor surgery is the dissociation of the kidney. The video picture of the kidney after being dissociated during the laparoscopic surgery of the kidney tumor is captured, and the video picture is converted into a png format picture frame by frame, wherein the picture size is 255 x 3, 255 is the length and the width of the picture, and 3 means that the picture is a color RGB (red, green and blue) picture and has three picture layers. Similarly, the kidney in each picture is labeled and stored by means of labelme software, and a database of the kidney laparoscope image is constructed.

(2) Design of network structure

In the structural design part of the endoscope image segmentation network, a deep neural network shown in figure 4 is designed by combining the self-property of an image and clinical requirements. Compared with a CT image, the kidney occupies a larger position in the image, so that the segmentation difficulty is relatively small, and a convolutional neural network comprising 16 convolutional layers is designed. Unlike the CT image segmentation network, the endoscope image segmentation network takes a single image as input, so the convolutions used in the network are all 2D convolutions. In the shallow feature mining phase of the network, we use a large convolution kernel of 5 x 5 in order to extract more common features with a larger receptive field. A small size convolution kernel of 3 x 3 is then used in order to refine the details of the extracted features to ensure a high accuracy segmentation result. In order to save the calculation amount, a filling operation is not used in the convolution process, so that the size of the characteristic diagram is gradually reduced, and the resolution is improved while the details are recovered through a deconvolution operation. The output of the convolution layer is combined with the deconvolution layer of the mirror image through jump connection, so that the fusion of shallow features and deep features is realized, the gradient propagation of the network is improved, and the gradient dispersion is prevented.

The network has the following characteristics in order to better segment the kidney image by combining the characteristics of the kidney image under the laparoscope. First, the present network does not use pooling, all using convolution and deconvolution operations. Secondly, convolution kernels of different sizes are used in the network. Again, a hopping connection is used in the present network. The network specific parameters are as follows. We take (m, s, n) to describe the convolution kernel. Where m is the length of the convolution kernel. m means we use square convolution. s represents the step size and n represents the number of convolution kernels.

The encoder structure is as follows:

and subsequently to the decoder portion, the structure is as follows,

after passing through the first deconvolution layer (3 x 3, 1, 512), and through the Relu layer, the array shape (238, 238, 512) is obtained, and then the array shape (238, 238, 512) processed by the seventh convolution layer is processed by the concatemate operation to obtain the array shape (238, 238, 2014),

and (3) passing the array through a fourth deconvolution layer (3 x 3, 1, 64), passing through a Relu layer, and carrying out a concatemate operation on the output and the array (244, 244, 64) processed by the fourth convolution layer to obtain an array shape (244, 244, 128).

and (3) passing the array through an eighth deconvolution layer (5 x 5, 1, 3), carrying out add operation on the output and input arrays (252, 252, 3), and obtaining an output result through a sigmoid layer. The entire network is shown in fig. 4.

(3) Network training and validation

The most critical place for network training is the construction of a network loss function, and the loss function of the segmentation network of the nephroscope image is shown as the following formula.

L＝Dice_f

Dice_fAnd the Dice correlation coefficient represents the segmentation result of the kidney segmentation network and the real segmentation result.

After the network is built, the network can be trained, and a binary image is generated according to the segmentation result, as shown in fig. 5.

Four, dual modality image registration fusion

The two-dimensional image (as shown in fig. 5C) with the positional relationship between the kidney and the kidney tumor generated from the result of the imaging segmentation in step two is referred to as image1, and the corresponding binary image is referred to as image1 a. The segmentation map generated in step three (as shown in fig. 5B) is image2, and the corresponding binary map is referred to as image2 a. Since image1 and image2 image the same kidney from different directions and different angles, and image1a and image2a are binary images of the kidneys of both, there is inevitably one Dice coefficient which is approximate to 1 for image1a and image2a after affine transformation f. This affine transformation f is applied in image1, adjusting the transparency and superimposing it over image 2. The relative position relationship of the kidney and the tumor provided by the CT image can be superposed on the laparoscope image, so that the relative position relationship of the kidney and the tumor of the kidney is visually displayed on the laparoscope image, and an operator can accurately excise the kidney tumor according to the information.

In a specific implementation, we use the published software package ants based on python to perform the calculation. ants can perform various forms of registration, and the registration method Affine can perform rigid rotation, scaling and translation on two graphs. We use this registration method to register image1a to image2a, compute Dice coefficients and transformation matrix f after each image1a and image2a are registered. Taking a Dice maximum time transformation matrix F, applying the image1 to the F for transformation, and adjusting the transparency of the transformed image and superposing the image on the image 2. The aim that the positions of the kidney and the tumor on the superposed image are related to the endoscope image is achieved. As shown in fig. 6. At the moment, the operation doctor can clearly master the tumor boundary in the kidney, so that the operation is more convenient and accurate.

The embodiment of the invention also provides a two-dimensional and three-dimensional medical image registration system based on deep learning, which comprises the following modules:

The specific implementation of each module corresponds to each step, and the invention is not described.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A two-dimensional and three-dimensional medical image registration method based on deep learning is characterized by comprising the following steps:

step 1, constructing a three-dimensional image picture segmentation network based on deep learning, and obtaining segmentation results of focus parts and focus tumors on a three-dimensional image;

the three-dimensional image picture segmentation network based on deep learning comprises 5 segmentation modules with the same structure, each segmentation module consists of a three-dimensional encoder and a three-dimensional decoder, and the processing process of the three-dimensional encoder is as follows: performing convolution operation twice on an input three-dimensional matrix, performing first pooling operation to obtain a first array, performing convolution operation twice, performing second pooling operation to obtain a second array, and performing convolution operation twice to finally generate a third array; the three-dimensional decoder processing procedure is as follows: firstly, carrying out convolution operation and deconvolution operation for the first time on an array three generated by a three-dimensional encoder to obtain an array four, then combining the array four generated by a three-dimensional decoder with a corresponding array two in the three-dimensional encoder, then carrying out convolution operation for two times, then carrying out deconvolution operation for the second time to obtain an array five, combining the array five generated by the three-dimensional decoder with the corresponding array one in the three-dimensional encoder, carrying out convolution operation for two times, and deleting the dimension with the dimension of 1 in the array to obtain the array with the same shape and input; performing Relu operation after each convolution operation;

step 2, reconstructing a three-dimensional image of the focus part and the focus tumor based on the segmentation result of the focus part and the focus tumor in the step 1, displaying the focus part and the focus tumor in different colors, setting certain transparency, acquiring two-dimensional images image1 containing the relative positions of the focus part and the focus tumor under different distances and different angles, and performing binarization processing on the image1 to obtain a corresponding binary image called image1 a;

2. The two-dimensional and three-dimensional medical image registration method based on deep learning of claim 1, characterized in that: the three-dimensional images include CT, MRI, PET/CT imaging pictures containing three-dimensional tomographic information of the focus.

3. The two-dimensional and three-dimensional medical image registration method based on deep learning of claim 1, characterized in that: the two-dimensional image comprises an endoscope image or an image shot by a common camera.

4. The two-dimensional and three-dimensional medical image registration method based on deep learning of claim 1, characterized in that: the convolution kernels in the convolution layers of the three-dimensional encoder are all 3 x 3 in size, the step size is 1, and the quasi-pooling operation is realized by convolution with the convolution kernels 2 x 2 in size and the step size is 2; the size of the deconvolution kernel in the three-dimensional decoder is 2 x 2, and the step size is 2;

5. The two-dimensional and three-dimensional medical image registration method based on deep learning of claim 1, characterized in that: the loss function of the three-dimensional image picture segmentation network based on deep learning is constructed as follows,

L＝λ₁*Dice₁+λ₂*Dice₂+λ₃*Dice₃+λ₄*Dice₄+λ₅*Dice₅

6. The two-dimensional and three-dimensional medical image registration method based on deep learning of claim 5, characterized in that: lambda [ alpha ]₁＝1,λ₂＝2,λ₃＝4,λ₄＝8，λ₅＝10。

7. The two-dimensional and three-dimensional medical image registration method based on deep learning of claim 1, characterized in that: in the two-dimensional encoder, a large-size convolution kernel of 5 x 5 is used in the first two convolution layers so as to extract more common features by utilizing a larger receptive field, and a small-size convolution kernel of 3 x 3 is used in the next 6 convolution layers so as to refine the details of the extracted features so as to ensure a high-precision segmentation result;

the structure of the two-dimensional encoder in step 3 is as follows:

and subsequently to the decoder portion, the structure is as follows,

8. The two-dimensional and three-dimensional medical image registration method based on deep learning of claim 1, characterized in that: the loss function of the two-dimensional image segmentation network based on the deep learning in the step 3 is shown as the following formula,

L＝Dice_f

9. A two-dimensional and three-dimensional medical image registration system based on deep learning, characterized by comprising the following modules:

10. Use of a deep learning based two-dimensional and three-dimensional medical image registration method as claimed in claim 1, characterized in that: the registration method is utilized to realize the registration of three-dimensional images and two-dimensional images under gastroscopes, enteroscopes, gynecology hysteroscopes, laparoscopes, cystoscopes, ureteroscopes, laparoscopes and thoracoscopes.