CN116013475A

CN116013475A - Method and device for sketching multi-mode medical image, storage medium and electronic equipment

Info

Publication number: CN116013475A
Application number: CN202310297602.7A
Authority: CN
Inventors: 周琦超; 马永康
Original assignee: Manteia Data Technology Co ltd In Xiamen Area Of Fujian Pilot Free Trade Zone
Current assignee: Manteia Data Technology Co ltd In Xiamen Area Of Fujian Pilot Free Trade Zone
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-04-25
Anticipated expiration: 2043-03-24
Also published as: CN116013475B

Abstract

The application discloses a method and a device for sketching a multi-mode medical image, a storage medium and electronic equipment, and relates to the technical field of image processing, wherein the method comprises the following steps: acquiring a medical image to be sketched; inputting a medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of a first mode of a predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the first mode sample image and the sketching result corresponding to the first mode sample image, the second mode sample image and the third mode sample image. According to the method and the device, the problem that the drawing precision of the network model to the multi-mode medical image in the related technology is low is solved.

Description

Method and device for sketching multi-mode medical image, storage medium and electronic equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and apparatus for sketching a multi-modal medical image, a storage medium, and an electronic device.

Background

A multi-modality medical image refers to a medical image captured using different medical imaging techniques, such as X-rays, computed Tomography (CT), magnetic Resonance Imaging (MRI), positron Emission Tomography (PET), and the like. These images may provide different information and thus more comprehensive diagnostic information for the physician. In a computer, image segmentation refers to the separation of different regions that are mutually disjoint and each meet the consistency of a particular region, with the purpose of segmentation to extract the region of interest, thus providing a basis for quantitative and qualitative analysis. Medical image segmentation belongs to the sub-field of image segmentation methods, which have many applications in medical images. The automatic image segmentation can help doctors to determine the boundaries of the interested organs and lesion tumors, so that diagnosis and treatment can be performed according to the related statistical information, and the effect rate before and after treatment can be quantitatively evaluated.

The multi-modality medical image can provide more accurate diagnostic information when performing medical image segmentation than a single-modality image. However, the network model in the prior art can only sketch a single-mode image, and other mode image information is not considered, so that the sketching precision of a medical image can be seriously reduced.

Aiming at the problem that the definition of the network model on the multi-mode medical image in the related technology is lower, no effective solution is proposed at present.

Disclosure of Invention

The main purpose of the application is to provide a method and a device for drawing multi-modal medical images, a storage medium and electronic equipment, so as to solve the problem that the drawing precision of a network model on the multi-modal medical images in the related technology is lower.

To achieve the above object, according to one aspect of the present application, a method of delineating a multi-modal medical image is provided. The method comprises the following steps: obtaining a medical image to be sketched, wherein the medical image to be sketched at least comprises: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality; inputting the medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of the first mode of the predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the method comprises the steps of sketching a first modal sample image and a sketching result corresponding to the first modal sample image, a second modal sample image and a third modal sample image.

Further, the target neural network model at least includes: the multi-level attention layer encoder comprises a plurality of encoders, an attention layer and a decoder, wherein the encoders share weights, and the attention layer is a non-local attention layer or a multi-head cross attention layer.

Further, obtaining, by the target neural network model, a sketching result of the medical image of the first modality of the target layer includes: extracting features of the medical images of the first mode, the second mode and the third mode of the target layer through the encoders to obtain a first feature image corresponding to the medical image of the first mode of the target layer, a second feature image corresponding to the medical image of the second mode of the target layer and a third feature image corresponding to the medical image of the third mode of the target layer; performing feature fusion on the first feature map, the second feature map and the third feature map through the attention layer to obtain a target feature map; and decoding the target feature map through the decoder to obtain a sketching result of the medical image of the first modality of the target layer.

Further, if the attention layer is a non-local attention layer, performing feature fusion on the first feature map, the second feature map and the third feature map through the attention layer, and obtaining a target feature map includes: splicing the first feature map, the second feature map and the third feature map to obtain a first initial feature map; calculating first similarity between each region in the first initial feature map, and obtaining the target feature map from the first initial feature map according to the first similarity, wherein the first similarity is calculated by the feature weight of the current region and the feature weight of the non-region.

Further, if the attention layer is a multi-head cross attention layer, performing feature fusion on the first feature map, the second feature map and the third feature map through the attention layer, and obtaining a target feature map includes: splicing the second characteristic diagram and the third characteristic diagram to obtain a second initial characteristic diagram; and calculating a second similarity between the first characteristic map and the second initial characteristic map, and obtaining the target characteristic map from the first characteristic map and the second initial characteristic map according to the second similarity.

Further, feature extraction is performed on the medical images of the first mode, the second mode and the third mode through the encoders to obtain a first feature map corresponding to the medical image of the first mode of the target layer, a second feature map corresponding to the medical image of the second mode of the target layer and a third feature map corresponding to the medical image of the third mode of the target layer, where the feature extraction includes: extracting features of medical images of multiple layers of target modes through one encoder in the multiple encoders to obtain an initial feature map corresponding to the medical images of each layer of target modes, wherein the target modes are one of the following: a first modality, a second modality, and a third modality; and carrying out feature fusion on the initial feature images corresponding to the medical images of the target modes of each layer to obtain feature images corresponding to the medical images of the target modes of the target layer.

Further, before inputting the medical image to be sketched into a target neural network model, the method further comprises: performing rigid registration on the medical images of the plurality of layers of first modalities, the medical images of the plurality of layers of second modalities and the medical images of the plurality of layers of third modalities to obtain registered medical images to be sketched; preprocessing the registered medical image to be sketched to obtain a processed medical image to be sketched; inputting the medical image to be sketched into a target neural network model, and obtaining the sketching result of the medical image of the first modality of the target layer through the target neural network model comprises the following steps: and inputting the processed medical image to be sketched into a target neural network model, and obtaining a sketching result of the medical image of the first mode of the target layer through the target neural network model.

To achieve the above object, according to another aspect of the present application, there is provided a delineating device of a multi-modal medical image. The device comprises: the device comprises an acquisition unit, a storage unit and a display unit, wherein the acquisition unit is used for acquiring a medical image to be sketched, and the medical image to be sketched at least comprises: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality; the output unit is used for inputting the medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of the first mode of the predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the method comprises the steps of sketching a first modal sample image and a sketching result corresponding to the first modal sample image, a second modal sample image and a third modal sample image.

Further, the output unit includes: the extraction module is used for extracting the characteristics of the medical images of the first mode, the medical images of the second mode and the medical images of the third mode through the encoders to obtain a first characteristic image corresponding to the medical images of the first mode of the target layer, a second characteristic image corresponding to the medical images of the second mode of the target layer and a third characteristic image corresponding to the medical images of the third mode of the target layer; the fusion module is used for carrying out feature fusion on the first feature map, the second feature map and the third feature map through the attention layer to obtain a target feature map; and the decoding module is used for decoding the target feature map through the decoder to obtain a sketching result of the medical image of the first mode of the target layer.

Further, if the attention layer is a non-local attention layer, the fusion module includes: the first splicing sub-module is used for splicing the first feature map, the second feature map and the third feature map to obtain a first initial feature map; the first computing sub-module is used for computing first similarity between each region in the first initial feature map, and obtaining the target feature map from the first initial feature map according to the first similarity, wherein the first similarity is obtained by computing the feature weight of the current region and the feature weight of the non-region.

Further, if the attention layer is a multi-head cross attention layer, the fusion module includes: the second splicing sub-module is used for splicing the second characteristic diagram and the third characteristic diagram to obtain a second initial characteristic diagram; and the second computing sub-module is used for computing a second similarity between the first characteristic diagram and the second initial characteristic diagram, and obtaining the target characteristic diagram from the first characteristic diagram and the second initial characteristic diagram according to the second similarity.

Further, the extraction module includes: the extraction submodule is used for extracting the characteristics of the medical images of the multi-layer target modes through one encoder in the plurality of encoders to obtain an initial characteristic diagram corresponding to the medical images of each layer of target modes, wherein the target modes are one of the following: a first modality, a second modality, and a third modality; and the fusion sub-module is used for carrying out feature fusion on the initial feature images corresponding to the medical images of the target modes of each layer to obtain the feature images corresponding to the medical images of the target modes of the target layer.

Further, the apparatus further comprises: the registration unit is used for carrying out rigid registration on the medical images of the plurality of layers of first modes, the medical images of the plurality of layers of second modes and the medical images of the plurality of layers of third modes before the medical images to be sketched are input into the target neural network model, so as to obtain registered medical images to be sketched; the preprocessing unit is used for preprocessing the medical image to be sketched after registration to obtain a processed medical image to be sketched; the output unit is further used for inputting the processed medical image to be sketched into a target neural network model, and obtaining a sketching result of the medical image of the first modality of the target layer through the target neural network model.

In order to achieve the above object, according to another aspect of the present application, there is also provided a computer-readable storage medium storing a program, wherein the program, when run, controls a device in which the storage medium is located to perform the method for delineating a multimodal medical image as described in any one of the above.

To achieve the above object, according to one aspect of the present application, there is provided an electronic device including one or more processors and a memory for storing a sketching method of one or more processors implementing the multi-modal medical image as set forth in any one of the above.

Through the application, the following steps are adopted: obtaining a medical image to be sketched, wherein the medical image to be sketched at least comprises: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality; inputting a medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of a first mode of a predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the problems of low sketching precision of the network model to the multi-mode medical image in the related technology are solved. In the scheme, the medical images of the first modes, the medical images of the second modes and the medical images of the third modes are directly input into the target neural network model, feature fusion among the multi-mode images is achieved through the target neural network model, and then the sketching result of the first modes is obtained, key features of the images of the multiple modes can be extracted and fused by the target neural network model, the sketching precision of the medical images of the first modes is effectively improved, and only the first mode sample image has a real sketching result by the training sample set adopted by the target neural network model, so that the manual labeling cost is reduced, and the sketching efficiency of the medical images can be effectively improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 is a flow chart of a method of delineating a multimodal medical image provided in accordance with an embodiment of the present application;

FIG. 2 is a schematic diagram of a target neural network model provided in accordance with an embodiment of the present application;

FIG. 3 is a schematic diagram of non-local provided according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a Multi-head Cross-section provided in accordance with an embodiment of the present application;

FIG. 5 is a schematic diagram of a sketching device of a multimodal medical image provided according to an embodiment of the application;

fig. 6 is a schematic diagram of an electronic device provided according to an embodiment of the present application.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.

The invention will be described with reference to preferred embodiments, and FIG. 1 is a flowchart of a method for delineating a multi-modality medical image, as shown in FIG. 1, according to an embodiment of the present application, the method comprising the steps of:

step S101, acquiring a medical image to be sketched, where the medical image to be sketched at least includes: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality;

specifically, a medical image to be sketched is acquired, and the medical image to be sketched needs to include a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality and a plurality of layers of medical images of a third modality. For example, a CT medical image acquired by CT is used as the medical image of the first modality. An MR medical image acquired by the MR is used as a medical image of the second modality. The PET-CT medical image is taken as the medical image of the third modality. PET, positron emission tomography, positron emission computed tomography.

The medical image of the first modality, the medical image of the second modality, and the medical image of the third modality are all multi-layered medical images. The upper and lower medical images of the medical image of the target layer are simultaneously used as input images of the target neural network model, and the adjacent layer images are used as input images, so that errors of the medical images can be effectively reduced, and the accuracy of sketching the medical images is improved.

Step S102, inputting a medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of a first mode of a predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the first mode sample image and the sketching result corresponding to the first mode sample image, the second mode sample image and the third mode sample image.

Specifically, after obtaining a plurality of layers of medical images of a first mode, a plurality of layers of medical images of a second mode and a plurality of layers of medical images of a third mode, inputting the medical images to be sketched into a target neural network model, performing feature fusion on the medical images of the first mode, the medical images of the second mode and the medical images of the third mode through the target neural network model, and finally outputting a sketching result corresponding to the medical images of the first mode of the target layer.

The training sample set for training the target neural network model includes a first modal sample image, a sketch result corresponding to the first modal sample image, a second modal sample image, and a third modal sample image. In the scheme, manual sketching of data of a plurality of different modes is not needed, and the manual marking cost is reduced.

It should be noted that, the sketching result corresponding to the first mode sample image may be a sketching result manually marked by the clinician.

It should be noted that, model training may also be performed through the first modality sample image, the second modality sample image, the sketching result corresponding to the second modality sample image, and the third modality sample image, to obtain a trained target neural network model, and then the sketching result corresponding to the medical image of the second modality may be obtained through the trained target neural network model.

In conclusion, the medical images of the first mode, the medical images of the second mode and the medical images of the third mode are directly input into the target neural network model, feature fusion among the multi-mode images is achieved through the target neural network model, and then the sketching result of the first mode is obtained, key features of the images of the multiple modes can be extracted and fused by the target neural network model, the sketching precision of the medical images of the first mode is effectively improved, only the first mode sample image has a real sketching result, and the manual labeling cost is reduced.

In order to improve the calculation efficiency of the model and reduce the calculation complexity of the model, the target neural network model at least comprises: the multi-head attention layer comprises a plurality of encoders, an attention layer and a decoder, wherein the encoders share weights, and the attention layer is a non-local attention layer or a multi-head cross attention layer.

Specifically, as shown in fig. 2, the target neural network model includes a plurality of encoders, attention layers and decoders, because there is an error between the multi-mode images, the drawing accuracy is reduced if a single encoder model is adopted, even lower than the drawing accuracy for training by only using CT images, and the calculation complexity and the model parameter number of the neural network model are greatly improved if a multi-encoder-multi-decoder model is directly adopted. The adoption of the multi-encoder-decoder model can reduce the calculation complexity of the model and simultaneously reduce the requirement of the model on registration accuracy. The encoders in the target neural network model are multiple encoders, but the encoders share weights. The number of parameters in the target neural network model is therefore consistent with the number of parameters of the unimodal sketched neural network.

It should be noted that, as shown in fig. 2, the target neural network model may further include a sigmoid function, where the sigmoid function maps the output of the decoder between 0 and 1, so as to obtain a sketching result of the medical image of the first modality of the target layer.

How to obtain the sketching information through the target neural network model is crucial, and obtaining the sketching result of the medical image of the first modality of the target layer through the target neural network model comprises the following steps: extracting features of the medical images of the first mode, the medical images of the second mode and the medical images of the third mode through a plurality of encoders to obtain a first feature image corresponding to the medical images of the first mode of the target layer, a second feature image corresponding to the medical images of the second mode of the target layer and a third feature image corresponding to the medical images of the third mode of the target layer; performing feature fusion on the first feature map, the second feature map and the third feature map through the attention layer to obtain a target feature map; and decoding the target feature map through a decoder to obtain a sketching result of the medical image of the first mode of the target layer.

Specifically, a first characteristic diagram corresponding to a medical image of a first mode of a target layer is obtained by encoding a medical image of a plurality of layers of first modes through a first encoder of a plurality of encoders, a second characteristic diagram corresponding to a medical image of a second mode of the target layer is obtained by encoding a medical image of a plurality of layers of second modes through a second encoder of the plurality of encoders, and a medical image of a third mode of the target layer is obtained by encoding a medical image of a plurality of layers of third modes through a third encoder of the plurality of encoders. That is, a feature extraction is performed on the medical image of one mode through one encoder, so as to obtain a feature map corresponding to each mode.

After obtaining the feature images corresponding to the medical images of each mode, carrying out feature fusion on the first feature image, the second feature image and the third feature image through the attention layer in the target neural network model to realize fusion on the medical images of multiple modes, obtaining a fused target feature image, and finally carrying out decoding processing on the target feature image through a decoder to obtain a final sketching result.

In an alternative embodiment, the delineation of the medical image of the first modality may be achieved using equation (1):

（1）

wherein F is _Ref Characterizing feature images corresponding to CT medical images, F _Seq Characterizing feature maps corresponding to PT medical images, f..characterizing feature maps corresponding to other modality medical images, encoder, encoder', encoder "represent encoding processes, F _fusion F is a spliced characteristic diagram _fusion ' is a feature map after the attention layer processing, O is a final sketching result, S (Decoder ()) represents a decoding process, CT represents a selected main sequence image (for example, a medical image of a first modality), PT represents a sequence image (for example, a medical image of a second modality), … represents a plurality of other sequence images (for example, a medical image of a third modality), concat represents a concatenation of a plurality of feature maps with a channel layer, and Att represents the attention layer.

It should be noted that, the maximum connected domain post-processing calculation may be performed on the sketching result output by the target neural network model, that is, only the maximum connected domain of the sketching result on the image is reserved.

In order to improve the accuracy of the multi-mode image feature fusion, if the attention layer is a non-local attention layer, performing feature fusion on the first feature map, the second feature map and the third feature map through the attention layer to obtain a target feature map, wherein the step of obtaining the target feature map comprises: splicing the first feature map, the second feature map and the third feature map to obtain a first initial feature map; and calculating a first similarity between each region in the first initial feature map, and obtaining a target feature map from the first initial feature map according to the first similarity, wherein the first similarity is calculated by the feature weight of the current region and the feature weight of the non-region.

If the attention layer is a multi-head cross attention layer, performing feature fusion on the first feature map, the second feature map and the third feature map through the attention layer to obtain a target feature map, wherein the step of obtaining the target feature map comprises the following steps: splicing the second characteristic diagram and the third characteristic diagram to obtain a second initial characteristic diagram; and calculating a second similarity between the first characteristic diagram and the second initial characteristic diagram, and obtaining a target characteristic diagram from the first characteristic diagram and the second initial characteristic diagram according to the second similarity.

Specifically, the Attention layer may be a Non-Local Attention layer (Non-Local) or a Multi-head cross Attention layer (Multi-mode-Self-Attention). When the attention layer is Non-Local, as shown in fig. 3, the first feature map, the second feature map and the third feature map are spliced to obtain a first initial feature map, then the similarity between each region of the first initial feature map and other remaining regions is calculated, wherein the first similarity is calculated by the feature weight of the current region and the feature weight of the Non-region, and finally a final target feature map is obtained from the first initial feature map by using the calculated similarity, wherein 1x1x1 in fig. 3 represents the size of a convolution kernel, and the target feature map is finally obtained by continuously carrying out convolution operation on the first initial feature map.

In an alternative embodiment, equation (2) may be used to derive the final target feature map if the attention layer is a Non-Local attention layer (Non-Local):

wherein y is _i In order to achieve the above-described object feature map,

for any one j, x is an input sequence (for example, the feature map), i and j are element subscripts of two sets, and the subscripts can express a spatial position, a temporal position (time) or a space-time position; / >

Scalar functions acting on the i and j position elements of the input sequence, namely weight coefficients; g (x) _j ) To act on the transfusionEntering a certain transformation function on the j-th position element of the series; c (x) is a normalization function.

When the Attention layer is Multi-model-Self-Attention, as shown in fig. 4, the second feature map and the third feature map are spliced, that is, the feature maps such as the PET feature map and the T1 feature map are spliced, so as to obtain a second initial feature map. It should be noted that T1 is a first sequence of MR medical images, after obtaining a second initial feature map and a first feature map, inputting the second initial feature map and the first feature map into a Multi-head cross Attention layer (Multi-model-Self-Attention), calculating a second similarity between the first feature map and the second initial feature map, and obtaining a target feature map from the first feature map and the second initial feature map according to the calculated second similarity. As shown in fig. 4, the Attention layer further includes an addition normalization layer (Add & Norm), and the addition normalization layer performs a smooth integration process on the output of Multi-mode-Self-Attention.

In summary, the attention layer can more accurately fuse the characteristics of the multi-mode medical image, so as to achieve the effect of improving the accuracy of drawing the medical image.

How to obtain the first feature map, the second feature map, and the third feature map is also crucial, so in the method for sketching a multi-mode medical image provided in the embodiment of the present application, feature extraction is performed on a multi-layer first-mode medical image, a multi-layer second-mode medical image, and a multi-layer third-mode medical image by using multiple encoders, so as to obtain a first feature map corresponding to a first-mode medical image of a target layer, a second feature map corresponding to a second-mode medical image of the target layer, and a third feature map corresponding to a third-mode medical image of the target layer, where the method includes: extracting features of medical images of multiple layers of target modes through one encoder in the multiple encoders to obtain an initial feature map corresponding to the medical images of each layer of target modes, wherein the target modes are one of the following: a first modality, a second modality, and a third modality; and carrying out feature fusion on the initial feature images corresponding to the medical images of the target modes of each layer to obtain feature images corresponding to the medical images of the target modes of the target layer.

Specifically, taking a first mode as an example, extracting feature information of a medical image of the first mode of each layer by a first encoder in a plurality of encoders to obtain an initial feature map of the medical image of the first mode of each layer, and then fusing the initial feature maps of adjacent layers to a target layer to obtain a first feature map of the medical image of the first mode of the target layer.

It should be noted that, the second feature map corresponding to the medical image of the second modality and the third feature map corresponding to the medical image of the third modality are the same as the above method, and are not described herein again.

In order to further improve the accuracy of medical image delineation, in the method for delineating a multi-modal medical image provided in the embodiment of the present application, before inputting the medical image to be delineated into the target neural network model, the method further includes: carrying out rigid registration on the medical images of the first modality in multiple layers, the medical images of the second modality in multiple layers and the medical images of the third modality in multiple layers to obtain medical images to be sketched after registration; preprocessing the registered medical image to be sketched to obtain a processed medical image to be sketched; inputting the medical image to be sketched into a target neural network model, and obtaining the sketching result of the medical image of the first mode of the target layer through the target neural network model comprises the following steps: inputting the processed medical image to be sketched into a target neural network model, and obtaining a sketching result of the medical image of the first mode of the target layer through the target neural network model.

Specifically, in order to further improve the accuracy of the medical image delineation, the medical images of the multiple layers of the first modality, the medical images of the multiple layers of the second modality and the medical images of the multiple layers of the third modality may be subjected to rigid registration, so as to obtain the medical images to be delineated after registration.

After registration, the medical image to be sketched after registration may be preprocessed, for example, preprocessing such as clipping, filtering, normalizing, and the like, and then the medical image to be sketched after processing is obtained.

And finally, inputting the processed medical image to be sketched into a target neural network model to obtain a sketching result of the medical image of the first mode of the target layer.

In an alternative embodiment, different registration modes are adopted for medical images of different modes, if the medical image of the first mode is a CT medical image, the medical image of the second mode is a PET-CT medical image, and the medical image of the third mode is an MR medical image, when the CT medical image and the PET-CT medical image are rigidly registered, a spatial rotation matrix between the CT medical image and the PET-CT medical image is acquired, the spatial rotation matrix is utilized to spatially rotate the PET-CT medical image, and then the PET-CT medical image after rigid registration is obtained.

And carrying out mutual information rigid registration on the CT medical image and the MR medical image when carrying out rigid registration, so as to obtain a registered MR medical image, and further obtain a medical image to be sketched after registration.

Note that, mutual information rigid registration: the mutual information is used for measuring the degree of dependence between X and Y of two different mode images and is used for measuring the distance between the joint probability distribution and the distribution when the two are completely independent, and a specific mathematical calculation method is shown in a formula (3):

（3）

wherein I (X, Y) is the degree of dependence between X and Y, p (X, Y) is the joint probability distribution function of X and Y, and p (X) and p (Y) are the edge probability distribution functions of X and Y, respectively.

It should be noted that, different registration modes are adopted for medical images of different modes, so that accuracy of rigid registration can be effectively improved.

In an alternative embodiment, preprocessing of the registered medical images to be sketched may be achieved in such a way that different window widths and window levels are selected for medical images of different modalities, e.g. CT medical images: 400-400, mr medical image: -1000. And selecting a corresponding medical image from the registered medical images to be sketched according to the selected window width and window level, and further obtaining the medical image to be sketched. And finally, carrying out normalization processing (for example, Z-Score normalization) on the pixel points in the medical image to be sketched to obtain a final processed medical image to be sketched.

In an alternative embodiment, the target neural network model may be trained using the following steps: obtaining a training sample set for training a target neural network model, wherein the training sample set comprises: the multi-mode medical image and the label corresponding to one of the multi-mode medical images, namely the first mode sample image and the real sketching result, the second mode sample image and the third mode sample image corresponding to the first mode sample image.

And then selecting a certain mode as a main sequence mode, and other sequence modes as sequence mode, for example, taking the first mode sample image as a main sequence sample image, taking the second mode sample image and the third mode sample image as sequence sample images, and carrying out rigid registration on the main sequence sample image and the sequence sample images.

And after rigid registration, preprocessing the registered sample image, and finally inputting the processed sample image into an initial neural network model, and training the initial neural network model to obtain a target neural network model after training.

It should be noted that, in order to improve the diversity of the training samples, image enhancement processing may be performed on the sample images, for example, rotation, clipping, flipping, zooming in and out, histogram migration, gaussian blurring, gao Sirui, and resampling, where for rotation, clipping, flipping, zooming in and out are simultaneously applied to all input images, other data enhancement methods are randomly applied to a single-mode image, and by performing image enhancement, the robustness of the target neural network model may be effectively improved.

According to the multi-mode medical image sketching method, the medical image to be sketched is obtained, wherein the medical image to be sketched at least comprises the following steps: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality; inputting a medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of a first mode of a predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the problems of low sketching precision of the network model to the multi-mode medical image in the related technology are solved. In the scheme, the medical images of the first modes, the medical images of the second modes and the medical images of the third modes are directly input into the target neural network model, feature fusion among the multi-mode images is achieved through the target neural network model, and then the sketching result of the first modes is obtained, key features of the images of the multiple modes can be extracted and fused by the target neural network model, the sketching precision of the medical images of the first modes is effectively improved, and only the first mode sample image has a real sketching result by the training sample set adopted by the target neural network model, so that the manual labeling cost is reduced, and the sketching efficiency of the medical images can be effectively improved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

The embodiment of the application also provides a device for outlining the multi-modal medical image, and it should be noted that the device for outlining the multi-modal medical image in the embodiment of the application can be used for executing the method for outlining the multi-modal medical image provided in the embodiment of the application. The following describes a device for outlining a multi-modal medical image provided in an embodiment of the present application.

Fig. 5 is a schematic diagram of a sketching device of a multimodal medical image according to an embodiment of the application. As shown in fig. 5, the apparatus includes: an acquisition unit 501 and an output unit 502.

The obtaining unit 501 is configured to obtain a medical image to be sketched, where the medical image to be sketched at least includes: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality;

the output unit 502 is configured to input a medical image to be sketched into a target neural network model, and output, through the target neural network model, a sketching result of the medical image of the first modality of the predicted target layer, where the target neural network model is obtained by training a training sample set, and the training sample set at least includes: the first mode sample image and the sketching result corresponding to the first mode sample image, the second mode sample image and the third mode sample image.

According to the multi-mode medical image sketching device provided by the embodiment of the application, the medical image to be sketched is obtained through the obtaining unit 501, wherein the medical image to be sketched at least comprises: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality; the output unit 502 inputs the medical image to be sketched into a target neural network model, and outputs a sketching result of the medical image of the first modality of the predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the problems of low sketching precision of the network model to the multi-mode medical image in the related technology are solved. In the scheme, the medical images of the first modes, the medical images of the second modes and the medical images of the third modes are directly input into the target neural network model, feature fusion among the multi-mode images is achieved through the target neural network model, and then the sketching result of the first modes is obtained, key features of the images of the multiple modes can be extracted and fused by the target neural network model, the sketching precision of the medical images of the first modes is effectively improved, and only the first mode sample image has a real sketching result by the training sample set adopted by the target neural network model, so that the manual labeling cost is reduced, and the sketching efficiency of the medical images can be effectively improved.

Optionally, in the device for delineating a multimodal medical image provided in the embodiment of the present application, the target neural network model at least includes: the multi-head attention layer comprises a plurality of encoders, an attention layer and a decoder, wherein the encoders share weights, and the attention layer is a non-local attention layer or a multi-head cross attention layer.

Optionally, in the device for sketching a multimodal medical image provided in the embodiment of the present application, the output unit includes: the extraction module is used for extracting the characteristics of the medical images of the first mode, the medical images of the second mode and the medical images of the third mode through a plurality of encoders to obtain a first characteristic image corresponding to the medical images of the first mode of the target layer, a second characteristic image corresponding to the medical images of the second mode of the target layer and a third characteristic image corresponding to the medical images of the third mode of the target layer; the fusion module is used for carrying out feature fusion on the first feature map, the second feature map and the third feature map through the attention layer to obtain a target feature map; and the decoding module is used for decoding the target feature map through the decoder to obtain a sketching result of the medical image of the first mode of the target layer.

Optionally, in the device for sketching a multimodal medical image provided in the embodiment of the present application, if the attention layer is a non-local attention layer, the fusion module includes: the first splicing sub-module is used for splicing the first feature map, the second feature map and the third feature map to obtain a first initial feature map; the first computing sub-module is used for computing first similarity between each region in the first initial feature map, and obtaining a target feature map from the first initial feature map according to the first similarity, wherein the first similarity is obtained by computing the feature weight of the current region and the feature weight of the non-region.

Optionally, in the device for sketching a multi-modal medical image provided in the embodiment of the present application, if the attention layer is a multi-head intersecting attention layer, the fusion module includes: the second splicing sub-module is used for splicing the second characteristic diagram and the third characteristic diagram to obtain a second initial characteristic diagram; the second computing sub-module is used for computing a second similarity between the first feature map and the second initial feature map, and obtaining a target feature map from the first feature map and the second initial feature map according to the second similarity.

Optionally, in the device for sketching a multimodal medical image provided in the embodiment of the present application, the extracting module includes: the extraction submodule is used for extracting the characteristics of the medical images of the multi-layer target modes through one encoder in the plurality of encoders to obtain an initial characteristic diagram corresponding to the medical images of each layer of target modes, wherein the target modes are one of the following: a first modality, a second modality, and a third modality; and the fusion sub-module is used for carrying out feature fusion on the initial feature images corresponding to the medical images of the target modes of each layer to obtain the feature images corresponding to the medical images of the target modes of the target layer.

Optionally, in the device for delineating a multimodal medical image provided in the embodiment of the present application, the device further includes: the registration unit is used for carrying out rigid registration on the medical images of the plurality of layers of first modes, the medical images of the plurality of layers of second modes and the medical images of the plurality of layers of third modes before inputting the medical images to be sketched into the target neural network model, so as to obtain registered medical images to be sketched; the preprocessing unit is used for preprocessing the registered medical image to be sketched to obtain a processed medical image to be sketched; the output unit is also used for inputting the processed medical image to be sketched into the target neural network model, and obtaining a sketching result of the medical image of the first mode of the target layer through the target neural network model.

The device for delineating a multi-modal medical image comprises a processor and a memory, wherein the acquisition unit 501, the output unit 502 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the delineation of the multi-mode medical image is realized by adjusting kernel parameters.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.

Embodiments of the present invention provide a computer-readable storage medium having stored thereon a program that, when executed by a processor, implements a method of delineating a multi-modality medical image.

The embodiment of the invention provides a processor, which is used for running a program, wherein the program runs to execute a method for outlining a multi-mode medical image.

As shown in fig. 6, an embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and when the processor executes the program, the following steps are implemented: obtaining a medical image to be sketched, wherein the medical image to be sketched at least comprises: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality; inputting a medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of a first mode of a predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the first mode sample image and the sketching result corresponding to the first mode sample image, the second mode sample image and the third mode sample image.

Optionally, the target neural network model includes at least: the multi-head attention layer comprises a plurality of encoders, an attention layer and a decoder, wherein the encoders share weights, and the attention layer is a non-local attention layer or a multi-head cross attention layer.

Optionally, obtaining, by the target neural network model, a delineation result of the medical image of the first modality of the target layer includes: extracting features of the medical images of the first mode, the medical images of the second mode and the medical images of the third mode through a plurality of encoders to obtain a first feature image corresponding to the medical images of the first mode of the target layer, a second feature image corresponding to the medical images of the second mode of the target layer and a third feature image corresponding to the medical images of the third mode of the target layer; performing feature fusion on the first feature map, the second feature map and the third feature map through the attention layer to obtain a target feature map; and decoding the target feature map through a decoder to obtain a sketching result of the medical image of the first mode of the target layer.

Optionally, if the attention layer is a non-local attention layer, feature fusion is performed on the first feature map, the second feature map and the third feature map by the attention layer, and obtaining the target feature map includes: splicing the first feature map, the second feature map and the third feature map to obtain a first initial feature map; and calculating a first similarity between each region in the first initial feature map, and obtaining a target feature map from the first initial feature map according to the first similarity, wherein the first similarity is calculated by the feature weight of the current region and the feature weight of the non-region.

Optionally, if the attention layer is a multi-head cross attention layer, performing feature fusion on the first feature map, the second feature map and the third feature map through the attention layer, and obtaining the target feature map includes: splicing the second characteristic diagram and the third characteristic diagram to obtain a second initial characteristic diagram; and calculating a second similarity between the first characteristic diagram and the second initial characteristic diagram, and obtaining a target characteristic diagram from the first characteristic diagram and the second initial characteristic diagram according to the second similarity.

Optionally, feature extraction is performed on the medical image of the first mode, the medical image of the second mode, and the medical image of the third mode through multiple encoders to obtain a first feature map corresponding to the medical image of the first mode of the target layer, and the second feature map corresponding to the medical image of the second mode of the target layer and the third feature map corresponding to the medical image of the third mode of the target layer include: extracting features of medical images of multiple layers of target modes through one encoder in the multiple encoders to obtain an initial feature map corresponding to the medical images of each layer of target modes, wherein the target modes are one of the following: a first modality, a second modality, and a third modality; and carrying out feature fusion on the initial feature images corresponding to the medical images of the target modes of each layer to obtain feature images corresponding to the medical images of the target modes of the target layer.

Optionally, before inputting the medical image to be delineated into the target neural network model, the method further comprises: carrying out rigid registration on the medical images of the first modality in multiple layers, the medical images of the second modality in multiple layers and the medical images of the third modality in multiple layers to obtain medical images to be sketched after registration; preprocessing the registered medical image to be sketched to obtain a processed medical image to be sketched; inputting the medical image to be sketched into a target neural network model, and obtaining the sketching result of the medical image of the first mode of the target layer through the target neural network model comprises the following steps: inputting the processed medical image to be sketched into a target neural network model, and obtaining a sketching result of the medical image of the first mode of the target layer through the target neural network model.

The device herein may be a server, PC, PAD, cell phone, etc.

The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: obtaining a medical image to be sketched, wherein the medical image to be sketched at least comprises: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality; inputting a medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of a first mode of a predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the first mode sample image and the sketching result corresponding to the first mode sample image, the second mode sample image and the third mode sample image.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A method of delineating a multi-modal medical image, comprising:

obtaining a medical image to be sketched, wherein the medical image to be sketched at least comprises: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality;

inputting the medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of the first mode of the predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the method comprises the steps of sketching a first modal sample image and a sketching result corresponding to the first modal sample image, a second modal sample image and a third modal sample image.

2. The method according to claim 1, wherein the target neural network model comprises at least: the multi-level attention layer encoder comprises a plurality of encoders, an attention layer and a decoder, wherein the encoders share weights, and the attention layer is a non-local attention layer or a multi-head cross attention layer.

3. The method of claim 2, wherein deriving, from the target neural network model, a delineation of a medical image of a first modality of a target layer comprises:

Extracting features of the medical images of the first mode, the second mode and the third mode of the target layer through the encoders to obtain a first feature image corresponding to the medical image of the first mode of the target layer, a second feature image corresponding to the medical image of the second mode of the target layer and a third feature image corresponding to the medical image of the third mode of the target layer;

performing feature fusion on the first feature map, the second feature map and the third feature map through the attention layer to obtain a target feature map;

and decoding the target feature map through the decoder to obtain a sketching result of the medical image of the first modality of the target layer.

4. The method of claim 3, wherein if the attention layer is a non-local attention layer, performing feature fusion on the first feature map, the second feature map, and the third feature map by the attention layer to obtain a target feature map comprises:

splicing the first feature map, the second feature map and the third feature map to obtain a first initial feature map;

Calculating first similarity between each region in the first initial feature map, and obtaining the target feature map from the first initial feature map according to the first similarity, wherein the first similarity is calculated by the feature weight of the current region and the feature weight of the non-region.

5. The method of claim 3, wherein if the attention layer is a multi-headed cross attention layer, performing feature fusion on the first feature map, the second feature map, and the third feature map by the attention layer to obtain a target feature map comprises:

splicing the second characteristic diagram and the third characteristic diagram to obtain a second initial characteristic diagram;

and calculating a second similarity between the first characteristic map and the second initial characteristic map, and obtaining the target characteristic map from the first characteristic map and the second initial characteristic map according to the second similarity.

6. The method according to claim 3, wherein extracting features of the medical image of the first multi-layer modality, the medical image of the second multi-layer modality, and the medical image of the third multi-layer modality by the plurality of encoders to obtain a first feature map corresponding to the medical image of the first modality of the target layer, a second feature map corresponding to the medical image of the second modality of the target layer, and a third feature map corresponding to the medical image of the third modality of the target layer includes:

Extracting features of medical images of multiple layers of target modes through one encoder in the multiple encoders to obtain an initial feature map corresponding to the medical images of each layer of target modes, wherein the target modes are one of the following: a first modality, a second modality, and a third modality;

and carrying out feature fusion on the initial feature images corresponding to the medical images of the target modes of each layer to obtain feature images corresponding to the medical images of the target modes of the target layer.

7. The method of claim 1, wherein the step of determining the position of the substrate comprises,

before inputting the medical image to be sketched into the target neural network model, the method further comprises:

performing rigid registration on the medical images of the plurality of layers of first modalities, the medical images of the plurality of layers of second modalities and the medical images of the plurality of layers of third modalities to obtain registered medical images to be sketched;

preprocessing the registered medical image to be sketched to obtain a processed medical image to be sketched;

inputting the medical image to be sketched into a target neural network model, and obtaining the sketching result of the medical image of the first modality of the target layer through the target neural network model comprises the following steps:

And inputting the processed medical image to be sketched into a target neural network model, and obtaining a sketching result of the medical image of the first mode of the target layer through the target neural network model.

8. A multi-modal medical image delineating apparatus, comprising:

the device comprises an acquisition unit, a storage unit and a display unit, wherein the acquisition unit is used for acquiring a medical image to be sketched, and the medical image to be sketched at least comprises: a plurality of layers of medical images of a first modality, a plurality of layers of medical images of a second modality, and a plurality of layers of medical images of a third modality;

the output unit is used for inputting the medical image to be sketched into a target neural network model, and outputting a sketching result of the medical image of the first mode of the predicted target layer through the target neural network model, wherein the target neural network model is obtained by training a training sample set, and the training sample set at least comprises: the method comprises the steps of sketching a first modal sample image and a sketching result corresponding to the first modal sample image, a second modal sample image and a third modal sample image.

9. A computer-readable storage medium, characterized in that the storage medium stores a program, wherein the program performs the delineating method of a multimodal medical image as claimed in any one of claims 1 to 7.

10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of delineating multimodal medical images of any of claims 1-7.