WO2022007957A1

WO2022007957A1 - Network architecture for automatically processing images, program carrier, and workstation

Info

Publication number: WO2022007957A1
Application number: PCT/CN2021/105538
Authority: WO
Inventors: 王少彬; 陈颀; 陈宇
Original assignee: 北京医智影科技有限公司
Priority date: 2020-07-10
Filing date: 2021-07-09
Publication date: 2022-01-13
Also published as: CN111784682A; CN111784682B

Abstract

Disclosed is a network architecture (1) for automatically processing images, the architecture comprising: an input module (11) for inputting an image to be processed; an encoding path (12), which is configured to use a dual-path network to perform feature extraction on an inputted image; a decoding path (14), which is configured to establish a connection with the encoding path (12); a central module (13), which is configured as transition from the encoding path (12) to the decoding path (14) so as to refine high-dimensional image features; and an output module (15), which is configured to output an image processing result from the decoding path (14). The decoding path (14) is configured to use a decoding path of a Unet architecture to decode a corresponding encoding result of the encoding path (12) as well as the image feature extraction result of the central module (13). Also disclosed are a corresponding method, a corresponding program carrier, and a corresponding workstation. According to the present invention, better image segmentation and classification can be performed.

Description

Network architecture, program carrier and workstation for automatic processing of images

technical field

The invention relates to a network architecture for automatic processing of images, a corresponding computer-readable program carrier and a corresponding workstation.

Background technique

There are many occasions where image processing is required, such as image segmentation to identify specific objects in the image or to automatically outline specific objects in the image.

With the development of modern medicine, more and more diseases need to be diagnosed and treated with the help of medical images, such as tumor radiation therapy.

Cervical cancer is now the second most common cancer in women aged 15 to 44 worldwide. Intensity-modulated radiation therapy (IMRT) has become the radiation therapy of choice for cervical cancer treatment. The effectiveness of IMRT depends on the accuracy of clinical target volume (CTV, Clinical target volume) and organ-at-risk (OAR, organ-at-risk) delineation. Outlines of clinical target volumes and organs at risk are currently performed by radiation therapy oncologists through laborious and tedious manual delineation. Contouring is time-consuming and relies heavily on the experience of the radiation oncologist. Despite the availability of standard guidelines, differences in the experience of radiation oncologists remain one of the main challenges in planning radiation therapy. Therefore, if the anatomy can be automatically segmented in a reasonable time, the manual workload of radiation therapy oncologists can be greatly reduced.

Traditional automatic segmentation methods are derived from statistical models or atlas-based models. However, both methods have limitations. In particular, the organs where cervical cancer is located are complex, and the boundaries of CT images are not very clear. Therefore, current automatic segmentation methods perform poorly in CTV contouring, various complex tissues and organs may be confused with CTV boundaries, and tumor tissue or subclinical disease in CTV may not be detected in CT images. potential spread.

Therefore, there is an urgent need for improvement, especially for cancers such as cervical cancer, for which automatic contouring is still difficult to perform well.

SUMMARY OF THE INVENTION

The object of the present invention is to provide an improved network architecture for the automatic processing of images, a corresponding computer-readable program carrier and a corresponding workstation.

According to a first aspect of the present invention, there is provided a network architecture for automatic processing of images, the network architecture comprising: an input module for inputting an image to be processed; an encoding path configured to use a dual The path network performs feature extraction on the input image; a decoding path is configured to establish a connection with the encoding path; a central module is configured for transition from the encoding path to the decoding path to refine high-dimensional image features; and an output a module configured to output an image processing result from a decoding path; wherein the decoding path is configured to perform a decoding operation on the corresponding encoding result of the encoding path and the image feature extraction result of the central module using the decoding path of the Unet architecture.

According to an optional embodiment of the invention, the dual-path network comprises a series of concatenated micro-blocks, the micro-blocks being correspondingly embedded in the decoders of the decoding paths.

According to an optional embodiment of the present invention, the encoding path includes 5 encoders, the decoding path includes 4 decoders, and each decoder uses a corresponding microblock in the encoding path.

According to an optional embodiment of the present invention, the central module is configured to implement: Conv(3*3)+BN+ReLu; and the decoder is configured to implement: Microblock+

Among them, Conv(3*3) is a 3*3 convolution operation, BN is a batch normalization operation, ReLU is a linear rectification function,

Represents bilinear upsampling.

According to an optional embodiment of the present invention, the input module receives an image or images, the output module outputs the image or images; and the output module is configured to implement:

+Conv(3*3)+BN+ReLU+

in,

represents the sigmoid activation function.

According to an optional embodiment of the present invention, the image is a computed tomography image, and the multiple images are multiple adjacent computed tomography images; and/or the network architecture is configured to analyze the images Perform automatic segmentation processing.

According to an optional embodiment of the present invention, the input module receives an image or images, the output module outputs a classification result; and the output module is configured to implement:

+Conv(3*3)+BN+ReLU+

in,

Represents a fully connected layer.

According to an optional embodiment of the present invention, a loss function is configured at the end point of the network architecture, and the loss function is a cross-entropy loss function or a weight adaptive loss function:

in,

p is a Sigmoid activation function output; FIG y is label, which value 0 or 1; I is the input image; N is the total number of segmented object; p and y have the shape of I * N; P _n is figured out every The proportion of the size of the target area of each category to the size of the overall image volume.

According to an optional embodiment of the present invention, the formula is used as the loss function when the network architecture is configured to output multiple segmented images.

According to an optional embodiment of the present invention, the network architecture is configured for automatic segmentation or classification of images for radiotherapy of tumors, especially images for radiotherapy of cervical cancer.

According to a second aspect of the present invention, there is provided a method for processing an image using the network architecture, comprising the steps of: inputting the image through an input module; processing the image through an encoding path, a central module, and a decoding path; and The image processing results are output through the output module.

According to a third aspect of the present invention, there is provided a computer readable program carrier storing program instructions for implementing the method when executed by a processor.

According to a fourth aspect of the present invention, there is provided a workstation configured to include the computer-readable program carrier.

According to an optional embodiment of the present invention, the workstation is configured as a doctor's workstation for automatic segmentation or classification of medical images.

According to the present invention, not only the performance of image processing can be improved, but also the network can be established faster.

Description of drawings

The principles, features and advantages of the present invention may be better understood by describing the present invention in more detail below with reference to the accompanying drawings. The accompanying drawings include:

Figure 1 shows a simplified block diagram of a network architecture for automatic segmentation or classification of images according to an exemplary embodiment of the present invention.

Figure 2 schematically shows a simplified block diagram of a network architecture according to an exemplary embodiment of the present invention.

FIG. 3 schematically shows a simplified block diagram of a network architecture for automatic image segmentation according to another exemplary embodiment of the present invention.

FIG. 4 schematically shows a simplified block diagram of a network architecture for automatic segmentation of images according to yet another exemplary embodiment of the present invention.

FIG. 5 schematically shows a simplified block diagram for identifying which types of regions of interest are in an image, according to an exemplary embodiment of the present invention.

detailed description

In order to make the technical problems, technical solutions and beneficial technical effects to be solved by the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and multiple exemplary embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, rather than to limit the protection scope of the present invention.

As shown in FIG. 1 , the network architecture 1 may mainly include an input module 11 , an encoding path 12 , a central module 13 , a decoding path 14 and an output module 15 , wherein the illustrated input module 11 is used to receive images, such as CT images, and the encoding path 12 is configured to use a dual-path network (DPN) for feature extraction on the input image, the decoding path 14 is associated with the encoding path 12 to decode the encoding result of the encoding path 12 accordingly, and the central module 13 is used to extract from the encoding path 12. The transition from encoding path 12 to decoding path 14 is used to refine high-dimensional image features, and output module 15 is used to output output results that can be used for automatic contouring. The decoding path 14 is configured to decode using the decoding path of the Unet architecture.

The dual-path network also includes the residual network (ResNet) path and the dense convolutional network (DenseNet) path. ResNet supports the reuse of features, while DenseNet supports the exploration of new features, so the dual-path network combines the advantages of both.

The dual path network consists of a series of micro-blocks connected in series. Microblocks are at the heart of DPN. By using a dual-path network as the encoder part of the network architecture of the present invention, better feature extraction capabilities can be achieved.

According to an exemplary embodiment of the present invention, in order to give the decoder part the same performance of recovering abstract features, microblocks are embedded in the decoder part to replace the standard convolution operation.

Figure 2 schematically shows a simplified block diagram of a network architecture 1 according to an exemplary embodiment of the present invention.

As shown in FIG. 2 , the single CT image 16 input to the encoding path 12 usually has 512*512 pixels. Here, 1*512*512 represents a CT image with 512*512 pixels. The CT image 16 undergoes a series of encoder processes in the encoding path 12 to extract features of the image. The final encoded result of the encoding path 12 is input to the central module 13 , and then processed by the central module 13 and then output to the decoding path 14 , so as to realize the transition from the encoding path 12 to the decoding path 14 . It can also be seen from FIG. 2 that the decoding path 14 is also associated with the encoding path 12 as schematically shown by arrow 17 .

In the exemplary embodiment shown in Figure 2, the encoding path 12 encodes a series of images 16 through 5 encoders/microblocks 18, while the decoding path 14 includes 4 decoders 19, each decoder 19 Each uses a corresponding microblock/encoder 18 in the encoding path 12 .

In Figure 2, exemplary operations/operations within some modules are also schematically represented by symbols. Specifically, the darker arrow represents Conv(3*3)+BN+ReLu, where Conv(3*3) is a 3*3 convolution operation, BN (Batch Normalization) is a batch normalization operation, ReLU (Rectified Linear Unit) is a linear rectification function. Lighter colored arrows indicate microblocks.

Represents a connection relationship.

represents bilinear upsampling,

represents the sigmoid activation function.

According to an exemplary embodiment of the present invention, as shown in FIG. 2 , the central module 13 may be configured as: Conv(3*3)+BN+ReLu.

In the exemplary embodiment shown in FIG. 2 , after being processed by the decoding path 14, it is output to the output module 15, and the output module 15 also outputs an automatically segmented image 20 of 512*512.

According to an exemplary embodiment of the present invention, as shown in FIG. 2 , the output module 15 may be configured to:

+Conv(3*3)+BN+ReLU+

Among them, the Sigmoid activation function

It is used to output the probability value that each pixel belongs to the region of interest (ROI, region of interest).

According to an exemplary embodiment of the present invention, as shown in FIG. 2 , the decoder 19 may be configured as: microblock+

The present exemplary embodiment can be used for collaborative tumor radiation therapy, especially radiation therapy with complex tumor location organs, such as cervical cancer radiation therapy planning, and more specifically, for automatic contouring of clinical target volumes and organs at risk , to generate intermediate results usable by oncology radiation therapists.

The main difference between FIG. 3 and FIG. 2 is that the input module 11 inputs three adjacent CT images 16 before and after, so the neural network obtains the image information before and after, which is a quasi-or-like 3D image. In this way, the segmentation results of the front and rear layers are smoother, avoiding the sudden change of the segmentation results.

Of course, those skilled in the art can understand that the input CT images are not necessarily three images, but can be any suitable number of multiple images.

The output is also an automatically segmented image 20 . Others are similar to FIG. 2, and are not repeated here for the sake of clarity.

The main difference between Figure 4 and Figure 3 is that the output is a multi-channel output. The multi-channel output indicates that the network can not only do 2-class image segmentation, but also has the function of multi-class classification. Taking cervical cancer as an example here, 8 kinds of ROI segmentation results can be output at the same time. The 8 ROIs include 7 organs at risk and 1 clinical target area, which are: 1. Bladder, 2. Bone marrow, 3. Left femoral head, 4. Right femoral head, 5. Rectum, 6. Small intestine, 7. Spinal cord 8. Clinical target area of cervical cancer. However, multi-channel output for other tumors is also supported, and of course other numbers of channels.

Multi-channel output means outputting multiple segmented images, each segmented image corresponds to a region of interest.

Others are similar to FIG. 3 , and are not repeated here for the sake of clarity.

Figure 5 schematically shows a simplified block diagram for identifying which types of regions of interest are in an image, according to an exemplary embodiment of the present invention.

The main difference between FIG. 5 and FIG. 4 is that the last output part of the output module 15 is replaced with a fully connected layer (Full Connected Layer, FC), so that the segmentation network becomes a classification network. The network can judge which types of ROIs are included in the input CT image, but does not judge the positions of various ROIs, that is, does not do specific segmentation. Specifically, here is the Sigmoid activation function shown in Figure 2-4

Replaced with a fully connected layer

Other identical parts will not be repeated.

In order to better evaluate, train and update the network, a loss function is usually provided at the end of the network, with the predicted value of the network and the prepared gold standard as input, the loss (bias or error) is calculated through the loss function, and then reversed The corresponding, eg, all layers in the network are propagated to update the corresponding weight values, eg, all weight values, in the network, so that the network prediction value is getting closer and closer to the gold standard. For example, when the loss drops to an acceptable level, such as a predetermined level, the optimal weight value is also updated, and the network is finally determined, and the training can be ended at this time.

Therefore, the loss function not only affects the training process of the network, but also directly affects the final performance of the network.

According to an exemplary embodiment of the present invention, the loss function is a cross-entropy loss function (1) or a weight adaptive loss function (2):

in,

p is the output of the sigmoid activation function; y is the ground truth map, which takes the value 0 or 1; I is the input image; N is the total number of segmentation targets. Both p and y have the shape I*N; P _n is the ratio of the size of the target area of each category, such as the size of the organ at risk, to the size of the overall image volume.

Preferably, for the multi-channel segmentation network exemplarily shown in Figure 4, formula (2) is used as the loss function. The multi-channel segmentation network needs to segment multiple regions of interest, such as multiple organs at risk and clinical target areas. If the volume sizes of multiple regions of interest are very different, it is difficult to obtain a better segmentation effect using the cross-entropy loss function. The weight adaptive loss function can achieve very good segmentation results.

For other segmentation networks, Equation (1) can be used as the loss function.

It can be understood that the above description mainly takes CT images as an example, but obviously the images can also be X-ray imaging images, magnetic resonance imaging (MRI) images, single photon emission computed tomography (SPECT) images, positron emission type images. Computed tomography (PET) images, superimaging images, etc.

Those skilled in the art can understand that, in general, the present invention provides a neural network architecture for automatic image processing, such as automatic segmentation or classification, which can be used in the medical field, especially for assisting in the analysis of CT images. Segmentation or classification. In particular, the neural network architecture of the present invention is suitable for assisting the operations involved in the automatic delineation of clinical target volumes and organs at risk for radiotherapy of tumors (especially cervical cancer), and then physicians can automatically delineate the results based on the processing results, such as Follow-up operations, such as audits, can be modified if necessary for subsequent treatment purposes.

Therefore, the network architecture of the present invention is not directly used for treatment purposes when automatically processing CT images, but generates intermediate results, and the main purpose is to improve the working efficiency of doctors.

Moreover, those skilled in the art can also understand that the present invention also provides a computer-readable program carrier, which stores program instructions, and when the program instructions are executed by the processor, is used to implement the functions of the above-mentioned network architecture, to Perform image processing. For example, the computer-readable program carrier may be constructed as a flash memory or the like.

The invention further provides a workstation configured to include the computer-readable program carrier. In particular, the workstation can be configured as a doctor's workstation for use by a doctor.

Although specific embodiments of the invention have been described in detail herein, they are presented for purposes of explanation only, and should not be considered as limiting the scope of the invention. Various substitutions, alterations and modifications may be devised without departing from the spirit and scope of the present invention.

Claims

A network architecture (1) for automatically processing images, the network architecture (1) comprising:

an input module (11) for inputting an image to be processed;

an encoding path (12) configured to perform feature extraction on the input image using a two-path network;

a decoding path (14) configured to establish a connection with the encoding path (12);

a central module (13) configured for transition from an encoding path (12) to a decoding path (14) to refine high-dimensional image features; and

an output module (15) configured to output image processing results from the decoding path (14);

The decoding path (14) is configured to perform a decoding operation on the corresponding encoding result of the encoding path (12) and the image feature extraction result of the central module (13) using the decoding path of the Unet architecture.
The network architecture (1) according to claim 1, wherein,

The dual path network comprises a series of concatenated microblocks (18) that are correspondingly embedded in the decoder of the decoding path (14).
The network architecture (1) according to claim 1 or 2, wherein,

The encoding path (12) includes 5 encoders, and the decoding path (14) includes 4 decoders (19), each decoder (19) using a corresponding microcomputer in the encoding path (12), respectively. piece.
The network architecture (1) according to claim 3, wherein,

The central module (13) is configured to implement: Conv(3*3)+BN+ReLu; and

The decoder (19) is configured to implement:

Among them, Conv(3*3) is a 3*3 convolution operation, BN is a batch normalization operation, ReLU is a linear rectification function,
Represents bilinear upsampling.
The network architecture (1) according to claim 4, wherein,

The input module (11) receives an image or images, and the output module (15) outputs the image or images; and

The output module (15) is configured to implement:
in,
represents the sigmoid activation function.
The network architecture (1) according to claim 5, wherein,

The images are computed tomography images, X-ray imaging images, magnetic resonance imaging images, single photon emission computed tomography images, positron emission computed tomography images, and super-generated imaging images, and the multiple images are before and after images. adjacent images; and/or

The network architecture (1) is configured for automatic segmentation processing of images.
The network architecture (1) according to claim 4, wherein,

The input module (11) receives an image or images, and the output module (15) outputs a classification result; and

The output module (15) is configured to implement:
in,
Represents a fully connected layer.
The network architecture (1) according to claim 6, wherein,

A loss function is configured at the end point of the network architecture (1), and the loss function is a cross-entropy loss function (1) or a weight adaptive loss function (2):

in,

p is a Sigmoid activation function output; FIG y is label, which value 0 or 1; I is the input image; N is the total number of segmented object; p and y have the shape of I * N; P n is figured out every The proportion of the size of the target area of each category to the size of the overall image volume.
The network architecture (1) according to claim 8, wherein,

Equation (2) is used as the loss function when the network architecture (1) is configured to output multiple segmented images.
The network architecture (1) according to any one of claims 1-9, wherein,

The network architecture (1) is configured for automatic segmentation or classification of images for radiotherapy of tumors, in particular images for radiotherapy of cervical cancer.
A method for processing an image using the network architecture (1) according to any one of claims 1-10, comprising the following steps:

Input the image through the input module (11);

image processing through an encoding path (12), a central module (13) and a decoding path (14); and

The image processing result is output through the output module (15).
A computer readable program carrier storing program instructions for implementing the method of claim 12 when executed by a processor.
A workstation configured to include the computer-readable program carrier of claim 12.
14. The workstation of claim 13, wherein the workstation is configured as a doctor's workstation for automatic segmentation or classification of medical images.