CN111784682A

CN111784682A - Network architecture, program carrier and workstation for automatic processing of images

Info

Publication number: CN111784682A
Application number: CN202010663227.XA
Authority: CN
Inventors: 王少彬; 陈颀; 陈宇
Original assignee: Beijing Yizhiying Technology Co ltd
Current assignee: Beijing Yizhiying Technology Co ltd
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2020-10-16
Also published as: WO2022007957A1

Abstract

A network architecture (1) for automatic processing of images is disclosed, comprising: an input module (11) for inputting an image to be processed; an encoding path (12) configured to perform feature extraction on an input image using a dual path network; a decoding path (14) configured to establish a connection with the encoding path (12); a central module (13) configured for a transition from an encoding path (12) to a decoding path (14) to refine high-dimensional image features; and an output module (15) configured to output the image processing result from the decoding path (14); wherein the decoding path (14) is configured to perform a decoding operation on the respective encoding results of the encoding path (12) and the image feature refinement results of the central module (13) using a decoding path of the Unet architecture. A corresponding method, a corresponding program carrier and a corresponding workstation are also disclosed. According to the invention, better image segmentation and classification can be carried out.

Description

Network architecture, program carrier and workstation for automatic processing of images

Technical Field

The invention relates to a network architecture for automatic processing of images, a corresponding computer-readable program carrier and a corresponding workstation.

Background

There are many occasions when it is necessary to process an image, for example by segmenting the image to identify or automatically contour a particular object in the image.

With the development of modern medicine, more and more diseases are diagnosed and treated by means of medical images, such as tumor radiotherapy.

Cervical cancer is now the second most common cancer in women between the ages of 15 and 44 worldwide. Intensive-modulation therapy (IMRT) has become the radiation therapy of choice for the treatment of cervical cancer. The effectiveness of IMRT depends on the accuracy of the delineation of the Clinical Target Volume (CTV) and the organ-at-risk (OAR). The contouring of clinical target areas and organs at risk is currently accomplished by radiation therapy oncologists through laborious and cumbersome manual delineations. Contouring is very time consuming and depends very much on the experience of the radiation therapist. Despite the availability of standard guidelines, the differences in the experience of radiation therapy oncologists remain one of the major challenges in planning radiation therapy. Thus, the manual workload of a radiation therapy oncologist can be greatly reduced if the anatomy can be automatically segmented in a reasonable time.

Traditional automatic segmentation methods are derived from statistical models or atlas-based models. However, both of these approaches have limitations. In particular, the organs where the cervical cancer is located are complex, and the boundaries of the CT image are not clear. Therefore, current automated segmentation methods perform poorly in the contouring of CTVs, various complex tissues and organs may be confused with the boundaries of CTVs, and potential spread of tumor tissue or subclinical disease in CTVs may not be detectable in CT images.

Thus, there is a great need for improvement, particularly for cancers such as cervical cancer, for which automatic contouring is currently difficult to achieve well.

Disclosure of Invention

The object of the invention is achieved by an improved network architecture for automated processing of images, a corresponding computer-readable program carrier and a corresponding workstation.

According to a first aspect of the present invention, there is provided a network architecture for automated processing of images, the network architecture comprising: the input module is used for inputting an image to be processed; an encoding path configured to perform feature extraction on an input image using a dual path network; a decoding path configured to establish a connection with the encoding path; a central module configured for a transition from an encoding path to a decoding path to refine high-dimensional image features; and an output module configured to output the image processing result from the decoding path; wherein the decoding path is configured to perform a decoding operation on the respective encoding results of the encoding paths and the image feature refinement result of the central module using a decoding path of the Unet architecture.

According to an alternative embodiment of the invention, the dual path network comprises a series of concatenated micro-blocks, which are accordingly embedded in the decoder of the decoding path.

According to an alternative embodiment of the invention, the encoding path comprises 5 encoders and the decoding path comprises 4 decoders, each decoder using a respective micro-block in the encoding path.

According to an alternative embodiment of the invention, the central module is configured to implement: conv (3 × 3) + BN + ReLu; and the decoder is configured to implement:

wherein Conv (3 × 3) is a 3 × 3 convolution operation, BN is a batch normalization operation, ReLU is a linear rectification function,

representing bilinear upsampling.

According to an alternative embodiment of the present invention, the input module receives an image or images, and the output module outputs an image or images; and the output module is configured to implement:

wherein the content of the first and second substances,

representing a Sigmoid activation function.

According to an optional embodiment of the present invention, the image is a computed tomography image, and the plurality of images are a plurality of computed tomography images adjacent to each other in front and back; and/or the network architecture is configured for automatic segmentation processing of images.

According to an optional embodiment of the invention, the input module receives an image or a plurality of images, and the output module outputs a classification result; and the output module is configured to implement:

wherein the content of the first and second substances,

representing a fully connected layer.

According to an optional embodiment of the present invention, a loss function is configured at an end point of the network architecture, and the loss function is a cross-entropy loss function or a weight adaptive loss function:

wherein the content of the first and second substances,

p is Sigmoid activation function output; y is a label graph, which takes the value 0 or 1; i is an input image; n is the number of total segmentation targets; p and y both have the shape I x N; p_nThe counted volume size of the target region of each category is the proportion of the size of the whole image volume.

According to an alternative embodiment of the invention, when the network architecture is configured to output a plurality of segmented images, a formula is used as the loss function.

According to an alternative embodiment of the invention, the network architecture is configured for automatic segmentation or classification of images for tumor radiotherapy, in particular for cervical cancer radiotherapy.

According to a second aspect of the invention, there is provided a method of processing an image using the network architecture, comprising the steps of: inputting an image through an input module; processing the image through the encoding path, the central module and the decoding path; and outputting the image processing result through the output module.

According to a third aspect of the invention, there is provided a computer readable program carrier storing program instructions for implementing the method when executed by a processor.

According to a fourth aspect of the invention, there is provided a workstation configured to comprise the computer readable program carrier.

According to an alternative embodiment of the invention, the workstation is configured as a physician workstation for automatic segmentation or classification of medical images.

According to the invention, not only can the performance of image processing be improved, but also the network can be established more quickly.

Drawings

The principles, features and advantages of the present invention may be better understood by describing the invention in more detail below with reference to the accompanying drawings. The drawings comprise:

FIG. 1 shows a simplified block diagram of a network architecture for automatic segmentation or classification of images according to an exemplary embodiment of the present invention.

Fig. 2 schematically shows a simplified block diagram of a network architecture according to an exemplary embodiment of the present invention.

Fig. 3 schematically shows a simplified block diagram of a network architecture for automatic segmentation of images according to another exemplary embodiment of the present invention.

Fig. 4 schematically shows a simplified block diagram of a network architecture for automatic segmentation of images according to yet another exemplary embodiment of the present invention.

Fig. 5 schematically shows a simplified block diagram for identifying which regions of interest are in an image according to an exemplary embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and exemplary embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the scope of the invention.

As shown in fig. 1, the network architecture 1 may mainly include an input module 11, an encoding path 12, a central module 13, a decoding path 14 and an output module 15, wherein the input module 11 is configured to receive an image, for example, a CT image, the encoding path 12 is configured to use a Dual Path Network (DPN) to perform feature extraction on the input image, the decoding path 14 is associated with the encoding path 12 to correspondingly decode an encoding result of the encoding path 12, the central module 13 is configured to transition from the encoding path 12 to the decoding path 14 to refine high-dimensional image features, the output module 15 is configured to output an output result that can be used for automatic contour delineation, and the decoding path 14 is configured to decode using a decoding path of a uet architecture.

Dual path networks, in turn, include residual network (ResNet) paths, which support reuse of features, and dense convolutional network (densneet) paths, which support exploration of new features, thus combining the advantages of both.

The dual path network includes a series of Micro-blocks (Micro-blocks) connected in series. The micro-tiles are the core of the DPN. By using a dual path network as the encoder portion of the network architecture of the present invention, better feature extraction capabilities can be achieved.

According to an exemplary embodiment of the present invention, in order to make the decoder section have the same performance of recovering the abstract features, the micro-blocks are embedded in the decoder section to replace the standard convolution operation.

Fig. 2 schematically shows a simplified block diagram of a network architecture 1 according to an exemplary embodiment of the present invention.

As shown in fig. 2, the single CT image 16 input to the encoding path 12 has pixels of 512 × 512, and 1 × 512 represents a CT image having pixels of 512 × 512. The CT image 16 is subjected to a series of encoder processes in the encoding path 12 to extract features of the image. The final encoded result of the encoding path 12 is input to the central module 13, and then processed by the central module 13 and output to the decoding path 14, so as to realize the transition from the encoding path 12 to the decoding path 14. It can also be seen from fig. 2 that the decoding path 14 is also associated with the encoding path 12 as schematically indicated by arrow 17.

In the exemplary embodiment shown in fig. 2, the encoding path 12 encodes the image 16 in a series by 5 encoders/micro-blocks 18, while the decoding path 14 includes 4 decoders 19, each decoder 19 using a respective micro-block/encoder 18 in the encoding path 12.

In fig. 2, exemplary operations within some of the modules are also represented schematically by symbols. Specifically, darker arrows indicate Conv (3 × 3) + BN + ReLu, where Conv (3 × 3) is a 3 × 3 convolution operation, BN (Batch Normalization) is a Batch Normalization operation, and ReLu (rectified Linear unit) is a Linear rectification function. Lighter arrows indicate tiles.

Representing the connection relationship.

Representing a bi-linear up-sampling of the samples,

representing a Sigmoid activation function.

According to an exemplary embodiment of the present invention, as shown in fig. 2, the central module 13 may be configured to: conv (3 × 3) + BN + ReLu.

In the exemplary embodiment shown in fig. 2, the image processed by the decoding path 14 is output to the output module 15, and the output module 15 also outputs an automatically segmented 512 × 512 image 20.

According to an exemplary embodiment of the present invention, as shown in fig. 2, the output module 15 may be configured to:

wherein the Sigmoid activates the function

For outputting a probability value that each pixel belongs to a region of interest (ROI).

According to an exemplary embodiment of the present invention, as shown in fig. 2, the decoder 19 may be configured to:

the present exemplary embodiment can be used for collaborative planning of tumor radiotherapy, particularly radiotherapy in which the tumor site organs are complex, such as cervical cancer, and more particularly for automatic contouring of clinical target areas and organs-at-risk to generate intermediate results usable by tumor radiotherapy physicians.

The main difference between fig. 3 and fig. 2 is that three CT images 16 adjacent to each other in the front and back are input through the input module 11, so that the neural network obtains image information in the front and back, and is a quasi or 3D-like image. Therefore, the segmentation results of the front layer and the rear layer are smoother, and the sudden change of the segmentation results is avoided.

Of course, it will be understood by those skilled in the art that the input CT images need not be three images, but may be any suitable number of images.

The output is also an automatically segmented image 20. The rest is similar to fig. 2 and will not be described again here for the sake of clarity.

Fig. 4 differs from fig. 3 mainly in that the output is a multi-channel output. The output of multiple channels shows that the network not only can perform image segmentation of 2 classes, but also has the function of multi-class classification. Here, for example, cervical cancer, 8 ROI segmentation results can be simultaneously output. The 8 ROIs include 7 organs at risk and 1 clinical target area, specifically: 1. bladder, 2. marrow, 3. left femoral head, 4. right femoral head, 5. rectum, 6. small intestine, 7. spinal cord, 8. cervical carcinoma clinical target area. However, multi-channel outputs for other tumours are also supported, and of course other numbers of channels are also supported.

The multi-channel output representation outputs a plurality of segmented images, each segmented image corresponding to a region of interest.

The others are similar to fig. 3 and are not repeated here for clarity.

The main difference between fig. 5 and fig. 4 is that the last output part of the output module 15 is replaced by a fully Connected Layer (FC), so that the split network becomes a classified network. The network can determine which ROI are included in the input CT image, but does not determine the position of each ROIAnd (4) judging, namely not performing specific segmentation. In particular, here is the Sigmoid activation function shown in fig. 2-4

Replaced by a fully-connected layer

Other similar parts are not described in detail.

For better evaluation, training and updating of the network, a loss function is usually provided at the end point of the network, taking as input the predicted value of the network and the prepared gold standard, calculating the loss (deviation or error) by the loss function, and then propagating back to the corresponding, e.g. all, layers in the network to update the corresponding weight values, e.g. ownership weight values, in the network so that the network predicted value gets closer to the gold standard. The network only finally determines, for example when the losses have fallen to an acceptable level, for example a predetermined level, and the optimal weight values are also updated, at which point the training can be ended.

Thus, the loss function not only affects the training process of the network, but also directly affects the final performance of the network.

According to an exemplary embodiment of the present invention, the loss function is a cross-entropy loss function (1) or a weight adaptive loss function (2):

wherein the content of the first and second substances,

p is Sigmoid activation function output; y is a label Map (Ground Truth Map) with a value of 0 or 1; i is an input image; n is the total number of segmented objects. p and y both have the shape I x N; p_nIs the ratio of the counted volume size of each category of target region, e.g., the organ at risk, to the size of the overall image volume.

Preferably, for the multi-channel split network exemplarily shown in fig. 4, equation (2) is used as the loss function. The multi-channel segmentation network needs to segment a plurality of interested areas, such as a plurality of organs at risk and clinical target areas, if the size difference of the volumes of the plurality of interested areas is large, a better segmentation effect is difficult to obtain by using a cross entropy loss function, and a very good segmentation effect can be obtained by using a weight adaptive loss function.

For other split networks, equation (1) may be used as the loss function.

It will be appreciated that the above description has been made primarily with respect to CT images, but it will be apparent that the images may also be X-ray imaging images, Magnetic Resonance Imaging (MRI) images, Single Photon Emission Computed Tomography (SPECT) images, positron emission computed tomography (PET) images, ultrasound imaging images, and the like.

It will be appreciated by those skilled in the art that the present invention provides, in general, a neural network architecture for automated processing, e.g., automated segmentation or classification, of images, which may be used in the medical field, in particular to assist in segmenting or classifying CT images. In particular, the neural network architecture of the present invention is adapted to facilitate operations involved in automated contouring of clinical targets and organs-at-risk for radiation therapy of tumors (especially cervical cancer), and the physician can then perform subsequent operations, such as review, based on the processing results, such as automated contouring results, and can modify them as necessary for subsequent therapeutic purposes.

Therefore, the network architecture of the present invention is not directly used for therapeutic purposes when automatically processing CT images, but generates intermediate results, mainly aiming at improving the working efficiency of doctors.

Furthermore, it will be appreciated by those skilled in the art that the present invention also provides a computer readable program carrier storing program instructions for implementing the functions of the network architecture described above for image processing when executed by a processor. For example, the computer readable program carrier may be configured as a flash memory or the like.

The invention further provides a workstation configured to include the computer readable program carrier. The workstation may be configured in particular as a doctor workstation for use by a doctor.

Although specific embodiments of the invention have been described herein in detail, they have been presented for purposes of illustration only and are not to be construed as limiting the scope of the invention. Various substitutions, alterations, and modifications may be devised without departing from the spirit and scope of the present invention.

Claims

1. A network architecture (1) for automatically processing images, the network architecture (1) comprising:

an input module (11) for inputting an image to be processed;

an encoding path (12) configured to perform feature extraction on an input image using a dual path network;

a decoding path (14) configured to establish a connection with the encoding path (12);

a central module (13) configured for a transition from an encoding path (12) to a decoding path (14) to refine high-dimensional image features; and

an output module (15) configured to output image processing results from the decoding path (14);

wherein the decoding path (14) is configured to perform a decoding operation on the respective encoding results of the encoding path (12) and the image feature refinement results of the central module (13) using a decoding path of the Unet architecture.

2. The network architecture (1) according to claim 1,

the dual path network comprises a series of concatenated micro-blocks (18), the micro-blocks (18) being embedded in a decoder of the decoding path (14) accordingly.

3. Network architecture (1) according to claim 1 or 2,

the encoding path (12) comprises 5 encoders and the decoding path (14) comprises 4 decoders (19), each decoder (19) using a respective micro-block in the encoding path (12).

4. The network architecture (1) of claim 3,

the central module (13) is configured to implement: conv (3 × 3) + BN + ReLu; and

the decoder (19) is configured to implement: microblock +

representing bilinear upsampling.

5. The network architecture (1) of claim 4,

the input module (11) receives one image or a plurality of images, and the output module (15) outputs one image or a plurality of images; and

the output module (15) is configured to implement:

wherein the content of the first and second substances,

representing a Sigmoid activation function.

6. The network architecture (1) of claim 5,

the images are computed tomography images, X-ray imaging images, magnetic resonance imaging images, single photon emission computed tomography images, positron emission computed tomography images and ultrasonic imaging images, and the images are adjacent images in the front and back; and/or

The network architecture (1) is configured for automatic segmentation processing of images.

7. The network architecture (1) of claim 4,

the input module (11) receives one image or a plurality of images, and the output module (15) outputs a classification result; and

the output module (15) is configured to implement:

wherein the content of the first and second substances,

representing a fully connected layer.

8. The network architecture (1) of claim 6,

configuring a loss function at an end point of the network architecture (1), wherein the loss function is a cross-entropy loss function (1) or a weight adaptive loss function (2):

wherein the content of the first and second substances,

9. The network architecture (1) of claim 8,

when the network architecture (1) is configured to output a plurality of divided images, equation (2) is used as a loss function.

10. The network architecture (1) according to any one of claims 1-9,

the network architecture (1) is configured for automatic segmentation or classification of images for tumor radiotherapy, in particular for cervical cancer radiotherapy.

11. A method of processing an image using the network architecture (1) of any of claims 1-10, comprising the steps of:

inputting an image through an input module (11);

processing the image by means of an encoding path (12), a central module (13) and a decoding path (14); and

the image processing result is output through an output module (15).

12. A computer readable program carrier storing program instructions for implementing the method according to claim 12 when executed by a processor.

13. A workstation configured to include the computer readable program carrier of claim 12.

14. The workstation according to claim 13, wherein the workstation is configured as a physician workstation for automatic segmentation or classification of medical images.