CN109583576B

CN109583576B - Medical image processing device and method

Info

Publication number: CN109583576B
Application number: CN201811544139.7A
Authority: CN
Inventors: 韩妙飞; 张宇; 高耀宗
Original assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Current assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2020-11-06
Anticipated expiration: 2038-12-17
Also published as: CN109583576A

Abstract

The invention provides a medical image processing method, which comprises the following steps: acquiring a medical image to be processed; processing the medical image according to the trained neural network model; the neural network model comprises a plurality of neural network structure units, wherein each neural network structure unit comprises a first convolution layer, a second convolution layer and a third convolution layer which are sequentially connected, and convolution kernels of the first convolution layer and the third convolution layer are unit convolution kernels; the number of output channels of the first convolutional layer is reduced relative to the number of input channels of the first convolutional layer, the second convolutional layer performs spatial convolution on the feature image subjected to dimension reduction by the first convolutional layer, and the number of output channels of the third convolutional layer is increased to the number of input channels of the first convolutional layer.

Description

Medical image processing device and method

Technical Field

The present invention relates to the field of medical images, and in particular, to a medical image processing apparatus and method.

Background

In recent years, image processing methods based on deep learning have been used on a large scale in advanced applications of medical images. In the fields such as medical image detection, medical image segmentation, medical image reconstruction, medical image registration and the like, the deep learning method is far superior to the traditional image processing method in the aspects of accuracy, robustness, calculation speed and the like.

The most successful application of the image field in deep learning is Convolutional Neural Networks (CNN). Among convolutional neural networks, Network models that can be used for end-to-end segmentation include Full Convolutional Network (FCN), deeplab, and the like.

In three-dimensional medical image processing, taking image segmentation as an example, compared with a two-dimensional image, a model file generated by the existing neural network training is generally large, and a single model file is generally about 250 MB. If a plurality of neural network models are adopted in the software product, the occupied space of a disk of the final software product can be greatly increased, so that a lot of adverse effects are brought to the deployment of the product. In addition, the existing method for deploying the neural network and the open-source software library need to allocate new memories for the input and the output of each convolution layer, which consumes more memories, and brings very high requirements and cost for hardware configuration of product deployment.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a medical image processing device and method, which can save storage space, reduce memory consumption and improve the calculation efficiency of a neural network by reducing the size of a network model file and optimizing the memory during the operation of the neural network.

In order to solve the above technical problem, an aspect of the present invention provides a medical image processing method, including: acquiring a medical image to be processed; processing the medical image according to the trained neural network model; the neural network model comprises a plurality of neural network structure units, wherein each neural network structure unit comprises a first convolution layer, a second convolution layer and a third convolution layer which are sequentially connected, and convolution kernels of the first convolution layer and the third convolution layer are unit convolution kernels; the number of output channels of the first convolutional layer is reduced relative to the number of input channels of the first convolutional layer, the second convolutional layer performs spatial convolution on the feature image subjected to dimension reduction by the first convolutional layer, and the number of output channels of the third convolutional layer is increased to the number of input channels of the first convolutional layer.

In an embodiment of the present invention, the neural network structure unit further includes a batch normalization layer and an activation layer.

In an embodiment of the invention, the neural network model is a convolutional neural network model.

In an embodiment of the invention, the convolutional neural network model is a U-shaped or V-shaped neural network model.

In an embodiment of the present invention, a maximum memory required for input and output in all convolutional layers is determined, and corresponding input memory and output memory are allocated according to the maximum memory, and in a processing process, the input memory and the output memory are exchanged with each other.

In an embodiment of the present invention, the memory further includes a cross-layer connected convolutional layer, and an input and an output of the cross-layer connected convolutional layer are embedded in the input memory and the output memory.

In an embodiment of the invention, the convolution kernel size of the second convolution layer is 3-5.

Another aspect of the present invention provides a medical image processing apparatus, the processing apparatus including: the medical image acquisition module is used for acquiring a medical image to be processed; a medical image processing module for processing the medical image according to the trained neural network model; the neural network model comprises a plurality of neural network structure units, wherein each neural network structure unit comprises a first convolution layer, a second convolution layer and a third convolution layer which are sequentially connected, and convolution kernels of the first convolution layer and the third convolution layer are unit convolution kernels; the number of output channels of the first convolutional layer is reduced relative to the number of input channels of the first convolutional layer, the second convolutional layer performs spatial convolution on the feature image subjected to dimension reduction by the first convolutional layer, and the number of output channels of the third convolutional layer is increased to the number of input channels of the first convolutional layer.

Yet another aspect of the present invention provides a computer readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform the method as described above.

Compared with the prior art, the invention has the following advantages: the neural network model comprises a plurality of neural network structure units, the neural network structure units are of bottleneck structures, the number of output channels of a first convolution layer is reduced relative to the number of input channels of the first convolution layer, a second convolution layer performs space convolution on a characteristic image subjected to dimension reduction by the first convolution layer, the number of output channels of a third convolution layer is increased to the number of input channels of the first convolution layer, convolution kernels of the first convolution layer and the third convolution layer are unit convolution kernels and are respectively used for dimension reduction and dimension increase of the number of channels, because the space convolution is performed on a characteristic image subjected to dimension reduction, the parameter quantity of a model file is remarkably reduced, the operation efficiency is remarkably increased, and meanwhile, because the integral number of the input channels and the number of the output channels are kept unchanged, the effect of neural network image processing can be guaranteed. For the U-shaped or V-shaped neural network model, the invention calculates the maximum required memories in the input layer and the output layer of all the convolution layers, and allocates the maximum memories of the two layers as input and output respectively. Therefore, the image processing effect is ensured under the condition of lower memory consumption, and the parameter quantity of the model file is obviously reduced.

Drawings

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below, wherein:

FIG. 1 is a schematic block diagram of a computer device according to some embodiments of the invention.

FIG. 2 is a block diagram depicting an exemplary processing engine according to some embodiments of the invention.

FIG. 3 is a block diagram depicting an exemplary neural network determination module, in accordance with some embodiments of the present invention.

Fig. 4 is a flowchart of a medical image processing method according to an embodiment of the present invention.

Fig. 5A-5H are schematic diagrams of exemplary processes of a medical image processing method according to an embodiment of the invention.

Fig. 6 is a schematic diagram of a prior art convolution-batch normalization-activation building block.

FIG. 7 is a diagram of a neural network building block according to an embodiment of the present invention.

FIG. 8A is a diagram illustrating memory allocation according to an embodiment of the invention.

FIG. 8B is a diagram illustrating memory allocation according to another embodiment of the present invention.

FIG. 9 is a schematic diagram of an exemplary neural network model described in accordance with an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the invention, from which it is possible for a person skilled in the art, without inventive effort, to apply the invention also in other similar contexts. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

As used in this disclosure and in the claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Although the present invention makes various references to certain modules in a system according to embodiments of the present invention, any number of different modules may be used and run on a computing device and/or processor. The modules are merely illustrative and different aspects of the systems and methods may use different modules.

It will be understood that when an element or module is referred to as being "connected," "coupled" to other elements, modules or blocks, it can be directly connected or coupled or in communication with the other elements, modules or blocks or intervening elements, modules or blocks may be present unless the context clearly dictates otherwise. As used herein, the term "and/or" can include any and all combinations of one or more of the associated listed items.

Flow charts are used in the present invention to illustrate the operations performed by a system according to embodiments of the present invention. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, various steps may be processed in reverse order or simultaneously. Meanwhile, other operations are added to or removed from these processes.

Some descriptions of the present invention are provided in connection with Computed Tomography (CT) images. It will be understood that this is for illustrative purposes and is not intended to limit the scope of the invention. The apparatus and methods of the invention herein may be used to process images or image data from other imaging modalities. The imaging system may be a single mode imaging system such as an Emission Computed Tomography (ECT), an ultrasound imaging system, an X-ray optical imaging system, a Positron Emission Tomography (PET) system, and the like. The imaging system may also be a multi-modality imaging system, such as a computed tomography-magnetic resonance imaging (CT-MRI) system, a positron emission tomography-magnetic resonance imaging (PET-MRI) system, a single photon emission tomography-computed tomography (SPECT-CT) system, a digital subtraction angiography-computed tomography (DSA-CT) system, or the like.

FIG. 1 is a schematic block diagram of a computer device according to some embodiments of the invention. Computer 100 may be used to implement particular methods and apparatus disclosed in some embodiments of the invention. The specific apparatus in this embodiment is illustrated by a functional block diagram of a hardware platform that includes a display module. In some embodiments, computer 100 may implement implementations of some embodiments of the invention by its hardware devices, software programs, firmware, and combinations thereof. In some embodiments, the computer 100 may be a general purpose computer, or a special purpose computer.

Referring to FIG. 1, a computer 100 may include an internal communication bus 101, a processor (processor)102, a Read Only Memory (ROM)103, a Random Access Memory (RAM)104, a communication port 105, input/output components 106, a hard disk 107, and a user interface 108. The internal communication bus 101 may enable data communication among the components of the computer 100. The processor 102 may make the determination and issue the prompt. In some embodiments, the processor 102 may be comprised of one or more processors. The communication port 105 may enable the computer 100 to communicate with other components (not shown), such as: and the external equipment, the image acquisition equipment, the database, the external storage, the image processing workstation and the like are in data communication. In some embodiments, computer 100 may send and receive information and data from a network through communication port 105. Input/output component 106 supports the flow of input/output data between computer 100 and other components. The user interface 108 may enable interaction and information exchange between the computer 100 and a user. The computer 100 may also include various forms of program storage units and data storage units such as a hard disk 107, Read Only Memory (ROM)103 and Random Access Memory (RAM)104, capable of storing various data files used in computer processing and/or communications, as well as possible program instructions executed by the processor 102.

By way of example, the input/output components 106 may include one or more of the following components: a mouse, a trackball, a keyboard, a touch-sensitive component, a sound receiver, etc.

The processor 102 in the present invention may be configured as a processing engine. FIG. 2 is a block diagram of a processing engine according to some embodiments of the invention. Referring to fig. 2, the processing engine 200 may include an acquisition module 210, a control module 220, a neural network determination module 230, an image data processing module 240, and a storage module 250. The processing engine 200 may be implemented on various components (e.g., the processor 102 of the computer 100 shown in FIG. 1).

The acquisition module 210 may receive image data. The acquisition module 210 may acquire image data from an imaging system or a storage device (e.g., hard disk 107, ROM 103, or RAM 104). The image data may include scan data, reconstructed images, and the like. In some embodiments, the acquisition module 210 may send the acquired image data to other modules or units of the processing engine 200 for further processing. For example, the acquired image data may be sent to the storage module 250 for storage. As another example, the acquisition module 210 may send image data (e.g., scan data) to the image data processing module 240 to segment the image.

The control module 220 may control the operation of the acquisition module 210, the neural network determination module 230, the image data processing module 240, and/or the storage module 250 by, for example, generating one or more control parameters. For example, the control module 220 may control the acquisition module 210 to acquire image data. As another example, the control module 220 may control the image data processing module 240 to process the image data acquired by the acquisition module 210. As yet another example, the control module 220 may control the neural network determination module 230 to train the neural network model. In some embodiments, the control module 220 may receive real-time commands or retrieve predetermined commands provided by, for example, a user (e.g., a physician) or the computer 100 to control one or more operations of the acquisition module 210, the neural network determination module 230, and/or the image data processing module 240. For example, the control module 220 can adjust the image data processing module 240 to generate the image of the object according to real-time instructions and/or predetermined instructions. In some embodiments, control module 220 may communicate with one or more other modules of processing engine 200 to exchange information and/or data.

The neural network determination module 230 may determine one or more neural network models. For example, the neural network determination module 230 may determine a neural network model configured to segment the image. In some embodiments, the neural network determination module 230 may send the determined neural network model to one or more other modules for further processing or application. For example, the neural network determination module 230 may send the neural network model to the storage module 250 for storage. As another example, the neural network determination module 230 may send the neural network model to the image data processing module 240 for image processing.

The image data processing module 240 may process information provided by various modules of the processing engine 200. The image data processing module 240 may process image data acquired by the acquisition module 210, image data retrieved from the storage module 250, and the like. In some embodiments, the image data processing module 240 may segment an image, generate a report including one or more images and/or other relevant information, and/or perform any other function for image segmentation in accordance with various embodiments of the present invention.

The storage module 250 may store image data, models, control parameters, processed image data, or a combination thereof. In some embodiments, the memory module 250 may store one or more programs and/or instructions executable by the processor of the processing engine 200 to perform the exemplary methods described in this disclosure. For example, storage module 250 may store programs and/or instructions executed by a processor of processing engine 200 to acquire image data, segment images based on image data, train neural network models, and/or display any intermediate results or resulting images.

In some embodiments, the neural network determination module 230 may be provided independently of the processing engine 200. One or more neural network models determined by another device may be stored in computer 100 (e.g., hard disk 107, ROM 103, RAM104, etc.) or on an external device that is accessible by processing engine 200 via, for example, a network. In some embodiments, such devices may include the same or similar portions as the neural network determination module 230. In some embodiments, the neural network determination module 230 may store one or more neural network models determined by another device and accessible by one or more components of the computer 100 (e.g., the processor 102, etc.). In some embodiments, the neural network model applicable in the present invention may be determined by computer 100 (or comprise part of, for example, processing engine 200) or an external device accessible by computer 100 (or comprise part of, for example, processing engine 200).

Fig. 3 is a block diagram depicting an exemplary neural network determination module 230, in accordance with some embodiments of the present invention. Referring to fig. 3, the neural network determining module 230 may include an image reconstructing unit 320, a neural network training unit 340, and a storage unit 360. The neural network determination module 230 may be implemented on various components (e.g., the processor 102 of a computer as shown in fig. 1).

The image reconstruction unit 320 may reconstruct one or more images based on one or more reconstruction techniques. In some embodiments, the image reconstruction unit 320 may send the reconstructed image to other units or blocks of the neural network determination module 230 for further processing. For example, the image reconstruction unit 320 may send the reconstructed images to the neural network training unit 340 to train the neural network model. As another example, the image reconstruction unit 320 may transmit the reconstructed image to the storage unit 360 to be stored.

The neural network training unit 340 may train the neural network model. In some embodiments, the neural network training unit 340 may train a neural network model configured to generate predicted images from the undersampled images. Such a neural network model may be obtained using some images and region of interest boxes for those images.

In some embodiments, the neural network training unit 340 may also include a parameter determination block 342, an extraction block 344, a calculation block 346, and a decision block 348 for initialization. The parameter determination block 342 may initialize the neural network model. For example, the parameter determination block 342 may construct an initial neural network model. As another example, the parameter determination block 342 may initialize one or more parameter values for the initial neural network model. The extraction block 344 may extract information from one or more training images. For example, the extraction block 344 may extract features for one or more regions from the training images. The calculation block 346 may perform a calculation function, for example, in the course of training the neural network model. For example, the calculation block 346 may calculate one or more parameter values of the neural network model that are updated during the iterative training process. Decision block 348 may perform decision functions in, for example, training the neural network model. For example, the decision block 348 can determine whether a condition is satisfied during training of the neural network model.

The storage unit 360 may store information about, for example, training a neural network model. In some embodiments, the information related to training the neural network model may include images used to train the neural network model, algorithms used to train the neural network model, parameters of the neural network model, and the like. For example, the storage unit 360 may store the training images according to a certain standard. The training images may be stored or uploaded into the storage unit 360 based on the dimensions of the training images. For purposes of illustration, a two-dimensional (2D) image or a three-dimensional (3D) image may be stored as a 2D or 3D matrix including a plurality of elements (e.g., pixels or voxels). Elements of the 2D matrix are arranged in the storage unit 360 in such a manner that each row of elements is sequentially stored in the storage unit 360, each row of elements corresponding to the length of the 2D image, and thus the elements in the same row are adjacent to each other in the storage unit 360. Elements of the 3D matrix are arranged in the storage unit 360 in such a manner that a plurality of 2D matrices constituting the 3D matrix are sequentially stored in the storage unit 360, and then rows and/or columns of each 2D matrix are sequentially stored in the storage unit 360. The storage unit 360 may be a memory that stores data to be processed by a processing device such as a CPU, GPU, or the like. In some embodiments, memory unit 360 may be a memory accessed by one or more GPUs, or a memory accessed only by a particular GPU.

It should be noted that the above description of the neural network determination module 230 is provided for illustrative purposes only, and is not intended to limit the scope of the present invention. Many variations and modifications may be made to the teachings of the present invention by those of ordinary skill in the art. Such variations and modifications do not depart from the scope of the invention.

Fig. 4 is a flowchart of a medical image processing method according to an embodiment of the present invention. The medical image processing method in the embodiment of the invention can be used for image segmentation, image detection, image registration and the like. For convenience of explanation, fig. 4 illustrates a medical image processing method in an embodiment of the present invention, taking image segmentation as an example. Referring to fig. 4, the medical image processing method includes the steps of:

step 401: a medical image to be processed is acquired.

Step 401 may also include two specific steps: (1) acquiring medical image data; (2) the acquired medical image data is preprocessed.

The medical image data to be acquired may be a CT image, an X-ray image, an MRI image, a PET image or a CT-PET image, a CT-MRI image, etc. The object of the image may include a substance, tissue, organ, specimen, body, or the like, or any combination thereof. Particular objects from medical images of a body may include a head, chest, lung, heart, liver, spleen, pleura, mediastinum, abdomen, large intestine, small intestine, bladder, gall bladder, pelvic cavity, diaphysis, terminal, skeletal, vascular, or the like, or any combination thereof.

The step of pre-processing the acquired medical image data comprises down-sampling, i.e. down-sampling the medical image to a specified resolution of the trained neural network model. The preprocessing step may further include normalization, that is, the medical image data after down-sampling is normalized according to the normalization method adopted in the neural network model training stage.

Step 402: segmenting the medical image according to the trained neural network model. Specifically, the neural network structure model can be realized in a mode of coarse segmentation of the neural network and fine segmentation of the neural network cascade. The rough segmentation neural network is used for fast organ positioning, and the fine segmentation network is used for fine segmentation of organs. Segmenting the medical image according to the trained neural network model may comprise coarse segmenting the medical image in step 4021 and fine segmenting the medical image in step 4022.

Step 4021: and performing coarse segmentation on the medical image according to the trained coarse segmentation neural network model.

For the medical image data of the human body, in the step, the preprocessed medical image is input into the rough segmentation neural network for forward propagation calculation, and then the probability distribution map of the organ in the human body can be obtained.

In an embodiment, after the step 4021 performs coarse segmentation on the medical image, the step further includes performing post-processing on the probability distribution map of the medical image. The so-called post-processing mainly includes binarization and determination of the maximum connected region. After post-processing, the segmentation result of the organ at the coarse resolution can be obtained.

Step 4022: and performing fine segmentation on the medical image after the coarse segmentation according to the trained fine segmentation neural network model.

Taking a region of interest (localization box of interest) from the coarse segmentation result obtained in step 4021, and capturing the image in the localization box as an initial input image of the subdivided neural network module.

In this step, the following processing may be performed on the initial input image: (1) preprocessing, including down-sampling the initial image to a specified resolution of the trained sub-segmentation neural network model, and normalizing; (2) inputting the preprocessed image into a subdivision neural network to perform forward propagation calculation to obtain a probability distribution map of the organ; (3) the probability distribution map is post-processed in a similar step 4021, and the result of the segmentation of the organ is obtained.

It should be noted that the medical image to be processed in the present embodiment is a three-dimensional image, and the present invention is also applicable to two-dimensional medical image data. Accordingly, the neural network used for two-dimensional image segmentation is a two-dimensional neural network.

In addition, the coarse segmentation neural network model in step 4021 and the fine segmentation neural network model in step 4022 are trained in advance. Taking a three-dimensional medical image as an example, the training of the neural network model is to train a large number of three-dimensional medical images and labeled images of organs thereof by using a convolutional neural network to obtain a segmentation model file. The main process of training is as follows:

(1) image preprocessing, which comprises firstly resampling an image to an image with a specified resolution, secondly randomly taking out image blocks (crop) from the complete image, and then normalizing the image blocks. The reason for resampling the images to the same resolution is that the spatial resolutions of different medical images are different, and the unification of the same resolution in the training stage is beneficial to the convergence of model training. The image is normalized in the same way, so that the gray distribution of the image is controlled within a specified range, such as-1-1, thereby accelerating the convergence of the model. The image is partitioned, the image block training is adopted instead of the whole original image, the limitation of a memory (video memory) is mainly considered, and the partial image training can be regarded as a regularization means, so that the model performance is better.

(2) Model training: and (3) inputting the image blocks into the convolutional neural network for training by taking the batch size (batch size) as 6, and saving the training model file after multiple iterations when the Loss function value (Loss) to be trained is low.

The training process of the rough segmentation neural network model and the fine segmentation neural network model is the same, and the difference is that the resolution ratio after the image is resampled is different. The resolution of the coarse segmentation neural network model is coarse, such as 6 mm. The resolution of the sub-segmentation neural network model is fine, such as the resolution of 1 mm. And respectively obtaining a rough segmentation model file and a fine segmentation model file after training. The rough segmentation model file is used for rough segmentation of the organ in the medical image in step 4021 described above. The sub-segmentation model file is used for the sub-segmentation of the organ in the medical image in step 4022.

The process of model training and the subsequent organ segmentation of the medical image based on the trained neural network model form a complete organ segmentation solution.

Fig. 5A-5H are schematic diagrams of exemplary processes of a medical image processing method according to an embodiment of the invention. The process takes a CT image of a human body as an example, and performs liver segmentation on the image.

Fig. 5A is the medical image to be processed acquired in step 401 of fig. 4, in this example, the image size is [512,512,594], and the image resolution is [1.172mm,1.172mm,1.5mm ].

Fig. 5B is an image after pre-processing the medical image of fig. 5A. The pre-processing includes down-sampling and normalization processing. The image size after pre-processing was [112,112,160], and the image resolution was [6mm,6mm,6mm ]. It can be seen that the size of the image after the preprocessing becomes smaller and the resolution is reduced compared to the image before the preprocessing.

Fig. 5C is a probability distribution diagram obtained after the image obtained in fig. 5B is subjected to a trained coarse segmentation neural network. The probability distribution map has been post-processed by binarization and determination of the largest connected region. The resulting image size was [112,112,160]]The image resolution is [6mm,6mm]The probability of the liver region is close to 1 (shown as white in the figure), and the probability of other regions is very small and is basically less than 10^-6(shown in black in the figure).

Fig. 5D is a positioning frame of the region of interest calculated from the roughly segmented image, and the image in the positioning frame is used as the initial image for performing the fine segmentation processing, and the initial image is subjected to the resampling processing in the preprocessing, that is, the initial image is resampled to the specified resolution of the trained subdivided neural network model. The resulting image size was [256,256,192] and the image resolution was [1mm,1mm,1mm ].

FIG. 5E is a diagram of the image of FIG. 5D, with the pixel values of the image normalized to [ -1,1], continuing with the normalization process. The resulting image size was [256,256,192] and the image resolution was [1mm,1mm,1mm ].

Fig. 5F is the image resulting after post-processing to binarize the image in fig. 5E and to determine the largest connected component. The image size is still [256,256,192], and the image resolution is still [1mm,1mm,1mm ].

Fig. 5G is a three-dimensional schematic diagram of the liver after returning the segmentation result of the liver image obtained through the previous steps to the original image resolution.

Fig. 5H shows the result of the liver image segmentation obtained through the above steps restored to the human CT image. The liver can be identified from the CT image of the human body with reference to fig. 5H.

Fig. 6 is a schematic structural diagram of a basic structural unit in the prior art. Referring to fig. 6, the size of the input tensor (InputTensor) is x × y × z × c, and when the input is an image, x, y, and z represent the spatial size of the image, and c is the number of channels of the input image. The input tensor is convolved (Convolution) in a Convolution layer 601 with a Convolution kernel of size 5 × c, where 5 × 5 represents the size of the Convolution kernel, the first c is the number of input channels, and the second c is the number of output channels. The convolution results are further processed by Batch Normalization (Batch Normalization) and activation (ReLU). Together, convolutional layer 601, bulk normalization layer 602, and active layer 603 form a convolutional-bulk normalization-active (Conv-BN-ReLU) structure. After the input Tensor is subjected to Conv-BN-ReLU structure processing, a final Output Tensor (Output Tensor) is obtained. The output tensor is the same size and number of channels as the input tensor, and is also x y z c.

U-Net is a deep neural network dedicated to image segmentation. The embodiment is based on the U-Net neural network structure to process the image. However, in practical applications, the network model files stored in the network model are usually large, a single model file is usually about 250MB, and if a plurality of neural network models are adopted in a software product, the final product is very huge, which brings adverse effects such as parameter redundancy, storage space waste, and reduction of calculation efficiency, and is extremely unfavorable for use and popularization of the product.

FIG. 7 is a diagram of a neural network building block according to an embodiment of the present invention. Referring to fig. 7, the neural network structure unit of the present embodiment replaces a Conv-BN-ReLU structure shown in fig. 6 with a Bottleneck (bottleeck) structure. In the bottleeck structure, the convolutional layer including a convolutional kernel size of 5 × c in the Conv-BN-ReLU structure is divided into three convolutional layers, which are a first convolutional layer 701, a second convolutional layer 702, and a third convolutional layer 703, respectively. The first convolution layer 701 includes a unit convolution kernel with a convolution kernel size of 1 × 1, and the number of input channels is c, and the number of output channels is c/n; the second convolution layer 702 contains convolution kernels with a size of 3 × 3, and the number of input channels and the number of output channels are both c/n; the third convolutional layer 703 also includes a unit convolutional kernel having a convolutional kernel size of 1 × 1, similar to the first convolutional layer 701, but has the number of input channels of c/n and the number of output channels of c.

As shown in fig. 7, after each convolutional layer there is a corresponding bulk normalization layer 602 and activation layer 603.

With the neural network configuration unit of this embodiment, the input tensors can be convolved with the three convolution kernels respectively by sequentially passing through the first convolution layer 701, the second convolution layer 702, and the third convolution layer 703. It will be appreciated that after passing through the first convolutional layer 701, the number of channels of the network is reduced from c to c/n, where n is referred to as a dimensionality reduction coefficient. After passing through the third convolutional layer 703, the number of output channels of the network is raised in dimension from c/n to c, and an output with the same number of channels as the original input is obtained, where n is called a raised dimension coefficient. The dimension reduction coefficient and the dimension increasing coefficient n are positive integers which are larger than or equal to 1, and the larger n is, the larger the compression ratio is. Preferably, n is a multiple of 2. The name of the Bottleneck structure is the structural characteristic that the number of channels of the convolution layers at two ends is large and the number of channels of the middle convolution layer is small.

For example, the following steps are carried out: assume that in a bottleeck structure, the number of input channels of the input tensor is 256, and n is 4. The convolution kernel of the first convolution layer 701 may be denoted as 1 x 256 x 64, the convolution kernel of the second convolution layer 702 may be denoted as 3 x 64, and the convolution kernel of the third convolution layer 703 may be denoted as 1 x 64 x 256. It can be seen that the final number of output channels, which is the same as the number of input channels, is 256.

In other embodiments, the size of the convolution kernel applied in the neural network structure, the number of input channels, the number of output channels, and the like are not limited to the values adopted in the above example, and the three dimensions of the convolution kernel are not necessarily completely equal. For example, the size of a convolution kernel is x y z, where x, y, z may or may not be equal.

The present embodiment is implemented based on a U-Net network structure, and it can be understood that the bottleeck neural network structure adopted in the present embodiment is applicable to other U-shaped neural network models or V-shaped neural network models besides U-Net.

The Bottleneck neural network structure of the embodiment is adopted to replace a convolution-batch processing-activation module structure in the U-Net, and the size of the U-Net model can be greatly compressed on the basis of not reducing the U-Net performance.

For example, the following steps are carried out: in the absence of the Bottleneeck structure, the parameters of the proto-neural network model are: k c, where k is the size of the convolution kernel and c is the number of input channels and the number of output channels.

After using the Bottleneck structure, the parameter quantities of the neural network model become:

from the above formula, it can be seen that the amount of parameters of the original neural network model is reduced due to the adoption of the bottleeck structure, and the reduction amount of the model parameters has no relation with the number of input and output channels.

In the current neural network structure, k is generally 3, and in this case, if n is 2, the model parameter quantity is reduced by 3.48 times; if n is 4, the model parameter quantity is reduced by 12.34 times; if n is 8, the number of model parameters is reduced by 40.19 times.

In the example of n-4 above, after passing through the neural network model structure of this embodiment, the final effect may be to compress a 250MB original model size file to 8.8MB, thirty times more than 250MB when uncompressed.

FIG. 8A is a diagram illustrating memory allocation according to an embodiment of the invention. At present, the memory consumption for running U-Net in open source software (such as PyTorch) is large, and the demand of commercialization cannot be met. The invention develops a neural network algorithm forward propagation frame specially for U-Net, can reasonably reuse the video memory, and greatly reduces the use amount of the memory. In the embodiment of the present invention, the memory space may be a video memory space. The memory allocation method comprises the following steps: determining the maximum memory required by input and output in all the convolution layers, and distributing corresponding input memory and output memory according to the maximum memory, wherein the input memory and the output memory are exchanged in the processing process. For example, the maximum memory required for input and output in all the convolution layers is 800MB, one input memory of 800MB and an output memory of 800MB are allocated, and the input memory of 800MB and the output memory of 800MB are exchanged during the processing. Referring to fig. 8A, for each forward operation, such as four consecutive convolutions, only two memories are allocated to store Input (Input) and Output (Output), respectively, and the Input and Output are exchanged during the intermediate calculation, with the Input memory as the Output memory and the Output memory as the Input memory. Therefore, the memory can be prevented from being allocated for each forward operation, and the currently available video memory can be reused to the maximum extent.

FIG. 8B is a diagram illustrating memory allocation according to yet another embodiment of the present invention. In this alternative embodiment, additional memory may be required to temporarily hold intermediate results in some special cases, as shown in FIG. 8B. Additional memory space is needed to store the input, and this part of the additional memory space is called a workspace (workplace). When forward propagation operations such as convolution and the like are executed, a certain algorithm memory is needed and is also added to the requirement of the working space application. For large networks, such as U-Net, the algorithm calculates the maximum memory amount of the input tensor (input tensor) and the output tensor (output tensor) in all the convolutional layers in advance, and allocates the maximum memory of the two blocks as input and output. Since all tensors used for calculation of other layers in the whole network are smaller than the maximum video memory, the two video memories can be reused. For U-shaped or V-shaped neural network models, such as U-Net and V-Net, the skip-connection in the neural network models does not allocate video memory separately, but embeds the skip-connection into the allocated maximum video memory.

FIG. 9 is a schematic diagram of an exemplary neural network model described in accordance with some embodiments of the present invention.

The CNN model may include an input layer 920, a plurality of hidden layers 940, and an output layer 960. The plurality of hidden layers 940 may include one or more convolutional layers, one or more modified linear unit layers (ReLU layers), one or more pooling layers, one or more fully connected layers, or the like, or a combination thereof.

For illustrative purposes, a number of exemplary hidden layers 940 of the CNN model are shown, including convolutional layer 940-1, pooling layer 940-2, and fully-connected layer 940-N. As described in connection with the steps of fig. 4, the neural network training unit 340 may obtain image information as an input to the CNN model. The image information may be represented as a two-dimensional (2D) or three-dimensional (3D) matrix comprising a plurality of elements, e.g. pixels or voxels. Each of a plurality of elements in the matrix may have a value representing a characteristic of the element.

Convolutional layer 940-1 may include multiple cores (e.g., a, B, C, and D). The plurality of kernels may be used to extract features of the image information. In some embodiments, each of the plurality of kernels may filter a portion (e.g., a region) of the image information to produce a particular feature corresponding to the portion of the image information. The features may include low-level features (e.g., edge features, texture features), high-level features, or complex features based on kernel computations.

Pooling layer 940-2 may take as input the output of convolutional layer 940-1. Pooling layer 940-2 may include a plurality of pooling nodes (e.g., E, F, G, and H). The output of the convolutional layer 940-1 may be sampled using the plurality of pooled nodes, and thus the computational burden of data processing of the computer 100 may be reduced and the data processing speed may be increased. In some embodiments, the neural network training unit 340 may reduce the size of the matrix corresponding to the image information in the pooling layer 940-2.

Fully connected layer 940-N may include a plurality of neurons (e.g., O, P, M, and N). The plurality of neurons may be connected to a plurality of nodes from a previous layer, such as a pooling layer. In the fully connected layer 940-N, the neural network training unit 340 may determine a plurality of vectors corresponding to the plurality of neurons based on the features of the image information and further weight the plurality of vectors with a plurality of weighting coefficients.

In the output layer 960, the neural network training unit 340 may determine an output, e.g., second image information, based on the plurality of vectors and weight coefficients obtained by the fully-connected layer 940.

In some embodiments, the neural network training unit 340 may access multiple processing units, such as GPUs, in the computer 100. Multiple processing units may perform parallel processing in certain layers of the CNN model. Parallel processing may be performed in such a way that computations of different nodes in a layer of the CNN model may be distributed to two or more processing units. For example, one GPU may run computations corresponding to kernels a and B, and the other GPU(s) may run computations corresponding to kernels C and D in convolutional layer 940-1. Similarly, computations corresponding to different nodes in other types of layers in the CNN model may be performed in parallel by multiple GPUs.

The invention also provides a medical image processing device. The medical image processing apparatus includes a medical image acquisition module and a medical image processing module. The medical image acquisition module is used for acquiring a medical image to be processed. The medical image processing module is used for processing the medical image according to the trained neural network model. The neural network model comprises a plurality of neural network structure units. The neural network structure unit comprises a first convolution layer, a second convolution layer and a third convolution layer which are connected in sequence. The convolution kernels of the first convolution layer and the third convolution layer are unit convolution kernels. The number of output channels of the first convolutional layer is reduced in dimension relative to the number of input channels of the first convolutional layer. The second convolution layer performs spatial convolution on the feature image subjected to dimension reduction by the first convolution layer. The number of output channels of the third convolutional layer is upscaled to the number of input channels of the first convolutional layer. The medical image processing apparatus refers to the above description for the processing process of the medical image, which is not described herein again.

In some other embodiments of the present invention, the maximum memories required for input and output in all the convolution layers may be determined, and corresponding input memories and output memories are allocated according to the maximum memories, and the input memories and the output memories are exchanged during the processing.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing disclosure is only illustrative and not limiting of the invention. Various modifications, improvements and adaptations of the present invention may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed within the present invention and are intended to be within the spirit and scope of the exemplary embodiments of the present invention.

Also, the present invention has been described using specific terms to describe embodiments of the invention. Such as "one embodiment," "an embodiment," and/or "some embodiments" means a feature, structure, or characteristic described in connection with at least one embodiment of the invention. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some of the features, structures, or characteristics of one or more embodiments of the present invention may be combined as suitable.

Moreover, those skilled in the art will appreciate that aspects of the invention may be illustrated and described as embodied in several forms or conditions of patentability, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of the present invention may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present invention may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

A computer readable signal medium may comprise a propagated data signal with computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, and the like, or any suitable combination. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code on a computer readable signal medium may be propagated over any suitable medium, including radio, electrical cable, fiber optic cable, radio frequency signals, or the like, or any combination of the preceding.

Computer program code required for operation of various portions of the present invention may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which the elements and sequences of the process are described, the use of letters or other designations herein is not intended to limit the order of the processes and methods of the invention unless otherwise indicated by the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it should be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments of the invention. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the invention, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to suggest that the claimed subject matter requires more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the invention are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

Although the present invention has been described with reference to the present specific embodiments, it will be appreciated by those skilled in the art that the above embodiments are merely illustrative of the present invention and various equivalent changes and substitutions may be made without departing from the spirit of the invention, and therefore, it is intended that all changes and modifications to the above embodiments, which fall within the true spirit of the invention, fall within the scope of the claims of the present invention.

Claims

1. A medical image processing method, characterized in that the processing method comprises:

acquiring a medical image to be processed, wherein the medical image is a three-dimensional image;

processing the medical image according to the trained neural network model;

determining maximum memories required by input and output in all the convolution layers, distributing corresponding input memories and output memories according to the maximum memories, exchanging the input memories and the output memories in the processing process, and embedding the input and the output of the convolution layers connected in a cross-layer mode into the input memories and the output memories if the neural network model comprises the convolution layers connected in a cross-layer mode;

the neural network model comprises a plurality of neural network structure units, wherein each neural network structure unit comprises a first convolution layer, a second convolution layer and a third convolution layer which are sequentially connected, and convolution kernels of the first convolution layer and the third convolution layer are unit convolution kernels; the number of output channels of the first convolutional layer is reduced relative to the number of input channels of the first convolutional layer, the second convolutional layer performs spatial convolution on the feature image subjected to dimension reduction by the first convolutional layer, and the number of output channels of the third convolutional layer is increased to the number of input channels of the first convolutional layer.

2. The medical image processing method according to claim 1, wherein the neural network structure unit further includes a batch normalization layer and an activation layer.

3. The medical image processing method according to claim 1, wherein the neural network model is a convolutional neural network model.

4. A medical image processing method according to claim 3, wherein the convolutional neural network model is a U-shaped or V-shaped neural network model.

5. The medical image processing method of claim 1, wherein the convolution kernel size of the second convolution layer is 3-5.

6. A medical image processing apparatus, characterized in that the processing apparatus comprises:

the medical image acquisition module is used for acquiring a medical image to be processed, wherein the medical image is a three-dimensional image;

the medical image processing module is used for processing medical images according to the trained neural network model, determining maximum memories required by input and output in all the convolutional layers, and distributing corresponding input memories and output memories according to the maximum memories;

7. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, perform the method of any of claims 1-5.