WO2021017006A1 - 图像处理方法及装置、神经网络及训练方法、存储介质 - Google Patents

图像处理方法及装置、神经网络及训练方法、存储介质 Download PDF

Info

Publication number
WO2021017006A1
WO2021017006A1 PCT/CN2019/098928 CN2019098928W WO2021017006A1 WO 2021017006 A1 WO2021017006 A1 WO 2021017006A1 CN 2019098928 W CN2019098928 W CN 2019098928W WO 2021017006 A1 WO2021017006 A1 WO 2021017006A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
sub
decoding
encoding
output
Prior art date
Application number
PCT/CN2019/098928
Other languages
English (en)
French (fr)
Inventor
胡馨月
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US16/970,131 priority Critical patent/US11816870B2/en
Priority to CN201980001232.XA priority patent/CN112602114A/zh
Priority to PCT/CN2019/098928 priority patent/WO2021017006A1/zh
Publication of WO2021017006A1 publication Critical patent/WO2021017006A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present disclosure relate to an image processing method, an image processing device, a neural network, a neural network training method, and a storage medium.
  • CNN Convolutional Neural Network
  • At least one embodiment of the present disclosure provides an image processing method, including: acquiring an input image; and processing the input image using a neural network to obtain a first segmented image and a second segmented image; wherein the neural network includes Two encoding and decoding networks, the two encoding and decoding networks include a first encoding and decoding network and a second encoding and decoding network, the input of the first encoding and decoding network includes the input image; the neural network is used for the input Processing the image to obtain a first segmented image and a second segmented image includes: performing segmentation processing on the input image using the first encoding and decoding network to obtain a first output feature map and the first segmented image; Combine the first output feature map with at least one of the input image and the first segmented image to obtain the input of the second codec network; use the second codec network to perform The input of the second encoding and decoding network is segmented to obtain the second segmented image.
  • the neural network includes Two encoding and
  • each of the two encoding and decoding networks includes: an encoding element network and a decoding element network;
  • the segmentation processing of the first encoding and decoding network includes : Use the code element network of the first encoding and decoding network to encode the input image to obtain a first encoding feature map; use the decoding element network of the first encoding and decoding network to encode the first encoding feature map Performing decoding processing to obtain the output of the first codec network, the output of the first codec network includes the first segmented image;
  • the segmentation processing of the second codec network includes: using the second The encoding element network of the encoding and decoding network performs encoding processing on the input of the second encoding and decoding network to obtain a second encoding feature map; using the decoding element network of the second encoding and decoding network to perform the second encoding feature map Decoding processing to obtain the output of the second encoding and decoding network, and the
  • the coding element network includes N coding sub-networks and N-1 down-sampling layers, and the N coding sub-networks are connected in sequence, and each down-sampling layer It is used to connect two adjacent coding sub-networks, where N is an integer and N ⁇ 2;
  • the coding processing of the coding element network includes: using the i-th coding sub-network in the N coding sub-networks to perform the The input of the i coding sub-network is processed to obtain the output of the i-th coding sub-network; the i+1-th coding sub-network connecting the i-th coding sub-network and the N coding sub-networks is used
  • the down-sampling layer of the network performs down-sampling processing on the output of the i-th encoding sub-network to obtain the down-sampled output of the i-th encoding sub-network; using the i+1-th en
  • the decoding element network when N>2, includes N-1 decoding sub-networks and N-1 upsampling layers, and the N-1 Two decoding sub-networks are connected in sequence, the N-1 upsampling layers include a first upsampling layer and N-2 second upsampling layers, and the first upsampling layer is used to connect the N-1 decoding The first decoding sub-network in the sub-network and the N-th coding sub-network in the N coding sub-networks, each second upsampling layer is used to connect two adjacent decoding sub-networks; the decoding element
  • the decoding processing of the network includes: obtaining the input of the j-th decoding sub-network among the N-1 decoding sub-networks; using the j-th decoding sub-network to process the input of the j-th decoding sub-network, To obtain the output of the j-th decoding sub-network; where j is an integer and 1 ⁇
  • the size of the upsampling input of the j-th decoding sub-network is the same as the size of the output of the Nj-th encoding sub-network, where 1 ⁇ j ⁇ N-1.
  • the encoding element network further includes a second encoding sub-network
  • the decoding element network includes the first decoding sub-network
  • the decoding processing of the decoding element network includes: connecting the first decoding sub-network and the second The first up-sampling layer of the two encoding sub-networks performs up-sampling processing on the output of the second encoding sub-network to obtain the up-sampling input of the first decoding sub-network; decode the first The upsampling input of the sub-network is combined with the output of the first encoding sub-network as the input of the first decoding sub-network, wherein the size of the up-sampling input of the first decoding sub-network is the same as that of the The output size of the first encoding sub-
  • each of the N coding sub-networks and the N-1 decoding sub-networks includes: a first convolution module and a residual module; each The processing of each sub-network includes: using the first convolution module to process the input of the sub-network corresponding to the first convolution module to obtain a first intermediate output; using the residual module to process the first intermediate output; The intermediate output is subjected to residual processing to obtain the output of the sub-network.
  • the residual module includes a plurality of second convolution modules; the residual module is used to perform residual processing on the first intermediate output to obtain the
  • the output of the sub-network includes: processing the first intermediate output using the plurality of second convolution modules to obtain a second intermediate output; and combining the first intermediate output and the second intermediate output Perform residual connection and addition processing to obtain the output of the sub-network.
  • the processing of each of the first convolution module and the plurality of second convolution modules includes: convolution processing, activation processing, and batch normalization processing .
  • the input and output sizes of each decoding sub-network in the decoding element network are the same, and the input and output sizes of each encoding sub-network in the encoding element network are the same.
  • the output size is the same.
  • each encoding and decoding network further includes a fusion module; the fusion module in the first encoding and decoding network is used to process the first output feature map to Obtaining the first segmented image; using the second encoding and decoding network to perform segmentation processing on the input of the second encoding and decoding network to obtain the second segmented image includes: using the second encoding and decoding network to The input of the second encoding and decoding network is segmented to obtain a second output feature map; the fusion module in the second encoding and decoding network is used to process the second output feature map to obtain the second segmentation image.
  • the first segmented image corresponds to a first area of the input image
  • the second segmented image corresponds to a second area of the input image
  • the input The first area of the image surrounds the second area of the input image.
  • At least one embodiment of the present disclosure further provides a neural network training method, including: obtaining training input images; using the training input images to train the neural network to be trained to obtain the image processing method provided by any embodiment of the present disclosure The neural network in.
  • using the training input image to train the neural network to be trained includes: using the neural network to be trained to process the training input image to obtain the first A training segmentation image and a second training segmentation image; the first reference segmentation image and the second reference segmentation image based on the training input image, as well as the first training segmentation image and the second training segmentation image, are lost through the system
  • the function calculates the system loss value of the neural network to be trained; and corrects the parameters of the neural network to be trained based on the system loss value; wherein, the first training segmentation image and the first reference segmentation Image correspondence, and the second training segmentation image corresponds to the second reference segmentation image.
  • the system loss function includes a first segmentation loss function and a second segmentation loss function; each of the first segmentation loss function and the second segmentation loss function A segmentation loss function includes: cross loss function and similarity loss function.
  • the first segmentation loss function is expressed as:
  • L 01 represents the first segmentation loss function
  • L 11 represents the cross loss function in the first segmentation loss function
  • ⁇ 11 represents the weight of the cross loss function in the first segmentation loss function
  • L 21 represents the similarity loss function in the first segmentation loss function
  • ⁇ 12 represents the weight of the similarity loss function in the first segmentation loss function
  • the cross loss function L 11 in the first segmentation loss function is expressed as:
  • the similarity loss function L 21 in the first segmentation loss function is expressed as:
  • x m1n1 represents the value of the pixel located in the m1 row and n1 column in the first training segmented image
  • y m1n1 represents the value of the pixel located in the m1 row and n1 column in the first reference segmented image
  • the second segmentation loss function is expressed as:
  • L 02 represents the second segmentation loss function
  • L 12 represents the cross loss function in the second segmentation loss function
  • ⁇ 21 represents the weight of the cross loss function in the second segmentation loss function
  • L 22 represents the similarity loss function in the second segmentation loss function
  • ⁇ 22 represents the weight of the similarity loss function in the second segmentation loss function
  • the cross loss function L 12 in the second split loss function is expressed as:
  • the similarity loss function L 22 in the second segmentation loss function is expressed as:
  • x m2n2 represents the value of the pixel located in the m2 row and n2 column in the second training segmented image
  • y m2n2 represents the value of the pixel located in the m2 row and n2 column in the second reference segmented image
  • the system loss function is expressed as:
  • L 01 and L 02 represent the first segmentation loss function and the second segmentation loss function, respectively
  • ⁇ 01 and ⁇ 02 represent the first segmentation loss function and the first segmentation loss function in the system loss function, respectively.
  • the weight of the two-division loss function is the weight of the two-division loss function.
  • obtaining the training input image includes: obtaining an original training input image; and performing preprocessing and data enhancement processing on the original training input image to obtain the Training input image.
  • At least one embodiment of the present disclosure further provides an image processing device, including: a memory for storing non-transitory computer readable instructions; and a processor for running the computer readable instructions, and the computer readable instructions are The processor executes the image processing method provided by any embodiment of the present disclosure or executes the training method provided by any embodiment of the present disclosure when the processor is running.
  • At least one embodiment of the present disclosure further provides a storage medium for non-transitory storage of computer-readable instructions.
  • the image processing method provided by any embodiment of the present disclosure can be executed
  • the instruction or the instruction of the training method provided by any embodiment of the present disclosure can be executed.
  • At least one embodiment of the present disclosure also provides a neural network, including: two encoding and decoding networks and a joint layer, the two encoding and decoding networks include a first encoding and decoding network and a second encoding and decoding network; wherein, the first The coding network is configured to perform segmentation processing on the input image to obtain a first output feature map and a first segmented image; the joint layer is configured to combine the first output feature map with the input image and the first At least one of the divided images is combined to obtain the input of the second codec network; the second codec network is configured to perform division processing on the input of the second codec network to obtain the second Split the image.
  • each of the two encoding and decoding networks includes an encoding element network and a decoding element network; the encoding element network of the first encoding and decoding network is configured In order to perform encoding processing on the input image to obtain a first encoding feature map; the decoding element network of the first encoding and decoding network is configured to perform decoding processing on the first encoding feature map to obtain the first encoding feature map; The output of the encoding and decoding network, the output of the first encoding and decoding network includes the first segmented image; the code element network of the second encoding and decoding network is configured to encode the input of the second encoding and decoding network , To obtain a second encoding feature map; the decoding element network of the second encoding and decoding network is configured to decode the second encoding feature map to obtain the output of the second encoding and decoding network, the first The output of the second codec network includes the second segment
  • the coding element network includes N coding sub-networks and N-1 down-sampling layers.
  • the N coding sub-networks are connected in sequence, and each down-sampling layer uses When connecting two adjacent coding sub-networks, N is an integer and N ⁇ 2; the i-th coding sub-network of the N coding sub-networks is configured to process the input of the i-th coding sub-network , To obtain the output of the i-th coding sub-network; the down-sampling layer connecting the i-th coding sub-network and the i+1-th coding sub-network of the N coding sub-networks is configured to The output of the i-th encoding sub-network is subjected to down-sampling processing to obtain the down-sampled output of the i-th encoding sub-network; the i+1-th encoding sub-network is configured to perform the down-sampling
  • the down-sampling output of the network is processed to obtain the output of the i+1th coding sub-network; where i is an integer and 1 ⁇ i ⁇ N-1, the first code in the N coding sub-networks
  • the input of the sub-network includes the input of the first encoding and decoding network or the second encoding and decoding network.
  • the input of the i+1-th encoding sub-network includes the The down-sampled output of the i-th coding sub-network, and the first coding feature map or the second coding feature map includes the output of the N coding sub-networks.
  • the decoding element network includes N-1 decoding sub-networks, N-1 up-sampling layers, and N-1
  • the decoding sub-networks are connected in sequence
  • the N-1 upsampling layers include a first upsampling layer and N-2 second upsampling layers
  • the first upsampling layer is used to connect the N-1 decoding sub-networks
  • the first decoding sub-network in the network and the N-th encoding sub-network in the N encoding sub-networks, each second upsampling layer is used to connect two adjacent decoding sub-networks;
  • each encoding and decoding network It also includes N-1 sub-joint layers corresponding to the N-1 decoding sub-networks of the decoding element network;
  • the j-th decoding sub-network in the N-1 decoding sub-networks is configured to The input of the decoding sub-network is processed to obtain the output of the j-th decoding sub-network
  • the size of the upsampling input of the j-th decoding sub-network is the same as the size of the output of the Nj-th encoding sub-network, where 1 ⁇ j ⁇ N -1.
  • the coding element network further includes a second coding sub-network
  • the decoding element network includes a first decoding sub-network and a connection
  • the first upsampling layer of the first decoding sub-network and the second encoding sub-network, each encoding and decoding network further includes a first sub-network corresponding to the first decoding sub-network of the decoding element network Joint layer;
  • the first up-sampling layer connecting the first decoding sub-network and the second encoding sub-network is configured to perform up-sampling processing on the output of the second encoding sub-network to obtain
  • the first sub-joining layer is configured to combine the up-sampling input of the first decoding sub-network with the output of the first coding sub-network , As the input of the first decoding sub-network, wherein the size of the up
  • each of the N coding sub-networks and the N-1 decoding sub-networks includes: a first convolution module and a residual module;
  • the first convolution module is configured to process the input of the sub-network corresponding to the first convolution module to obtain a first intermediate output;
  • the residual module is configured to perform residual on the first intermediate output Difference processing to obtain the output of the sub-network.
  • the residual module includes a plurality of second convolution modules and a residual addition layer; the plurality of second convolution modules are configured to An intermediate output is processed to obtain a second intermediate output; the residual addition layer is configured to perform residual connection and addition processing on the first intermediate output and the second intermediate output to obtain the sub The output of the network.
  • each of the first convolution module and the plurality of second convolution modules includes: a convolution layer, an activation layer, and a batch normalization layer;
  • the convolution layer is configured to perform convolution processing
  • the activation layer is configured to perform activation processing
  • the batch normalization layer is configured to perform batch normalization processing.
  • the size of the input and output of each decoding sub-network in the decoding element network is the same, and the input and output of each encoding sub-network in the encoding element network The dimensions are the same.
  • each codec network further includes a fusion module; the fusion module in the first codec network is configured to process the first output feature map to Obtain the first segmented image; the second codec network is configured to perform segmentation processing on the input of the second codec network to obtain the second segmented image, including: the second codec network Is configured to perform segmentation processing on the input of the second encoding and decoding network to obtain a second output feature map; the fusion module in the second encoding and decoding network is configured to process the second output feature map, To obtain the second segmented image.
  • FIG. 1 is a flowchart of an image processing method provided by some embodiments of the present disclosure
  • FIG. 2 is a schematic structural block diagram of a neural network corresponding to the image processing method shown in FIG. 1 provided by some embodiments of the present disclosure
  • FIG. 3 is a schematic structural block diagram of another neural network corresponding to the image processing method shown in FIG. 1 provided by some embodiments of the present disclosure
  • FIG. 4 is an exemplary flowchart corresponding to step S200 in the image processing method shown in FIG. 1 according to some embodiments of the present disclosure
  • FIG. 5 is a schematic diagram of a first area and a second area in an input image provided by some embodiments of the present disclosure
  • Fig. 6 is a flowchart of a neural network training method provided by some embodiments of the present disclosure.
  • FIG. 7 is an exemplary flowchart corresponding to step S400 in the training method shown in FIG. 6 according to some embodiments of the present disclosure
  • FIG. 8 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a storage medium provided by an embodiment of the disclosure.
  • Image segmentation is a research hotspot in the field of image processing.
  • Image segmentation is a technique that divides an image into several specific regions with unique properties and extracts objects of interest.
  • Medical image segmentation is an important application field of image segmentation. Medical image segmentation refers to extracting the region or boundary of the tissue of interest from the medical image, so that the extracted tissue can be clearly distinguished from other tissues. Medical image segmentation is of great significance to the quantitative analysis of tissues, the formulation of surgical plans and computer-aided diagnosis.
  • deep learning neural networks can be used for medical image segmentation, which can improve the accuracy of image segmentation, reduce the time to extract features, and improve computational efficiency. Medical image segmentation can be used to extract regions of interest to facilitate the analysis and recognition of medical images.
  • the convolutional layer, down-sampling layer, and up-sampling layer each refer to the corresponding processing operation, that is, convolution processing, down-sampling processing, up-sampling processing, etc., as described
  • the modules, subnets, etc. also refer to the corresponding processing operations, and the description will not be repeated below.
  • At least one embodiment of the present disclosure provides an image processing method.
  • the image processing method includes: acquiring an input image; and using a neural network to process the input image to obtain a first segmented image and a second segmented image.
  • the neural network includes two encoding and decoding networks.
  • the two encoding and decoding networks include a first encoding and decoding network and a second encoding and decoding network.
  • the input of the first encoding and decoding network includes an input image.
  • Using a neural network to process the input image to obtain the first segmented image and the second segmented image includes: using the first codec network to segment the input image to obtain the first output feature map and the first segmented image; The first output feature map is combined with at least one of the input image and the first segmented image to obtain the input of the second encoding and decoding network; the second encoding and decoding network is used to segment the input of the second encoding and decoding network to obtain the first Two split image.
  • Some embodiments of the present disclosure also provide image processing devices, neural networks, neural network training methods, and storage media corresponding to the above-mentioned image processing methods.
  • the image processing method provided by the embodiments of the present disclosure obtains the first segmented image first, and then obtains the second segmented image based on the first segmented image, which can improve robustness, has higher generalization and accuracy, and is resistant to different light sources.
  • the images acquired by the environment and imaging equipment have more stable segmentation results; at the same time, the end-to-end convolutional neural network model can be used to reduce manual operations.
  • Fig. 1 is a flowchart of an image processing method provided by some embodiments of the present disclosure.
  • the image processing method includes step S100 and step S200.
  • Step S100 Obtain an input image
  • Step S200 Use a neural network to process the input image to obtain a first segmented image and a second segmented image.
  • the input image may be various types of images, for example, including but not limited to medical images.
  • medical images may include ultrasound images, X-ray computed tomography (CT), MRI (Magnetic Resonance Imaging, MRI) images, and Digital Subtraction Angiography (DSA) And Positron Emission Computed Tomography PET, etc.
  • CT computed tomography
  • MRI Magnetic Resonance Imaging
  • DSA Digital Subtraction Angiography
  • medical images may include brain tissue MRI images, spinal cord MRI images, fundus images, blood vessel images, pancreas CT images, lung CT images, and so on.
  • the input image can be acquired by an image acquisition device.
  • the image acquisition device may include, for example, ultrasound equipment, X-ray equipment, nuclear magnetic resonance equipment, nuclear medicine equipment, medical optical equipment, and thermal imaging equipment, which are not limited in the embodiments of the present disclosure.
  • the input image can also be a person image, an image of animals and plants or a landscape image, etc.
  • the input image can also be through the camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a surveillance camera or a network Image acquisition devices such as cameras.
  • the input image can be a grayscale image or a color image.
  • the size of the input image can be set according to implementation needs, which is not limited in the embodiment of the present disclosure.
  • the input image may be an original image directly collected by an image collecting device, or an image obtained after preprocessing the original image.
  • the image processing method provided by the embodiment of the present disclosure may further include an operation of preprocessing the input image. Preprocessing can eliminate irrelevant information or noise information in the input image, so as to better segment the input image.
  • a neural network is used to segment the input image, that is, the shape of an object (for example, an organ or tissue) is segmented from the input image to obtain a corresponding segmented image.
  • the first segmented image may correspond to the first region of the input image, for example, the first segmented image Corresponding to an organ or tissue in the medical image (for example, the optic disc in the fundus image, the lung in the lung CT image, etc.); the second segmented image may correspond to the second area of the input image, for example, the first area of the input image
  • the second region surrounding the input image, for example, the second segmented image corresponds to a structure or lesion in the aforementioned organ or tissue (for example, the optic cup in the fundus image, the lung nodule in the lung CT image, etc.).
  • the first segmented image and the second segmented image can be used for medical diagnosis, for example, can be used for glaucoma (based on segmentation of optic disc and cup), early lung cancer (based on segmentation of lung and lung nodules), etc. diagnosis.
  • FIG. 2 is a schematic structural block diagram of a neural network corresponding to the image processing method shown in FIG. 1 provided by some embodiments of the present disclosure
  • FIG. 3 is another one provided by some embodiments of the present disclosure corresponding to the one shown in FIG.
  • the schematic architecture block diagram of the neural network in the image processing method shown in FIG. 4 is an exemplary flowchart corresponding to step S200 in the image processing method shown in FIG. 1 provided by some embodiments of the present disclosure.
  • step S200 in the image processing method shown in FIG. 1 will be described in detail with reference to FIGS. 2, 3 and 4.
  • the neural network in the image processing method provided by the embodiments of the present disclosure may include two encoding and decoding networks, and the two encoding and decoding networks include a first encoding and decoding network UN1 and a second encoding and decoding network.
  • Encoding and decoding network UN2 may be U-nets, which are not limited in the embodiment of the present disclosure.
  • the input of the first codec network UN1 includes an input image.
  • a neural network is used to process the input image to obtain the first segmented image and the second segmented image, that is, step S200 includes step S210 to step S230.
  • Step S210 Use the first codec network to perform segmentation processing on the input image to obtain a first output feature map and a first segmented image.
  • the first encoding and decoding network UN1 includes an encoding element network LN1 and a decoding element network RN1.
  • the segmentation processing of the first encoding and decoding network UN1 includes: using the code element network LN1 of the first encoding and decoding network UN1 to encode the input image (that is, the input of the first encoding and decoding network) to obtain the first encoding feature map F1: Use the decoding element network RN1 of the first encoding and decoding network UN1 to decode the first encoding feature map F1 to obtain the output of the first encoding and decoding network UN1.
  • the output of the first encoding and decoding network UN1 includes the first segmented image; for example, as shown in Figures 2 and 3, the output of the first encoding and decoding network UN1 may also include the first output.
  • Feature map F01, the first output feature map F01 can be used in the processing of the second encoding and decoding network UN2.
  • the coding element network LN1 may include N coding sub-networks SLN1 and N-1 down-sampling layers DS, where N is an integer and N ⁇ 2.
  • the N coding sub-networks SLN1 are connected in turn, and each down-sampling layer DS is used to connect two adjacent coding sub-networks SLN1, that is, any two adjacent coding sub-networks SLN1 pass a corresponding down-sampling Layer DS connection.
  • the code element network LN1 of the first codec network UN1 includes the first A coding sub-network, a second coding sub-network, a third coding sub-network and a fourth coding sub-network; as shown in Figure 3, in the coding element network LN1 of the first coding and decoding network UN1, from top to bottom , The coding element network LN1 includes the first coding sub-network and the second coding sub-network in sequence.
  • the down-sampling layer is used for down-sampling processing.
  • the down-sampling layer can be used to reduce the scale of the input image, simplify the calculation complexity, and reduce over-fitting to a certain extent; on the other hand, the down-sampling layer can also perform feature compression to extract the input image Main features.
  • the down-sampling layer can reduce the size of feature images, but does not change the number of feature images. For example, downsampling is used to reduce the size of the feature image, thereby reducing the data volume of the feature map.
  • the down-sampling layer can use max pooling, average pooling, strided convolution, decimation, such as selecting fixed pixels, and demultiplexing output (demuxout, Split the input image into multiple smaller images) and other down-sampling methods to achieve down-sampling processing.
  • the coding process of the coding element network LN1 includes: using the i-th coding sub-network in the N coding sub-networks SLN1 to process the input of the i-th coding sub-network to obtain the The output of the i coding sub-network; the down-sampling layer DS connecting the i-th coding sub-network and the i+1-th coding sub-network in the N coding sub-networks SLN1 is used to down-sample the output of the i-th coding sub-network Process to obtain the down-sampled output of the i-th coding sub-network; use the i+1-th coding sub-network to process the down-sampled output of the i-th coding sub-network to obtain the output of the i+1-th coding sub-network ; Where i is an integer and 1 ⁇ i ⁇ N-1, the input of the first encoding subnetwork in the
  • each coding sub-network SLN1 the input and output sizes of each coding sub-network SLN1 are the same.
  • the decoding element network RN1 includes N-1 decoding sub-networks SRN1 and N-1 upsampling layers.
  • the decoding element network RN1 of the first encoding and decoding network UN1 from bottom to top, the decoding element network RN1 includes the first decoding sub-network, the second decoding sub-network and the third decoding sub-network in turn
  • the encoding element network RN1 in the decoding element network RN1 of the first encoding and decoding network UN1, the encoding element network RN1 includes the first decoding sub-network.
  • the up-sampling layer is used for up-sampling processing.
  • the up-sampling process is used to increase the size of the feature image, thereby increasing the data volume of the feature map.
  • the up-sampling layer can adopt up-sampling methods such as strided transposed convolution and interpolation algorithms to implement up-sampling processing.
  • the interpolation algorithm may include, for example, interpolation, bilinear interpolation, and bicubic interpolation (Bicubic Interprolation).
  • N-1 decoding sub-networks SRN1 are connected in sequence
  • N-1 up-sampling layers include the first up-sampling layer US1 and N-2 second up-sampling layers.
  • Layer US2 the first up-sampling layer US1 is used to connect the first decoding sub-network in N-1 decoding sub-networks SRN1 and the N-th coding sub-network in N coding sub-networks SLN1, each second up-sampling
  • the layer US2 is used to connect two adjacent decoding sub-networks, that is, any two adjacent decoding sub-networks SRN1 are connected through a corresponding second upsampling layer US2.
  • the decoding process of the decoding element network RN1 includes: obtaining the input of the jth decoding sub-network in the N-1 decoding sub-networks SRN1; using the j-th decoding The sub-network processes the input of the j-th decoding sub-network to obtain the output of the j-th decoding sub-network; where j is an integer and 1 ⁇ j ⁇ N-1, and the output of the first encoding and decoding network UN1 includes N- The output of the N-1th decoding subnetwork in 1 decoding subnetwork SRN1.
  • the N-1 decoding subnetwork in the N-1 decoding subnetworks SRN1 (the third decoding subnetwork in the example shown in Figure 2) The output of is the first output feature map F01.
  • obtaining the input of the j-th decoding sub-network in the N-1 decoding sub-networks includes: connecting the j-th decoding sub-network and the j-th decoding sub-network in the N-1 decoding sub-networks SRN1
  • the second upsampling layer US2 of the j-1 decoding sub-network performs up-sampling processing on the output of the j-1 decoding sub-network to obtain the up-sampling input of the j-th decoding sub-network;
  • the j-th decoding sub-network is The up-sampling input of is combined with the output of the Nj-th encoding sub-network in the N encoding sub-networks SRN1, as the input of the j-th decoding sub-network.
  • the size of the up-sampling input of the j-th decoding sub-network is the same as the size of the output of the N-j-th coding sub-network in the N coding sub-networks SLN1, where 1 ⁇ j ⁇ N-1.
  • the j-th decoder For example, taking the up-sampling input of the j-th decoding sub-network and the output of the Nj-th coding sub-network in the N coding sub-networks SLN1 including a matrix with H rows and W columns as an example, the j-th decoder
  • the number of feature maps included in the up-sampling input of the network is C1
  • the number of feature maps included in the output of the Nj-th encoding sub-network in the N encoding sub-network SLN1 is C2
  • the feature map models of the output of the Nj-th coding sub-network in the and N coding sub-networks SLN1 are (C1, H, W) and (C2, H, W), respectively.
  • the up-sampled input of the j-th decoding sub-network is combined with the output of the Nj-th coding sub-network in the N coding sub-networks SRN1, and the feature map model of the input of the j-th decoding sub-network is obtained as (C1 +C2, H, W).
  • the number of feature maps included in the input of the jth decoding sub-network is C1+C2, and the present disclosure does not limit the arrangement order of the feature maps in the input feature map model of the jth decoding sub-network. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • connection may mean that two functional objects (for example, sub-network, down-sampling layer, up-sampling layer, etc.) are connected in the direction of signal (for example, feature map) transmission.
  • the output of one functional object in front of it is used as the input of another functional object in the back.
  • the coding element network LN1 includes a first coding sub-network, a second coding sub-network, and a connection between the first coding sub-network and the second coding sub-network.
  • the down-sampling layer DS, the decoding element network RN1 includes a first decoding sub-network, and a first up-sampling layer US1 connecting the first decoding sub-network and the second encoding sub-network. Therefore, as shown in FIG.
  • the decoding process of the decoding element network RN1 includes: using the first up-sampling layer US1 connecting the first decoding sub-network and the second encoding sub-network to pair the second The output of the encoding sub-network is up-sampled to obtain the up-sampling input of the first decoding sub-network; the up-sampling input of the first decoding sub-network is combined with the output of the first encoding sub-network as the first The input of a decoding sub-network, where the size of the up-sampled input of the first decoding sub-network is the same as the output of the first encoding sub-network; the first decoding sub-network is used to perform the input of the first decoding sub-network Processing to obtain the output of the first decoding sub-network; wherein, the output of the first encoding and decoding network UN1 includes the output of the first decoding sub-network.
  • the output of the first encoding and decoding network UN1 includes the
  • the number of downsampling layers in the coding element network LN1 is equal to the number of upsampling layers in the decoding element network RN1.
  • the first down-sampling layer in the encoding element network LN1 and the last-to-last up-sampling layer in the decoding element network RN1 are located at the same level, and the second down-sampling layer and the decoding element in the encoding element network LN1
  • the penultimate upsampling layer in the network RN1 is located at the same level, ...
  • the last downsampling layer in the coding element network LN1 and the first upsampling layer in the decoding element network RN1 are located at the same level.
  • the downsampling layer used to connect the first encoding sub-network and the second encoding sub-network is connected to the upper sampling layer used to connect the second decoding sub-network and the third decoding sub-network.
  • the sampling layer is at the same level, and the downsampling layer used to connect the second encoding sub-network and the third encoding sub-network is at the same level as the upsampling layer used to connect the first decoding sub-network and the second decoding sub-network ,
  • the down-sampling layer used to connect the third coding sub-network and the fourth coding sub-network is at the same level as the up-sampling layer used to connect the first decoding sub-network and the fourth coding sub-network.
  • the down-sampling factor of the down-sampling layer for example, correspondingly, the down-sampling factor of 2 ⁇ 2
  • the up-sampling factor of the up-sampling layer for example, correspondingly, 2 ⁇ 2 upsampling factor
  • the down-sampling factor of the downsampling layer is 1/y
  • the upsampling factor of the upsampling layer is y, where y is a positive integer, and y is usually greater than Equal to 2.
  • the size of the upsampling input of the j-th decoding sub-network can be made the same as the output size of the Nj-th coding sub-network in the N coding sub-networks SLN1, where N is an integer and N ⁇ 2, and j is an integer And 1 ⁇ j ⁇ N-1.
  • each of the N coding sub-networks SLN1 in the coding element network LN1 and the N-1 decoding sub-networks SRN1 in the decoding element network RN1 may include a first convolution module CN1 and Residual error module RES.
  • the processing of each sub-network includes: using the first convolution module CN1 to process the input of the sub-network corresponding to the first convolution module CN1 to obtain the first intermediate output; using the residual The difference module RES performs residual processing on the first intermediate output to obtain the output of the sub-network.
  • the residual module RES may include a plurality of second convolution modules CN2, for example, the number of second convolution modules CN2 included in each residual module RES may be 2, but the present disclosure does not Limited to this.
  • using the residual module RES to perform residual processing on the first intermediate output to obtain the output of the sub-network includes: using multiple second convolution modules CN2 to perform the residual processing on the first intermediate output Perform processing to obtain the second intermediate output; and perform residual connection and addition processing on the first intermediate output and the second intermediate output (as shown by ADD in the figure) to obtain the output of the residual module RES, that is, the sub The output of the network.
  • the output of each coding sub-network is the first coding feature map F1.
  • the size of the first intermediate output is the same as the size of the second intermediate output, so that after the residual connection is added, the size of the output of the residual module RES (that is, the output of the corresponding sub-network) is the same as that of the residual module RES.
  • the input that is, the corresponding first intermediate output
  • each of the aforementioned first convolution module CN1 and second convolution module CN2 may include a convolution layer, an activation layer, and a batch normalization layer (Batch Normalization Layer), so that each The processing of a convolution module can include: convolution processing, activation processing and batch normalization processing.
  • the convolutional layer is the core layer of the convolutional neural network.
  • the convolutional layer can apply several convolution kernels (also called filters) to its input (for example, input image) to extract multiple types of features of the input.
  • the convolutional layer may include a 3 ⁇ 3 convolution kernel.
  • the convolutional layer can include multiple convolution kernels, and each convolution kernel can extract one type of feature.
  • the convolution kernel is generally initialized in the form of a random decimal matrix. During the training process of the convolutional neural network, the convolution kernel will learn to obtain reasonable weights.
  • the result obtained after applying a convolution kernel to the input image is called a feature map, and the number of feature maps is equal to the number of convolution kernels.
  • Each feature map is composed of some neurons arranged in a rectangle.
  • the neurons of the same feature map share weights, and the shared weights here are the convolution kernels.
  • the feature image output by the convolutional layer of one level can be input to the convolutional layer of the next adjacent level and processed again to obtain a new feature image.
  • the activation layer includes an activation function, and the activation function is used to introduce nonlinear factors to the convolutional neural network, so that the convolutional neural network can better solve more complex problems.
  • the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), or a hyperbolic tangent function (tanh function).
  • the ReLU function is an unsaturated nonlinear function, and the Sigmoid function and tanh function are saturated nonlinear functions.
  • the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer.
  • the batch normalization layer is used to perform batch normalization processing on the feature image, so that the gray value of the pixel of the feature image changes within a predetermined range, thereby reducing the difficulty of calculation and improving the contrast.
  • the predetermined range may be [-1, 1].
  • the processing method of the batch standardization layer can refer to the common batch standardization process, which will not be repeated here.
  • the input and output sizes of the first convolution module CN1 are the same, so that the input and output sizes of each encoding sub-network in the encoding element network LN1 are the same, and each of the decoding element network RN1 has the same size.
  • the input and output sizes of the decoding sub-network are the same.
  • the first codec network UN1 may also include a fusion module MG.
  • the fusion module MG in the first codec network UN1 is used to process the first output feature map F01 to obtain the first segmented image.
  • the fusion module MG in the first encoding and decoding network UN1 may use a 1 ⁇ 1 convolution kernel to process the first output feature map F01 to obtain the first segmented image; it should be noted that the Examples include but are not limited to this.
  • Step S220 Combine the first output feature map with at least one of the input image and the first segmented image to obtain the input of the second codec network.
  • the size of the first output feature map F01 is the same as the size of the input image.
  • the sampling input and the output of the Nj-th coding sub-network in the N coding sub-networks SRN1 are jointly described, which will not be repeated here.
  • Step S230 Use the second codec network to perform segmentation processing on the input of the second codec network to obtain a second segmented image.
  • the second encoding and decoding network UN2 includes an encoding element network LN2 and a decoding element network RN2.
  • the segmentation processing of the second encoding and decoding network UN2 includes: using the code element network LN2 of the second encoding and decoding network UN2 to encode the input of the second encoding and decoding network to obtain the second encoding feature map F2; using the second encoding feature map F2; The decoding element network RN2 of the encoding and decoding network UN2 performs decoding processing on the second encoding feature map F2 to obtain the output of the second encoding and decoding network UN2.
  • the second coding feature map F2 includes the output of the N coding sub-networks SLN1 in the coding element network LN2.
  • the output of the second encoding and decoding network UN2 may include the second segmented image.
  • the structure and processing of the code element network LN2 and the decoding element network RN2 of the second encoding and decoding network UN2 can respectively refer to the aforementioned encoding element networks LN1 and LN1 and the decoding element network of the first encoding and decoding network UN1.
  • the description of the structure and processing procedure of the decoding element network RN1 will not be repeated here.
  • Figures 2 and 3 both show that the second codec network UN2 and the first codec network UN1 have the same structure (that is, the same number of coding sub-networks and the same number of decoding sub-networks are included)
  • the embodiments of the present disclosure are not limited to this. That is to say, the second encoding and decoding network UN2 may also have a similar structure to the first encoding and decoding network UN1, but the number of encoding sub-networks included in the second encoding and decoding network UN2 is equal to that of the first encoding and decoding network UN1. The quantity can be different.
  • the second codec network UN2 may also include a fusion module MG.
  • using the second codec network UN2 to perform segmentation processing on the input of the second codec network UN2 to obtain a second segmented image including: using the second codec network UN2 to segment the input of the second codec network UN2 , To obtain the second output feature map F02; use the fusion module MG in the second encoding and decoding network UN2 to process the second output feature map F02 to obtain the second segmented image.
  • the fusion module MG in the second encoding and decoding network UN2 is used to process the second output feature map F02 to obtain the second segmented image.
  • the fusion module MG in the second encoding and decoding network UN2 may use a 1 ⁇ 1 convolution kernel to process the second output feature map F02 to obtain the second segmented image; it should be noted that the Examples include but are not limited to this.
  • the first divided image corresponds to a first area of the input image
  • the second divided image corresponds to a second area of the input image.
  • FIG. 5 is a schematic diagram of a first area and a second area in an input image provided by some embodiments of the present disclosure.
  • the first region R1 of the input image surrounds the second region R2 of the input image, that is, the second region R2 is located in the first region R1.
  • the first segmented image and the second segmented image can be used for medical diagnosis, etc., for example, can be used for glaucoma (based on the segmentation of the optic disc and the optic cup, where the first area corresponds to the optic disc and the second area corresponds to Screening and diagnosis of early lung cancer (based on the segmentation of lungs and lung nodules, where the first area corresponds to the lung and the second area corresponds to the lung nodules), etc.
  • the area ratio of the optic cup/optic disc ie, the cup-to-disk ratio
  • the screen can be screened according to the relative size of the area ratio and the preset threshold. Check and diagnose, not repeat them here. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • first region R1 and the second region R2 in the input image shown in FIG. 5 are illustrative, and the embodiment of the present disclosure does not limit this.
  • first area in the input image may include a connected area (as shown in FIG. 5)
  • second area in the input image may include a connected area (as shown in FIG. 5).
  • the first area in the input image can also include multiple separate first sub-areas
  • the second area in the input image can include a connected area (located in a first Within one sub-region), it may also include multiple separate second sub-regions (located in one first sub-region or in several separate first sub-regions).
  • the second area is located in the first area, which may include the case where the edge of the second area does not overlap with the edge of the first area, or at least part of the edge of the second area and the edge of the first area. In the case of overlap, the embodiment of the present disclosure does not limit this.
  • the same or similar functional objects may have the same or similar structure or processing process, but the parameters corresponding to the same or similar functional objects may be the same. It can also be different. The embodiment of the present disclosure does not limit this.
  • the robustness can be improved, and the generalization and accuracy can be improved.
  • the images acquired by the light environment and imaging equipment have more stable segmentation results; at the same time, the end-to-end convolutional neural network model can be used to reduce manual operations.
  • At least one embodiment of the present disclosure also provides a neural network, which can be used to execute the image processing method provided in the foregoing embodiment.
  • the structure of the neural network can refer to the structure of the neural network shown in FIG. 2 or FIG. 3.
  • the neural network provided by the embodiment of the present disclosure includes two encoding and decoding networks, and the two encoding and decoding networks include a first encoding and decoding network UN1 and a second encoding and decoding network UN2; the neural network also It includes a joint layer (as shown in CONCAT for connecting the first codec network UN1 and the second codec network UN2 in Fig. 2 and Fig. 3).
  • CONCAT joint layer
  • both the first codec network UN1 and the second codec network UN2 may be U-nets, which are not limited in the embodiment of the present disclosure.
  • the input of the first codec network UN1 includes an input image.
  • the neural network is configured to process the input image to obtain the first segmented image and the second segmented image.
  • the first encoding network UN1 is configured to perform segmentation processing on the input image to obtain the first output feature map F01 and the first segmented image.
  • the first encoding and decoding network UN1 includes an encoding element network LN1 and a decoding element network RN1.
  • the encoding element network LN1 of the first encoding and decoding network UN1 is configured to perform encoding processing on the input image (that is, the input of the first encoding and decoding network) to obtain the first encoding feature map F1;
  • the decoding element network of the first encoding and decoding network UN1 RN1 is configured to perform decoding processing on the first encoding feature map F1 to obtain the output of the first encoding and decoding network UN1.
  • the output of the first encoding and decoding network UN1 includes the first segmented image; for example, as shown in Figures 2 and 3, the output of the first encoding and decoding network UN1 may also include the first output.
  • Feature map F01, the first output feature map F01 can be used in the processing of the second encoding and decoding network UN2.
  • the coding element network LN1 may include N coding sub-networks SLN1 and N-1 down-sampling layers DS, where N is an integer and N ⁇ 2.
  • the N coding sub-networks SLN1 are connected in turn, and each down-sampling layer DS is used to connect two adjacent coding sub-networks SLN1, that is, any two adjacent coding sub-networks SLN1 pass a corresponding down-sampling Layer DS connection.
  • the code element network LN1 of the first codec network UN1 includes the first A coding sub-network, a second coding sub-network, a third coding sub-network and a fourth coding sub-network; as shown in Figure 3, in the coding element network LN1 of the first coding and decoding network UN1, from top to bottom , The coding element network LN1 includes the first coding sub-network and the second coding sub-network in sequence.
  • the i-th coding sub-network of the N coding sub-networks SLN1 is configured to process the input of the i-th coding sub-network to obtain the output of the i-th coding sub-network ;
  • the down-sampling layer DS connecting the i-th coding sub-network and the i+1-th coding sub-network in the N coding sub-networks SLN1 is configured to down-sample the output of the i-th coding sub-network to obtain the The down-sampled output of the i coding sub-network;
  • the i+1-th coding sub-network is configured to process the down-sampled output of the i-th coding sub-network to obtain the output of the i+1-th coding sub-network;
  • i is an integer and 1 ⁇ i ⁇ N-1
  • the input of the first coding subnetwork in the N coding subnetworks SLN1 includes the input of
  • each coding sub-network SLN1 the input and output sizes of each coding sub-network SLN1 are the same.
  • the decoding element network RN1 includes N-1 decoding sub-networks SRN1 and N-1 upsampling layers.
  • the decoding element network RN1 of the first encoding and decoding network UN1 from bottom to top, the decoding element network RN1 includes the first decoding sub-network, the second decoding sub-network and the third decoding sub-network in turn
  • the encoding element network RN1 in the decoding element network RN1 of the first encoding and decoding network UN1, the encoding element network RN1 includes the first decoding sub-network.
  • N-1 decoding sub-networks SRN1 are connected in sequence
  • N-1 up-sampling layers include the first up-sampling layer US1 and N-2 second up-sampling layers.
  • Layer US2 the first up-sampling layer US1 is used to connect the first decoding sub-network in N-1 decoding sub-networks SRN1 and the N-th coding sub-network in N coding sub-networks SLN1, each second up-sampling
  • the layer US2 is used to connect two adjacent decoding sub-networks, that is, any two adjacent decoding sub-networks SRN1 are connected through a corresponding second upsampling layer US2.
  • the first encoding and decoding network UN1 also includes N-1 sub-joint layers corresponding to the N-1 decoding sub-networks SRN1 of the decoding element network RN1 (such as the CONCAT in the decoding element network RN1 in Figure 2). Shown).
  • the j-th decoding sub-network in the N-1 decoding sub-networks SRN1 is configured to process the input of the j-th decoding sub-network to obtain the j-th decoding sub-network.
  • the output of the network, where j is an integer and 1 ⁇ j ⁇ N-1, and the output of the first encoding and decoding network UN1 includes the output of the N-1th decoding sub-network of the N-1 decoding sub-networks SRN1.
  • the N-1 decoding subnetwork in the N-1 decoding subnetworks SRN1 (the third decoding subnetwork in the example shown in Figure 2)
  • the output of is the first output feature map F01.
  • the first up-sampling layer US1 is configured to up-sampling the output of the N-th encoding sub-network to obtain the up-sampling input of the first decoding sub-network; connect N-1 decoding
  • the j-th decoding sub-network and the second up-sampling layer US2 of the j-1-th decoding sub-network in the sub-network SRN1 are configured to perform up-sampling processing on the output of the j-1-th decoding sub-network to obtain the The up-sampled input of the j-th decoding sub-network, where j is an integer and 1 ⁇ j ⁇ N-1.
  • the j-th sub-joint layer of the N-1 sub-joint layers is configured to combine the up-sampled input of the j-th decoding sub-network with the Nj-th coding sub-network in the N coding sub-networks LN1
  • the output of is combined as the input of the j-th decoding sub-network, where j is an integer and 1 ⁇ j ⁇ N-1.
  • the size of the up-sampling input of the j-th decoding sub-network is the same as the size of the output of the N-j-th coding sub-network in the N coding sub-networks SLN1, where 1 ⁇ j ⁇ N-1.
  • the coding element network LN1 includes a first coding sub-network, a second coding sub-network, and a connection between the first coding sub-network and the second coding sub-network.
  • the down-sampling layer DS, the decoding element network RN1 includes a first decoding sub-network, and a first up-sampling layer US1 connecting the first decoding sub-network and the second encoding sub-network.
  • the first encoding and decoding network UN1 also includes the first sub-joint layer corresponding to the first decoding sub-network SRN1 of the decoding element network RN1 (as shown by the CONCAT in the decoding element network RN1 in Figure 3). Show).
  • the first up-sampling layer US1 connecting the first decoding sub-network and the second encoding sub-network is configured to perform processing on the output of the second encoding sub-network.
  • the first sub-joining layer is configured to combine the up-sampling input of the first decoding sub-network with the output of the first encoding sub-network as The input of the first decoding sub-network, where the size of the up-sampled input of the first decoding sub-network is the same as the output of the first encoding sub-network;
  • the first decoding sub-network is configured to The input of the network is processed to obtain the output of the first decoding sub-network; wherein, the output of the first encoding and decoding network UN1 includes the output of the first decoding sub-network.
  • the output of the first encoding and decoding network UN1 includes the output of the first de
  • the number of downsampling layers in the coding element network LN1 is equal to the number of upsampling layers in the decoding element network RN1.
  • the first down-sampling layer in the encoding element network LN1 and the last-to-last up-sampling layer in the decoding element network RN1 are located at the same level, and the second down-sampling layer and the decoding element in the encoding element network LN1
  • the penultimate upsampling layer in the network RN1 is located at the same level, ...
  • the last downsampling layer in the coding element network LN1 and the first upsampling layer in the decoding element network RN1 are located at the same level.
  • the downsampling layer used to connect the first encoding sub-network and the second encoding sub-network is connected to the upper sampling layer used to connect the second decoding sub-network and the third decoding sub-network.
  • the sampling layer is at the same level, and the downsampling layer used to connect the second encoding sub-network and the third encoding sub-network is at the same level as the upsampling layer used to connect the first decoding sub-network and the second decoding sub-network ,
  • the down-sampling layer used to connect the third coding sub-network and the fourth coding sub-network is at the same level as the up-sampling layer used to connect the first decoding sub-network and the fourth coding sub-network.
  • the down-sampling factor of the down-sampling layer for example, correspondingly, the down-sampling factor of 2 ⁇ 2
  • the up-sampling factor of the up-sampling layer for example, correspondingly, 2 ⁇ 2 upsampling factor
  • the down-sampling factor of the downsampling layer is 1/y
  • the upsampling factor of the upsampling layer is y, where y is a positive integer, and y is usually greater than Equal to 2.
  • the size of the upsampling input of the j-th decoding sub-network can be made the same as the output size of the Nj-th coding sub-network in the N coding sub-networks SLN1, where N is an integer and N ⁇ 2, and j is an integer And 1 ⁇ j ⁇ N-1.
  • each of the N coding sub-networks SLN1 in the coding element network LN1 and the N-1 decoding sub-networks SRN1 in the decoding element network RN1 may include a first convolution module CN1 and Residual error module RES.
  • the first convolution module CN1 is configured to process the input of the sub-network corresponding to the first convolution module CN1 to obtain the first intermediate output;
  • the residual module RES is configured to Perform residual processing on the first intermediate output to obtain the output of the sub-network.
  • the residual module RES may include multiple second convolution modules CN2 and a residual addition layer (as shown by ADD in FIGS. 2 and 3), for example, each residual module RES
  • the number of included second convolution modules CN2 may be 2, but the present disclosure is not limited thereto.
  • the plurality of second convolution modules CN2 are configured to process the first intermediate output to obtain the second intermediate output; the residual addition layer is configured to transfer the first intermediate output Perform residual connection and addition processing with the second intermediate output to obtain the output of the residual module RES, that is, the output of the sub-network.
  • the output of each coding sub-network is the first coding feature map F1.
  • the size of the first intermediate output is the same as the size of the second intermediate output.
  • the size of the output of the residual module RES (that is, the output of the corresponding sub-network) is the same as that of the residual module RES.
  • the input ie, the corresponding first intermediate output
  • each of the foregoing first convolution module CN1 and second convolution module CN2 may include a convolution layer, an activation layer, and a batch normalization layer (Batch Normalization Layer).
  • the convolutional layer is configured to perform convolution processing
  • the activation layer is configured to perform activation processing
  • the batch normalization layer is configured to perform batch normalization processing.
  • the input and output sizes of the first convolution module CN1 are the same, so that the input and output sizes of each encoding sub-network in the encoding element network LN1 are the same, and each of the decoding element network RN1 has the same size.
  • the input and output sizes of the decoding sub-network are the same.
  • the first codec network UN1 may also include a fusion module MG.
  • the fusion module MG in the first codec network UN1 is configured to process the first output feature map F01 to obtain the first segmented image.
  • the fusion module MG in the first encoding and decoding network UN1 may use a 1 ⁇ 1 convolution kernel to process the first output feature map F01 to obtain the first segmented image; it should be noted that the Examples include but are not limited to this.
  • the joint layer is configured to combine the first output feature map F01 with at least one of the input image and the first segmented image to obtain the input of the second codec network.
  • the size of the first output feature map F01 is the same as the size of the input image.
  • the second encoding network UN2 is configured to perform segmentation processing on the input of the second encoding and decoding network to obtain a second segmented image.
  • the second encoding and decoding network UN2 includes an encoding element network LN2 and a decoding element network RN2.
  • the encoding element network LN2 of the second encoding and decoding network UN2 is configured to perform encoding processing on the input of the second encoding and decoding network to obtain the second encoding feature map F2; the decoding element network RN2 of the second encoding and decoding network UN2 is configured to The second encoding feature map F2 is decoded to obtain the output of the second encoding and decoding network UN2.
  • the second coding feature map F2 includes the output of the N coding sub-networks SLN1 in the coding element network LN2.
  • the output of the second encoding and decoding network UN2 may include the second segmented image.
  • the structure and function of the code element network LN2 and the decoding element network RN2 of the second encoding and decoding network UN2 can be referred to the aforementioned code element network LN1 and decoding of the first encoding and decoding network UN1, respectively.
  • the related description of the structure and function of the meta-network RN1 will not be repeated here.
  • Figures 2 and 3 both show that the second codec network UN2 and the first codec network UN1 have the same structure (that is, the same number of coding sub-networks and the same number of decoding sub-networks are included)
  • the embodiments of the present disclosure are not limited to this. That is to say, the second encoding and decoding network UN2 may also have a similar structure to the first encoding and decoding network UN1, but the number of encoding sub-networks included in the second encoding and decoding network UN2 is equal to that of the first encoding and decoding network UN1. The quantity can be different.
  • the second codec network UN2 may also include a fusion module MG.
  • the second codec network UN2 is configured to perform segmentation processing on the input of the second codec network UN2 to obtain a second segmented image, including: the second codec network UN2 is configured to The input is segmented to obtain the second output feature map F02; the fusion module MG in the second encoding and decoding network UN2 is configured to process the second output feature map F02 to obtain the second segmented image.
  • the fusion module MG in the second encoding and decoding network UN2 may use a 1 ⁇ 1 convolution kernel to process the second output feature map F02 to obtain the second segmented image; it should be noted that the Examples include but are not limited to this.
  • FIG. 6 is a flowchart of a neural network training method provided by some embodiments of the present disclosure.
  • the training method includes step S300 and step S400.
  • Step S300 Obtain training input images.
  • the training input image may also be various types of images, including, but not limited to, medical images, for example.
  • the training input image can be acquired by an image acquisition device.
  • the image acquisition device may include, for example, ultrasound equipment, X-ray equipment, nuclear magnetic resonance equipment, nuclear medicine equipment, medical optical equipment, and thermal imaging equipment, which are not limited in the embodiments of the present disclosure.
  • training input images can also be images of people, plants and animals, or landscape images, etc.
  • the training input images can also be through the camera of a smartphone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a surveillance camera, or a webcam Wait for the image acquisition device to acquire.
  • the training input image may also be a sample image in a sample set prepared in advance.
  • the sample set also includes a standard segmentation map (ie, ground truth) of the sample image.
  • the training input image can be a grayscale image or a color image.
  • obtaining the training input image may include: obtaining the original training input image; and performing preprocessing and data enhancement processing on the original training input image to obtain the training input image.
  • the original training input image is generally an image directly collected by an image collection device.
  • the original training input image can be preprocessed and data augmented.
  • preprocessing can eliminate irrelevant information or noise information in the original training input image, so as to better segment the training input image.
  • the preprocessing may include, for example, image scaling on the original training input image. Image scaling includes scaling and cropping the original training input image to a preset size to facilitate subsequent image segmentation processing.
  • preprocessing can also include gamma correction, image de-redundancy (cutting out redundant parts of the image), image enhancement (image adaptive color equalization, image alignment, color correction, etc.) or For processing such as noise reduction and filtering, for example, you can refer to common processing methods, which will not be repeated here.
  • Image enhancement processing includes expanding the training input image data through methods such as random cropping, rotation, flipping, skew, affine transformation, etc., increasing the difference of training input images, reducing overfitting in the image processing process, and increasing convolution The robustness and generalization of the neural network model.
  • Step S400 Use the training input image to train the neural network to be trained to obtain the neural network in the image processing method provided in any embodiment of the present disclosure.
  • the structure of the neural network to be trained may be the same as the neural network shown in FIG. 2 or the neural network shown in FIG. 3, and the embodiments of the present disclosure include but are not limited to this.
  • the neural network to be trained can execute the image processing method provided by any of the above embodiments of the present disclosure after being trained by the training method, that is, the neural network obtained by using the training method can execute the image provided by any of the above embodiments of the present disclosure. Approach.
  • FIG. 7 is an exemplary flowchart corresponding to step S400 in the training method shown in FIG. 6 provided by some embodiments of the present disclosure.
  • the neural network to be trained is trained using the training input image, that is, step S400 includes step S410 to step S430.
  • Step S410 Use the neural network to be trained to process the training input image to obtain a first training segmentation image and a second training segmentation image.
  • step S410 can refer to the related description of the aforementioned step S200, where the neural network to be trained, the training input image, the first training segmentation image, and the second training segmentation image in step S410 correspond to those in step S200.
  • the details of neural network, input image, first segmented image and second segmented image are not repeated here.
  • the initial parameters of the neural network to be trained may be random numbers, for example, the random numbers conform to a Gaussian distribution. It should be noted that the embodiments of the present disclosure do not limit this.
  • Step S420 Based on the first reference segmentation image and the second reference segmentation image of the training input image, and the first training segmentation image and the second training segmentation image, calculate the system loss value of the neural network to be trained through the system loss function, where, The first training segmentation image corresponds to the first reference segmentation image, and the second training segmentation image corresponds to the second reference segmentation image.
  • the training input image is a sample image in a sample set prepared in advance.
  • the first reference segmentation image and the second reference segmentation image are respectively the first standard segmentation corresponding to the sample image included in the sample set.
  • the first training segmentation image corresponds to the first reference segmentation image, which means that the first training segmentation image and the first reference segmentation image correspond to the same area (for example, the first area) of the training input image; the second training segmentation image corresponds to the first reference segmentation image.
  • Correspondence between two reference segmented images means that the second training segmented image and the second reference segmented image correspond to the same region (for example, the second region) of the training input image.
  • the first area of the training input image surrounds the second area of the training input image, that is, the second area of the training input image is located within the first area of the training input image.
  • the system loss function may include a first segmentation loss function and a second segmentation loss function.
  • the system loss function can be expressed as:
  • L 01 and L 02 respectively represent the first segmentation loss function and the second segmentation loss function
  • ⁇ 01 and ⁇ 02 respectively represent the weights of the first segmentation loss function and the second segmentation loss function in the system loss function.
  • the first segmentation loss function may include a binary (cross-entropy) loss function and a similarity (softdice) loss function.
  • the first segmentation loss function can be expressed as:
  • L 01 represents the first segmentation loss function
  • L 11 represents the cross loss function in the first segmentation loss function
  • ⁇ 11 represents the weight of the cross loss function in the first segmentation loss function
  • L 21 represents the first segmentation loss function.
  • ⁇ 12 represents the weight of the similarity loss function in the first segmentation loss function.
  • the cross loss function L 11 in the first segmentation loss function can be expressed as:
  • the similarity loss function L 21 in the first segmentation loss function can be expressed as:
  • x m1n1 represents the value of the pixel located in the m1 row and n1 column in the first training segmented image
  • y m1n1 represents the value of the pixel located in the m1 row and n1 column in the first reference segmented image
  • the training goal is to minimize the system loss value. Therefore, in the training process of the neural network to be trained, minimizing the system loss value includes minimizing the first segmentation loss function value.
  • the second segmentation loss function may also include a binary (cross-entropy) loss function and a similarity (softdice) loss function.
  • the second segmentation loss function can be expressed as:
  • L 02 represents the second segmentation loss function
  • L 12 represents the cross loss function in the second segmentation loss function
  • ⁇ 21 represents the weight of the cross loss function in the second segmentation loss function
  • L 22 represents the second segmentation loss
  • ⁇ 22 represents the weight of the similarity loss function in the second segmentation loss function.
  • the cross loss function L 12 in the second split loss function can be expressed as:
  • the similarity loss function L 22 in the second segmentation loss function can be expressed as:
  • x m2n2 represents the value of the pixel located in the m2 row and n2 column in the second training segmented image
  • y m2n2 represents the value of the pixel located in the m2 row and n2 column in the second reference segmented image
  • minimizing the system loss value also includes minimizing the second segmentation loss function value.
  • Step S430 Correct the parameters of the neural network to be trained based on the system loss value.
  • the training process of the neural network to be trained can also include an optimization function.
  • the optimization function can calculate the error value of the parameters of the neural network to be trained according to the system loss value calculated by the system loss function, and according to the error value The parameters of the neural network are corrected.
  • the optimization function may use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (BGD) algorithm, etc., to calculate the error value of the parameters of the neural network to be trained.
  • SGD stochastic gradient descent
  • BGD batch gradient descent
  • the above-mentioned training method may further include: judging whether the training of the neural network to be trained meets a predetermined condition, if the predetermined condition is not met, repeating the above-mentioned training process (ie, step S410 to step S430); if the predetermined condition is met, stop In the above training process, a trained neural network is obtained.
  • the foregoing predetermined condition is that the system loss value corresponding to two consecutive (or more) training input images no longer significantly decreases.
  • the foregoing predetermined condition is that the number of training times or training periods of the neural network to be trained reaches a predetermined number. The embodiment of the present disclosure does not limit this.
  • the first training segmentation image and the second training segmentation image output by the trained neural network can be respectively close to the first reference segmentation image and the second reference segmentation image, that is, the trained neural network can perform standard training on the input image. Image segmentation.
  • the program/method of can be implemented by corresponding software, firmware, hardware, etc.; and the above-mentioned embodiments are only illustrative of the training process of the neural network to be trained.
  • Those skilled in the art should know that in the training phase, a large number of sample images need to be used to train the neural network; at the same time, each sample image training process can include multiple iterations to perform the parameters of the neural network to be trained. Fix.
  • the training phase also includes fine-tune the parameters of the neural network to be trained to obtain more optimized parameters.
  • the neural network training method provided by the embodiment of the present disclosure can train the neural network used in the image processing method of the embodiment of the present disclosure.
  • the neural network trained by the training method can obtain the first segmented image first, and then Obtaining the second segmented image based on the first segmented image can improve robustness, have higher generalization and accuracy, and have more stable segmentation results for images acquired by different light environments and imaging devices; at the same time, end-to-end
  • the convolutional neural network model at the end can reduce manual operations.
  • FIG. 8 is a schematic block diagram of an image processing device provided by an embodiment of the present disclosure.
  • the image processing apparatus 500 includes a memory 510 and a processor 520.
  • the memory 510 is used for non-transitory storage of computer readable instructions
  • the processor 520 is used for running the computer readable instructions.
  • the image processing method provided by any embodiment of the present disclosure is executed. Or/and neural network training method.
  • the memory 510 and the processor 520 may directly or indirectly communicate with each other.
  • components such as the memory 510 and the processor 520 may communicate through a network connection.
  • the network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
  • the network may include a local area network, the Internet, a telecommunication network, the Internet of Things (Internet of Things) based on the Internet and/or a telecommunication network, and/or any combination of the above networks, etc.
  • the wired network may, for example, use twisted pair, coaxial cable, or optical fiber transmission for communication, and the wireless network may use, for example, a 3G/4G/5G mobile communication network, Bluetooth, Zigbee, or WiFi.
  • the present disclosure does not limit the types and functions of the network here.
  • the processor 520 may control other components in the image processing apparatus to perform desired functions.
  • the processor 520 may be a central processing unit (CPU), a tensor processor (TPU), or a graphics processor GPU, and other devices with data processing capabilities and/or program execution capabilities.
  • the central processing unit (CPU) can be an X86 or ARM architecture.
  • the GPU can be directly integrated on the motherboard alone or built into the north bridge chip of the motherboard.
  • the GPU can also be built into the central processing unit (CPU).
  • the memory 510 may include any combination of one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.
  • one or more computer instructions may be stored in the memory 510, and the processor 520 may execute the computer instructions to implement various functions.
  • the computer-readable storage medium may also store various application programs and various data, such as training input images, first reference segmented images, second reference segmented images, and various data used and/or generated by the application programs.
  • one or more steps in the image processing method described above may be executed.
  • one or more steps in the neural network training method described above may be executed.
  • the image processing device provided by the embodiments of the present disclosure is exemplary rather than restrictive. According to actual application requirements, the image processing device may also include other conventional components or structures, for example, to achieve image processing. For the necessary functions of the device, those skilled in the art can set other conventional components or structures according to specific application scenarios, which are not limited in the embodiments of the present disclosure.
  • FIG. 9 is a schematic diagram of a storage medium provided by an embodiment of the disclosure.
  • the storage medium 600 non-transitory stores computer-readable instructions 601.
  • any of the embodiments of the present disclosure can be executed.
  • the instruction of the image processing method or the instruction of the neural network training method provided by any embodiment of the present disclosure can be executed.
  • one or more computer instructions may be stored on the storage medium 600.
  • Some computer instructions stored on the storage medium 600 may be, for example, instructions for implementing one or more steps in the above-mentioned image processing method.
  • the other computer instructions stored on the storage medium may be, for example, instructions for implementing one or more steps in the above-mentioned neural network training method.
  • the storage medium may include the storage components of a tablet computer, the hard disk of a personal computer, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), optical disk read only memory (CD -ROM), flash memory, or any combination of the above storage media, can also be other suitable storage media.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD -ROM optical disk read only memory
  • flash memory or any combination of the above storage media, can also be other suitable storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种图像处理方法、图像处理装置、神经网络、神经网络的训练方法以及存储介质。该图像处理方法包括:获取输入图像;以及使用神经网络对输入图像进行处理,以得到第一分割图像和第二分割图像。神经网络包括第一编码解码网络和第二编码解码网络,第一编码解码网络的输入包括输入图像。使用神经网络对输入图像进行处理,以得到第一分割图像和第二分割图像,包括:使用第一编码解码网络对输入图像进行分割处理,以得到第一输出特征图和第一分割图像;将第一输出特征图与输入图像和第一分割图像至少之一进行联合,以得到第二编码解码网络的输入;使用第二编码解码网络对第二编码解码网络的输入进行分割处理,以得到第二分割图像。

Description

图像处理方法及装置、神经网络及训练方法、存储介质 技术领域
本公开的实施例涉及一种图像处理方法、图像处理装置、神经网络、神经网络的训练方法以及存储介质。
背景技术
当前,基于人工神经网络的深度学习技术已经在诸如图像分类、图像捕获和搜索、面部识别、年龄和语音识别等领域取得了巨大进展。深度学习的优势在于可以利用通用的结构以相对类似的系统解决非常不同的技术问题。卷积神经网络(Convolutional Neural Network,CNN)是近年发展起来并引起广泛重视的一种人工神经网络,CNN是一种特殊的图像识别方式,属于非常有效的带有前向反馈的网络。现在,CNN的应用范围已经不仅仅限于图像识别领域,也可以应用在人脸识别、文字识别、图像处理等应用方向。
发明内容
本公开至少一个实施例提供一种图像处理方法,包括:获取输入图像;以及使用神经网络对所述输入图像进行处理,以得到第一分割图像和第二分割图像;其中,所述神经网络包括两个编码解码网络,所述两个编码解码网络包括第一编码解码网络和第二编码解码网络,所述第一编码解码网络的输入包括所述输入图像;使用所述神经网络对所述输入图像进行处理,以得到第一分割图像和第二分割图像,包括:使用所述第一编码解码网络对所述输入图像进行分割处理,以得到第一输出特征图和所述第一分割图像;将所述第一输出特征图与所述输入图像和所述第一分割图像至少之一进行联合,以得到所述第二编码解码网络的输入;使用所述第二编码解码网络对所述第二编码解码网络的输入进行分割处理,以得到所述第二分割图像。
例如,在本公开一些实施例提供的图像处理方法中,所述两个编码解码网络中的每个编码解码网络包括:编码元网络和解码元网络;所述第一 编码解码网络的分割处理包括:使用所述第一编码解码网络的编码元网络对所述输入图像进行编码处理,以得到第一编码特征图;使用所述第一编码解码网络的解码元网络对所述第一编码特征图进行解码处理,以得到所述第一编码解码网络的输出,所述第一编码解码网络的输出包括所述第一分割图像;所述第二编码解码网络的分割处理包括:使用所述第二编码解码网络的编码元网络对所述第二编码解码网络的输入进行编码处理,以得到第二编码特征图;使用所述第二编码解码网络的解码元网络对所述第二编码特征图进行解码处理,以得到所述第二编码解码网络的输出,所述第二编码解码网络的输出包括所述第二分割图像。
例如,在本公开一些实施例提供的图像处理方法中,所述编码元网络包括N个编码子网络和N-1个下采样层,所述N个编码子网络依次连接,每个下采样层用于连接相邻的两个编码子网络,N为整数且N≥2;所述编码元网络的编码处理包括:使用所述N个编码子网络中的第i个编码子网络对所述第i个编码子网络的输入进行处理,以得到所述第i个编码子网络的输出;使用连接所述第i个编码子网络和所述N个编码子网络中的第i+1个编码子网络的下采样层对所述第i个编码子网络的输出进行下采样处理,以得到所述第i个编码子网络的下采样输出;使用所述第i+1个编码子网络对所述第i个编码子网络的下采样输出进行处理,以得到所述第i+1个编码子网络的输出;其中,i为整数且1≤i≤N-1,所述N个编码子网络中的第一个编码子网络的输入包括所述第一编码解码网络或所述第二编码解码网络的输入,除了所述第一个编码子网络之外,所述第i+1个编码子网络的输入包括所述第i个编码子网络的下采样输出,所述第一编码特征图或所述第二编码特征图包括所述N个编码子网络的输出。
例如,在本公开一些实施例提供的图像处理方法中,在N>2的情况下,所述解码元网络包括N-1个解码子网络、N-1个上采样层,所述N-1个解码子网络依次连接,所述N-1个上采样层包括第一上采样层和N-2个第二上采样层,所述第一上采样层用于连接所述N-1个解码子网络中的第1个解码子网络和所述N个编码子网络中的第N个编码子网络,每个第二上采样层用于连接相邻的两个解码子网络;所述解码元网络的解码处理包括:获取所述N-1个解码子网络中的第j个解码子网络的输入;使用所述第j个解码子网络对所述第j个解码子网络的输入进行处理,以得到所述 第j个解码子网络的输出;其中,j为整数且1≤j≤N-1,所述第一编码解码网络或所述第二编码解码网络的输出包括所述N-1个解码子网络中的第N-1个解码子网络的输出;当j=1时,获取所述N-1个解码子网络中的第j个解码子网络的输入包括:利用所述第一上采样层对所述第N个编码子网络的输出进行上采样处理,以得到所述第j个解码子网络的上采样输入;将所述第j个解码子网络的上采样输入与所述N个编码子网络中的第N-j个编码子网络的输出进行联合,作为所述第j个解码子网络的输入;当1<j≤N-1时,获取所述N-1个解码子网络中的第j个解码子网络的输入包括:利用连接所述N-1个解码子网络中的第j个解码子网络和第j-1个解码子网络的第二上采样层对所述第j-1个解码子网络的输出进行上采样处理,以得到所述第j个解码子网络的上采样输入;将所述第j个解码子网络的上采样输入与所述N个编码子网络中的第N-j个编码子网络的输出进行联合,作为所述第j个解码子网络的输入。
例如,在本公开一些实施例提供的图像处理方法中,所述第j个解码子网络的上采样输入的尺寸与所述第N-j个编码子网络的输出的尺寸相同,其中,1≤j≤N-1。
例如,在本公开一些实施例提供的图像处理方法中,在N=2的情况下,所述编码元网络还包括第二个编码子网络,所述解码元网络包括第一个解码子网络、连接所述第一个解码子网络和所述第二个编码子网络的第一上采样层,所述解码元网络的解码处理包括:使用连接所述第一个解码子网络和所述第二个编码子网络的所述第一上采样层对所述第二个编码子网络的输出进行上采样处理,以得到所述第一个解码子网络的上采样输入;将所述第一个解码子网络的上采样输入与所述第一个编码子网络的输出进行联合,作为所述第一个解码子网络的输入,其中,所述第一解码子网络的上采样输入的尺寸与所述第一个编码子网络的输出的尺寸相同;使用所述第一个解码子网络对所述第一个解码子网络的输入进行处理,以得到所述第一个解码子网络的输出;其中,所述编码解码网络的输出包括所述第一个解码子网络的输出。
例如,在本公开一些实施例提供的图像处理方法中,所述N个编码子网络和所述N-1个解码子网络中的每个子网络包括:第一卷积模块和残差模块;每个子网络的处理包括:使用所述第一卷积模块对与所述第一卷积 模块对应的子网络的输入进行处理,以得到第一中间输出;使用所述残差模块对所述第一中间输出进行残差处理,以得到所述子网络的输出。
例如,在本公开一些实施例提供的图像处理方法中,所述残差模块包括多个第二卷积模块;使用所述残差模块对所述第一中间输出进行残差处理,以得到所述子网络的输出,包括:使用所述多个第二卷积模块对所述第一中间输出进行处理,以得到第二中间输出;以及将所述第一中间输出和所述第二中间输出进行残差连接相加处理,以得到所述子网络的输出。
例如,在本公开一些实施例提供的图像处理方法中,所述第一卷积模块和所述多个第二卷积模块中的每一个的处理包括:卷积处理、激活处理和批量标准化处理。
例如,在本公开一些实施例提供的图像处理方法中,所述解码元网络中的每个解码子网络的输入和输出的尺寸相同,所述编码元网络中的每个编码子网络的输入和输出的尺寸相同。
例如,在本公开一些实施例提供的图像处理方法中,每个编码解码网络还包括融合模块;所述第一编码解码网络中的融合模块用于对所述第一输出特征图进行处理,以得到所述第一分割图像;使用所述第二编码解码网络对所述第二编码解码网络的输入进行分割处理,以得到所述第二分割图像包括:使用所述第二编码解码网络对所述第二编码解码网络的输入进行分割处理,以得到第二输出特征图;使用所述第二编码解码网络中的融合模块对所述第二输出特征图进行处理,以得到所述第二分割图像。
例如,在本公开一些实施例提供的图像处理方法中,所述第一分割图像对应所述输入图像的第一区域,所述第二分割图像对应所述输入图像的第二区域,所述输入图像的所述第一区域包围所述输入图像的所述第二区域。
本公开至少一个实施例还提供一种神经网络的训练方法,包括:获取训练输入图像;利用所述训练输入图像对待训练的神经网络进行训练,以得到本公开任一实施例提供的图像处理方法中的所述神经网络。
例如,在本公开一些实施例提供的训练方法中,利用所述训练输入图像对待训练的神经网络进行训练,包括:使用所述待训练的神经网络对所述训练输入图像进行处理,以得到第一训练分割图像和第二训练分割图像;基于所述训练输入图像的第一参考分割图像和第二参考分割图像、以及所 述第一训练分割图像和所述第二训练分割图像,通过系统损失函数计算所述待训练的神经网络的系统损失值;以及基于所述系统损失值对所述待训练的神经网络的参数进行修正;其中,所述第一训练分割图像与所述第一参考分割图像对应,所述第二训练分割图像与所述第二参考分割图像对应。
例如,在本公开一些实施例提供的训练方法中,所述系统损失函数包括第一分割损失函数和第二分割损失函数;所述第一分割损失函数和所述第二分割损失函数中的每个分割损失函数包括:交叉损失函数和相似性损失函数。
例如,在本公开一些实施例提供的训练方法中,所述第一分割损失函数表示为:
L 01=λ 11·L 1112·L 21
其中,L 01表示所述第一分割损失函数,L 11表示在所述第一分割损失函数中的交叉损失函数,λ 11表示在所述第一分割损失函数中的交叉损失函数的权重,L 21表示在所述第一分割损失函数中的相似性损失函数,λ 12表示在所述第一分割损失函数中的相似性损失函数的权重;
所述第一分割损失函数中的交叉损失函数L 11表示为:
Figure PCTCN2019098928-appb-000001
所述第一分割损失函数中的相似性损失函数L 21表示为:
Figure PCTCN2019098928-appb-000002
其中,x m1n1表示所述第一训练分割图像中位于m1行n1列的像素的值,y m1n1表示所述第一参考分割图像中位于m1行n1列的像素的值;
所述第二分割损失函数表示为:
L 02=λ 21·L 1222·L 22
其中,L 02表示所述第二分割损失函数,L 12表示在所述第二分割损失函数中的交叉损失函数,λ 21表示在所述第二分割损失函数中所述交叉损失函数的权重,L 22表示在所述第二分割损失函数中的相似性损失函数,λ 22表示在所述第二分割损失函数中所述相似性损失函数的权重,
所述第二分割损失函数中的交叉损失函数L 12表示为:
Figure PCTCN2019098928-appb-000003
所述第二分割损失函数中的相似性损失函数L 22表示为:
Figure PCTCN2019098928-appb-000004
其中,x m2n2表示所述第二训练分割图像中位于m2行n2列的像素的值,y m2n2表示所述第二参考分割图像中位于m2行n2列的像素的值。
例如,在本公开一些实施例提供的训练方法中,所述系统损失函数表示为:
L=λ 01·L 0102·L 02
其中,L 01和L 02分别表示所述第一分割损失函数和所述第二分割损失函数,λ 01和λ 02分别表示在所述系统损失函数中所述第一分割损失函数和所述第二分割损失函数的权重。
例如,在本公开一些实施例提供的训练方法中,获取所述训练输入图像,包括:获取原始训练输入图像;以及,对所述原始训练输入图像进行预处理和数据增强处理,以得到所述训练输入图像。
本公开至少一个实施例还提供一种图像处理装置,包括:存储器,用于存储非暂时性计算机可读指令;以及处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时执行本公开任一实施例提供的图像处理方法或执行本公开任一实施例提供的训练方法。
本公开至少一个实施例还提供一种存储介质,非暂时性地存储计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时可以执行本公开任一实施例提供的图像处理方法的指令或可以执行本公开任一实施例提供的训练方法的指令。
本公开至少一个实施例还提供一种神经网络,包括:两个编码解码网络和联合层,所述两个编码解码网络包括第一编码解码网络和第二编码解码网络;其中,所述第一编码网络被配置为对输入图像进行分割处理,以得到第一输出特征图和第一分割图像;所述联合层被配置为将所述第一输出特征图与所述输入图像和所述第一分割图像至少之一进行联合,以得到所述第二编码解码网络的输入;所述第二编码解码网络被配置为对所述第二编码解码网络的输入进行分割处理,以得到所述第二分割图像。
例如,在本公开一些实施例提供的神经网络中,所述两个编码解码网络中的每个编码解码网络包括编码元网络和解码元网络;所述第一编码解码网络的编码元网络被配置为对所述输入图像进行编码处理,以得到第一编码特征图;所述第一编码解码网络的解码元网络被配置为对所述第一编码特征图进行解码处理,以得到所述第一编码解码网络的输出,所述第一编码解码网络的输出包括所述第一分割图像;所述第二编码解码网络的编码元网络被配置为对所述第二编码解码网络的输入进行编码处理,以得到第二编码特征图;所述第二编码解码网络的解码元网络被配置为对所述第二编码特征图进行解码处理,以得到所述第二编码解码网络的输出,所述第二编码解码网络的输出包括所述第二分割图像。
例如,在本公开一些实施例提供的神经网络中,所述编码元网络包括N个编码子网络和N-1个下采样层,所述N个编码子网络依次连接,每个下采样层用于连接相邻的两个编码子网络,N为整数且N≥2;所述N个编码子网络中的第i个编码子网络被配置为对所述第i个编码子网络的输入进行处理,以得到所述第i个编码子网络的输出;连接所述第i个编码子网络和所述N个编码子网络中的第i+1个编码子网络的下采样层被配置为对所述第i个编码子网络的输出进行下采样处理,以得到所述第i个编码子网络的下采样输出;所述第i+1个编码子网络被配置为对所述第i个编码子网络的下采样输出进行处理,以得到所述第i+1个编码子网络的输出;其中,i为整数且1≤i≤N-1,所述N个编码子网络中的第一个编码子网络的输入包括所述第一编码解码网络或所述第二编码解码网络的输入,除了所述第一个编码子网络之外,所述第i+1个编码子网络的输入包括所述第i个编码子网络的下采样输出,所述第一编码特征图或所述第二编码特征图包括所述N个编码子网络的输出。
例如,在本公开一些实施例提供的神经网络中,在N>2的情况下,所述解码元网络包括N-1个解码子网络、N-1个上采样层,所述N-1个解码子网络依次连接,所述N-1个上采样层包括第一上采样层和N-2个第二上采样层,所述第一上采样层用于连接所述N-1个解码子网络中的第1个解码子网络和所述N个编码子网络中的第N个编码子网络,每个第二上采样层用于连接相邻的两个解码子网络;每个编码解码网络还包括与所述解码元网络的N-1个解码子网络对应的N-1个子联合层;所述N-1个解码子 网络中的第j个解码子网络被配置为对所述第j个解码子网络的输入进行处理,以得到所述第j个解码子网络的输出,其中,j为整数且1≤j≤N-1,所述第一编码解码网络或所述第二编码解码网络的输出包括所述N-1个解码子网络中的第N-1个解码子网络的输出;所述第一上采样层被配置为对所述第N个编码子网络的输出进行上采样处理,以得到所述第一个解码子网络的上采样输入;连接所述N-1个解码子网络中的第j个解码子网络和第j-1个解码子网络的第二上采样层被配置为对所述第j-1个解码子网络的输出进行上采样处理,以得到所述第j个解码子网络的上采样输入,其中,j为整数且1<j≤N-1;所述N-1个子联合层中的第j个子联合层被配置为将所述第j个解码子网络的上采样输入与所述N个编码子网络中的第N-j个编码子网络的输出进行联合,作为所述第j个解码子网络的输入,其中,j为整数且1≤j≤N-1。
例如,在本公开一些实施例提供的神经网络中,所述第j个解码子网络的上采样输入的尺寸与所述第N-j个编码子网络的输出的尺寸相同,其中,1≤j≤N-1。
例如,在本公开一些实施例提供的神经网络中,在N=2的情况下,所述编码元网络还包括第二个编码子网络,所述解码元网络包括第一个解码子网络、连接所述第一个解码子网络和所述第二个编码子网络的第一上采样层,每个编码解码网络还包括与所述解码元网络的第一个解码子网络对应的第一个子联合层;连接所述第一个解码子网络和所述第二个编码子网络的所述第一上采样层被配置为对所述第二个编码子网络的输出进行上采样处理,以得到所述第一个解码子网络的上采样输入;所述第一个子联合层被配置为将所述第一个解码子网络的上采样输入与所述第一个编码子网络的输出进行联合,作为所述第一个解码子网络的输入,其中,所述第一解码子网络的上采样输入的尺寸与所述第一个编码子网络的输出的尺寸相同;所述第一个解码子网络被配置为对所述第一个解码子网络的输入进行处理,以得到所述第一个解码子网络的输出;其中,所述第一编码解码网络或所述第二编码解码网络的输出包括所述第一个解码子网络的输出。
例如,在本公开一些实施例提供的神经网络中,所述N个编码子网络和所述N-1个解码子网络中的每个子网络包括:第一卷积模块和残差模块;所述第一卷积模块被配置为对与所述第一卷积模块对应的子网络的输入进 行处理,以得到第一中间输出;所述残差模块被配置为对所述第一中间输出进行残差处理,以得到所述子网络的输出。
例如,在本公开一些实施例提供的神经网络中,所述残差模块包括多个第二卷积模块和残差相加层;所述多个第二卷积模块被配置为对所述第一中间输出进行处理,以得到第二中间输出;所述残差相加层被配置为将所述第一中间输出和所述第二中间输出进行残差连接相加处理,以得到所述子网络的输出。
例如,在本公开一些实施例提供的神经网络中,所述第一卷积模块和所述多个第二卷积模块中的每一个包括:卷积层、激活层和批量标准化层;所述卷积层被配置为进行卷积处理,所述激活层被配置为进行激活处理,所述批量标准化层被配置为进行批量标准化处理。
例如,在本公开一些实施例提供的神经网络中,所述解码元网络中的每个解码子网络的输入和输出的尺寸相同,所述编码元网络中的每个编码子网络的输入和输出的尺寸相同。
例如,在本公开一些实施例提供的神经网络中,每个编码解码网络还包括融合模块;所述第一编码解码网络中的融合模块被配置为对所述第一输出特征图进行处理,以得到所述第一分割图像;所述第二编码解码网络被配置为对所述第二编码解码网络的输入进行分割处理,以得到所述第二分割图像,包括:所述第二编码解码网络被配置为对所述第二编码解码网络的输入进行分割处理,以得到第二输出特征图;所述第二编码解码网络中的融合模块被配置为对所述第二输出特征图进行处理,以得到所述第二分割图像。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1为本公开一些实施例提供的一种图像处理方法的流程图;
图2为本公开一些实施例提供的一种对应于图1所示的图像处理方法中的神经网络的示意性架构框图;
图3为本公开一些实施例提供的另一种对应于图1所示的图像处理方 法中的神经网络的示意性架构框图;
图4为本公开一些实施例提供的一种对应于图1所示的图像处理方法中的步骤S200的示例性流程图;
图5为本公开一些实施例提供的一种输入图像中的第一区域和第二区域的示意图;
图6为本公开一些实施例提供的一种神经网络的训练方法的流程图;
图7为本公开一些实施例提供的一种对应于图6中所示的训练方法中的步骤S400的示例性流程图;
图8为本公开一实施例提供的一种图像处理装置的示意性框图;以及
图9为本公开一实施例提供的一种存储介质的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
下面通过几个具体的实施例对本公开进行说明。为了保持本公开实施例的以下说明清楚且简明,本公开省略了已知功能和已知部件的详细说明。当本公开实施例的任一部件在一个以上的附图中出现时,该部件在每个附图中由相同或类似的参考标号表示。
图像分割是图像处理领域的研究热点。图像分割是一种将图像分成若 干个特定的、具有独特性质的区域并提取感兴趣目标的技术。医学图像分割是图像分割的一个重要应用领域。医学图像分割是指从医学图像中提取感兴趣组织的区域或边界,使所提取的组织能够与其他组织明显地区别开来。医学图像分割对组织定量分析、制定手术计划和计算机辅助诊断等具有重要的意义。在医学领域,深度学习神经网络可以用于医学图像分割,其可以提升图像分割的准确性,减少抽取特征的时间,提高计算效率。医学图像分割可以用于提取感兴趣区域,以便于对医学图像进行分析和识别。
需要说明的是,本公开以医学图像为例进行的示意性说明,其它涉及到图像分割需求的领域依旧可以适用本公开实施例提供的技术方案。
需要说明的是,在本公开中,卷积层、下采样层和上采样层等这些层每个都指代对应的处理操作,即卷积处理、下采样处理、上采样处理等,所描述的模块、子网络等也都指代对应的处理操作,以下不再重复说明。
本公开至少一个实施例提供一种图像处理方法。该图像处理方法包括:获取输入图像;以及使用神经网络对输入图像进行处理,以得到第一分割图像和第二分割图像。该神经网络包括两个编码解码网络,该两个编码解码网络包括第一编码解码网络和第二编码解码网络,第一编码解码网络的输入包括输入图像。使用神经网络对输入图像进行处理,以得到第一分割图像和第二分割图像,包括:使用第一编码解码网络对输入图像进行分割处理,以得到第一输出特征图和第一分割图像;将第一输出特征图与输入图像和第一分割图像至少之一进行联合,以得到第二编码解码网络的输入;使用第二编码解码网络对第二编码解码网络的输入进行分割处理,以得到第二分割图像。
本公开的一些实施例还提供对应于上述图像处理方法的图像处理装置、神经网络、神经网络的训练方法以及存储介质。
本公开的实施例提供的图像处理方法通过先得到第一分割图像,再基于第一分割图像得到第二分割图像,可以提高鲁棒性、具有较高的泛化性和精度,且对于不同光环境和成像设备获取的图像具有更稳定的分割结果;同时,采用端到端的卷积神经网络模型,可以减少手工操作。
下面结合附图对本公开的实施例及其示例进行详细说明。
图1为本公开一些实施例提供的一种图像处理方法的流程图。例如,如图1所示,该图像处理方法包括步骤S100和步骤S200。
步骤S100:获取输入图像;
步骤S200:使用神经网络对输入图像进行处理,以得到第一分割图像和第二分割图像。
例如,在步骤S100中,输入图像可以为各种类型的图像,例如包括但不限于医学图像。例如,按照获取医学图像的设备划分,医学图像可以包括超声图像、X射线计算机断层摄影(Computed Tomography,CT)、核磁共振(Magnetic Resonance Imaging,MRI)图像、数字血管剪影(Digital Subtraction Angiography,DSA)和正电子发射断层摄影(Positron Emission Computed Tomography PET)等。按照医学图像的内容划分,医学图像可以包括脑组织核磁共振图像、脊髓核磁共振图像、眼底图像、血管图像、胰腺CT图像和肺部CT图像等。
例如,输入图像可以通过图像采集装置获取。当输入图像为医学图像时,图像采集装置例如可以包括超声设备、X线设备、核磁共振设备、核医学设备、医用光学设备以及热成像设备等,本公开的实施例对此不作限制。
需要说明的是,输入图像也可以为人物图像、动植物图像或风景图像等,输入图像也可以通过智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、数码照相机的镜头、监控摄像头或者网络摄像头等图像采集装置获取。
例如,输入图像可以为灰度图像,也可以为彩色图像。例如,输入图像的尺寸可以根据实施需要进行设置,本公开的实施例对此不作限制。
例如,输入图像可以是图像采集装置直接采集到的原始图像,也可以是对原始图像进行预处理之后获得的图像。例如,为了避免输入图像的数据质量、数据不均衡等对于图像分割精度的影响,在步骤S100之前,本公开实施例提供的图像处理方法还可以包括对输入图像进行预处理的操作。预处理可以消除输入图像中的无关信息或噪声信息,以便于更好地对输入图像进行分割。
例如,在步骤S200中,使用神经网络对输入图像进行分割处理,即从输入图像中分割出一种物体(例如,器官或组织)的形状,以得到对应的分割图像。例如,在本公开的一些实施例中,以输入图像包括医学图像(例如,眼底图像、肺部CT图像等)为例,第一分割图像可以对应输入图像的 第一区域,例如第一分割图像对应医学图像中的一种器官或组织(例如,眼底图像中的视盘、肺部CT图像中的肺等);第二分割图像可以对应输入图像的第二区域,例如,输入图像的第一区域包围输入图像的第二区域,例如第二分割图像对应前述器官或组织中的一种结构或病灶等(例如,眼底图像中的视杯、肺部CT图像中的肺结节等)。例如,第一分割图像和第二分割图像可以用于医疗诊断,例如,可以用于青光眼(基于视盘和视杯的分割)、早期肺癌(基于肺和肺结节的分割)等的筛查和诊断。
图2为本公开一些实施例提供的一种对应于图1所示的图像处理方法中的神经网络的示意性架构框图,图3为本公开一些实施例提供的另一种对应于图1所示的图像处理方法中的神经网络的示意性架构框图,图4为本公开一些实施例提供的一种对应于图1所示的图像处理方法中的步骤S200的示例性流程图。以下,结合图2、图3和图4,对图1所示的图像处理方法中的步骤S200进行详细说明。
结合图2、图3和图4所示,本公开的实施例提供的图像处理方法中的神经网络可以包括两个编码解码网络,该两个编码解码网络包括第一编码解码网络UN1和第二编码解码网络UN2。例如,如图2和图3所示,第一编码解码网络UN1和第二编码解码网络UN2均可以为U型网络(U-net),本公开的实施例对此不作限制。例如,第一编码解码网络UN1的输入包括输入图像。例如,如图4所示,使用神经网络对输入图像进行处理,以得到第一分割图像和第二分割图像,即步骤S200,包括步骤S210至步骤S230。
步骤S210:使用第一编码解码网络对输入图像进行分割处理,以得到第一输出特征图和第一分割图像。
例如,如图2和图3所示,第一编码解码网络UN1包括编码元网络LN1和解码元网络RN1。对应地,第一编码解码网络UN1的分割处理包括:使用第一编码解码网络UN1的编码元网络LN1对输入图像(即第一编码解码网络的输入)进行编码处理,以得到第一编码特征图F1;使用第一编码解码网络UN1的解码元网络RN1对第一编码特征图F1进行解码处理,以得到第一编码解码网络UN1的输出。例如,如图2和图3所示,第一编码解码网络UN1的输出包括第一分割图像;例如,如图2和图3所示,第一编码解码网络UN1的输出还可以包括第一输出特征图 F01,第一输出特征图F01可以用于第二编码解码网络UN2的处理。
例如,如图2和图3所示,编码元网络LN1可以包括N个编码子网络SLN1和N-1个下采样层DS,其中,N为整数且N≥2。该N个编码子网络SLN1依次连接,且每个下采样层DS用于连接相邻的两个编码子网络SLN1,也就是说,任意相邻的两个编码子网络SLN1通过一个对应的下采样层DS连接。例如,图2中示出了N>2的情形,而图3中示出了N=2的情形。需要说明的是,图2中示出的是N=4的情形,但不应视为对本公开的限制。如图2所示,在第一编码解码网络UN1的编码元网络LN1中,从上至下(即从靠近输入图像的一侧至远离输入图像的一侧),编码元网络LN1依次包括第一个编码子网络、第二个编码子网络、第三个编码子网络和第四个编码子网络;如图3所示,在第一编码解码网络UN1的编码元网络LN1中,从上至下,编码元网络LN1依次包括第一个编码子网络和第二个编码子网络。
下采样层用于进行下采样处理。一方面,下采样层可以用于缩减输入图像的规模,简化计算的复杂度,在一定程度上减小过拟合的现象;另一方面,下采样层也可以进行特征压缩,提取输入图像的主要特征。下采样层能够减少特征图像的尺寸,但不改变特征图像的数量。例如,下采样处理用于减小特征图像的尺寸,从而减少特征图的数据量。例如,下采样层可以采用最大值合并(max pooling)、平均值合并(average pooling)、跨度卷积(strided convolution)、欠采样(decimation,例如选择固定的像素)、解复用输出(demuxout,将输入图像拆分为多个更小的图像)等下采样方法实现下采样处理。
例如,如图2和图3所示,编码元网络LN1的编码处理包括:使用N个编码子网络SLN1中的第i个编码子网络对第i个编码子网络的输入进行处理,以得到第i个编码子网络的输出;使用连接第i个编码子网络和N个编码子网络SLN1中的第i+1个编码子网络的下采样层DS对第i个编码子网络的输出进行下采样处理,以得到第i个编码子网络的下采样输出;使用第i+1个编码子网络对第i个编码子网络的下采样输出进行处理,以得到第i+1个编码子网络的输出;其中,i为整数且1≤i≤N-1,N个编码子网络SLN1中的第一个编码子网络的输入包括第一编码解码网络UN1的输入,除了第一个编码子网络之外,第i+1个编码子网络的输入包括第i 个编码子网络SLN1的下采样输出,第一编码特征图F1包括编码元网络LN1中的N个编码子网络SLN1的输出,也就是说,第一编码特征图F1包括第一个编码子网络的输出、第二个编码子网络的输出、第三个编码子网络的输出和第四个编码子网络的输出。
例如,在一些示例中,每个编码子网络SLN1的输入和输出的尺寸相同。
例如,如图2和图3所示,与编码元网络LN1的结构对应,解码元网络RN1包括N-1个解码子网络SRN1和N-1个上采样层。如图2所示,在第一编码解码网络UN1的解码元网络RN1中,从下至上,解码元网络RN1依次包括第一个解码子网络、第二个解码子网络和第三个解码子网络;如图3所示,在第一编码解码网络UN1的解码元网络RN1中,编码元网络RN1包括第一个解码子网络。
上采样层用于进行上采样处理。例如,上采样处理用于增大特征图像的尺寸,从而增加特征图的数据量。例如,上采样层可以采用跨度转置卷积(strided transposed convolution)、插值算法等上采样方法实现上采样处理。插值算法例如可以包括内插值、双线性插值、两次立方插值(Bicubic Interprolation)等算法。
例如,如图2所示,在N>2的情况下,N-1个解码子网络SRN1依次连接,N-1个上采样层包括第一上采样层US1和N-2个第二上采样层US2,第一上采样层US1用于连接N-1个解码子网络SRN1中的第1个解码子网络和N个编码子网络SLN1中的第N个编码子网络,每个第二上采样层US2用于连接相邻的两个解码子网络,也就是说,任意相邻的两个解码子网络SRN1通过一个对应的第二上采样层US2连接。从而,如图2所示,在N>2的情况下,解码元网络RN1的解码处理包括:获取N-1个解码子网络SRN1中的第j个解码子网络的输入;使用第j个解码子网络对第j个解码子网络的输入进行处理,以得到第j个解码子网络的输出;其中,j为整数且1≤j≤N-1,第一编码解码网络UN1的输出包括N-1个解码子网络SRN1中的第N-1个解码子网络的输出。例如,如图2所示,在N>2的情况下,N-1个解码子网络SRN1中的第N-1个解码子网络(图2所示的示例中为第三个解码子网络)的输出即为第一输出特征图F01。
例如,如图2所示,在N>2的情况下,当j=1时,获取N-1个解码 子网络SRN1中的第j个解码子网络(即第一个解码子网络)的输入包括:利用第一上采样层US1对第N个编码子网络(图2所示的示例中为第四个解码子网络)的输出进行上采样处理,以得到第j个解码子网络的上采样输入;将第j个解码子网络的上采样输入与N个编码子网络SLN1中的第N-j个编码子网络(图2所示的示例中为第三个解码子网络)的输出进行联合(concatenate,如图中CONCAT所示),作为第j个解码子网络的输入。当1<j≤N-1时,获取N-1个解码子网络中的第j个解码子网络的输入包括:利用连接N-1个解码子网络SRN1中的第j个解码子网络和第j-1个解码子网络的第二上采样层US2对第j-1个解码子网络的输出进行上采样处理,以得到第j个解码子网络的上采样输入;将第j个解码子网络的上采样输入与N个编码子网络SRN1中的第N-j个编码子网络的输出进行联合,作为第j个解码子网络的输入。
例如,第j个解码子网络的上采样输入的尺寸与N个编码子网络SLN1中的第N-j个编码子网络的输出的尺寸相同,其中,1≤j≤N-1。例如,以第j个解码子网络的上采样输入和N个编码子网络SLN1中的第N-j个编码子网络的输出包括的特征图均为H行W列的矩阵为例,第j个解码子网络的上采样输入包括的特征图的数量为C1,N个编码子网络SLN1中的第N-j个编码子网络的输出包括的特征图的数量为C2,则第j个解码子网络的上采样输入和N个编码子网络SLN1中的第N-j个编码子网络的输出的特征图模型分别为(C1,H,W)、和(C2,H,W)。从而,将第j个解码子网络的上采样输入与N个编码子网络SRN1中的第N-j个编码子网络的输出进行联合,得到的第j个解码子网络的输入的特征图模型为(C1+C2,H,W)。第j个解码子网络的输入包括的特征图的数量为C1+C2,本公开对第j个解码子网络的输入的特征图模型中各个特征图的排列顺序不作限制。需要说明的是,本公开的实施例包括但不限于此。
需要说明的是,在本公开的实施例中,“连接”可以表示在信号(例如,特征图)传输的方向上将两个功能对象(例如,子网络、下采样层、上采样层等)中的在前的一个功能对象的输出作为在后的另一个功能对象的输入。
例如,如图3所示,在N=2的情况下,编码元网络LN1包括第一个编码子网络、第二个编码子网络以及连接第一个编码子网络和第二个编码 子网络的下采样层DS,解码元网络RN1包括第一个解码子网络、连接第一个解码子网络和第二个编码子网络的第一上采样层US1。从而,如图3所示,在N=2的情况下,解码元网络RN1的解码处理包括:使用连接第一个解码子网络和第二个编码子网络的第一上采样层US1对第二个编码子网络的输出进行上采样处理,以得到第一个解码子网络的上采样输入;将第一个解码子网络的上采样输入与第一个编码子网络的输出进行联合,作为第一个解码子网络的输入,其中,第一解码子网络的上采样输入的尺寸与第一个编码子网络的输出的尺寸相同;使用第一个解码子网络对第一个解码子网络的输入进行处理,以得到第一个解码子网络的输出;其中,第一编码解码网络UN1的输出包括第一个解码子网络的输出。例如,如图3所示,在N=2的情况下,第一个解码子网络的输出即为第一输出特征图F01。
需要说明的是,在本公开的实施例中,编码元网络LN1中的下采样层的数目和解码元网络RN1中的上采样层的数目相等。例如,可以认为:编码元网络LN1中的第一个下采样层和解码元网络RN1中的倒数第一个上采样层位于同一层级,编码元网络LN1中的第二个下采样层和解码元网络RN1中的倒数第二个上采样层位于同一层级,……,以此类推,编码元网络LN1中的最后一个下采样层和解码元网络RN1中的第一个上采样层位于同一层级。例如,在图2所示的示例中,用于连接第一个编码子网络和第二个编码子网络的下采样层与用于连接第二个解码子网络和第三个解码子网络的上采样层位于同一层级,用于连接第二个编码子网络和第三个编码子网络的下采样层与用于连接第一个解码子网络和第二个解码子网络的上采样层位于同一层级,用于连接第三个编码子网络和第四个编码子网络的下采样层与用于连接第一个解码子网络和第四个编码子网络的上采样层位于同一层级。则对于位于同一层级的下采样层和上采样层,该下采样层的下采样因子(例如,相应地,2×2的下采样因子)与该上采样层的上采样因子(例如,相应地,2×2的上采样因子)对应,即:当该下采样层的下采样因子为1/y时,则该上采样层的上采样因子为y,其中y为正整数,且y通常大于等于2。从而,可以使第j个解码子网络的上采样输入的尺寸与N个编码子网络SLN1中的第N-j个编码子网络的输出的尺寸相同,其中,N为整数且N≥2,j为整数且1≤j≤N-1。
例如,如图2和3所示,编码元网络LN1的N个编码子网络SLN1和解码元网络RN1中的N-1个解码子网络SRN1中的每个子网络可以包括第一卷积模块CN1和残差模块RES。从而,如图2和3所示,每个子网络的处理包括:使用第一卷积模块CN1对与第一卷积模块CN1对应的子网络的输入进行处理,以得到第一中间输出;使用残差模块RES对第一中间输出进行残差处理,以得到该子网络的输出。
例如,如图2和3所示,残差模块RES可以包括多个第二卷积模块CN2,例如每个残差模块RES包括的第二卷积模块CN2的数量可以为2,但本公开不限于此。从而,如图2和图3所示,使用残差模块RES对第一中间输出进行残差处理,以得到该子网络的输出,包括:使用多个第二卷积模块CN2对第一中间输出进行处理,以得到第二中间输出;以及将第一中间输出和第二中间输出进行残差连接相加处理(如图中ADD所示),以得到该残差模块RES的输出,即该子网络的输出。例如,如图2和图3所示,每个编码子网络的输出为第一编码特征图F1。
例如,第一中间输出的尺寸与第二中间输出的尺寸相同,从而经过残差连接相加后,该残差模块RES的输出(即对应的子网络的输出)的尺寸与该残差模块RES的输入(即对应的第一中间输出)的尺寸相同。
例如,在一些示例中,上述第一卷积模块CN1和第二卷积模块CN2中的每一个卷积模块均可以包括卷积层,激活层和批量标准化层(Batch Normalization Layer),从而,每一个卷积模块的处理可以包括:卷积处理、激活处理和批量标准化处理。
卷积层是卷积神经网络的核心层。卷积层可以对其输入(例如,输入图像)应用若干个卷积核(也称为滤波器),以提取该输入的多种类型的特征。例如,卷积层可以包括3×3的卷积核。卷积层可以包括多个卷积核,每个卷积核可以提取一种类型的特征。卷积核一般以随机小数矩阵的形式初始化,在卷积神经网络的训练过程中卷积核将通过学习以得到合理的权值。对输入图像应用一个卷积核之后得到的结果被称为特征图(feature map),特征图的数目与卷积核的数目相等。每个特征图由一些矩形排列的神经元组成,同一特征图的神经元共享权值,这里共享的权值就是卷积核。一个层级的卷积层输出的特征图像可以被输入到相邻的下一个层级的卷积层并再次处理以得到新的特征图像。
例如,激活层包括激活函数,激活函数用于给卷积神经网络引入非线性因素,以使卷积神经网络可以更好地解决较为复杂的问题。激活函数可以包括线性修正单元(ReLU)函数、S型函数(Sigmoid函数)或双曲正切函数(tanh函数)等。ReLU函数为非饱和非线性函数,Sigmoid函数和tanh函数为饱和非线性函数。例如,激活层可以单独作为卷积神经网络的一层,或者激活层也可以被包含在卷积层中。
例如,批量标准化层用于对特征图进行批量标准化处理,以使特征图像的像素的灰度值在预定范围内变化,从而降低计算难度,提高对比度。例如,预定范围可以为[-1,1]。例如,批量标准化层的处理方式可以参考常见的批量标准化处理的过程,在此不再赘述。
例如,在一些示例中,第一卷积模块CN1的输入和输出的尺寸相同,从而,编码元网络LN1中的每个编码子网络的输入和输出的尺寸相同,解码元网络RN1中的每个解码子网络的输入和输出的尺寸相同。
例如,如图2和图3所示,第一编码解码网络UN1还可以包括融合模块MG。第一编码解码网络UN1中的融合模块MG用于对第一输出特征图F01进行处理,以得到第一分割图像。例如,在一些示例中,第一编码解码网络UN1中的融合模块MG可以采用1×1卷积核对第一输出特征图F01进行处理,以得到第一分割图像;需要说明的是,本公开的实施例包括但不限于此。
步骤S220:将第一输出特征图与输入图像和第一分割图像至少之一进行联合,以得到第二编码解码网络的输入。
例如,第一输出特征图F01的尺寸与输入图像的尺寸相同。例如,对第一输出特征图F01与输入图像或/和第一分割图像(即输入图像和第一分割图像至少之一)进行联合操作的过程,可以参考前述将第j个解码子网络的上采样输入与N个编码子网络SRN1中的第N-j个编码子网络的输出进行联合的相关描述,在此不再重复赘述。
步骤S230:使用第二编码解码网络对第二编码解码网络的输入进行分割处理,以得到第二分割图像。
例如,如图2和图3所示,第二编码解码网络UN2包括编码元网络LN2和解码元网络RN2。对应地,第二编码解码网络UN2的分割处理包括:使用第二编码解码网络UN2的编码元网络LN2对第二编码解码网络 的输入进行编码处理,以得到第二编码特征图F2;使用第二编码解码网络UN2的解码元网络RN2对第二编码特征图F2进行解码处理,以得到第二编码解码网络UN2的输出。第二编码特征图F2包括编码元网络LN2中的N个编码子网络SLN1的输出。例如,如图2和图3所示,第二编码解码网络UN2的输出可以包括第二分割图像。
例如,如图2和图3所示,第二编码解码网络UN2的编码元网络LN2和解码元网络RN2的结构和处理过程可以分别对应参考前述关于第一编码解码网络UN1的编码元网络LN1和解码元网络RN1的结构和处理过程的相关描述,在此不再重复赘述。
需要说明的是,虽然图2和图3中示出的均是第二编码解码网络UN2和第一编码解码网络UN1具有相同结构(即包括相同数量的编码子网络和相同数量的解码子网络)的情形,但是本公开的实施例不限于此。也就是说,第二编码解码网络UN2也可以和第一编码解码网络UN1具有相似结构,但第二编码解码网络UN2包括的编码子网络的数量与第一编码解码网络UN1包括的编码子网络的数量可以不同。
例如,如图2和图3所示,第二编码解码网络UN2还可以包括融合模块MG。例如,使用第二编码解码网络UN2对第二编码解码网络UN2的输入进行分割处理,以得到第二分割图像,包括:使用第二编码解码网络UN2对第二编码解码网络UN2的输入进行分割处理,以得到第二输出特征图F02;使用第二编码解码网络UN2中的融合模块MG对第二输出特征图F02进行处理,以得到第二分割图像。
例如,如图2和图3所示,第二编码解码网络UN2中的融合模块MG用于对第二输出特征图F02进行处理,以得到第二分割图像。例如,在一些示例中,第二编码解码网络UN2中的融合模块MG可以采用1×1卷积核对第二输出特征图F02进行处理,以得到第二分割图像;需要说明的是,本公开的实施例包括但不限于此。
例如,在一些示例中,第一分割图像对应输入图像的第一区域,第二分割图像对应输入图像的第二区域。图5为本公开一些实施例提供的一种输入图像中的第一区域和第二区域的示意图。例如,如图5所示,输入图像的第一区域R1包围输入图像的第二区域R2,也就是说,第二区域R2位于第一区域R1内。例如,在此情况下,第一分割图像和第二分割图像 可以用于医疗诊断等,例如,可以用于青光眼(基于视盘和视杯的分割,其中,第一区域对应视盘,第二区域对应视杯)、早期肺癌(基于肺和肺结节的分割,其中,第一区域对应肺,第二区域对应肺结节)等的筛查和诊断。例如,用于青光眼的筛查和诊断时,可以基于视盘和视杯的分割,计算视杯/视盘的面积比(即杯盘比),并根据该面积比与预设阈值的相对大小进行筛查和诊断,在此不再赘述。需要说明的是,本公开的实施例包括但不限于此。
需要说明的是,图5所示的输入图像中的第一区域R1和第二区域R2的形状和大小均是示意性的,本公开的实施例对此不作限制。另外,应当理解的是,输入图像中的第一区域可以包括一个连通的区域(如图5所示),此时,输入图像中的第二区域可以包括一个连通的区域(如图5所示),也可以包括几个分立的区域;输入图像中的第一区域也可以包括多个分立的第一子区域,此时,输入图像中的第二区域可以包括一个连通的区域(位于一个第一子区域内),也可以包括多个分立的第二子区域(位于一个第一子区域内或位于几个分立的第一子区域内)。还需要说明的是,第二区域位于第一区域内,可以包括第二区域的边缘与第一区域的边缘没有交叠的情形,也可以包括第二区域的边缘与第一区域的边缘至少部分交叠的情形,本公开的实施例对此不作限制。
需要说明的是,在本公开的实施例(不限于本实施例)中,相同或相似的功能对象可以具有相同或相似的结构或处理过程,但是相同或相似的功能对象对应的参数可以相同,也可以不同。本公开的实施例对此不作限制。
本公开的实施例提供的图像处理方法,通过先得到第一分割图像,再基于第一分割图像得到第二分割图像,可以提高鲁棒性、具有较高的泛化性和精度,且对于不同光环境和成像设备获取的图像具有更稳定的分割结果;同时,采用端到端的卷积神经网络模型,可以减少手工操作。
本公开至少一实施例还提供一种神经网络,该神经网络可以用于执行上述实施例提供的图像处理方法。例如,该神经网络的结构可以参考图2或图3所示的神经网络的架构。如图2和图3所示,本公开的实施例提供的神经网络包括两个编码解码网络,该两个编码解码网络包括第一编码解码网络UN1和第二编码解码网络UN2;该神经网络还包括联合层(如图2 和图3中用于连接第一编码解码网络UN1和第二编码解码网络UN2的CONCAT所示)。例如,如图2和图3所示,第一编码解码网络UN1和第二编码解码网络UN2均可以为U型网络(U-net),本公开的实施例对此不作限制。例如,第一编码解码网络UN1的输入包括输入图像。例如,该神经网络被配置为对输入图像进行处理,以得到第一分割图像和第二分割图像。
例如,如图2和图3所示,第一编码网络UN1被配置为对输入图像进行分割处理,以得到第一输出特征图F01和第一分割图像。
例如,如图2和图3所示,第一编码解码网络UN1包括编码元网络LN1和解码元网络RN1。第一编码解码网络UN1的编码元网络LN1被配置为对输入图像(即第一编码解码网络的输入)进行编码处理,以得到第一编码特征图F1;第一编码解码网络UN1的解码元网络RN1被配置为对第一编码特征图F1进行解码处理,以得到第一编码解码网络UN1的输出。例如,如图2和图3所示,第一编码解码网络UN1的输出包括第一分割图像;例如,如图2和图3所示,第一编码解码网络UN1的输出还可以包括第一输出特征图F01,第一输出特征图F01可以用于第二编码解码网络UN2的处理。
例如,如图2和图3所示,编码元网络LN1可以包括N个编码子网络SLN1和N-1个下采样层DS,其中,N为整数且N≥2。该N个编码子网络SLN1依次连接,且每个下采样层DS用于连接相邻的两个编码子网络SLN1,也就是说,任意相邻的两个编码子网络SLN1通过一个对应的下采样层DS连接。例如,图2中示出了N>2的情形,而图3中示出了N=2的情形。需要说明的是,图2中示出的是N=4的情形,但不应视为对本公开的限制。如图2所示,在第一编码解码网络UN1的编码元网络LN1中,从上至下(即从靠近输入图像的一侧至远离输入图像的一侧),编码元网络LN1依次包括第一个编码子网络、第二个编码子网络、第三个编码子网络和第四个编码子网络;如图3所示,在第一编码解码网络UN1的编码元网络LN1中,从上至下,编码元网络LN1依次包括第一个编码子网络和第二个编码子网络。
例如,如图2和图3所示,N个编码子网络SLN1中的第i个编码子网络被配置为对第i个编码子网络的输入进行处理,以得到第i个编码子 网络的输出;连接第i个编码子网络和N个编码子网络SLN1中的第i+1个编码子网络的下采样层DS被配置为对第i个编码子网络的输出进行下采样处理,以得到第i个编码子网络的下采样输出;第i+1个编码子网络被配置为对第i个编码子网络的下采样输出进行处理,以得到第i+1个编码子网络的输出;其中,i为整数且1≤i≤N-1,N个编码子网络SLN1中的第一个编码子网络的输入包括第一编码解码网络UN1的输入,除了第一个编码子网络之外,第i+1个编码子网络的输入包括第i个编码子网络SLN1的下采样输出,第一编码特征图F1包括编码元网络LN1中的N个编码子网络SLN1的输出,也就是说,第一编码特征图F1包括第一个编码子网络的输出、第二个编码子网络的输出、第三个编码子网络的输出和第四个编码子网络的输出。
例如,在一些示例中,每个编码子网络SLN1的输入和输出的尺寸相同。
例如,如图2和图3所示,与编码元网络LN1的结构对应,解码元网络RN1包括N-1个解码子网络SRN1和N-1个上采样层。如图2所示,在第一编码解码网络UN1的解码元网络RN1中,从下至上,解码元网络RN1依次包括第一个解码子网络、第二个解码子网络和第三个解码子网络;如图3所示,在第一编码解码网络UN1的解码元网络RN1中,编码元网络RN1包括第一个解码子网络。
例如,如图2所示,在N>2的情况下,N-1个解码子网络SRN1依次连接,N-1个上采样层包括第一上采样层US1和N-2个第二上采样层US2,第一上采样层US1用于连接N-1个解码子网络SRN1中的第1个解码子网络和N个编码子网络SLN1中的第N个编码子网络,每个第二上采样层US2用于连接相邻的两个解码子网络,也就是说,任意相邻的两个解码子网络SRN1通过一个对应的第二上采样层US2连接。例如,在此情况下,第一编码解码网络UN1还包括与解码元网络RN1的N-1个解码子网络SRN1对应的N-1个子联合层(如图2中的解码元网络RN1中的CONCAT所示)。
例如,如图2所示,N-1个解码子网络SRN1中的第j个解码子网络被配置为对所述第j个解码子网络的输入进行处理,以得到所述第j个解码子网络的输出,其中,j为整数且1≤j≤N-1,第一编码解码网络UN1 输出包括所述N-1个解码子网络SRN1中的第N-1个解码子网络的输出。例如,如图2所示,在N>2的情况下,N-1个解码子网络SRN1中的第N-1个解码子网络(图2所示的示例中为第三个解码子网络)的输出即为第一输出特征图F01。
例如,如图2所示,第一上采样层US1被配置为对第N个编码子网络的输出进行上采样处理,以得到第一个解码子网络的上采样输入;连接N-1个解码子网络SRN1中的第j个解码子网络和第j-1个解码子网络的第二上采样层US2被配置为对第j-1个解码子网络的输出进行上采样处理,以得到所述第j个解码子网络的上采样输入,其中,j为整数且1<j≤N-1。
例如,如图2所示,N-1个子联合层中的第j个子联合层被配置为将第j个解码子网络的上采样输入与N个编码子网络LN1中的第N-j个编码子网络的输出进行联合,作为第j个解码子网络的输入,其中,j为整数且1≤j≤N-1。
例如,第j个解码子网络的上采样输入的尺寸与N个编码子网络SLN1中的第N-j个编码子网络的输出的尺寸相同,其中,1≤j≤N-1。
例如,如图3所示,在N=2的情况下,编码元网络LN1包括第一个编码子网络、第二个编码子网络以及连接第一个编码子网络和第二个编码子网络的下采样层DS,解码元网络RN1包括第一个解码子网络、连接第一个解码子网络和第二个编码子网络的第一上采样层US1。例如,在此情况下,第一编码解码网络UN1还包括与解码元网络RN1的第一个解码子网络SRN1对应的第一个子联合层(如图3中的解码元网络RN1中的CONCAT所示)。
例如,如图3所示,在N=2的情况下,连接第一个解码子网络和第二个编码子网络的第一上采样层US1被配置为对第二个编码子网络的输出进行上采样处理,以得到第一个解码子网络的上采样输入;第一个子联合层被配置为将第一个解码子网络的上采样输入与第一个编码子网络的输出进行联合,作为第一个解码子网络的输入,其中,第一解码子网络的上采样输入的尺寸与第一个编码子网络的输出的尺寸相同;第一个解码子网络被配置为对第一个解码子网络的输入进行处理,以得到第一个解码子网络的输出;其中,第一编码解码网络UN1的输出包括第一个解码子网络的输出。例如,如图3所示,在N=2的情况下,第一个解码子网络的输出 即为第一输出特征图F01。
需要说明的是,在本公开的实施例中,编码元网络LN1中的下采样层的数目和解码元网络RN1中的上采样层的数目相等。例如,可以认为:编码元网络LN1中的第一个下采样层和解码元网络RN1中的倒数第一个上采样层位于同一层级,编码元网络LN1中的第二个下采样层和解码元网络RN1中的倒数第二个上采样层位于同一层级,……,以此类推,编码元网络LN1中的最后一个下采样层和解码元网络RN1中的第一个上采样层位于同一层级。例如,在图2所示的示例中,用于连接第一个编码子网络和第二个编码子网络的下采样层与用于连接第二个解码子网络和第三个解码子网络的上采样层位于同一层级,用于连接第二个编码子网络和第三个编码子网络的下采样层与用于连接第一个解码子网络和第二个解码子网络的上采样层位于同一层级,用于连接第三个编码子网络和第四个编码子网络的下采样层与用于连接第一个解码子网络和第四个编码子网络的上采样层位于同一层级。则对于位于同一层级的下采样层和上采样层,该下采样层的下采样因子(例如,相应地,2×2的下采样因子)与该上采样层的上采样因子(例如,相应地,2×2的上采样因子)对应,即:当该下采样层的下采样因子为1/y时,则该上采样层的上采样因子为y,其中y为正整数,且y通常大于等于2。从而,可以使第j个解码子网络的上采样输入的尺寸与N个编码子网络SLN1中的第N-j个编码子网络的输出的尺寸相同,其中,N为整数且N≥2,j为整数且1≤j≤N-1。
例如,如图2和3所示,编码元网络LN1的N个编码子网络SLN1和解码元网络RN1中的N-1个解码子网络SRN1中的每个子网络可以包括第一卷积模块CN1和残差模块RES。例如,如图2和3所示,第一卷积模块CN1被配置为对与第一卷积模块CN1对应的子网络的输入进行处理,以得到第一中间输出;残差模块RES被配置为对第一中间输出进行残差处理,以得到该子网络的输出。
例如,如图2和3所示,残差模块RES可以包括多个第二卷积模块CN2和残差相加层(如图2和图3中ADD所示),例如每个残差模块RES包括的第二卷积模块CN2的数量可以为2,但本公开不限于此。例如,如图2和图3所示,多个第二卷积模块CN2被配置为对第一中间输出进行处理,以得到第二中间输出;残差相加层被配置为将第一中间输出和第二中 间输出进行残差连接相加处理,以得到该残差模块RES的输出,即该子网络的输出。例如,如图2和图3所示,每个编码子网络的输出为第一编码特征图F1。
例如,第一中间输出的尺寸与第二中间输出的尺寸相同,从而经过残差层处理后,该残差模块RES的输出(即对应的子网络的输出)的尺寸与该残差模块RES的输入(即对应的第一中间输出)的尺寸相同。
例如,在一些示例中,上述第一卷积模块CN1和第二卷积模块CN2中的每一个卷积模块均可以包括卷积层,激活层和批量标准化层(Batch Normalization Layer)。卷积层被配置为进行卷积处理,激活层被配置为进行激活处理,批量标准化层被配置为进行批量标准化处理,例如,可以参考前述相关描述,在此不再赘述。
例如,在一些示例中,第一卷积模块CN1的输入和输出的尺寸相同,从而,编码元网络LN1中的每个编码子网络的输入和输出的尺寸相同,解码元网络RN1中的每个解码子网络的输入和输出的尺寸相同。
例如,如图2和图3所示,第一编码解码网络UN1还可以包括融合模块MG。第一编码解码网络UN1中的融合模块MG被配置为对第一输出特征图F01进行处理,以得到第一分割图像。例如,在一些示例中,第一编码解码网络UN1中的融合模块MG可以采用1×1卷积核对第一输出特征图F01进行处理,以得到第一分割图像;需要说明的是,本公开的实施例包括但不限于此。
例如,如图2和图3所示,联合层被配置为将第一输出特征图F01与输入图像和第一分割图像至少之一进行联合,以得到所述第二编码解码网络的输入。例如,第一输出特征图F01的尺寸与输入图像的尺寸相同。
例如,如图2和图3所示,第二编码网络UN2被配置为对第二编码解码网络的输入进行分割处理,以得到第二分割图像。
例如,如图2和图3所示,第二编码解码网络UN2包括编码元网络LN2和解码元网络RN2。第二编码解码网络UN2的编码元网络LN2被配置为对第二编码解码网络的输入进行编码处理,以得到第二编码特征图F2;第二编码解码网络UN2的解码元网络RN2被配置为对第二编码特征图F2进行解码处理,以得到第二编码解码网络UN2的输出。第二编码特征图F2包括编码元网络LN2中的N个编码子网络SLN1的输出。例如, 如图2和图3所示,第二编码解码网络UN2的输出可以包括第二分割图像。
例如,如图2和图3所示,第二编码解码网络UN2的编码元网络LN2和解码元网络RN2的结构和功能可以分别对应参考前述关于第一编码解码网络UN1的编码元网络LN1和解码元网络RN1的结构和功能的相关描述,在此不再重复赘述。
需要说明的是,虽然图2和图3中示出的均是第二编码解码网络UN2和第一编码解码网络UN1具有相同结构(即包括相同数量的编码子网络和相同数量的解码子网络)的情形,但是本公开的实施例不限于此。也就是说,第二编码解码网络UN2也可以和第一编码解码网络UN1具有相似结构,但第二编码解码网络UN2包括的编码子网络的数量与第一编码解码网络UN1包括的编码子网络的数量可以不同。
例如,如图2和图3所示,第二编码解码网络UN2还可以包括融合模块MG。例如,第二编码解码网络UN2被配置为对第二编码解码网络UN2的输入进行分割处理,以得到第二分割图像,包括:第二编码解码网络UN2被配置为对第二编码解码网络UN2的输入进行分割处理,以得到第二输出特征图F02;第二编码解码网络UN2中的融合模块MG被配置为对第二输出特征图F02进行处理,以得到第二分割图像。例如,在一些示例中,第二编码解码网络UN2中的融合模块MG可以采用1×1卷积核对第二输出特征图F02进行处理,以得到第二分割图像;需要说明的是,本公开的实施例包括但不限于此。
本公开的实施例提供的神经网络的技术效果可以参考上述实施例中关于图像处理方法的相应描述,在此不再赘述。
本公开至少一实施例还提供一种神经网络的训练方法。图6为本公开一些实施例提供的一种神经网络的训练方法的流程图。例如,如图6所示,该训练方法包括步骤S300和步骤S400。
步骤S300:获取训练输入图像。
例如,与前述步骤S100中的输入图像类似,训练输入图像也可以为各种类型的图像,例如包括但不限于医学图像。例如,训练输入图像可以通过图像采集装置获取。当输入图像为医学图像时,图像采集装置例如可以包括超声设备、X线设备、核磁共振设备、核医学设备、医用光学设备以及热成像设备等,本公开的实施例对此不作限制。例如,训练输入图像 也可以为人物图像、动植物图像或风景图像等,训练输入图像也可以通过智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、数码照相机的镜头、监控摄像头或者网络摄像头等图像采集装置获取。例如,训练输入图像也可以为预先准备好的样本集中的样本图像,例如,该样本集还包括样本图像的标准分割图(即ground truth)。例如,训练输入图像可以为灰度图像,也可以为彩色图像。
例如,在一些示例中,获取训练输入图像,即步骤S300可以包括:获取原始训练输入图像;以及,对原始训练输入图像进行预处理和数据增强处理,以得到训练输入图像。例如,原始训练输入图像一般为图像采集装置直接采集到的图像。为了避免原始训练输入图像的数据质量、数据不均衡等对于训练过程的影响,可以对原始训练输入图像进行预处理和数据增强(Data Augment)处理。例如,预处理可以消除原始训练输入图像中的无关信息或噪声信息,以便于更好地对训练输入图像进行分割。预处理例如可以包括对原始训练输入图像进行图像缩放。图像缩放包括对原始训练输入图像进行等比例缩放并剪裁为预设尺寸,以便于后续进行图像分割处理。需要说明的是,预处理还可以包括伽玛(Gamma)校正、图像去冗余(将图像的冗余部分进行裁剪去除)、图像增强(图像自适应色彩均衡、图像对齐、颜色校正等)或降噪滤波等处理,例如可以参考常见的处理方式,在此不再赘述。图像增强处理包括通过例如随机裁剪、旋转、翻转、偏斜、仿射变换等方式扩充训练输入图像的数据,增加训练输入图像的差异性,减少图像处理过程中的过拟合现象,增加卷积神经网络模型的鲁棒性和泛化性。
步骤S400:利用训练输入图像对待训练的神经网络进行训练,以得到本公开任一实施例提供的图像处理方法中的神经网络。
例如,该待训练的神经网络的构造可以与图2所示的神经网络或图3所示的神经网络相同,本公开的实施例包括但不限于此。例如,待训练的神经网络经过该训练方法的训练后可以执行本公开上述任一实施例提供的图像处理方法,即利用该训练方法得到的神经网络可以执行本公开上述任一实施例提供的图像处理方法。
图7为本公开一些实施例提供的一种对应于图6中所示的训练方法中的步骤S400的示例性流程图。例如,如图7所示,利用训练输入图像对 待训练的神经网络进行训练,即步骤S400,包括步骤S410至步骤S430。
步骤S410:使用待训练的神经网络对训练输入图像进行处理,以得到第一训练分割图像和第二训练分割图像。
例如,步骤S410的具体过程可以参考前述步骤S200的相关描述,其中,步骤S410中的待训练的神经网络、训练输入图像、第一训练分割图像和第二训练分割图像分别对应于步骤S200中的神经网络、输入图像、第一分割图像和第二分割图像,具体细节在此不再赘述。
例如,在训练过程中,待训练的神经网络的初始参数可以为随机数,例如随机数符合高斯分布。需要说明的是,本公开的实施例对此不作限制。
步骤S420:基于训练输入图像的第一参考分割图像和第二参考分割图像、以及第一训练分割图像和第二训练分割图像,通过系统损失函数计算待训练的神经网络的系统损失值,其中,第一训练分割图像与第一参考分割图像对应,第二训练分割图像与第二参考分割图像对应。
例如,在一些示例中,训练输入图像为预先准备好的样本集中的样本图像,例如,第一参考分割图像和第二参考分割图像分别为该样本集包括的与样本图像对应的第一标准分割图和第二标准分割图。
例如,第一训练分割图像与第一参考分割图像对应,是指第一训练分割图像和第一参考分割图像对应训练输入图像的同一区域(例如,第一区域);第二训练分割图像与第二参考分割图像对应,是指第二训练分割图像和第二参考分割图像对应训练输入图像的同一区域(例如,第二区域)。例如,在一些示例中,训练输入图像的第一区域包围训练输入图像的第二区域,也就是说,训练输入图像的第二区域位于训练输入图像的第一区域内。
例如,在一些示例中,系统损失函数可以包括第一分割损失函数和第二分割损失函数。例如,系统损失函数可以表示为:
L=λ 01·L 0102·L 02
其中,L 01和L 02分别表示第一分割损失函数和第二分割损失函数,λ 01和λ 02分别表示在系统损失函数中第一分割损失函数和第二分割损失函数的权重。
例如,在一些示例中,第一分割损失函数可以包括交叉(binary cross entropy)损失函数和相似性(softdice)损失函数。例如,第一分割损失函数可以表示为:
L 01=λ 11·L 1112·L 21
其中,L 01表示第一分割损失函数,L 11表示在第一分割损失函数中的交叉损失函数,λ 11表示在第一分割损失函数中的交叉损失函数的权重,L 21表示在第一分割损失函数中的相似性损失函数,λ 12表示在第一分割损失函数中的相似性损失函数的权重。
例如,第一分割损失函数中的交叉损失函数L 11可以表示为:
Figure PCTCN2019098928-appb-000005
第一分割损失函数中的相似性损失函数L 21可以表示为:
Figure PCTCN2019098928-appb-000006
其中,x m1n1表示第一训练分割图像中位于m1行n1列的像素的值,y m1n1表示第一参考分割图像中位于m1行n1列的像素的值。
例如,L 11或L 21的值越大,即L 01的值越大,则表示第一训练分割图像与第一参考分割图像之间的差异越大;L 11或L 21的值越小,即L 01的值越小,则表示第一训练分割图像与第一参考分割图像之间的差异越小。在训练过程中,训练目标是最小化系统损失值,因此,在待训练的神经网络的训练过程中,最小化系统损失值包括最小化第一分割损失函数值。
例如,在一些示例中,第二分割损失函数也可以包括交叉(binary cross entropy)损失函数和相似性(softdice)损失函数。例如,第二分割损失函数可以表示为:
L 02=λ 21·L 1222·L 22
其中,L 02表示第二分割损失函数,L 12表示在第二分割损失函数中的交叉损失函数,λ 21表示在第二分割损失函数中交叉损失函数的权重,L 22表示在第二分割损失函数中的相似性损失函数,λ 22表示在第二分割损失函数中相似性损失函数的权重。
例如,第二分割损失函数中的交叉损失函数L 12可以表示为:
Figure PCTCN2019098928-appb-000007
第二分割损失函数中的相似性损失函数L 22可以表示为:
Figure PCTCN2019098928-appb-000008
其中,x m2n2表示第二训练分割图像中位于m2行n2列的像素的值,y m2n2表示第二参考分割图像中位于m2行n2列的像素的值。
例如,L 12或L 22的值越大,即L 02的值越大,则表示第二训练分割图像与第二参考分割图像之间的差异越大;L 21或L 22的值越小,即L 02的值越小,则表示第二训练分割图像与第二参考分割图像之间的差异越小。因此,在待训练的神经网络的训练过程中,最小化系统损失值还包括最小化第二分割损失函数值。
步骤S430:基于系统损失值对待训练的神经网络的参数进行修正。
例如,待训练的神经网络的训练过程中还可以包括优化函数,优化函数可以根据系统损失函数计算得到的系统损失值计算待训练的神经网络的参数的误差值,并根据该误差值对待训练的神经网络的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient descent,BGD)算法等计算待训练的神经网络的参数的误差值。
例如,上述训练方法还可以包括:判断待训练的神经网络的训练是否满足预定条件,若不满足预定条件,则重复执行上述训练过程(即步骤S410至步骤S430);若满足预定条件,则停止上述训练过程,得到训练好的神经网络。例如,在一个示例中,上述预定条件为连续两幅(或更多幅)训练输入图像对应的系统损失值不再显著减小。例如,在另一个示例中,上述预定条件为待训练的神经网络的训练次数或训练周期达到预定数目。本公开的实施例对此不作限制。
例如,训练好的神经网络输出的第一训练分割图像和第二训练分割图像可以分别接近于第一参考分割图像和第二参考分割图像,即训练好的神经网络可以对训练输入图像进行较为标准的图像分割。
需要说明的是,在本公开中,例如,待训练的神经网络及其包括的各种层或模块(例如卷积模块、上采样层、下采样层等)等每个分别对应执行相应处理过程的程序/方法,例如可以通过相应的软件、固件、硬件等方式实现;并且,上述实施例仅是示意性说明待训练的神经网络的训练过程。本领域技术人员应当知道,在训练阶段,需要利用大量样本图像对神经网 络进行训练;同时,在每一幅样本图像训练过程中,都可以包括多次反复迭代以对待训练的神经网络的参数进行修正。又例如,训练阶段还包括对待训练的神经网络的参数进行微调(fine-tune),以获取更优化的参数。
本公开的实施例提供的神经网络的训练方法,可以对本公开实施例的图像处理方法中采用的神经网络进行训练,通过该训练方法训练好的神经网络,可以通过先得到第一分割图像,再基于第一分割图像得到第二分割图像,可以提高鲁棒性、具有较高的泛化性和精度,且对于不同光环境和成像设备获取的图像具有更稳定的分割结果;同时,采用端到端的卷积神经网络模型,可以减少手工操作。
本公开至少一实施例还提供一种图像处理装置。图8为本公开一实施例提供的一种图像处理装置的示意性框图。例如,如图8所示,该图像处理装置500包括存储器510和处理器520。例如,存储器510用于非暂时性存储计算机可读指令,处理器520用于运行该计算机可读指令,该计算机可读指令被处理器520运行时执行本公开任一实施例提供的图像处理方法或/和神经网络的训练方法。
例如,存储器510和处理器520之间可以直接或间接地互相通信。例如,存储器510和处理器520等组件之间可以通过网络连接进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网和/或电信网的物联网(Internet of Things)、和/或以上网络的任意组合等。有线网络例如可以采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G/4G/5G移动通信网络、蓝牙、Zigbee或者WiFi等通信方式。本公开对网络的类型和功能在此不作限制。
例如,处理器520可以控制图像处理装置中的其它组件以执行期望的功能。处理器520可以是中央处理单元(CPU)、张量处理器(TPU)或者图形处理器GPU等具有数据处理能力和/或程序执行能力的器件。中央处理器(CPU)可以为X86或ARM架构等。GPU可以单独地直接集成到主板上,或者内置于主板的北桥芯片中。GPU也可以内置于中央处理器(CPU)上。
例如,存储器510可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存 储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。
例如,在存储器510上可以存储一个或多个计算机指令,处理器520可以运行所述计算机指令,以实现各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如训练输入图像、第一参考分割图像、第二参考分割图像以及应用程序使用和/或产生的各种数据等。
例如,存储器510存储的一些计算机指令被处理器520执行时可以执行根据上文所述的图像处理方法中的一个或多个步骤。又例如,存储器510存储的另一些计算机指令被处理器520执行时可以执行根据上文所述的神经网络的训练方法中的一个或多个步骤。
例如,关于图像处理方法的处理过程的详细说明可以参考上述图像处理方法的实施例中的相关描述,关于神经网络的训练方法的处理过程的详细说明可以参考上述神经网络的训练方法的实施例中的相关描述,重复之处不再赘述。
需要说明的是,本公开的实施例提供的图像处理装置是示例性的,而非限制性的,根据实际应用需要,该图像处理装置还可以包括其他常规部件或结构,例如,为实现图像处理装置的必要功能,本领域技术人员可以根据具体应用场景设置其他的常规部件或结构,本公开的实施例对此不作限制。
本公开的实施例提供的图像处理装置的技术效果可以参考上述实施例中关于图像处理方法以及神经网络的训练方法的相应描述,在此不再赘述。
本公开至少一实施例还提供一种存储介质。图9为本公开一实施例提供的一种存储介质的示意图。例如,如图9所示,该存储介质600非暂时性地存储计算机可读指令601,当非暂时性计算机可读指令601由计算机(包括处理器)执行时可以执行本公开任一实施例提供的图像处理方法的指令或者可以执行本公开任一实施例提供的神经网络的训练方法的指令。
例如,在存储介质600上可以存储一个或多个计算机指令。存储介质600上存储的一些计算机指令可以是例如用于实现上述图像处理方法中的 一个或多个步骤的指令。存储介质上存储的另一些计算机指令可以是例如用于实现上述神经网络的训练方法中的一个或多个步骤的指令。
例如,存储介质可以包括平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、光盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。
本公开的实施例提供的存储介质的技术效果可以参考上述实施例中关于图像处理方法以及神经网络的训练方法的相应描述,在此不再赘述。
对于本公开,有以下几点需要说明:
(1)本公开实施例附图中,只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开同一实施例及不同实施例中的特征可以相互组合。
以上,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (31)

  1. 一种图像处理方法,包括:
    获取输入图像;以及
    使用神经网络对所述输入图像进行处理,以得到第一分割图像和第二分割图像;其中,
    所述神经网络包括两个编码解码网络,所述两个编码解码网络包括第一编码解码网络和第二编码解码网络,所述第一编码解码网络的输入包括所述输入图像;
    使用所述神经网络对所述输入图像进行处理,以得到第一分割图像和第二分割图像,包括:
    使用所述第一编码解码网络对所述输入图像进行分割处理,以得到第一输出特征图和所述第一分割图像;
    将所述第一输出特征图与所述输入图像和所述第一分割图像至少之一进行联合,以得到所述第二编码解码网络的输入;
    使用所述第二编码解码网络对所述第二编码解码网络的输入进行分割处理,以得到所述第二分割图像。
  2. 根据权利要求1所述的图像处理方法,其中,所述两个编码解码网络中的每个编码解码网络包括:编码元网络和解码元网络;
    所述第一编码解码网络的分割处理包括:
    使用所述第一编码解码网络的编码元网络对所述输入图像进行编码处理,以得到第一编码特征图;
    使用所述第一编码解码网络的解码元网络对所述第一编码特征图进行解码处理,以得到所述第一编码解码网络的输出,所述第一编码解码网络的输出包括所述第一分割图像;
    所述第二编码解码网络的分割处理包括:
    使用所述第二编码解码网络的编码元网络对所述第二编码解码网络的输入进行编码处理,以得到第二编码特征图;
    使用所述第二编码解码网络的解码元网络对所述第二编码特征图进行解码处理,以得到所述第二编码解码网络的输出,所述第二编码解码网络的输出包括所述第二分割图像。
  3. 根据权利要求2所述的图像处理方法,其中,所述编码元网络包括N个编码子网络和N-1个下采样层,所述N个编码子网络依次连接,每个下采样层用于连接相邻的两个编码子网络,N为整数且N≥2;
    所述编码元网络的编码处理包括:
    使用所述N个编码子网络中的第i个编码子网络对所述第i个编码子网络的输入进行处理,以得到所述第i个编码子网络的输出;
    使用连接所述第i个编码子网络和所述N个编码子网络中的第i+1个编码子网络的下采样层对所述第i个编码子网络的输出进行下采样处理,以得到所述第i个编码子网络的下采样输出;
    使用所述第i+1个编码子网络对所述第i个编码子网络的下采样输出进行处理,以得到所述第i+1个编码子网络的输出;
    其中,i为整数且1≤i≤N-1,所述N个编码子网络中的第一个编码子网络的输入包括所述第一编码解码网络或所述第二编码解码网络的输入,除了所述第一个编码子网络之外,所述第i+1个编码子网络的输入包括所述第i个编码子网络的下采样输出,所述第一编码特征图或所述第二编码特征图包括所述N个编码子网络的输出。
  4. 根据权利要求3所述的图像处理方法,其中,在N>2的情况下,
    所述解码元网络包括N-1个解码子网络、N-1个上采样层,所述N-1个解码子网络依次连接,所述N-1个上采样层包括第一上采样层和N-2个第二上采样层,所述第一上采样层用于连接所述N-1个解码子网络中的第1个解码子网络和所述N个编码子网络中的第N个编码子网络,每个第二上采样层用于连接相邻的两个解码子网络;
    所述解码元网络的解码处理包括:
    获取所述N-1个解码子网络中的第j个解码子网络的输入;
    使用所述第j个解码子网络对所述第j个解码子网络的输入进行处理,以得到所述第j个解码子网络的输出;
    其中,j为整数且1≤j≤N-1,所述第一编码解码网络或所述第二编码解码网络的输出包括所述N-1个解码子网络中的第N-1个解码子网络的输出;
    当j=1时,获取所述N-1个解码子网络中的第j个解码子网络的输入包括:
    利用所述第一上采样层对所述第N个编码子网络的输出进行上采样处理,以得到所述第j个解码子网络的上采样输入;
    将所述第j个解码子网络的上采样输入与所述N个编码子网络中的第N-j个编码子网络的输出进行联合,作为所述第j个解码子网络的输入;
    当1<j≤N-1时,获取所述N-1个解码子网络中的第j个解码子网络的输入包括:
    利用连接所述N-1个解码子网络中的第j个解码子网络和第j-1个解码子网络的第二上采样层对所述第j-1个解码子网络的输出进行上采样处理,以得到所述第j个解码子网络的上采样输入;
    将所述第j个解码子网络的上采样输入与所述N个编码子网络中的第N-j个编码子网络的输出进行联合,作为所述第j个解码子网络的输入。
  5. 根据权利要求4所述的图像处理方法,其中,所述第j个解码子网络的上采样输入的尺寸与所述第N-j个编码子网络的输出的尺寸相同,其中,1≤j≤N-1。
  6. 根据权利要求3所述的图像处理方法,其中,在N=2的情况下,
    所述编码元网络还包括第二个编码子网络,所述解码元网络包括第一个解码子网络、连接所述第一个解码子网络和所述第二个编码子网络的第一上采样层,
    所述解码元网络的解码处理包括:
    使用连接所述第一个解码子网络和所述第二个编码子网络的所述第一上采样层对所述第二个编码子网络的输出进行上采样处理,以得到所述第一个解码子网络的上采样输入;
    将所述第一个解码子网络的上采样输入与所述第一个编码子网络的输出进行联合,作为所述第一个解码子网络的输入,其中,所述第一解码子网络的上采样输入的尺寸与所述第一个编码子网络的输出的尺寸相同;
    使用所述第一个解码子网络对所述第一个解码子网络的输入进行处理,以得到所述第一个解码子网络的输出;
    其中,所述第一编码解码网络或所述第二编码解码网络的输出包括所述第一个解码子网络的输出。
  7. 根据权利要求4-6任一项所述的图像处理方法,其中,
    所述N个编码子网络和所述N-1个解码子网络中的每个子网络包括: 第一卷积模块和残差模块;
    每个子网络的处理包括:
    使用所述第一卷积模块对与所述第一卷积模块对应的子网络的输入进行处理,以得到第一中间输出;
    使用所述残差模块对所述第一中间输出进行残差处理,以得到所述子网络的输出。
  8. 根据权利要求7所述的图像处理方法,其中,所述残差模块包括多个第二卷积模块;
    使用所述残差模块对所述第一中间输出进行残差处理,以得到所述子网络的输出,包括:
    使用所述多个第二卷积模块对所述第一中间输出进行处理,以得到第二中间输出;以及
    将所述第一中间输出和所述第二中间输出进行残差连接相加处理,以得到所述子网络的输出。
  9. 根据权利要求8所述的图像处理方法,其中,所述第一卷积模块和所述多个第二卷积模块中的每一个的处理包括:卷积处理、激活处理和批量标准化处理。
  10. 根据权利要求4-9任一项所述的图像处理方法,其中,所述解码元网络中的每个解码子网络的输入和输出的尺寸相同,
    所述编码元网络中的每个编码子网络的输入和输出的尺寸相同。
  11. 根据权利要求2-10任一项所述的图像处理方法,其中,每个编码解码网络还包括融合模块;
    所述第一编码解码网络中的融合模块用于对所述第一输出特征图进行处理,以得到所述第一分割图像;
    使用所述第二编码解码网络对所述第二编码解码网络的输入进行分割处理,以得到所述第二分割图像包括:
    使用所述第二编码解码网络对所述第二编码解码网络的输入进行分割处理,以得到第二输出特征图;
    使用所述第二编码解码网络中的融合模块对所述第二输出特征图进行处理,以得到所述第二分割图像。
  12. 根据权利要求1-11任一项所述的图像处理方法,其中,所述第一 分割图像对应所述输入图像的第一区域,所述第二分割图像对应所述输入图像的第二区域,
    所述输入图像的所述第一区域包围所述输入图像的所述第二区域。
  13. 一种神经网络的训练方法,包括:
    获取训练输入图像;
    利用所述训练输入图像对待训练的神经网络进行训练,以得到根据权利要求1-12任一项所述的图像处理方法中的所述神经网络。
  14. 根据权利要求13所述的训练方法,其中,利用所述训练输入图像对待训练的神经网络进行训练,包括:
    使用所述待训练的神经网络对所述训练输入图像进行处理,以得到第一训练分割图像和第二训练分割图像;
    基于所述训练输入图像的第一参考分割图像和第二参考分割图像、以及所述第一训练分割图像和所述第二训练分割图像,通过系统损失函数计算所述待训练的神经网络的系统损失值;以及
    基于所述系统损失值对所述待训练的神经网络的参数进行修正;
    其中,所述第一训练分割图像与所述第一参考分割图像对应,所述第二训练分割图像与所述第二参考分割图像对应。
  15. 根据权利要求14所述的训练方法,其中,所述系统损失函数包括第一分割损失函数和第二分割损失函数;
    所述第一分割损失函数和所述第二分割损失函数中的每个分割损失函数包括:交叉损失函数和相似性损失函数。
  16. 根据权利要求15所述的训练方法,其中,所述第一分割损失函数表示为:
    L 01=λ 11·L 1112·L 21
    其中,L 01表示所述第一分割损失函数,L 11表示在所述第一分割损失函数中的交叉损失函数,λ 11表示在所述第一分割损失函数中的交叉损失函数的权重,L 21表示在所述第一分割损失函数中的相似性损失函数,λ 12表示在所述第一分割损失函数中的相似性损失函数的权重;
    所述第一分割损失函数中的交叉损失函数L 11表示为:
    Figure PCTCN2019098928-appb-100001
    所述第一分割损失函数中的相似性损失函数L 21表示为:
    Figure PCTCN2019098928-appb-100002
    其中,x m1n1表示所述第一训练分割图像中位于m1行n1列的像素的值,y m1n1表示所述第一参考分割图像中位于m1行n1列的像素的值;
    所述第二分割损失函数表示为:
    L 02=λ 21·L 1222·L 22
    其中,L 02表示所述第二分割损失函数,L 12表示在所述第二分割损失函数中的交叉损失函数,λ 21表示在所述第二分割损失函数中所述交叉损失函数的权重,L 22表示在所述第二分割损失函数中的相似性损失函数,λ 22表示在所述第二分割损失函数中所述相似性损失函数的权重,
    所述第二分割损失函数中的交叉损失函数L 12表示为:
    Figure PCTCN2019098928-appb-100003
    所述第二分割损失函数中的相似性损失函数L 22表示为:
    Figure PCTCN2019098928-appb-100004
    其中,x m2n2表示所述第二训练分割图像中位于m2行n2列的像素的值,y m2n2表示所述第二参考分割图像中位于m2行n2列的像素的值。
  17. 根据权利要求15或16所述的训练方法,其中,所述系统损失函数表示为:
    L=λ 01·L 0102·L 02
    其中,L 01和L 02分别表示所述第一分割损失函数和所述第二分割损失函数,λ 01和λ 02分别表示在所述系统损失函数中所述第一分割损失函数和所述第二分割损失函数的权重。
  18. 根据权利要求13-17任一项所述的训练方法,其中,获取所述训练输入图像,包括:
    获取原始训练输入图像;以及,
    对所述原始训练输入图像进行预处理和数据增强处理,以得到所述训练输入图像。
  19. 一种图像处理装置,包括:
    存储器,用于存储非暂时性计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时执行根据权利要求1-12任一项所述的图像处理方法或执行根据权利要求13-18任一项所述的训练方法。
  20. 一种存储介质,非暂时性地存储计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时可以执行根据权利要求1-12任一项所述的图像处理方法的指令或可以执行根据权利要求13-18任一项所述的训练方法的指令。
  21. 一种神经网络,包括:两个编码解码网络和联合层,所述两个编码解码网络包括第一编码解码网络和第二编码解码网络;其中,
    所述第一编码网络被配置为对输入图像进行分割处理,以得到第一输出特征图和第一分割图像;
    所述联合层被配置为将所述第一输出特征图与所述输入图像和所述第一分割图像至少之一进行联合,以得到所述第二编码解码网络的输入;
    所述第二编码解码网络被配置为对所述第二编码解码网络的输入进行分割处理,以得到所述第二分割图像。
  22. 根据权利要求21所述的神经网络,其中,所述两个编码解码网络中的每个编码解码网络包括编码元网络和解码元网络;
    所述第一编码解码网络的编码元网络被配置为对所述输入图像进行编码处理,以得到第一编码特征图;
    所述第一编码解码网络的解码元网络被配置为对所述第一编码特征图进行解码处理,以得到所述第一编码解码网络的输出,所述第一编码解码网络的输出包括所述第一分割图像;
    所述第二编码解码网络的编码元网络被配置为对所述第二编码解码网络的输入进行编码处理,以得到第二编码特征图;
    所述第二编码解码网络的解码元网络被配置为对所述第二编码特征图进行解码处理,以得到所述第二编码解码网络的输出,所述第二编码解码网络的输出包括所述第二分割图像。
  23. 根据权利要求22所述的神经网络,其中,所述编码元网络包括N个编码子网络和N-1个下采样层,所述N个编码子网络依次连接,每个下采样层用于连接相邻的两个编码子网络,N为整数且N≥2;
    所述N个编码子网络中的第i个编码子网络被配置为对所述第i个编码子网络的输入进行处理,以得到所述第i个编码子网络的输出;
    连接所述第i个编码子网络和所述N个编码子网络中的第i+1个编码子网络的下采样层被配置为对所述第i个编码子网络的输出进行下采样处理,以得到所述第i个编码子网络的下采样输出;
    所述第i+1个编码子网络被配置为对所述第i个编码子网络的下采样输出进行处理,以得到所述第i+1个编码子网络的输出;
    其中,i为整数且1≤i≤N-1,所述N个编码子网络中的第一个编码子网络的输入包括所述第一编码解码网络或所述第二编码解码网络的输入,除了所述第一个编码子网络之外,所述第i+1个编码子网络的输入包括所述第i个编码子网络的下采样输出,所述第一编码特征图或所述第二编码特征图包括所述N个编码子网络的输出。
  24. 根据权利要求23所述的神经网络,其中,在N>2的情况下,
    所述解码元网络包括N-1个解码子网络、N-1个上采样层,所述N-1个解码子网络依次连接,所述N-1个上采样层包括第一上采样层和N-2个第二上采样层,所述第一上采样层用于连接所述N-1个解码子网络中的第1个解码子网络和所述N个编码子网络中的第N个编码子网络,每个第二上采样层用于连接相邻的两个解码子网络;
    每个编码解码网络还包括与所述解码元网络的N-1个解码子网络对应的N-1个子联合层;
    所述N-1个解码子网络中的第j个解码子网络被配置为对所述第j个解码子网络的输入进行处理,以得到所述第j个解码子网络的输出,其中,j为整数且1≤j≤N-1,所述第一编码解码网络或所述第二编码解码网络的输出包括所述N-1个解码子网络中的第N-1个解码子网络的输出;
    所述第一上采样层被配置为对所述第N个编码子网络的输出进行上采样处理,以得到所述第一个解码子网络的上采样输入;
    连接所述N-1个解码子网络中的第j个解码子网络和第j-1个解码子网络的第二上采样层被配置为对所述第j-1个解码子网络的输出进行上采样处理,以得到所述第j个解码子网络的上采样输入,其中,j为整数且1<j≤N-1;
    所述N-1个子联合层中的第j个子联合层被配置为将所述第j个解码 子网络的上采样输入与所述N个编码子网络中的第N-j个编码子网络的输出进行联合,作为所述第j个解码子网络的输入,其中,j为整数且1≤j≤N-1。
  25. 根据权利要求24所述的神经网络,其中,所述第j个解码子网络的上采样输入的尺寸与所述第N-j个编码子网络的输出的尺寸相同,其中,1≤j≤N-1。
  26. 根据权利要求23所述的神经网络,其中,在N=2的情况下,
    所述编码元网络还包括第二个编码子网络,所述解码元网络包括第一个解码子网络、连接所述第一个解码子网络和所述第二个编码子网络的第一上采样层,
    每个编码解码网络还包括与所述解码元网络的第一个解码子网络对应的第一个子联合层;
    连接所述第一个解码子网络和所述第二个编码子网络的所述第一上采样层被配置为对所述第二个编码子网络的输出进行上采样处理,以得到所述第一个解码子网络的上采样输入;
    所述第一个子联合层被配置为将所述第一个解码子网络的上采样输入与所述第一个编码子网络的输出进行联合,作为所述第一个解码子网络的输入,其中,所述第一解码子网络的上采样输入的尺寸与所述第一个编码子网络的输出的尺寸相同;
    所述第一个解码子网络被配置为对所述第一个解码子网络的输入进行处理,以得到所述第一个解码子网络的输出;
    其中,所述第一编码解码网络或所述第二编码解码网络的输出包括所述第一个解码子网络的输出。
  27. 根据权利要求24-26任一项所述的神经网络,其中,
    所述N个编码子网络和所述N-1个解码子网络中的每个子网络包括:第一卷积模块和残差模块;
    所述第一卷积模块被配置为对与所述第一卷积模块对应的子网络的输入进行处理,以得到第一中间输出;
    所述残差模块被配置为对所述第一中间输出进行残差处理,以得到所述子网络的输出。
  28. 根据权利要求27所述的神经网络,其中,所述残差模块包括多 个第二卷积模块和残差相加层;
    所述多个第二卷积模块被配置为对所述第一中间输出进行处理,以得到第二中间输出;
    所述残差相加层被配置为将所述第一中间输出和所述第二中间输出进行残差连接相加处理,以得到所述子网络的输出。
  29. 根据权利要求28所述的神经网络,其中,所述第一卷积模块和所述多个第二卷积模块中的每一个包括:卷积层、激活层和批量标准化层;
    所述卷积层被配置为进行卷积处理,所述激活层被配置为进行激活处理,所述批量标准化层被配置为进行批量标准化处理。
  30. 根据权利要求24-29任一项所述的神经网络,其中,所述解码元网络中的每个解码子网络的输入和输出的尺寸相同,
    所述编码元网络中的每个编码子网络的输入和输出的尺寸相同。
  31. 根据权利要求22-30任一项所述的神经网络,其中,每个编码解码网络还包括融合模块;
    所述第一编码解码网络中的融合模块被配置为对所述第一输出特征图进行处理,以得到所述第一分割图像;
    所述第二编码解码网络被配置为对所述第二编码解码网络的输入进行分割处理,以得到所述第二分割图像,包括:
    所述第二编码解码网络被配置为对所述第二编码解码网络的输入进行分割处理,以得到第二输出特征图;
    所述第二编码解码网络中的融合模块被配置为对所述第二输出特征图进行处理,以得到所述第二分割图像。
PCT/CN2019/098928 2019-08-01 2019-08-01 图像处理方法及装置、神经网络及训练方法、存储介质 WO2021017006A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US16/970,131 US11816870B2 (en) 2019-08-01 2019-08-01 Image processing method and device, neural network and training method thereof, storage medium
CN201980001232.XA CN112602114A (zh) 2019-08-01 2019-08-01 图像处理方法及装置、神经网络及训练方法、存储介质
PCT/CN2019/098928 WO2021017006A1 (zh) 2019-08-01 2019-08-01 图像处理方法及装置、神经网络及训练方法、存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/098928 WO2021017006A1 (zh) 2019-08-01 2019-08-01 图像处理方法及装置、神经网络及训练方法、存储介质

Publications (1)

Publication Number Publication Date
WO2021017006A1 true WO2021017006A1 (zh) 2021-02-04

Family

ID=74228505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/098928 WO2021017006A1 (zh) 2019-08-01 2019-08-01 图像处理方法及装置、神经网络及训练方法、存储介质

Country Status (3)

Country Link
US (1) US11816870B2 (zh)
CN (1) CN112602114A (zh)
WO (1) WO2021017006A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256670A (zh) * 2021-05-24 2021-08-13 推想医疗科技股份有限公司 图像处理方法及装置、网络模型的训练方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112785575B (zh) * 2021-01-25 2022-11-18 清华大学 一种图像处理的方法、装置和存储介质
CN113658165B (zh) * 2021-08-25 2023-06-20 平安科技(深圳)有限公司 杯盘比确定方法、装置、设备及存储介质
TWI784688B (zh) * 2021-08-26 2022-11-21 宏碁股份有限公司 眼睛狀態評估方法及電子裝置
CN114708973B (zh) * 2022-06-06 2022-09-13 首都医科大学附属北京友谊医院 一种用于对人体健康进行评估的设备和存储介质
CN116612146B (zh) * 2023-07-11 2023-11-17 淘宝(中国)软件有限公司 图像处理方法、装置、电子设备以及计算机存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180240235A1 (en) * 2017-02-23 2018-08-23 Zebra Medical Vision Ltd. Convolutional neural network for segmentation of medical anatomical images
CN109598728A (zh) * 2018-11-30 2019-04-09 腾讯科技(深圳)有限公司 图像分割方法、装置、诊断系统及存储介质
US20190114774A1 (en) * 2017-10-16 2019-04-18 Adobe Systems Incorporated Generating Image Segmentation Data Using a Multi-Branch Neural Network
CN109859210A (zh) * 2018-12-25 2019-06-07 上海联影智能医疗科技有限公司 一种医学数据处理装置及方法
CN109993726A (zh) * 2019-02-21 2019-07-09 上海联影智能医疗科技有限公司 医学图像的检测方法、装置、设备和存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IE87469B1 (en) * 2016-10-06 2024-01-03 Google Llc Image processing neural networks with separable convolutional layers
CN110838124B (zh) * 2017-09-12 2021-06-18 深圳科亚医疗科技有限公司 用于分割具有稀疏分布的对象的图像的方法、系统和介质
CN110009598B (zh) * 2018-11-26 2023-09-05 腾讯科技(深圳)有限公司 用于图像分割的方法和图像分割设备
JP7250489B2 (ja) * 2018-11-26 2023-04-03 キヤノン株式会社 画像処理装置およびその制御方法、プログラム
US11328430B2 (en) * 2019-05-28 2022-05-10 Arizona Board Of Regents On Behalf Of Arizona State University Methods, systems, and media for segmenting images
WO2023019363A1 (en) * 2021-08-20 2023-02-23 Sonic Incytes Medical Corp. Systems and methods for detecting tissue and shear waves within the tissue

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180240235A1 (en) * 2017-02-23 2018-08-23 Zebra Medical Vision Ltd. Convolutional neural network for segmentation of medical anatomical images
US20190114774A1 (en) * 2017-10-16 2019-04-18 Adobe Systems Incorporated Generating Image Segmentation Data Using a Multi-Branch Neural Network
CN109598728A (zh) * 2018-11-30 2019-04-09 腾讯科技(深圳)有限公司 图像分割方法、装置、诊断系统及存储介质
CN109859210A (zh) * 2018-12-25 2019-06-07 上海联影智能医疗科技有限公司 一种医学数据处理装置及方法
CN109993726A (zh) * 2019-02-21 2019-07-09 上海联影智能医疗科技有限公司 医学图像的检测方法、装置、设备和存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256670A (zh) * 2021-05-24 2021-08-13 推想医疗科技股份有限公司 图像处理方法及装置、网络模型的训练方法及装置

Also Published As

Publication number Publication date
CN112602114A (zh) 2021-04-02
US20220398783A1 (en) 2022-12-15
US11816870B2 (en) 2023-11-14

Similar Documents

Publication Publication Date Title
WO2021017006A1 (zh) 图像处理方法及装置、神经网络及训练方法、存储介质
WO2021164429A1 (zh) 图像处理方法、图像处理装置及设备
US10706333B2 (en) Medical image analysis method, medical image analysis system and storage medium
CN109754402B (zh) 图像处理方法、图像处理装置以及存储介质
WO2020177651A1 (zh) 图像分割方法和图像处理装置
US11488021B2 (en) Systems and methods for image segmentation
CN113706526B (zh) 内窥镜图像特征学习模型、分类模型的训练方法和装置
CN111784671B (zh) 基于多尺度深度学习的病理图像病灶区域检测方法
CN110163260B (zh) 基于残差网络的图像识别方法、装置、设备及存储介质
WO2021073493A1 (zh) 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质
KR102058884B1 (ko) 치매를 진단을 하기 위해 홍채 영상을 인공지능으로 분석하는 방법
WO2020108562A1 (zh) 一种ct图像内的肿瘤自动分割方法及系统
EP3923233A1 (en) Image denoising method and apparatus
JP2021513697A (ja) 完全畳み込みニューラル・ネットワークを用いた心臓ctaにおける解剖学的構造のセグメンテーションのためのシステム
WO2023070447A1 (zh) 模型训练方法、图像处理方法、计算处理设备及非瞬态计算机可读介质
CN110310280B (zh) 肝胆管及结石的图像识别方法、系统、设备及存储介质
CN112396605B (zh) 网络训练方法及装置、图像识别方法和电子设备
WO2024011835A1 (zh) 一种图像处理方法、装置、设备及可读存储介质
WO2021017168A1 (zh) 图像分割方法、装置、设备及存储介质
WO2021168920A1 (zh) 基于多剂量等级的低剂量图像增强方法、系统、计算机设备及存储介质
CN115471470A (zh) 一种食管癌ct图像分割方法
WO2020187029A1 (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
CN112419283A (zh) 估计厚度的神经网络及其方法
Patel et al. Deep Learning in Medical Image Super-Resolution: A Survey
US20220392059A1 (en) Method and system for representation learning with sparse convolution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19939210

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19939210

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19939210

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10/02/2023)