CN114359296A

CN114359296A - Image element and lower alveolar nerve segmentation method and device based on deep learning

Info

Publication number: CN114359296A
Application number: CN202210012675.2A
Authority: CN
Inventors: 钱坤; 黄志俊; 刘金勇; 吴燏迪
Original assignee: Lancet Robotics Co Ltd
Current assignee: Lancet Robotics Co Ltd
Priority date: 2022-01-06
Filing date: 2022-01-06
Publication date: 2022-04-15

Abstract

A training method of an image element segmentation model based on deep learning is disclosed, wherein a GAN network is constructed, a UNet network is used as a generator G, a two-class CNN network is used as a discriminator D, and the method comprises the following steps: inputting an image img1 in a JPG format into a generator G, extracting image features and reconstructing an image to obtain a generated image G _ img, respectively taking the generated image G _ img and an annotated image as positive and negative samples, training a discriminator D to obtain the output of the discriminator D, updating parameters of the discriminator D which is used as a hyper-parameter k times by using an optimization function adam, inputting an image img2 which has the same data distribution as that of an image img1 into the generator G, updating the parameters of the generator G once, completing training of a UNet network through repeated iteration, and determining an image element segmentation model. The original image is input into the UNet network for segmentation, and automatic segmentation of the original image is completed. By using the oral CT image, the alveolar nerve segmentation can be automatically and rapidly completed.

Description

Image element and lower alveolar nerve segmentation method and device based on deep learning

Technical Field

The invention relates to the technical field of image data processing, in particular to a method and a device for segmenting image elements based on deep learning, and particularly relates to a method and a device for segmenting lower alveolar nerves based on Unet + GAN.

Background

In order to assist precise surgical operation to reduce injury, the common practice at present is to perform three-dimensional reconstruction of images such as CT and MRI on a patient before performing surgery, and this process can be completed by some software, but most of the procedures depend on the conventional image processing technology, and the procedures are performed manually by a doctor, and the doctor needs to have certain software use experience and image processing knowledge.

In order to solve this situation, it is necessary to implement automatic segmentation and recognition of medical images to assist accurate recognition of the surgical target region or target image, so as to provide doctors with better diagnosis discrimination.

In addition, as the aging society develops and the public attaches more importance to teeth, more and more people have demands for tooth planting and tooth extraction, but the operation on teeth easily damages the dental nerves. Moreover, currently, there are few studies on automatic image segmentation of dental nerves.

Disclosure of Invention

In order to solve the above-described situation, it is necessary to further accurately identify and segment a dental nerve such as, in particular, a lower alveolar nerve which is the largest branch of a mandibular nerve, and to provide support for a doctor to make a diagnosis.

More specifically, the invention provides an automatic image recognition method and system based on deep learning, and particularly provides a UNet + GAN-based lower alveolar tooth nerve segmentation method.

According to an aspect of the present invention, there is provided a training method of an image element segmentation model based on deep learning, using UNet network as a generator G of GAN network and using a two-class CNN network as a discriminator D of GAN network, including the following steps: inputting a first image img1 in a JPG format into a generator G, extracting image features and reconstructing an image to obtain a generated image G _ img, respectively taking an annotated image and the generated image G _ img as positive and negative samples, training a discriminator D to obtain the output of the discriminator D, updating parameters of the discriminator D k times by using an optimization function adam, wherein k is a hyperparameter, inputting a second image img2 with the same data distribution as that of the first image img1 into the generator G, updating the parameters of the generator G once, finishing training of a UNet network through repeated iteration, and determining an image element segmentation model.

Preferably, the parameter updating periods of the discriminator D and the generator G are the same; and/or: the first image img1 and the second image img2 used by the discriminator D and the generator G have the same number of images.

Preferably, the first image img1 and the second image img2 are the same image.

Preferably, extracting the image features comprises: the first image img1 of one batch is used as input to be subjected to 3 × 3 convolution twice, the number of convolution kernels is 64, the size of a template is 2, the maximum pooling is achieved, a relu activation function is used for activation, conv1 is obtained, conv1 is used as input to be subjected to the same processing, the number of convolution kernels is doubled every time, and conv2, conv3 and conv4 are obtained sequentially.

Preferably, reconstructing the image comprises: carrying out 3 × 3 convolution twice by taking conv4 as an input, wherein the number of convolution kernels is 1024, then carrying out deconvolution with the convolution kernel of 2 to obtain tconv4, carrying out feature splicing concat on tconv4 and conv4, and then repeating the operation by taking the spliced features as the input, wherein the number of convolution kernels is reduced by half every time, so as to obtain tconv3, tconv2 and tconv1 in sequence; and taking tconv1 as an input, performing two times of 3 × 3 convolution, wherein the number of convolution kernels is 64, and performing one more time of 1 × 1 convolution on the obtained result, wherein the number of convolution kernels is 1, so as to obtain a generated image g _ img.

Preferably, taking the annotation image and the generation image g _ img as positive and negative samples, respectively, the training of the discriminator D includes: labeling the random value in the label use interval [0.8,1.2] of the image generates a random value in the label use interval [0,0.3] of the image g _ img.

Preferably, training the discriminator D comprises: inputting a batch of data, performing batch standardization BN after convolution with convolution kernel of 3 and channel number of 64, activating by using LeakyReLU function, performing identical processing by taking the output of the activation function as new input, doubling the convolution kernel each time, keeping the step length of 2 unchanged, repeating the processing for 4 times, performing flattening processing on the obtained feature diagram, inputting the feature diagram into a full connection layer Dense, and obtaining the output of a discriminator D through a Sigmoid activation function.

According to another aspect of the present invention, there is provided an image element segmentation method based on deep learning, which performs automatic segmentation of an original image by inputting the original image into a UNet network, which is trained by using any one of the above-described training methods for an image element segmentation model, and segmenting the original image, wherein the original image inputted into the UNet network is a JPG file.

Preferably further comprising: acquiring image data of an oral cavity CT image as an original image; and a data preprocessing step of the original image, namely, marking the position of the teething nerve on the CT image to obtain a marked image, wherein the cross section information of the oral cavity CT file in the DICOM format is used as an experimental image and is converted into a JPG file.

According to still another aspect of the present invention, there is provided an image element segmentation apparatus based on deep learning, which constructs a neural network based on a generative confrontation network GAN in which a UNet network is used as a generator G and a two-class CNN network is used as a discriminator D by causing a computer to execute a program, the UNet network being constructed such that: performing 3 × 3 convolution twice on a batch of first images img1 as input, wherein the number of convolution kernels is 64, the size of a template is 2, the maximum pooling is realized, a relu activation function is used for activation, conv1 is obtained, conv1 is used as input, the same processing is performed, the number of convolution kernels is doubled every time, and conv2, conv3 and conv4 are obtained in sequence; carrying out 3 × 3 convolution twice by taking conv4 as an input, wherein the number of convolution kernels is 1024, then carrying out deconvolution with the convolution kernel of 2 to obtain tconv4, carrying out feature splicing concat on tconv4 and conv4, and then repeating the operation by taking the spliced features as the input, wherein the number of convolution kernels is reduced by half every time, so as to obtain tconv3, tconv2 and tconv1 in sequence; and performing two times of 3 × 3 convolution with tconv1 as an input, the number of convolution kernels being 64, and performing one more time of 1 × 1 convolution with the obtained result, the number of convolution kernels being 1, to obtain a generated image g _ img, wherein the discriminator D is configured to: inputting a batch of data, performing batch standardization BN after convolution with convolution kernel of 3 and channel number of 64, activating by using LeakyReLU function, performing identical processing by taking the output of the activation function as new input, doubling the convolution kernel each time, keeping the step length of 2 unchanged, repeating the processing for 4 times, performing flattening processing on the obtained feature diagram, inputting the feature diagram into a full connection layer Dense, and obtaining the output of a discriminator D through a Sigmoid activation function.

According to a further aspect of the present invention, there is provided a computer-readable storage medium, characterized in that the storage medium stores a computer program for performing the steps of any of the above-mentioned methods.

According to still another aspect of the present invention, there is provided an electronic apparatus, characterized in that the electronic apparatus includes: a processor; a memory for storing processor-executable instructions; and a processor for reading the executable instructions from the memory and executing the instructions to implement the steps of any of the above methods.

Therefore, the invention provides the lower alveolar nerve segmentation method based on deep learning, which can automatically and quickly segment the lower alveolar nerve, thereby improving the operation safety and saving the time.

Drawings

Fig. 1 is a flowchart of an automatic image recognition method based on deep learning according to an exemplary embodiment of the present invention.

Fig. 2 is a block diagram of a UNet + GAN network provided according to an exemplary embodiment of the present invention.

Fig. 3 is a diagram illustrating a segmentation effect obtained by segmenting according to a method or apparatus provided in an exemplary embodiment of the present invention.

Fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described in detail below with reference to the accompanying drawings. The exemplary embodiments described below and illustrated in the figures are intended to teach the principles of the present invention and enable one skilled in the art to implement and use the invention in several different environments and for several different applications. The scope of the invention is, therefore, indicated by the appended claims, and the exemplary embodiments are not intended to, and should not be considered as, limiting the scope of the invention.

Although CT (computed tomography) and other technologies are widely used in clinical diagnosis, manual three-dimensional reconstruction of medical images is complicated and requires a certain requirement for operators. Therefore, the inventor can provide a powerful auxiliary effect for medical diagnosis by processing and analyzing the CT data by an artificial intelligence technology, for example, the image.

Embodiments in accordance with the present invention are operational with numerous other general purpose or special purpose computing system environments or configurations, and with numerous other electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

As programming language and development environment, for example, using programming language python3.7, integrated framework tensorflow2.5, other components: opencv4.5, vtk 8.1.1, and the operating system is Windows.

Exemplary method

Fig. 1 shows a flow diagram of a method 100 for automatic image recognition based on deep learning according to an embodiment of the invention. The method 100 begins at step 101.

In step 101, a group of images to be processed associated with a target object is acquired, and the group of images to be processed is subjected to data preprocessing to obtain a preprocessed group of images. The image group to be processed comprises a plurality of image files, wherein each image file is provided with an image area and the target object is contained in the image area.

According to one embodiment, each image file is format-converted to obtain an image file in a JPG format.

In step 102, a plurality of image files are subjected to image processing by using a UNet + GAN based neural network to obtain a plurality of segmented target image areas.

According to one embodiment, in the method for image segmentation by using the UNet + GAN-based neural network, a framework of GAN (Generative adaptive Networks) is used, which includes a two-part network structure of a generator g (generator) and a discriminator d (discriminator). The generator G is used to fit the data distribution of the target image and the discriminator D is used to distinguish whether a sample is from the real data or from the generator G.

According to one embodiment, each image file corresponds to a target image area. The neural network is trained prior to image processing the plurality of image files using the neural network.

According to one embodiment, training the neural network comprises: step S01: data preprocessing, step S02: neural network building, step S03: neural network training, step S04: verification, step S05: and (6) testing.

According to one embodiment, before the data preprocessing of step S01, further comprising S00: an original image is acquired as image data to be processed.

According to one embodiment, in the data preprocessing of step S01, the original image data format may be converted into a JPG image file as needed.

In the neural network construction of step S02, let UNet be generator G, and discriminator D trains UNet network using GAN framework using two-class CNN network, and the network structure is shown in fig. 2.

After the original image is preprocessed as described above, an annotation image indicating a target position is created in the annotation target region, and the original image and the annotation image are used as data required for model training.

The JPG file of the original image is used as input data of the generator G, and the input data of the discriminator D is distinguished when the parameters of the discriminator D and the parameters of the generator G are updated.

When updating the parameters of the discriminator D, the image generated by the generator G and the randomly selected part mark image ((mask _ img, G _ img), (true _ label, false _ label)) are input, wherein the ratio of the number of mask _ img to the number of G _ img is preferably 1: k, and k is a preset hyper-parameter.

Wherein, when updating the parameters of the generator G, the input should be the image (G _ img, true _ label) generated by the generator G. At this time, the marker image needs to be resized so that its size is the same as the generated image.

According to one embodiment, data is batched using the Dataset module of tensiorflow, each batch being sized as batch _ size, and shuffle operations shuffle the batches, using data augmentation such as random small angle tilting of the image to reduce the degree of overfitting.

In the neural network training of step S03, the UNet network is trained by introducing GAN structure instead of the gradient descent process of the original UNet network, and using two-player game (two-player game) concept of GAN, the parameter update of UNet is not directly derived from the data sample of the original image, but from the gradient back propagation of the discriminator D.

In this way, a neural network segmentation model based on UNet + GAN is obtained by training GAN with a training set and verifying the trained GAN with a test set until an objective function is satisfied.

The objective function is shown in equation 1:

wherein, each parameter representation content is as follows:

v: an objective function; e: (ii) desire; g: UNet; d: a two-class network; x: real data; z: an image to be segmented; mask: the data is marked; p: and (6) data distribution.

Since the form of the objective function is unchanged, ideally:

if G is fixed, formula (2):

when the time D is optimal when the time D is up,

if formula (3):

p_g(x)＝p_mask(x) When the D output is 0.5, the G is optimal,

wherein p is_maskRepresents the data distribution of the annotation, and p_gRepresenting the data distribution of generator G.

As shown in fig. 2, in the training process, a small-batch stochastic gradient descent (mini-batch stochastic gradient) training model is adopted, which includes: step S31, namely, first updating the D parameter k times; step S32, i.e., the generator G parameters are then updated once.

In step S31, k is a hyper-parameter and self-tuning optimization is required. Therefore, an original image is firstly processed by a generator G to obtain a generated image G _ img, and the method specifically comprises the following steps:

s310: and (3) extracting image features, namely, performing 3 × 3 convolution (conv) on the image img1 of one batch of batch, wherein the number of convolution kernels is 64, the maximum pooling (maxpool) of the template (step size) size is 2 is performed once, and activation is performed by using a relu activation function to obtain conv 1. With conv1 as an input, the number of convolution kernels is doubled by the same processing, and conv2, conv3 and conv4 are obtained in sequence.

S311: reconstructing an image, namely, taking conv4 as an input, performing 3 × 3 convolution twice, wherein the number of convolution kernels is 1024, then performing deconvolution (transconv) with the number of convolution kernels being 2 to obtain tconv4, performing feature splicing concat on tconv4 and conv4, taking the spliced features as an input, repeating the operation, and reducing the number of convolution kernels by half every time to obtain tconv3, tconv2 and tconv1 in sequence. Taking tconv1 as input, performing two times of 3 × 3 convolution, wherein the number of convolution kernels is 64, performing one more time of 1 × 1 convolution on the obtained result, and obtaining a generated image g _ img, wherein the number of convolution kernels is 1.

S312: the arbiter D is trained, with the generator G parameters fixed. The marked image and the generated image g _ img are respectively used as positive and negative samples, corresponding labels do not directly use 1 and 0, a random value (true _ label) in a label use interval [0.8,1.2] of the marked image and a random value (false _ label) in a label use interval [0,0.3] of the generated image form a data set structure as follows: ((mask _ img, g _ img), (true _ label, false _ label)). Although the true _ label should be 1 in theory, randomly selecting a number from [0.8,1.2] as true _ label can increase the generalization performance of the network. After inputting a Batch of Batch data, performing convolution with a convolution kernel of 3 and a channel number of 64, performing BN (Batch Normalization), and activating by using a leak-corrected linear unit function (leakage-free). The output of the activation function is used as a new input to perform the same processing, the convolution kernel is doubled each time, the step length is 2, the step length is kept unchanged, and the process is repeated for 4 times. And (4) flattening the obtained feature diagram (flatten), inputting the feature diagram into a full connection layer Dense, and obtaining the output of a discriminator D through a Sigmoid activation function. The error function selects binary _ cross, the optimization function selects adam, and the parameters of the discriminator D are updated, while the generator G parameters are unchanged.

The whole process is repeated k times, and data of one batch is input each time, so that the parameters of the discriminator D are updated k times.

In step S32, the generator G parameters are updated once, that is, the data set (img2, true _ label) of one batch of batch is input to the generator G, and img2 is the original image.

Preferably, the img2 is chosen to maintain the same data distribution as the images in img1, because the classifier D now based on the training of img1 works better when performed on the same data distribution, and also conforms to the GAN's countermeasures.

Preferably, the parameter update period of each of the discriminator D and the generator G, and the number of images used by the discriminator D and the generator G, of the img1 and the img2 are also preferably the same, so that the data can be fully utilized. In addition, the input lot sizes in G and D are preferably made different.

In summary, img2 can select the same image as img1, which can be denoted as img2 as img1, so that it can be known that:

num (mask _ img) ═ num (img1) ═ num (img2) ═ batch size, (equation 4),

k × blocksize ═ num (mask _ img + g _ img), (formula 5),

num (g _ img) ═ k-1 × batch size, (formula 6),

where num () is the number of marked pictures, num (mask _ img) represents the number of marked pictures, and batch size represents the batch size.

The generated image obtained based on the original image img2 and the annotation image file are input to a discriminator D, and after convolution with a convolution kernel of 3 and a channel number of 64, BN is performed, and activation is performed using the leakyrelu function. The output of the activation function is treated as a new input for the same process, and the convolution kernel is doubled each time and repeated for 4 times. And (4) flattening the obtained feature diagram (flatten), inputting the feature diagram into a full connection layer Dense, and obtaining the output of a discriminator D through a Sigmoid activation function. The error function selects binary _ cross, the optimization function selects adam, and the parameters of the generator G are updated, while the parameters of the discriminator D are unchanged.

And finishing the epoach training, and repeating iteration to finish the model training to obtain the trained UNet network.

According to the invention, when the parameters of the generator G are updated, the data sets of each batch of batch in the input generator G before and after updating adopt the data distribution which keeps the same, so that not only is the deep network training simplified, but also the training work can be better.

According to the present invention, by setting the parameter update cycle for each of the discriminator D and the generator G, and the number of images of the img1 and the img2 used to be the same, data can be made more use of.

The JPG file of the obtained original image is input into the UNet network trained by the GAN frame to be segmented, and then the original image can be automatically segmented.

That is, the overall framework for automatically dividing the original image is: obtaining an original image, converting the original image into a JPG format according to needs, inputting the JPG format into a neural network, performing optimization training on a UNet network by using a GAN frame, and storing a trained model; and inputting an original image during automatic segmentation, converting the original image into a JPG format according to needs, inputting a previously stored network model, and finishing segmentation output.

< example: lower alveolar nerve segmentation scheme >

The embodiment of the invention provides a UNet + GAN-based lower alveolar nerve segmentation method, which uses a GAN framework and comprises a generator G and a discriminator D. The generator G is used to fit the data distribution of the target image and the discriminator D is used to distinguish whether a sample is from the real data or from the generator G.

This embodiment uses UNet as a generator, the discriminator uses a two-class CNN network, and the UNet network is trained using a GAN framework, and the network structure is shown in fig. 2.

< training procedure >

The training process used in this embodiment is divided into the following steps: step S11: data preprocessing, step S12: neural network building, step S13: neural network training, step S14: verification, step S15: and (6) testing.

According to one embodiment, before the data preprocessing of step S11, further comprising S10: an oral CT image is acquired as an original image.

According to one embodiment, in the data preprocessing of step S11, the data format of the original image may be converted into a JPG image file as needed.

According to one embodiment, in the data preprocessing, data required for model training includes an oral CT image and an annotation image that annotates the position of a teething nerve on the CT image.

The generator G inputs data as oral cavity CT images, the oral cavity CT file used for experiments is a DICOM (Digital Imaging and Communications in Medicine) format file, the three-dimensional information of the cross section, the coronal plane and the sagittal plane is contained, the cross section is selected as an experiment image, and the experiment image is converted into a JPG file.

The input data of the discriminator D is distinguished between updating the parameters of the discriminator D and the parameters of the generator G.

When updating parameters of the discriminator D, inputting images generated by the generator G and randomly selected part mark images ((mask _ img, G _ img), (true _ label, false _ label)), wherein the ratio of the number of the mask _ img to the number of the G _ img is preferably 1: k, and k is a hyper-parameter; when updating the parameters of the generator G, the input should be the image (G _ img, true _ label) generated by the generator G. The marker image needs to be resized so that it is the same size as the generated image.

In the neural network training of step S13, the UNet network is trained by introducing GAN structure instead of the original UNet network gradient descent process, and using two-player game (two-player game) concept of GAN, the parameter update of UNet is not directly derived from data samples, but from gradient back propagation of the discriminator D.

In this way, a UNet + GAN-based neural network segmentation model is obtained by training GAN with a training set and verifying the trained GAN with a test set until the objective function as expressed in equation 1 above is satisfied.

As shown in FIG. 2, during the training process, a small-batch random gradient descent (mini-batch random gradient device) is used to train the model. The method comprises the following steps: step S31', namely, the D parameter is updated k times first; step S32', i.e., the generator G parameters are then updated once.

In step S31, k is a hyper-parameter and self-tuning optimization is required. Therefore, the CT image is firstly processed by a generator G to obtain a generated image G _ img, and the specific steps comprise:

s310': and extracting image features. A batch of CT images img1 of batch were subjected to two 3 × 3 convolutions with a number of convolution kernels of 64 and a maximum pooling of template sizes of 2, and activated using relu activation function, resulting in conv 1. With conv1 as an input, the number of convolution kernels is doubled by the same processing, and conv2, conv3 and conv4 are obtained in sequence.

S311': reconstructing an image, namely, taking conv4 as an input, performing 3 × 3 convolution twice, with the number of convolution kernels being 1024, then performing deconvolution with the number of convolution kernels being 2 to obtain tconv4, performing feature splicing concat tconv4 and conv4, taking the spliced features as an input, and repeating the operation, wherein the number of convolution kernels is reduced by half every time, so as to obtain tconv3, tconv2 and tconv1 in sequence. Taking tconv1 as input, performing two times of 3 × 3 convolution, wherein the number of convolution kernels is 64, performing one more time of 1 × 1 convolution on the obtained result, and the number of convolution kernels is 1 to obtain a generated image.

S312: the arbiter D is trained, with the generator G parameters fixed. The label image and the generated image are used as positive and negative samples, corresponding labels do not directly use 1 and 0, the label of the label image uses a random value (true _ label) in a [0.8,1.2] interval, the label of the generated image uses a random value (false _ label) in a [0,0.3] interval, and the data set structure is as follows: ((mask _ img, g _ img), (true _ label, false _ label)). And inputting data of a batch of batch, performing convolution with a convolution kernel of 3 and a channel number of 64, performing BN, and activating by using a LeakyReLU function. The output of the activation function is treated as a new input for the same process, and the convolution kernel is doubled each time and repeated for 4 times. And flattening the obtained characteristic diagram (flatten), inputting the characteristic diagram into a full connection layer Dense, and obtaining D output through a Sigmoid activation function. The error function selects binary _ cross, the optimization function selects adam, and the parameters of the discriminator D are updated, while the generator G parameters are unchanged.

The entire process is repeated k times, and data of one batch is input each time, thereby updating the parameters of the discriminator D k times.

In step S32', the generator G parameters are updated once. A batch dataset (img2, true _ label) is input into the generator G, img2 being a CT image.

Similarly, the img2 should be selected to maintain the same data distribution as the image in img1, and the discriminator D will work better on the same data distribution based on img1, and also meet the GAN countermeasure idea.

In order to make the data fully utilized, the number of images of img1 and img2 used by the discriminator D and the generator G is preferably the same for each parameter update cycle of the discriminator D and the generator G, and in summary, img2 may select the same image as img1, and in this case, img1 is img2, and the above (equation 4) to (equation 6) may be referred to.

After the input data is convolved by a convolution kernel of 3 and the number of channels of 64, BN is carried out, and LeakyReLU function is used for activation. The output of the activation function is treated as a new input for the same process, and the convolution kernel is doubled each time and repeated for 4 times. And inputting the result into a full connection layer, and obtaining the output of a discriminator D through a Sigmoid activation function. The error function selects binary _ cross, the optimization function selects adam, and the parameters of the generator G are updated, while the parameters of the discriminator D are unchanged.

Dental nerves in the CT images are segmented using the trained UNet network, and partial experimental results are shown in fig. 3.

Exemplary electronic device

Fig. 4 is a structure of an electronic device according to an exemplary embodiment of the present invention. The electronic device may be either or both of the first device and the second device, or a stand-alone device separate from them, which stand-alone device may communicate with the first device and the second device to receive the acquired input signals therefrom. As shown in fig. 4, the electronic device 50 includes one or more processors 51 and a memory 52.

The processor 51 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

The memory 52 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 51 to implement the method for deep learning based image element segmentation of the software program of the various embodiments of the present disclosure described above and/or other desired functions. In one example, the electronic device may further include: an input device 53 and an output device 54, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device 53 may also include, for example, a keyboard, a mouse, and the like.

The output device 54 may output various information to the outside, and may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto.

Of course, for simplicity, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 4, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the deep learning based image element segmentation method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method for deep learning based image element segmentation according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

For the device or system embodiments in this specification, since they correspond basically to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points.

The words "or" and "as used in this disclosure mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure. The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

It will be understood by those skilled in the art that the terms "first", "second", "step Sn", etc. in the embodiments of the present invention are used only for distinguishing different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily indicate a logical order therebetween.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise. Unless expressly stated or indicated to the contrary, the numerical parameters set forth in this specification and attached claims are approximations that can vary depending upon the desired properties of the dimensions, such as body shape, of the subject matter to be treated by the present disclosure or the specific application, and any numerical range recited herein is intended to include all sub-ranges subsumed therein.

While the invention has been described with reference to various specific embodiments, it should be understood that changes can be made within the spirit and scope of the inventive concepts described. Accordingly, it is intended that the invention not be limited to the described embodiments, but that it will have the full scope defined by the language of the following claims.

Claims

1. A training method of an image element segmentation model based on deep learning is characterized in that a UNet network is used as a generator G of a GAN network, a two-class CNN network is used as a discriminator D of the GAN network, and the training method comprises the following steps:

inputting the first image img1 into the generator G, extracting image features and reconstructing an image, resulting in a generated image G _ img,

respectively using the marked image and the generated image g _ img as a positive sample and a negative sample, training a discriminator D to obtain the output of the discriminator D,

updating the parameters of the k-times discriminator D by using an optimization function adam, wherein k is a hyperparameter,

inputting a second image img2 having the same data distribution as the first image img1 into the generator G, updating the parameters of the generator G once,

and finishing training of the UNet network through repeated iteration, and determining an image element segmentation model.

2. The method of training an image element segmentation model according to claim 1,

the parameter updating periods of the discriminator D and the generator G are the same; and/or

The numbers of images of the first image img1 and the second image img2 used by the discriminator D and the generator G are the same.

3. The method of training an image element segmentation model according to claim 1,

the first image img1 and the second image img2 are the same image.

4. The method for training an image element segmentation model according to claim 1, wherein the extracting image features comprises:

and carrying out 3 × 3 convolution twice on a batch of the first images img1 as input, wherein the number of convolution kernels is 64, the size of a template is 2, the maximum pooling is realized, a relu activation function is used for activation, conv1 is obtained, conv1 is used as input, the same processing is carried out, the number of convolution kernels is doubled every time, and conv2, conv3 and conv4 are obtained in sequence.

5. The method of claim 4, wherein the reconstructing the image comprises:

carrying out 3 × 3 convolution twice by taking conv4 as an input, wherein the number of convolution kernels is 1024, then carrying out deconvolution with the convolution kernel of 2 to obtain tconv4, carrying out feature splicing concat on tconv4 and conv4, and then repeating the operation by taking the spliced features as the input, wherein the number of convolution kernels is reduced by half every time, so as to obtain tconv3, tconv2 and tconv1 in sequence; and

and taking tconv1 as an input, performing two times of 3 × 3 convolution, wherein the number of convolution kernels is 64, performing one more time of 1 × 1 convolution on the obtained result, and obtaining the generated image g _ img, wherein the number of convolution kernels is 1.

6. The method for training an image element segmentation model according to claim 5, wherein the training of the discriminator D with the annotation image and the generation image g _ img as positive and negative samples comprises:

the label of the labeled image uses the random value in the interval [0.8,1.2], and the label of the generated image g _ img uses the random value in the interval [0,0.3 ].

7. The method according to claim 6, wherein the training of the discriminator D comprises:

inputting another batch of data, performing batch standardization BN after convolution with convolution kernel of 3 and channel number of 64, activating by using LeakyReLU function, performing identical processing by taking the output of the activation function as new input, doubling the convolution kernel each time, keeping the step length of 2 unchanged, repeating the processing for 4 times, inputting the obtained feature diagram into a full connection layer Dense after flattening processing flatten, and obtaining the output of a discriminator D through a Sigmoid activation function.

8. An image element segmentation method based on deep learning, which is used for completing automatic segmentation of an original image by inputting the original image into a UNet network for segmentation,

wherein the UNet network is trained by using the training method of the image element segmentation model according to any one of claims 1 to 7.

9. The image element segmentation method according to claim 8, further comprising:

acquiring image data of an oral cavity CT image as an original image; and

a step of preprocessing data of the original image, which is to obtain an annotated image by annotating the position of the teething nerve on the CT image,

the cross-section information of the oral cavity CT file in the DICOM format is used as an image and is converted into a JPG file.

10. An image element segmentation apparatus based on deep learning, which constructs a neural network based on a generative countermeasure network GAN in which a UNet network is used as a generator G and a two-class CNN network is used as a discriminator D by causing a computer to execute a program,

the UNet network is configured to:

performing 3 × 3 convolution twice on a batch of first images img1 as input, wherein the number of convolution kernels is 64, the size of a template is 2, the maximum pooling is realized, a relu activation function is used for activation, conv1 is obtained, conv1 is used as input, the same processing is performed, the number of convolution kernels is doubled every time, and conv2, conv3 and conv4 are obtained in sequence;

taking tconv1 as input, performing two times of 3 × 3 convolution, with the number of convolution kernels being 64, performing one more time of 1 × 1 convolution on the obtained result, with the number of convolution kernels being 1, obtaining the generated image g _ img,

the discriminator D is configured to:

11. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the method of any of the preceding claims 1 to 9.

12. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing the processor-executable instructions;

the processor is used for reading the executable instructions from the memory and executing the instructions to realize the method of any one of the claims 1 to 9.