Background technique
Image Engineering is broadly divided into three levels: image procossing, image analysis and image understanding.The target of image segmentation is
Each pixel in image is classified, i.e., using image target object pixel and surrounding pixel between discontinuity or
The similitude of person's target internal is cut into picture the segmentation block of multiple zonules.Image segmentation is most basic in Image Engineering
It is a part being most difficult to, is the important bridge contacted between image procossing and image analysis.Image segmentation mode mainly includes
Binary segmentation, semantic segmentation and example segmentation.It is all research binary segmentation mostly inside traditional image processing algorithm,
It is that the pixel of picture is divided into two class of foreground and background.Traditional partitioning algorithm mainly includes edge detection (such as canny), threshold
Value segmentation (such as OTSU), region growing, cluster (such as k-means) and graph model (such as grabcut).
Present data are more and more, many large-scale segmentation databases all disclose (PASCAL VOC, Cityscape,
Places, DAVIS etc.), the computing capability of computer is further promoted, and GPU is taken seriously in the ability that processing image calculates,
Effect is most preferably based on the certain methods of deep learning, such as FCN (Fully Convolutional Networks, full volume
Product network), Deeplab (semantic segmentation network), U-Net (Convolutional Networks for Biomedical
Image Segmentation, this network is for dividing cell image), DRN (Deep Residual Network, depth residual error
Network), PSPNet (Pyramid Scene Parsing Network, semantic segmentation network) etc..Deep learning is relative to tradition
Method the advantages of be the feature that can automatically extract image, either semantic segmentation or example segmentation could set up end
To the network structure at end, and the semantic information in picture can be extracted to a certain extent, and because the picture of training becomes
More, the model that training obtains also has better generalization ability.
Wherein article [Philipp Fischer, ThomasBrox.U-Net:Convolutional Networks for
Biomedical Image Segmentation.Medical Image Computing and Computer-Assisted
Intervention (MICCAI) .2015.] in propose U-Net network structure image segmentation network is divided into two parts, U-
Net network structure a constricted path (left side) and an extensions path (right side) as shown in Figure 1, be made of.Constricted path is abided by
The typical architecture of convolutional network is followed, it is by 2 3*3 without filling up convolution (unpaddedconvolutions) and maximum pond
Repeated application composition.Down-sampling all doubles number of channels each time, and each step in extensions path all carries out characteristic pattern
Then up-sampling carries out 2*2 convolution for feature port number and reduces half, the characteristic value figure heap cut corresponding to constricted path
It is folded.It inputs picture and from left to right successively passes through layer-by-layer convolution sum pond, the feature extracted has more abstract semantic letter
Breath, the process of down-sampling can get the global semantic information of whole picture step by step, but the resolution ratio of characteristic pattern
But it is reducing always, this also causes the characteristic pattern on more the right although to have more abstract semantic information but is also lost many spaces
It is clear to will lead to segmentation result obscure boundary in this way for detailed information.Although up-sampling itself can restore some spatial detail information.
But this be it is far from being enough, the spy that there is higher resolution on some left sides is also added in U-Net during up-sampling
Sign figure (higher resolution often means that more spatial detail information).Article, which is taken, directly copied to be cropped to again and above adopts
The same size of master drawing piece, more abstract because the feature of extraction is also more efficient as convolution number increases, the figure of up-sampling
Piece be after undergoing multiple convolution relatively efficiently and abstract picture, then characteristic pattern and left side be less abstract but high resolution
Feature image stacked, allowing in the characteristic pattern after up-sampling not only has abstract semantic information, also has more
Spatial detail information.
In conclusion deficiency existing for existing image partition method is:
1), traditional images partitioning algorithm
Edge detection is realized simply, but the edge found is not exclusively the edge of target object.Threshold segmentation is generally used
Do binary segmentation, but either whole figure or part all poorly automatically find a suitable threshold value.Region is raw
The general quality for relatively depending on initial point of the quality of long segmentation result.Cluster the line for removing to use image that can be relatively good
Reason, color and spatial information, but need one kind that can indicate image texture, the method for color and spatial information well.
The feature of hand-designed is not particularly suited for whole pictures.
2), the image segmentation of deep learning
Although deep neural network has surmounted conventional segmentation methods from far away, CNN is utilized
(Convolutional Neural Network, convolutional neural networks) do image segmentation and still remain some problems, such as special
Levying resolution ratio reduces, and object, cannot be well to global contextual information modeling etc. there are multiple dimensioned.Although U-Net is used
The method of down-sampled-up-sampling, but picture too many spatial detail information can be lost during down-sampled.DRN is used
Empty convolution goes the resolution ratio for guaranteeing the smallest characteristic pattern will not be too small, but also only alleviates asking for feature resolution reduction
Topic, resolution ratio do not reduce, and the video memory of video card does not accommodate the characteristic pattern of too many image, the characteristic polymorphic of the picture extracted
It can become smaller.PSPNet is gone to obtain global context information with the method in pyramid pond, and effect well, but is split
Image edge it is still relatively rough.So segmentation is still the field of a very open and worth research.
Summary of the invention
This application provides a kind of image partition method based on deep learning, system and electronic equipments, it is intended at least exist
One of above-mentioned technical problem in the prior art is solved to a certain extent.
To solve the above-mentioned problems, this application provides following technical solutions:
A kind of image partition method based on deep learning, comprising the following steps:
Step a: original image is normalized;
Step b: the image after the normalization is inputted into ResUNet network model, the ResUNet network model extracts
Characteristic pattern comprising global semantic information in input picture, and up-sampling and the processing of characteristic pattern stacking are carried out to the characteristic pattern,
Obtain final characteristic pattern;
Step c: to the up-sampling and stacking, treated that characteristic pattern is classified pixel-by-pixel, and exports image segmentation knot
Fruit.
The technical solution that the embodiment of the present application is taken further include: described that normalizing is carried out to original image in the step a
Change processing specifically: original image is normalized using deep learning normalization mode GN.
The technical solution that the embodiment of the present application is taken further include: in the step b, the ResUNet network model is
The network model combined using ResNet and U-Net, including 2N+1 Resblock, N number of average pond layer, N number of up-sampling layer
And 2 (N+1) a convolutional layers, the ResUNet network model are full convolutional network, the characteristic pattern of extraction is big → small → big
Change procedure.
The technical solution that the embodiment of the present application is taken further include: under the ResUNet network model includes bottom-up
Sampling module and top-down up-sampling module, for the down sample module during down-sampling, the characteristic pattern of extraction is by big
Become smaller, obtains the global semantic information of the input picture;The up-sampling module makes the characteristic pattern by small using up-sampling
Become larger, and carry out characteristic pattern stacking after each up-sampling, had not only been possessed high-resolution but also had the feature of abstract low resolution
Figure.
A kind of another technical solution that the embodiment of the present application is taken are as follows: image segmentation system based on deep learning, comprising:
Normalize module: for original image to be normalized;
Characteristic pattern extraction module: described for the image after the normalization to be inputted ResUNet network model
ResUNet network model extracts the characteristic pattern comprising global semantic information in input picture, and adopt to the characteristic pattern
Sample and the processing of characteristic pattern stacking, obtain final characteristic pattern;
As a result output module: for the up-sampling and stacking, treated that characteristic pattern is classified pixel-by-pixel, and it is defeated
Image segmentation result out.
The technical solution that the embodiment of the present application is taken further include: place is normalized to original image in the normalization module
Reason specifically: original image is normalized using deep learning normalization mode GN.
The technical solution that the embodiment of the present application is taken further include: the ResUNet network model is using ResNet and U-
The network model that Net is combined, including 2N+1 Resblock, N number of average pond layer, N number of up-sampling layer and 2 (N+1) a volumes
Lamination, the ResUNet network model are full convolutional network, and the characteristic pattern of extraction is big → small → big change procedure.
The technical solution that the embodiment of the present application is taken further include: under the ResUNet network model includes bottom-up
Sampling module and top-down up-sampling module, for the down sample module during down-sampling, the characteristic pattern of extraction is by big
Become smaller, obtains the global semantic information of the input picture;The up-sampling module makes the characteristic pattern by small using up-sampling
Become larger, and carry out characteristic pattern stacking after each up-sampling, had not only been possessed high-resolution but also had the feature of abstract low resolution
Figure.
The another technical solution that the embodiment of the present application is taken are as follows: a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by described at least one
It manages device to execute, so that at least one described processor is able to carry out the following of the above-mentioned image partition method based on deep learning
Operation:
Step a: original image is normalized;
Step b: the image after the normalization is inputted into ResUNet network model, the ResUNet network model extracts
Characteristic pattern comprising global semantic information in input picture, and up-sampling and the processing of characteristic pattern stacking are carried out to the characteristic pattern,
Obtain final characteristic pattern;
Step c: to the up-sampling and stacking, treated that characteristic pattern is classified pixel-by-pixel, and exports image segmentation knot
Fruit.
Compared with the existing technology, the embodiment of the present application generate beneficial effect be: the embodiment of the present application based on depth
Image partition method, system and the electronic equipment of study carry out image segmentation using the network model that ResNet and U-Net is combined,
The spatial detail information of original image can be retained, and different scale object of different shapes can be divided.It is tied at entire network
In structure, the application replaces common convolutional layer using Resblock, solves the problems, such as gradient disappearance common in convolutional neural networks,
Also network can be allowed to be more easier to train, converges to a better segmentation result.The application utilizes GroupNormalization
Instead of BatchNormalization, it can efficiently solve in the case where limiting BatchSize very little due to video memory, normalize institute
The mean value and variance that need calculate the problem of inaccuracy, can allow network that can also converge in the case where the BatchSize of very little
Good result.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the application, not
For limiting the application.
Referring to Fig. 2, being the flow chart of the image partition method based on deep learning of the embodiment of the present application.The application is real
Apply the image partition method based on deep learning of example the following steps are included:
Step 100: using deep learning normalization mode GN (GroupNormalization, group normalization) to original graph
As being normalized;
In step 100, BatchNormalization (batch normalization, write a Chinese character in simplified form BN) is a kind of common method for normalizing,
But its performance is influenced by Batch Size (batch size).It is limited by hardware device, BatchSize generally can not be arranged
For very big number, and too small BatchSize can make to normalize required mean value and variance and calculate inaccuracy, cause under performance
Drop.Because BN be it is normalized in this dimension of batch, when Batch Size very little, the mean value calculated and
Variance cannot reflect the true distribution of data distribution, will lead to inconsistent in training, verifying, the parameter of test three phases.And
GN is then that the dimension of characteristic pattern is first become [N, G, C//G, H, W] by [N, C, H, W], the dimension after normalization be [C//G, H,
W] (N is quantity, and C is port number, and H is characterized figure height, W be characterized figure is wide, and G is a group number), therefore, in the small feelings of BatchSize
Under condition, the normalization performance of GN does not receive influence yet, and network is made to be easier to train.
Step 200: by after normalization image input ResUNet network model, ResUNet network model utilize from bottom to
On down sample module extract the characteristic pattern from large to small comprising global semantic information in image, and using from top upwards upper
Sampling module carries out up-sampling to characteristic pattern and characteristic pattern stacking is handled, and obtains final characteristic pattern;
In step 200, the ResUNet network model that the application is combined using ResNet and U-Net, ResUNet network mould
Type is made of 2N+1 Resblock, N number of average pond layer, N number of up-sampling layer and 2 (N+1) a convolutional layers, and network can basis
The number of specific data set adjustment ResNetblock.As shown in figure 3, being the preferred ResUNet network model of the embodiment of the present application
Structure chart.It includes 7 Resblock, 3 average pond layers, 3 up-sampling layers and 8 convolutional layers.Whole network is
Full convolutional network can both reduce the parameter of network not by the way of connecting entirely to pixel classifications in this way, can also reduce
Time of the front and back of network to deduction.Whole network characteristic pattern is big → small → big change procedure: in network characterization figure from big
During becoming smaller, characteristic pattern includes more and more abstract semantic information;In network characterization figure from the small process to become larger
In, characteristic pattern not only only include semantic information abundant, but also by means of up-sample and stack than itself have it is higher
The characteristic pattern of the low level of resolution ratio, making its characteristic pattern also includes enough spatial detail information, so that image segmentation knot
Fruit, which can revert to, has same resolution ratio with input picture.
As shown in figure 3, whole network is divided into bottom-up down sample module and up-sampling module two top-down
Point, wherein network is a bottom-up down sample module from ResNetblock 4 is input to, during down-sampling, feature
Figure from large to small, makes it obtain more global semantic informations.Assuming that the size of input picture is 512*512, successively pass through 3 times
After average pond, the characteristic pattern size of input ResNetblock 2 is 256*256, inputs the characteristic pattern of ResNetblock 3
Size is 128*128, and the characteristic pattern size of input ResNetblock 4 is 64*64, and 3 average ponds are so that characteristic pattern is more next
It is smaller, and continuous convolution process keeps port number more and more.Because of the reduction of resolution ratio, keeps characteristic pattern more efficient and take out
As also losing many spatial detail information.
Network is a top-down up-sampling module from ResNetblock 5 to output, in upper sampling process, is utilized
Up-sampling makes characteristic pattern change from small to big, and by up-sampling three times, characteristic pattern is by 64*64 → 128*128 → 256*256 → 512*
512, a characteristic pattern, which is had, after up-sampling every time stacks.Detailed process is as shown in figure 4, be the up-sampling of the embodiment of the present application
And characteristic pattern stacks schematic diagram.For example the characteristic pattern after last time up-sampling is 512*512, and passes through ResNetblock
Characteristic pattern after 1 stacks, and characteristic pattern at this time had not only possessed high-resolution characteristic pattern, but also had the feature of abstract low resolution
Figure, and final characteristic pattern includes the different size of characteristic pattern of each level, can retain enough spatial detail information, have
Conducive to the object of segmentation different scale, so that prediction result is more accurate.
The application replaces common convolutional layer using ResNet, solves asking for gradient disappearance common in convolutional neural networks
Topic.In order to guarantee to export the ability to express of feature, the application keeps ResNetblock port number successively progressive, network can be allowed wide
Degree becomes smaller, and overcomes the problems, such as that gradient disappears, also can control number of parameters, and network is allowed to be more easier to train, and converges to one more preferably
Segmentation result.Due to the jump connection function of ResNetblock, even if model will not be allowed in the case where network relatively depth
Tend to be saturated, ResNet can be allowed to have lower convergence loss.
Step 300: to up-sampling and stacking that treated that characteristic pattern is classified pixel-by-pixel, and export image segmentation knot
Fruit.
It is different from classical convolutional neural networks mode classification in step 300, the ResUNet network of the embodiment of the present application
Model is not that the feature vector for obtaining regular length using full articulamentum is classified, it can receive the input of arbitrary dimension
Image makes characteristic pattern be restored to the identical size of input picture, so as to produce one to each pixel using up-sampling
A prediction, while remaining the spatial detail information in original input picture.
Referring to Fig. 5, being the structural schematic diagram of the image segmentation system based on deep learning of the embodiment of the present application.This Shen
Please the image segmentation system based on deep learning of embodiment include that normalization module, characteristic pattern extraction module and result export mould
Block.
Normalization module: for original image to be normalized using deep learning normalization mode GN;Wherein,
BatchNormalization (batch normalization, write a Chinese character in simplified form BN) is a kind of common method for normalizing, but its performance is by Batch
The influence of Size (batch size).It is limited by hardware device, BatchSize generally can not be set as very big number, and too small
BatchSize can make to normalize required mean value and variance again and calculate inaccuracy, cause performance to decline.Because BN be
Normalized in this dimension of batch, when Batch Size very little, the mean value and variance calculated cannot reflect number
According to the true distribution of distribution, will lead to inconsistent in training, verifying, the parameter of test three phases.And GN is then first by feature
The dimension of figure becomes [N, G, C//G, H, W] by [N, C, H, W], and the dimension after normalization is that [C//G, H, W] (N is quantity, and C is
Port number, H are characterized figure height, and it is wide that W is characterized figure, and G is a group number), therefore, in the case where BatchSize is small, the normalization of GN
Performance does not receive influence yet, and network is made to be easier to train.
Characteristic pattern extraction module: ResUNet network model, ResUNet network mould are inputted for the image after normalizing
Type extracts the characteristic pattern from large to small comprising global semantic information in image using bottom-up down sample module, and utilizes
Up-sampling and the processing of characteristic pattern stacking are carried out to characteristic pattern from upward up-sampling module is pushed up, obtain final characteristic pattern;This Shen
The ResUNet network model that please be combined using ResNet and U-Net, ResUNet network model is by 2N+1 Resblock, N number of
Average pond layer, N number of up-sampling layer and a convolutional layer of 2 (N+1) are constituted, and network can be adjusted according to specific data set
The number of ResNetblock.
The preferred ResUNet network model of the embodiment of the present application is including on 7 Resblock, 3 average pond layers, 3
Sample level and 8 convolutional layers.Whole network is full convolutional network, not by the way of connecting entirely to pixel classifications, in this way
Both the parameter that network can have been reduced can also reduce the front and back of network to the time of deduction.Whole network characteristic pattern is big → small
→ big change procedure: in network characterization figure from during becoming smaller greatly, characteristic pattern includes more and more abstract semanteme
Information;During network characterization figure is from small become larger, characteristic pattern not only only includes semantic information abundant, but also by
It in up-sampling and stacks than itself there is the characteristic pattern of the low level of higher resolution, making its characteristic pattern also includes enough skies
Between detailed information so that image segmentation result, which can be reverted to, has same resolution ratio with input picture.
Whole network is divided into bottom-up down sample module and top-down up-sampling module two parts, wherein net
Network is from ResNetblock 4 is input to for a bottom-up down sample module, and during down-sampling, characteristic pattern is by becoming greatly
It is small, so that it is obtained more global semantic informations.Assuming that the size of input picture is 512*512, successively by 3 average ponds
Later, the characteristic pattern size for inputting ResNetblock 2 is 256*256, and the characteristic pattern size of input ResNetblock 3 is
128*128, the characteristic pattern size of input ResNetblock 4 are 64*64,3 average ponds so that characteristic pattern is smaller and smaller, and
Continuous convolution process keeps port number more and more.Because of the reduction of resolution ratio, keeps characteristic pattern more efficiently and abstract, also lose
Many spatial detail information.
Network is a top-down up-sampling module from ResNetblock 5 to output, in upper sampling process, is utilized
Up-sampling makes characteristic pattern change from small to big, and by up-sampling three times, characteristic pattern is by 64*64 → 128*128 → 256*256 → 512*
512, a characteristic pattern, which is had, after up-sampling every time stacks.For example the characteristic pattern after last time up-sampling is 512*512,
It is stacked with by the characteristic pattern after ResNetblock 1, characteristic pattern at this time had not only possessed high-resolution characteristic pattern, but also had abstract
Low resolution characteristic pattern, and final characteristic pattern include the different size of characteristic pattern of each level, can retain enough
Spatial detail information, be conducive to divide different scale object so that prediction result is more accurate.
The application replaces common convolutional layer using ResNet, solves asking for gradient disappearance common in convolutional neural networks
Topic.In order to guarantee to export the ability to express of feature, the application keeps ResNetblock port number successively progressive, network can be allowed wide
Degree becomes smaller, and overcomes the problems, such as that gradient disappears, also can control number of parameters, and network is allowed to be more easier to train, and converges to one more preferably
Segmentation result.Due to the jump connection function of ResNetblock, even if model will not be allowed in the case where network relatively depth
Tend to be saturated, ResNet can be allowed to have lower convergence loss.
As a result output module: for up-sampling and stacking that treated characteristic pattern being classified pixel-by-pixel, and figure is exported
As segmentation result.The ResUNet network model of the embodiment of the present application is not to obtain the feature of regular length using full articulamentum
Vector is classified, it can receive the input picture of arbitrary dimension, so that characteristic pattern is restored to input picture phase using up-sampling
Same size, so as to produce a prediction to each pixel, while it is thin to remain the space in original input picture
Save information.
Fig. 6 is the hardware device structural representation of the image partition method provided by the embodiments of the present application based on deep learning
Figure.As shown in fig. 6, the equipment includes one or more processors and memory.It takes a processor as an example, which may be used also
To include: input system and output system.
Processor, memory, input system and output system can be connected by bus or other modes, in Fig. 6 with
For being connected by bus.
Memory as a kind of non-transient computer readable storage medium, can be used for storing non-transient software program, it is non-temporarily
State computer executable program and module.Processor passes through operation non-transient software program stored in memory, instruction
And module realizes the place of above method embodiment thereby executing the various function application and data processing of electronic equipment
Reason method.
Memory may include storing program area and storage data area, wherein storing program area can storage program area, extremely
Application program required for a few function;It storage data area can storing data etc..In addition, memory may include that high speed is random
Memory is accessed, can also include non-transient memory, a for example, at least disk memory, flush memory device or other are non-
Transient state solid-state memory.In some embodiments, it includes the memory remotely located relative to processor that memory is optional, this
A little remote memories can pass through network connection to processing system.The example of above-mentioned network includes but is not limited to internet, enterprise
Intranet, local area network, mobile radio communication and combinations thereof.
Input system can receive the number or character information of input, and generate signal input.Output system may include showing
Display screen etc. shows equipment.
One or more of module storages in the memory, are executed when by one or more of processors
When, execute the following operation of any of the above-described embodiment of the method:
Step a: original image is normalized;
Step b: the image after the normalization is inputted into ResUNet network model, the ResUNet network model extracts
Characteristic pattern comprising global semantic information in input picture, and up-sampling and the processing of characteristic pattern stacking are carried out to the characteristic pattern,
Obtain final characteristic pattern;
Step c: to the up-sampling and stacking, treated that characteristic pattern is classified pixel-by-pixel, and exports image segmentation knot
Fruit.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiments of the present application.
The embodiment of the present application provides a kind of non-transient (non-volatile) computer storage medium, and the computer storage is situated between
Matter is stored with computer executable instructions, the executable following operation of the computer executable instructions:
Step a: original image is normalized;
Step b: the image after the normalization is inputted into ResUNet network model, the ResUNet network model extracts
Characteristic pattern comprising global semantic information in input picture, and up-sampling and the processing of characteristic pattern stacking are carried out to the characteristic pattern,
Obtain final characteristic pattern;
Step c: to the up-sampling and stacking, treated that characteristic pattern is classified pixel-by-pixel, and exports image segmentation knot
Fruit.
The embodiment of the present application provides a kind of computer program product, and the computer program product is non-temporary including being stored in
Computer program on state computer readable storage medium, the computer program include program instruction, when described program instructs
When being computer-executed, the computer is made to execute following operation:
Step a: original image is normalized;
Step b: the image after the normalization is inputted into ResUNet network model, the ResUNet network model extracts
Characteristic pattern comprising global semantic information in input picture, and up-sampling and the processing of characteristic pattern stacking are carried out to the characteristic pattern,
Obtain final characteristic pattern;
Step c: to the up-sampling and stacking, treated that characteristic pattern is classified pixel-by-pixel, and exports image segmentation knot
Fruit.
The image partition method based on deep learning, system and electronic equipment of the embodiment of the present application use ResNet and U-
The network model that Net is combined carries out image segmentation, can retain the spatial detail information of original image, and can divide different rulers
Spend object of different shapes.In entire network configuration, the application replaces common convolutional layer using Resblock, solves convolution
Common gradient disappearance problem, also can allow network to be more easier to train in neural network, converge to a better segmentation result.
The application replaces BatchNormalization using GroupNormalization, can efficiently solve since video memory limits
In the case where BatchSize very little, the problem that required mean value and variance calculate inaccuracy is normalized, network can be allowed in very little
BatchSize in the case where can also converge to good result.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, defined herein
General Principle can realize in other embodiments without departing from the spirit or scope of the application.Therefore, this Shen
These embodiments shown in the application please be not intended to be limited to, and are to fit to special with principle disclosed in the present application and novelty
The consistent widest scope of point.