CN108765425A - Image partition method, device, computer equipment and storage medium - Google Patents
Image partition method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108765425A CN108765425A CN201810463609.0A CN201810463609A CN108765425A CN 108765425 A CN108765425 A CN 108765425A CN 201810463609 A CN201810463609 A CN 201810463609A CN 108765425 A CN108765425 A CN 108765425A
- Authority
- CN
- China
- Prior art keywords
- super
- context
- characteristic pattern
- pixel region
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Abstract
This application involves a kind of image partition method, device, computer equipment and storage mediums.The method includes:Obtain image to be split, image to be split is input in the input variable of full convolutional neural networks, export convolution characteristic pattern, convolution characteristic pattern is input in the input variable that neural network can be switched in context, export context expressing information, intermediate features figure is generated according to convolution characteristic pattern and context expressing information, intermediate features figure is for carrying out image segmentation.By combining convolutional neural networks downscaled images and stacking feature of the image for calculating image, semantic, and the context expressing information that neural network generates based on context can be switched and be used for image segmentation, the accuracy of image segmentation can be improved.
Description
Technical field
This application involves technical field of image processing, more particularly to a kind of image partition method, device, computer equipment
And storage medium.
Background technology
With the development of image processing techniques, image segmentation is a part important in technical field of image processing, and machine
Device study plays an important role in technical field of image processing.Depth data can provide the geological information in image,
And include largely useful to image segmentation information in depth data, by the way that depth image is encoded to three-dimensional three kinds not
With the image in channel, reuses the image after color and coding and carry out training convolutional neural networks as input, to calculate image
Segmentation feature, so that it may to realize the segmentation of image.
However, it is current it is this in such a way that convolutional neural networks realize image segmentation in, convolutional neural networks it is defeated
Go out the major part that can be lost and include in depth data to image segmentation useful information, there are problems that image segmentation poor accuracy.
Invention content
Based on this, it is necessary in view of the above technical problems, provide a kind of image that can improve image segmentation accuracy point
Segmentation method, device, computer equipment and storage medium.
A kind of image partition method, the method includes:
Obtain image to be split;
The image to be split is input in the input variable of full convolutional neural networks, convolution characteristic pattern is exported;
The convolution characteristic pattern is input in the input variable that neural network can be switched in context, output context expression
Information;
Intermediate features figure is generated according to the convolution characteristic pattern and the context expressing information, the intermediate features figure is used
In progress image segmentation.
It is described in one of the embodiments, that the convolution characteristic pattern is input to the defeated of the changeable neural network of context
Enter in variable, exports context expressing information, including:
The convolution characteristic pattern is divided into super-pixel region, the super-pixel region is the sub-district of the convolution characteristic pattern
Domain;
According to the super-pixel Area generation local feature figure.
It is described in one of the embodiments, that the convolution characteristic pattern is input to the defeated of the changeable neural network of context
Enter in variable, exports context expressing information, further include:
Calculate the average depth value in the super-pixel region;
Context expressing information corresponding with the super-pixel region is generated according to the average depth value.
It is described in one of the embodiments, to be generated on corresponding with the super-pixel region according to the average depth value
Hereafter expressing information, including:
The average depth value is compared with condition depth value;
When the average depth value is less than the condition depth value, the super-pixel region is compressed;
When the average depth value is more than or equal to the condition depth value, the super-pixel region is expanded
Exhibition.
It is described when the average depth value is less than the condition depth value in one of the embodiments, to described super
Pixel region is compressed, including
The corresponding local feature figure in the super-pixel region is input to preset three convolutional neural networks to handle,
Obtain compressed super-pixel region;
Wherein, three convolutional neural networks include neural network that two convolution kernels are 1 and a convolution kernel is 3
Neural network.
It is described when the average depth value is more than or equal to the condition depth value in one of the embodiments,
The super-pixel region is extended, including:
The corresponding local feature figure in the super-pixel region is input to preset three convolutional neural networks to handle,
Super-pixel region after being expanded;
Wherein, three convolutional neural networks include neural network that two convolution kernels are 7 and a convolution kernel is 1
Neural network.
Training obtains the changeable neural network of the context in the following way in one of the embodiments,:
Input layer sequence is obtained according to the classification of the convolution characteristic pattern and the convolution characteristic pattern, it will be described defeated
Enter node layer sequence to be projected to obtain the corresponding hidden node sequence of the first hidden layer, using the first hidden layer as currently processed hidden
Layer;
According to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron node of currently processed hidden layer
Weight and deviation the hidden node sequence of next layer of hidden layer is obtained using Nonlinear Mapping, using next layer of hidden layer as current place
Hidden layer is managed, repeats to enter according to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each nerve of currently processed hidden layer
The step of corresponding weight of first node and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping, until defeated
Go out layer, obtains the context expressing information probability matrix corresponding with the classification of convolution characteristic pattern of output layer output.
A kind of image segmentation device, described device include:
Image collection module, for obtaining image to be split;
Characteristic pattern output module, for the image to be split to be input in the input variable of full convolutional neural networks,
Export convolution characteristic pattern;
Message output module, for the convolution characteristic pattern to be input to the input variable that neural network can be switched in context
In, export context expressing information;
Characteristic pattern generation module, for generating intermediate features according to the convolution characteristic pattern and the context expressing information
Figure, the intermediate features figure is for carrying out image segmentation.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing
Device realizes following steps when executing the computer program:
Obtain image to be split;
The image to be split is input in the input variable of full convolutional neural networks, convolution characteristic pattern is exported;
The convolution characteristic pattern is input in the input variable that neural network can be switched in context, output context expression
Information;
Intermediate features figure is generated according to the convolution characteristic pattern and the context expressing information, the intermediate features figure is used
In progress image segmentation.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
Following steps are realized when row:
Obtain image to be split;
The image to be split is input in the input variable of full convolutional neural networks, convolution characteristic pattern is exported;
The convolution characteristic pattern is input in the input variable that neural network can be switched in context, output context expression
Information;
Intermediate features figure is generated according to the convolution characteristic pattern and the context expressing information, the intermediate features figure is used
In progress image segmentation.
Above-mentioned image partition method, device, computer equipment and storage medium will be waited for point by obtaining image to be split
It cuts image to be input in the input variable of full convolutional neural networks, exports convolution characteristic pattern, convolution characteristic pattern is input to up and down
In the input variable of the changeable neural network of text, context expressing information is exported, is believed according to convolution characteristic pattern and context expression
Breath generates intermediate features figure, and intermediate features figure is for carrying out image segmentation.By combining convolutional neural networks downscaled images and heap
Folded image is used to calculate the feature of image, semantic, and the context expressing information that neural network generates based on context can be switched and use
In image segmentation, the accuracy of image segmentation can be improved.
Description of the drawings
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with
Obtain other attached drawings according to these attached drawings.
Fig. 1 is the internal structure chart of one embodiment Computer equipment;
Fig. 2 is the flow diagram of image partition method in one embodiment;
Fig. 3 is the method flow schematic diagram that local feature figure is generated in one embodiment;
Fig. 4 is the method flow schematic diagram that context expressing information is generated in one embodiment;
Fig. 5 is the method flow schematic diagram handled super-pixel region in one embodiment;
Fig. 6 is the schematic diagram that architecture is compressed in one embodiment;
Fig. 7 is the schematic diagram of XA(extended architecture) in one embodiment;
Fig. 8 is the system assumption diagram that neural network can be switched in context in one embodiment;
Fig. 9 is the corresponding local structural graph of average depth value in super-pixel region in one embodiment;
Figure 10 is the structure diagram of image segmentation device in one embodiment;
Figure 11 is the structure diagram of message output module in one embodiment;
Figure 12 is that different dividing methods are compared schematic diagram in NYUDv2 data concentrated collection pictures in experimentation;
Figure 13 is that different dividing methods are compared schematic diagram in SUN-RGBD data concentrated collection pictures in experimentation.
Specific implementation mode
It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
Unless otherwise defined, all of technologies and scientific terms used here by the article and belong to the technical field of the application
The normally understood meaning of technical staff is identical.The term used in the description of the present application is intended merely to description tool herein
The purpose of the embodiment of body, it is not intended that in the limitation present invention.Each technical characteristic of above example can carry out arbitrary group
It closes, to keep description succinct, combination not all possible to each technical characteristic in above-described embodiment is all described, however,
As long as contradiction is not present in the combination of these technical characteristics, it is all considered to be the range of this specification record.
As shown in Figure 1, for the internal structure schematic diagram of one embodiment Computer equipment.The computer equipment can be
Terminal can also be server, wherein terminal can be smart mobile phone, tablet computer, laptop, desktop computer, individual
Digital assistants, Wearable and mobile unit etc. have the electronic equipment of communication function, and server can be independent service
Device can also be server cluster.Referring to Fig.1, the computer equipment include the processor connected by system bus, it is non-volatile
Property storage medium, built-in storage and network interface.Wherein, the non-volatile memory medium of the computer equipment can store operation system
System and computer program, the computer program are performed, and processor may make to execute a kind of image processing method.The computer
The processor of equipment supports the operation of entire computer equipment for providing calculating and control ability.It can be stored up in the built-in storage
There are operating system, computer program and database.Wherein, when which is executed by processor, it may make processing
Device executes a kind of image processing method.The network interface of computer equipment is for carrying out network communication.
It will be understood by those skilled in the art that structure shown in Fig. 1, is only tied with the relevant part of application scheme
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
May include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.
In one embodiment, as shown in Fig. 2, providing a kind of image partition method, it is applied in Fig. 1 in this way
It illustrates, includes the following steps for computer equipment:
Step 202, image to be split is obtained.
Wherein, image is a kind of description or description of objective objects, is a kind of common information carrier, image contains
It is described object for information about.For example, image can be the photo containing different objects of camera acquisition, it can also be logical
Cross the information carrier containing different objects information of computer software synthesis.
Image segmentation refers to dividing the image into multiple specific, with unique properties regions.Image to be split can be
It obtains in real time, can also be pre-stored, for example, the figure to be split that can be acquired by camera user in real
Picture can also be that image to be split is then obtained from database by image to be split storage to database in advance.
Step 204, image to be split is input in the input variable of full convolutional neural networks, exports convolution characteristic pattern.
Full convolutional neural networks FCN (Fully Convolutional Networks) is advance trained neural network
Model can be used for image segmentation.FCN for image segmentation can be recovered from abstract feature belonging to each pixel
Classification further extends into the classification of pixel scale from the classification of image level.Convolutional neural networks model may include entirely
Convolutional layer and pond layer.Wherein, there are convolution kernel in convolutional layer, the weight matrix for extracting feature, by the way that convolution is arranged
The step-length of operation can reduce the quantity of weight.Pond layer, which is called, does down-sampling layer, can reduce the dimension of matrix.
Using image to be split as the input item data of advance trained full convolutional neural networks.Image to be split is defeated
Enter to convolutional layer, convolution kernel according to corresponding length be according to convolution kernel size from front to back to the image to be split of input into
Row scanning, and execute convolution operation.By process of convolution, followed by pond, layer is handled, and pond layer can effectively reduce dimension
Degree.Full convolutional neural networks can will be exported by the convolution characteristic pattern obtained after convolutional layer and the processing of pond layer.
Step 206, convolution characteristic pattern is input in the input variable that neural network can be switched in context, exports context
Expressing information.
It is trained in advance according to picture structure and depth data that neural network, which can be switched, in context, and context is changeable
Neural network is a kind of full convolutional neural networks, also includes convolutional layer and pond layer.Context expressing information is to refer to influence
Some information or all information of object in image.
After convolution characteristic pattern to be input to the input variable that neural network can be switched in context, nerve net can be switched in context
Convolutional layer and pond layer in network handle convolution characteristic pattern, can obtain the context expressing information of convolution characteristic pattern.
Step 208, intermediate features figure is generated according to convolution characteristic pattern and context expressing information, intermediate features figure be used for into
Row image segmentation.
Intermediate features figure can be the characteristic pattern of multiple different resolutions.Intermediate features figure can be according to differentiating from up to down
The sequence of rate from low to high is ranked up.Intermediate features figure can be generated according to convolution characteristic pattern and context expressing information,
Specific formula can be expressed as:Fl+1=Ml+1+Dl→(l+1), l=0 ..., L-1.Wherein, L indicates the quantity of intermediate features figure,
Fl+1Indicate the intermediate features figure ultimately generated, F1Indicate the intermediate features figure with lowest resolution, FlIndicate that there is best result
The intermediate features figure of resolution.The convolution characteristic pattern that M expressions are exported by full convolutional neural networks, and Dl→(l+1)Indicate convolution feature
Scheme the corresponding context expressing informations of M.The intermediate features figure of generation is for carrying out image segmentation.
By obtaining image to be split, image to be split is input in the input variable of full convolutional neural networks, is exported
Convolution characteristic pattern is input in the input variable that neural network can be switched in context by convolution characteristic pattern, output context expression
Information generates intermediate features figure according to convolution characteristic pattern and context expressing information, and intermediate features figure is for carrying out image segmentation.
By combining convolutional neural networks downscaled images and stacking feature of the image for calculating image, semantic, and based on context can cut
The context expressing information for changing neural network generation is used for image segmentation, can improve the accuracy of image segmentation.
As shown in figure 3, in one embodiment, a kind of image partition method provided can also include generating local feature
The process of figure, specific steps include:
Step 302, convolution characteristic pattern is divided into super-pixel region, super-pixel region is the subregion of convolution characteristic pattern.
It is the figure of Pixel-level (pixel-level) originally that super-pixel, which is a width, is divided into region class (district-
Level figure) can use super-pixel algorithm to carry out super-pixel region segmentation to image.Super-pixel is carried out to image and divides it
Afterwards, many regions not of uniform size can be obtained, include effective information, such as color histogram, texture in these regions
Information.For example, there are one people in image, we can carry out super-pixel segmentation to the image of this people, and then by each
The feature extraction of zonule, it is which part (head, shoulder, leg) in human body to pick out these regions, and then is established
The arthrosis image of human body.
After carrying out super-pixel region division to convolution characteristic pattern, multiple super-pixel regions can be obtained, what is obtained is multiple super
Pixel region is not the region of overlapping, these super-pixel regions are all the subregions of convolution characteristic pattern.
Step 304, according to super-pixel Area generation local feature figure.
Each super-pixel region is corresponding with local feature figure, and the formula that local feature figure generates can be expressed as:Wherein, SnIndicate that super-pixel region, region ri indicate figure to be split
A receptive field as in.Receptive field is the size in visual experience region, and in convolutional neural networks, the definition of receptive field is
The area size that pixel on the characteristic pattern of each layer of output of convolutional neural networks maps on the original image.φ(Sn) indicate
The set at receptive field center in multiple super-pixel regions, H (:) indicate local structural graph.It can be obtained from formula, for region
Ri contains the feature of intermediate features figure in the local feature figure of generation, and therefore, the local feature figure of generation remains original area
Content in the ri of domain.
By the way that convolution characteristic pattern is divided into super-pixel region, super-pixel region is the subregion of convolution characteristic pattern, according to
Super-pixel Area generation local feature figure.Convolution characteristic pattern is first divided into super-pixel region, further according to super-pixel Area generation
Local feature figure can retain the content in original region, keep image segmentation more accurate.
In one embodiment, as shown in figure 4, a kind of image partition method provided can also include generating context table
Up to the process of information, specific steps include:
Step 402, the average depth value in super-pixel region is calculated.
The gray value of each pixel in depth image can be used for characterizing distance of the certain point apart from video camera in scene,
Depth value is exactly distance value of the certain point apart from video camera in scene.There can be multiple objects to coexist simultaneously in super-pixel region.
By obtaining the depth value of each object in super-pixel region, entire super-pixel can be calculated according to the depth value of each object
The average depth value in region.
Step 404, context expressing information corresponding with super-pixel region is generated according to average depth value.
Depth value is the significant data for generating context expressing information.Each corresponding context expression letter in super-pixel region
Breath is generated according to the average depth value in each super-pixel region.
By calculating the average depth value in super-pixel region, generated on corresponding with super-pixel region according to average depth value
Hereafter expressing information.Corresponding context expressing information is generated according to the average depth value in super-pixel region, generation can be made
Context expressing information is more accurate, to improve the accuracy of image segmentation.
As shown in figure 5, in one embodiment, a kind of image partition method provided can also include to super-pixel region
The process handled, specific steps include:
Step 502, average depth value is compared with condition depth value.
Condition depth value can be the specific numerical value pre-set.Computer equipment is calculating mean depth
Size comparison can be carried out after value with condition depth value.
Step 504, when average depth value less-than condition depth value, super-pixel region is compressed.
When average depth value less-than condition depth value, indicates that the information content in super-pixel region is larger, need using pressure
Contracting architecture refines the information in super-pixel region, reduces the diversified information of transition in super-pixel region.Compression
Architecture can learn again to be weighted corresponding super-pixel region, and the formula compressed to super-pixel region is:Wherein, rj indicates that the super-pixel region of average depth value less-than condition depth value, c indicate pressure
Contracting architecture, DlIndicate structure feature figure, andRepresenting matrix corresponding element is multiplied.It indicates to pass through compression body tying
The compressed super-pixel region of structure.
Step 506, when average depth value is more than or equal to condition depth value, super-pixel region is extended.
When average depth value is more than or equal to condition depth value, indicates that the information content in super-pixel region is less, need
Information in super-pixel region is enriched using XA(extended architecture).XA(extended architecture) is extended super-pixel region
Formula is:Wherein,Indicate the super-pixel region by XA(extended architecture) extension.
It is right when average depth value less-than condition depth value by the way that average depth value to be compared with condition depth value
Super-pixel region is compressed, and when average depth value is more than or equal to condition depth value, is extended to super-pixel region.
Compression architecture or XA(extended architecture) is selected to handle super-pixel region according to the size of average depth value, it can be with
Improve the accuracy of image segmentation.
In one embodiment, the context expressing information that neural network generation can be switched by context can pass through public affairs
Formula indicates that specific formula is:
Wherein, super-pixel region SnWith super-pixel region SmIt is adjacent, it can be by super-pixel region S by this formulamIn impression
The wild top-down information of region rj is transmitted to receptive field region ri,Indicate XA(extended architecture),Indicate compression body tying
Structure, d (Sn) indicate super-pixel region SnMean depth.Indicator functionFor switching XA(extended architecture) and compression system
Structure.As d (Sn) < d (Sm) when, compression architecture can be switched to refine the information of receptive field region ri, as d (Sn) >
d(Sm) when, compression architecture can be switched to enrich the information of receptive field region ri.
In one embodiment, as shown in fig. 6, a kind of image partition method provided can also include to super-pixel region
The process compressed, specifically includes:The corresponding local feature figure in super-pixel region is input to preset three convolutional Neurals
Network is handled, and compressed super-pixel region is obtained.Wherein, it is 1 that three convolutional neural networks, which include two convolution kernels,
The neural network that neural network and a convolution kernel are 3.
As shown in fig. 6, by local feature Figure 61 0 as inputting, it is output in compression architecture.Compress architecture by
First 1*1 convolutional layers 620,3*3 convolutional layers 630 and the 2nd 1*1 convolutional layers 640 composition.Wherein, 620 He of the first 1*1 convolutional layers
2nd 1*1 convolutional layers 640 are the convolutional layer that convolution kernel is 1, and 3*3 convolutional layers 630 are the convolutional layer that convolution kernel is 3.
After local feature Figure 61 0 is inputted compression architecture, handled by the first 1*1 convolutional layers 620.First 1*1
The process that dimension halves can be filtered local feature Figure 61 0 by convolutional layer 620 for halving the dimension of local feature Figure 61 0
In garbage, the useful information in local feature Figure 61 0 can also be retained.After dimensionality reduction, 3*3 convolutional layers 630 can will be tieed up
Degree restores, and rebuilds back original dimension, reuses the 2nd 1*1 convolutional layers and generates weighing vector c (D againl(rj)), and according to adding
Weight vector c (Dl(rj)) compressed super-pixel region is generated.
As shown in fig. 7, in one embodiment, a kind of image partition method provided can also include to super-pixel region
The process being extended, specifically includes:The corresponding local feature figure in super-pixel region is input to preset three convolutional Neurals
Network is handled, the super-pixel region after being expanded.Wherein, it is 7 that three convolutional neural networks, which include two convolution kernels,
The neural network that neural network and a convolution kernel are 1.
As shown in fig. 7, XA(extended architecture) is by the first 7*7 convolutional layers 720,1*1 convolutional layers 730 and the 2nd 7*7 convolution
Layer 740 forms.After 0 input expanding architectures of local feature Figure 71, the first 7*7 convolutional layers 720 are come using larger kernel
Expand receptive field, and learns relevant context expressing information.1*1 convolutional layers 730 remove the first 7*7 for halving dimension
The redundancy that the big kernel of convolutional layer 720 includes.2nd 7*7 convolutional layers 740 are for restoring dimension, the 2nd 7*7 convolutional layers 740
It can be with ε (DlAnd D (rj))l(rj) dimension matching.
In one embodiment, training obtains the changeable neural network of context in the following way:According to convolution feature
The classification of figure and convolution characteristic pattern obtains input layer sequence, and input layer sequence is projected to obtain the first hidden layer
Corresponding hidden node sequence, using the first hidden layer as currently processed hidden layer.
According to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron node of currently processed hidden layer
Weight and deviation the hidden node sequence of next layer of hidden layer is obtained using Nonlinear Mapping, using next layer of hidden layer as current place
Hidden layer is managed, repeats to enter according to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each nerve of currently processed hidden layer
The step of corresponding weight of first node and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping, until defeated
Go out layer, obtains the context expressing information probability matrix corresponding with the classification of convolution characteristic pattern of output layer output.
Convolution characteristic pattern is input to after neural network can be switched in context and will produce the local feature figure of different resolution, and
The local feature figure of generation is sent in the grader pixel-by-pixel for semantic segmentation.Grader can be that part is special pixel-by-pixel
The pixel for levying figure exports one group of class label, and the sequence of class label can be expressed as:Y=f (Fl), wherein function f (:) be
One flexible maximum value regressor for generating classification pixel-by-pixel.Y=f (Fl) can be used for predicting class label pixel-by-pixel.Instruction
Practicing the object function of the changeable neural network of context can be formulated as:Wherein,
Function L (:) it is flexible maximum loss value, for receptive field region ri, y (ri) can be used to indicate the prediction kind of receptive field region ri
Class label.Result and the types of forecast label that neural network output can will can be switched in computer equipment by context carry out pair
Than to realize that the training of neural network can be switched in context.
Context can be switched neural network in processing procedure be:The convolution feature that will be generated by full convolutional neural networks
The convolution characteristic pattern of input is divided into super-pixel region by the class label of figure and convolution characteristic pattern as input, according to super
Pixel region generates local feature figure.The average depth value in super-pixel region is calculated, according to the size of average depth value
Super-pixel region is handled.When average depth value less-than condition depth value, using compression architecture to super-pixel area
Domain is handled, and the context expressing information in compressed super-pixel region is obtained;When average depth value is more than or equal to item
When part depth value, super-pixel region is handled using XA(extended architecture), and the super-pixel region after being expanded is upper
Hereafter expressing information, then obtained context expressing information is exported.
In one embodiment, the weight parameter in neural network can be switched to context using gradient descent method to adjust
It is whole.
The calculation formula of gradient is:Wherein, Sn
And SmIt is two adjacent super-pixel regions, ri, rj, rk indicate the receptive field in image to be split, J respectively
Indicate that the object function of neural network model can be switched in training context.In the calculation formula of gradient,Indicate update letter
Number.It, can be to centre when the weight parameter in neural network can be switched to context using the calculation formula of gradient being adjusted
Characteristic pattern optimizes.Receptive field region ri is from super-pixel region SnReceptive field region rk receive more new signalIt should
More new signalFor adjusting the feature being located in same super-pixel region, coexisted so that these regions show object
Property.Receptive field region ri is also from neighbouring super pixels region SmMiddle Fl+1Receptive field region rj in receive more new signal.
In the calculation formula of gradient,Indicate the more new signal from receptive field region rj.When more new signal from
When receptive field region rj is transmitted to receptive field region ri, more new signalAccording to signalIt is weighted, makes it
Extend receptive field region rj.Meanwhile parameter lambdacAnd parameter lambdaeIt is by super-pixel region SnWith super-pixel region SmMean depth
Determining switch, that is, according to super-pixel region SnWith super-pixel region SmMean depth can determine be use parameter
λcIt is compressed, or uses parameter lambdaeIt is extended.For example, signalIt can be extended to:Meanwhile signalIt can be extended to:
As d (Sn) < d (Sm), i.e. super-pixel region SnAverage depth value be less than super-pixel region SmAverage depth value
When, signalTo the signal transmitted from receptive field region rjThe gradient of back transfer is weighted.Compression body
Architecture C (:) can be optimized by back transfer, back transfer refers to object function
Be transmitted to compression architecture C (:).Wherein, weighing vector C (D againl(rk)) more new signal is also assisted inIn training
When neural network can be switched in context, vectorial C (D are usedl(rk)) local feature figure D is selectedl(rk) useful for dividing in
Information constructs intermediate features figure Fl+1(rj).With weighing vector C (D againl(rk)) together, for dividing in the rj of receptive field region
Receptive field region ri fresh informations can preferably be instructed by cutting useful information.
As d (Sn)≥d(Sm), i.e. super-pixel region SnAverage depth value be more than or equal to super-pixel region SmAverage depth
When angle value, signalIt can be to signalIt has an impact.Receptive field region rj and sense can be formed by the factor 1
Jump connection between by wild region ri, jump connection refer to that information is propagated without by appointing between receptive field region ri and rj
What neural network structure.When this information is propagated between different zones, weighted by the factor 1, signal without
Any change.XA(extended architecture) is by widening receptive field to obtain context expressing information, but XA(extended architecture)
Big convolution kernel may disperse the back transfer signal from receptive field region rj to receptive field region ri during the training period, use jump
Jump connection allows for the signal of back transfer, and directly from receptive field region, rj is transmitted to receptive field region ri.
The weight parameter that context can be switched by gradient descent algorithm neural network optimizes, and is more advantageous to
Image segmentation.
In one embodiment, the architecture of the changeable neural network of context is as shown in Figure 8.Image to be split is defeated
After entering full convolutional neural networks, multiple convolution characteristic patterns can be exported.First convolution characteristic pattern 810, the second convolution characteristic pattern 820,
Third convolution characteristic pattern 830, Volume Four product characteristic pattern 840 etc..By taking Volume Four accumulates characteristic pattern 840 as an example, Volume Four is accumulated special
Sign Figure 84 0 is input to context and neural network can be switched, and context can be switched neural network and be divided to Volume Four product characteristic pattern 840
Super-pixel region, and according to super-pixel Area generation local feature Figure 84 4.Context can be switched neural computing and go out super-pixel
The average depth value in region, and compression or XA(extended architecture) are selected according to average depth value, generate context expressing information
846.Context can be switched neural network and generate intermediate features figure according to local feature Figure 84 4 and context expressing information 846
842, intermediate features Figure 84 2 are used for image segmentation.
In one embodiment, the corresponding partial structurtes of the average depth value in super-pixel region are as shown in Figure 9.Context can
Each super-pixel region can be calculated average depth value by switching neural network, and calculated average depth value and condition is deep
Angle value is compared, to determine that the super-pixel region is to use compression architecture processes, or use XA(extended architecture)
Processing.For example, the average depth value in the first super-pixel region 910 is 6.8, the average depth value in the second super-pixel region 920 is
7.5, the average depth value in third super-pixel region 930 is 7.3, and the average depth value in the 4th super-pixel region 940 is 3.6, the
The average depth value in five super-pixel regions 950 is 4.3, and the average depth value in the 6th super-pixel region 960 is 3.1.When preset
When condition depth value is 5.0, the first super-pixel region 910, the second super-pixel region 920 and third super-pixel region 930 are answered
This is handled using compression architecture, and the 4th super-pixel region 940, the 5th super-pixel region 950 and the six surpass picture
Plain region 960 should be handled using XA(extended architecture).
In one embodiment, a kind of image processing method is provided, this method is with applied to computer as shown in Figure 1
It is illustrated in equipment.
First, computer equipment can obtain image to be split.Image segmentation refers to dividing the image into multiple specific, tools
There is the region of peculiar property.Image to be split can obtain in real time, can also be pre-stored, for example, can pass through
The image to be split of camera user in real acquisition can also be in advance to store image to be split into database, so
Image to be split is obtained from database afterwards.
Then, image to be split can be input in the input variable of full convolutional neural networks by computer equipment, output
Convolution characteristic pattern.Using image to be split as the input item data of advance trained full convolutional neural networks.By figure to be split
As being input to convolutional layer, convolution kernel is according to corresponding length i.e. according to the size of convolution kernel from front to back to the figure to be split of input
As being scanned, and execute convolution operation.By process of convolution, followed by pond, layer is handled, and pond layer can effectively drop
Low dimensional.Full convolutional neural networks can will be exported by the convolution characteristic pattern obtained after convolutional layer and the processing of pond layer.
Then, convolution characteristic pattern can be input to the input variable that neural network can be switched in context by computer equipment
In, export context expressing information.Neural network, which can be switched, in context is trained in advance according to picture structure and depth data
, it is a kind of full convolutional neural networks that neural network, which can be switched, in context, also includes convolutional layer and pond layer.Context expression letter
Breath is some information or all information for referring to the object in influence diagram picture.Convolution characteristic pattern, which is driven into context, to be cut
After changing the input variable of neural network, convolutional layer and pond layer in the changeable neural network of context carry out convolution characteristic pattern
Processing, can obtain the context expressing information of convolution characteristic pattern.
Convolution characteristic pattern can also be divided into super-pixel region by computer equipment, and super-pixel region is convolution characteristic pattern
Subregion.After carrying out super-pixel division to image, many regions not of uniform size can be obtained, include to have in these regions
The information of effect, such as color histogram, texture information.For example, there are one people in image, we can to the image of this people into
Row super-pixel segmentation, and then by the feature extraction to each zonule, pick out these regions are which portions in human body
Divide (head, shoulder, leg), and then establishes the arthrosis image of human body.It, can after carrying out super-pixel region division to convolution characteristic pattern
To obtain multiple super-pixel regions, obtained multiple super-pixel regions are not the region of overlapping, these super-pixel regions are all
The subregion of convolution characteristic pattern.Computer equipment can also be according to super-pixel Area generation local feature figure.
Computer equipment can also calculate the average depth value in super-pixel region.The ash of each pixel in depth image
Angle value can be used for characterize scene in distance of the certain point apart from video camera, depth value be exactly in scene certain point apart from video camera
Distance value.There can be multiple objects to coexist simultaneously in super-pixel region.By the depth for obtaining each object in super-pixel region
Value, can calculate the average depth value in entire super-pixel region according to the depth value of each object.Computer equipment can be with
Context expressing information corresponding with super-pixel region is generated according to average depth value.
Average depth value can also be compared by computer equipment with condition depth value.When average depth value less-than condition
When depth value, super-pixel region is compressed.When average depth value is more than or equal to condition depth value, to super-pixel area
Domain is extended.Wherein, when being compressed to super-pixel region, the corresponding local feature figure in super-pixel region is input to default
Three convolutional neural networks handled, obtain compressed super-pixel region.Wherein, three convolutional neural networks include two
The neural network that the neural network and a convolution kernel that a convolution kernel is 1 are 3.It wherein, will when being extended to super-pixel region
The corresponding local feature figure in super-pixel region is input to preset three convolutional neural networks and is handled, super after being expanded
Pixel region.Wherein, three convolutional neural networks include the neural network that two convolution kernels are 7 and the god that a convolution kernel is 1
Through network.
Then, training obtains the changeable neural network of context in the following way:According to convolution characteristic pattern and convolution
The classification of characteristic pattern obtains input layer sequence, and input layer sequence is projected to obtain the corresponding hidden layer of the first hidden layer
Sequence node, using the first hidden layer as currently processed hidden layer.According to the corresponding hidden node sequence of currently processed hidden layer and currently
The weight and deviation of the corresponding each neuron node of processing hidden layer obtain the hidden layer section of next layer of hidden layer using Nonlinear Mapping
Point sequence repeats to enter according to the corresponding hidden node sequence of currently processed hidden layer using next layer of hidden layer as currently processed hidden layer
The corresponding weight of row each neuron node corresponding with currently processed hidden layer and deviation obtain next layer using Nonlinear Mapping
The step of hidden node sequence of hidden layer, obtains the corresponding with the classification of convolution characteristic pattern of output layer output until output layer
Context expressing information probability matrix.
It should be understood that although each step in above-mentioned flow chart is shown successively according to the instruction of arrow, this
A little steps are not that the inevitable sequence indicated according to arrow executes successively.Unless being expressly stated otherwise in the application, these steps
Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, in above-mentioned flow chart extremely
Few a part of step may include that either these sub-steps of multiple stages or stage are not necessarily same to multiple sub-steps
Moment executes completion, but can execute at different times, and the execution sequence in these sub-steps or stage is also not necessarily
It carries out successively, but can either the sub-step of other steps or at least part in stage in turn or are handed over other steps
Alternately execute.
In one embodiment, as shown in Figure 10, a kind of image processing apparatus is provided, including:Image collection module
1010, characteristic pattern output module 1020, message output module 1030 and characteristic pattern generation module 1040, wherein:
Image collection module 1010, for obtaining image to be split.
Characteristic pattern output module 1020, for image to be split to be input in the input variable of full convolutional neural networks,
Export convolution characteristic pattern.
Message output module 1030, for convolution characteristic pattern to be input to the input variable that neural network can be switched in context
In, export context expressing information.
Characteristic pattern generation module 1040, for generating intermediate features figure according to convolution characteristic pattern and context expressing information,
Intermediate features figure is for carrying out image segmentation.
In one embodiment, message output module 1030 can be also used for convolution characteristic pattern being divided into super-pixel area
Domain, super-pixel region is the subregion of convolution characteristic pattern, according to super-pixel Area generation local feature figure.
In one embodiment, message output module 1030 can be also used for calculating the average depth value in super-pixel region,
Context expressing information corresponding with super-pixel region is generated according to average depth value.
In one embodiment, as shown in figure 11, message output module 1030 includes comparison module 1032, compression module
1034 and expansion module 1036, wherein:
Comparison module 1032, for average depth value to be compared with condition depth value.
Compression module 1034, for when average depth value less-than condition depth value, being compressed to super-pixel region.
Expansion module 1036, for when average depth value be more than or equal to condition depth value when, to super-pixel region into
Row extension.
In one embodiment, compression module 1034 can be also used for the corresponding local feature figure input in super-pixel region
It is handled to preset three convolutional neural networks, obtains compressed super-pixel region.Wherein, three convolutional neural networks
Including two convolution kernels be 1 neural network and a convolution kernel be 3 neural network.
In one embodiment, expansion module 1036 can be also used for the corresponding local feature figure input in super-pixel region
It is handled to preset three convolutional neural networks, the super-pixel region after being expanded.Wherein, three convolutional neural networks
Including two convolution kernels be 7 neural network and a convolution kernel be 1 neural network.
In one embodiment, the changeable neural network of context passes through such as lower section in a kind of image segmentation device provided
Formula trains to obtain:Input layer sequence is obtained according to the classification of convolution characteristic pattern and convolution characteristic pattern, by input layer
Sequence is projected to obtain the corresponding hidden node sequence of the first hidden layer, using the first hidden layer as currently processed hidden layer.According to working as
The weight and deviation of the corresponding hidden node sequence of pre-treatment hidden layer and the corresponding each neuron node of currently processed hidden layer are adopted
Obtain the hidden node sequence of next layer of hidden layer with Nonlinear Mapping, using next layer of hidden layer as currently processed hidden layer, repeat into
Enter corresponding according to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron node of currently processed hidden layer
The step of weight and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping obtains defeated until output layer
Go out the context expressing information probability matrix corresponding with the classification of convolution characteristic pattern of layer output.
Specific about image segmentation device limits the restriction that may refer to above for image partition method, herein not
It repeats again.Modules in above-mentioned image segmentation device can be realized fully or partially through software, hardware and combinations thereof.On
Stating each module can be embedded in or independently of in the processor in computer equipment, can also store in a software form in the form of hardware
In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.
In one embodiment, a kind of computer equipment, including memory and processor are provided, is stored in memory
Computer program, the processor realize following steps when executing computer program:
Obtain image to be split;Image to be split is input in the input variable of full convolutional neural networks, exports convolution
Characteristic pattern;Convolution characteristic pattern is input in the input variable that neural network can be switched in context, exports context expressing information;
Intermediate features figure is generated according to convolution characteristic pattern and context expressing information, intermediate features figure is for carrying out image segmentation.
In one embodiment, following steps are also realized when processor executes computer program:Convolution characteristic pattern is divided
For super-pixel region, super-pixel region is the subregion of convolution characteristic pattern;According to super-pixel Area generation local feature figure.
In one embodiment, following steps are also realized when processor executes computer program:Calculate super-pixel region
Average depth value;Context expressing information corresponding with super-pixel region is generated according to average depth value.
In one embodiment, following steps are also realized when processor executes computer program:By average depth value and item
Part depth value is compared;When average depth value less-than condition depth value, super-pixel region is compressed;Work as mean depth
When value is more than or equal to condition depth value, super-pixel region is extended.
In one embodiment, following steps are also realized when processor executes computer program:Super-pixel region is corresponded to
Local feature figure be input to preset three convolutional neural networks and handled, obtain compressed super-pixel region;Wherein,
The neural network that three convolutional neural networks include the neural network that two convolution kernels are 1 and a convolution kernel is 3.
In one embodiment, following steps are also realized when processor executes computer program:Super-pixel region is corresponded to
Local feature figure be input to preset three convolutional neural networks and handled, the super-pixel region after being expanded;Wherein,
The neural network that three convolutional neural networks include the neural network that two convolution kernels are 7 and a convolution kernel is 1.
In one embodiment, training obtains the changeable neural network of context in the following way:According to convolution feature
The classification of figure and convolution characteristic pattern obtains input layer sequence, and input layer sequence is projected to obtain the first hidden layer
Corresponding hidden node sequence, using the first hidden layer as currently processed hidden layer;According to the corresponding hidden node of currently processed hidden layer
It is hidden that the weight and deviation of sequence and the corresponding each neuron node of currently processed hidden layer using Nonlinear Mapping obtain next layer
The hidden node sequence of layer repeats to enter corresponding according to currently processed hidden layer using next layer of hidden layer as currently processed hidden layer
Hidden node sequence and the corresponding weight of the corresponding each neuron node of currently processed hidden layer and deviation use Nonlinear Mapping
The step of obtaining the hidden node sequence of next layer of hidden layer, until output layer, obtain output layer output with convolution characteristic pattern
The corresponding context expressing information probability matrix of classification.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program realizes following steps when being executed by processor:
Obtain image to be split;Image to be split is input in the input variable of full convolutional neural networks, exports convolution
Characteristic pattern;Convolution characteristic pattern is input in the input variable that neural network can be switched in context, exports context expressing information;
Intermediate features figure is generated according to convolution characteristic pattern and context expressing information, intermediate features figure is for carrying out image segmentation.
In one embodiment, following steps are also realized when processor executes computer program:Convolution characteristic pattern is divided
For super-pixel region, super-pixel region is the subregion of convolution characteristic pattern;According to super-pixel Area generation local feature figure.
In one embodiment, following steps are also realized when processor executes computer program:Calculate super-pixel region
Average depth value;Context expressing information corresponding with super-pixel region is generated according to average depth value.
In one embodiment, following steps are also realized when processor executes computer program:By average depth value and item
Part depth value is compared;When average depth value less-than condition depth value, super-pixel region is compressed;Work as mean depth
When value is more than or equal to condition depth value, super-pixel region is extended.
In one embodiment, following steps are also realized when processor executes computer program:Super-pixel region is corresponded to
Local feature figure be input to preset three convolutional neural networks and handled, obtain compressed super-pixel region;Wherein,
The neural network that three convolutional neural networks include the neural network that two convolution kernels are 1 and a convolution kernel is 3.
In one embodiment, following steps are also realized when processor executes computer program:Super-pixel region is corresponded to
Local feature figure be input to preset three convolutional neural networks and handled, the super-pixel region after being expanded;Wherein,
The neural network that three convolutional neural networks include the neural network that two convolution kernels are 7 and a convolution kernel is 1.
In one embodiment, training obtains the changeable neural network of context in the following way:According to convolution feature
The classification of figure and convolution characteristic pattern obtains input layer sequence, and input layer sequence is projected to obtain the first hidden layer
Corresponding hidden node sequence, using the first hidden layer as currently processed hidden layer;According to the corresponding hidden node of currently processed hidden layer
It is hidden that the weight and deviation of sequence and the corresponding each neuron node of currently processed hidden layer using Nonlinear Mapping obtain next layer
The hidden node sequence of layer repeats to enter corresponding according to currently processed hidden layer using next layer of hidden layer as currently processed hidden layer
Hidden node sequence and the corresponding weight of the corresponding each neuron node of currently processed hidden layer and deviation use Nonlinear Mapping
The step of obtaining the hidden node sequence of next layer of hidden layer, until output layer, obtain output layer output with convolution characteristic pattern
The corresponding context expressing information probability matrix of classification.
The technical solution of the application is it was proved that feasible, and specific experimentation is as described below:
In experiment, neural network can be switched in the context tested in the present invention using two common references, is used for depth map
As the semantic segmentation of RGB-D (Red, Green, Blue, Depth Map), two common references, that is, NYUDv2 data sets and SUN-
RGBD data sets.NYUDv2 data sets are widely used in assessment segmentation performance, there is 1449 RGB-D images.In this data set
In, 795 images are for training, and 654 images are for testing.It is possible, firstly, to select 414 width from original training set
The verification collection of image.The classification of image is marked using marking pixel-by-pixel, 40 classifications are all marked in all pixels.It uses
NYUDv2 data sets assess the above method, are further compared with advanced method using SUN-RGBD data sets
Compared with.
Then, segmentation result is calculated using multiple dimensioned test.That is, using four ratios (i.e. 0.6,0.8,1,
1.1) test image is supplied to the size that test image is reset before network.For condition random field algorithm CRF
The output segmentation score of the post-processing of (conditional random field algorithm), the image of re-scaling is put down
?.
When being tested on NYUDv2 data sets, it is necessary first to calculate the sensibility of super-pixel quantity.Context can
Switch in neural network, the control section of context expressing information depends on the size of super-pixel.It is super by using tool adjustment
The size of pixel, and different scales is selected by rule of thumb, it is 500,1000,2000,4000,8000 and 12000 respectively.For
Each scale can train context that neural network can be switched based on ResNet-101 models.Nerve net can be switched in context
In the RGB image of the input depth image of network, Range image is used for handoff features, and RGB image is for dividing image.NYUDv2 is tested
The segmentation precision for demonstrate,proving collection is as shown in table 1:
Super-pixel scale | 500 | 1000 | 2000 | 4000 | 8000 | 12000 |
Segmentation precision | 42.7 | 43.5 | 45.6 | 43.6 | 44.2 | 42.9 |
Table 1
As shown in table 1, the corresponding segmentation precision of each scale is indicated with average interaction than (%).When scale is set as 500
When, segmentation accuracy is minimum.It is too small because of super pixel that this thing happens, therefore the contextual information for including is very little.With
The increase of scale, segmented performance is improving.When scale is set as 2000, neural network segmentation precision can be switched most in context
It is good.Super-pixel too conference reduces performance, is because too big super-pixel may include additional object, this object limits super
The interim of pixel property preserves.In subsequent experiment, it is changeable to build context to be continuing with 2000 scale
Neural network.
Then, experiment also needs to the strategy of partial structurtes information transmission.Partial structurtes information is transmitted to generate to be had more with region
The feature of strong relationship.Analysis is as shown in table 2, and local structural information, which is transmitted, to be replaced by other using the strategy of structural information.First
A experiment measures the performance of the method without using partial structurtes information, and neural network can be switched using the context of full version,
45.6 segmentation scoring is realized on NYUDv2 verification collection.Then it is super without transmitting that neural network can be switched in re -training context
The partial structurtes information of pixel.Similarly, all intermediate features are all handled by global identical mapping, have reached 40.3 standard
Exactness.In addition, new feature is generated using interpolation and deconvolution, wherein each region includes more extensive but regular receptive field
Information, still, these methods generate the insensitive feature of structure, and it is low that than context neural network can be switched in score.
Table 2
As shown in table 2, there is several methods that the partial structurtes information of super-pixel can be transmitted.Information passes through to identical super-pixel
In the feature in region averagely calculate, it means that partial structurtes mapping is realized by same kernel, according to this
It is a realize 43.8 segmentation score.Since same kernel does not include the parameter that can learn, selection useful information is missed
Flexibility.Using different convolution kernels, such as size is 3 × 3 and 5 × 5,1 with the finer structure for capturing super-pixel
× 1 kernel is compared, and larger kernel generates poor result.
Then, experiment also needs to the assessment of top-down changeable transmission.Given partial structurtes feature can be applied certainly
Changeable information under above is transmitted to generate context expression, and generates context expression and guided by super-pixel and depth, has
Body process is as follows:
As shown in table 3, top-down transmission can be measured according to different data, without the use of super-pixel and depth, only
Carry out construction context expression using deconvolution and interpolation.Neural network can be switched less than context in the segmentation precision of acquisition.
Table 3
In next test, the guiding of super-pixel is only deactivated, followed by top-down information is transmitted.Not super picture
Element, executes changeable process transmission in compression and extension feature mapping, and wherein information transmission is defined by conventional kernel.With this
Setting is compared, and neural network, which can be switched, in complete context has better performance.It is passed in addition to super-pixel provides more natural information
Except the fact that pass, the mean depth calculated in each super-pixel is realized more by avoiding the noise depth of isolated area
Stable feature switching.
In addition, the case where experiment is investigated in the transmission of top-down handover information without using depth.In such case
Under, compression is expressed with extension feature mapping as context respectively.As shown in table 3, independent compression/extension Feature Mapping lacks
The flexibility of weary identification segmentation feature appropriate.Their performance is less than the changeable knot of the context expression driven by depth
Structure.
Next, experiment also needs to the close feature of adjustment contextual information.Top-down changeable information is transmitted
It is made of pressure texture and expansion structure, it provides different contextual informations.These architectures use compact feature next life
It is expressed at context.In an experiment, by pressure texture and expansion structure, and show that effective compact spy may be implemented in they
Property adjusts contextual information.
Table 4
In table 4, experiment provides the comparison of different pressure texture designs.Wherein, there is one kind simply into row information pressure
The method of contracting is to learn compact feature using 1*1 convolution, and thing followed 1*1 convolution is for restoring characteristic dimension.This is than compression
Architecture generates lower accuracy.Compared with the simple alternative solution for using two continuous 1*1 convolution, pressure texture exists
It is related to 3*3 convolution between two 1*1 convolution.To a certain extent, 3*3 convolution realizes wider contextual information, supplement
It may lead to compact feature caused by the size reduction that information loses, and the feature that the 3*3 convolution of pressure texture obtains is still
It is so compact.1*1 convolution that the last one is used to restore characteristic dimension when removal, and directly generated using 3*3 convolution opposite
Compared with higher-dimension dimensional characteristics when, performance less than compression architecture.This demonstrate the important of the compact feature of 3*3 convolution generation
Property.
In table 5, the expansion structure that begins one's study is tested, and it is compared from different Information expansion modes.Again,
Only expand receptive field for the convolutional layer of 7*7 using a single convolution kernel, generates 43.8 segmentation score.Assuming that increasing
The big convolutional layer of additional convolution kernel can further increase performance.Therefore higher to obtain using the convolutional layer of two 7*7
44.2 points.Expansion structure is less than by the segmentation score that convolution above generates, expansion structure calculates tight using the convolutional layer of 1*1
Gather feature.
Table 5
Then, neural network can be switched in context and the comparative experiments of advanced method is as follows:By the changeable nerve of context
Network is compared with state-of-the-art method, and all methods are divided into two groups.All methods are all on NYUDv2 test sets
Assessment.First group includes the method that RGB image is used only and is split, and the performance of these methods is listed in RGB inputs.
Deep layer network is transmitted with top-down information, generates the segmentation feature of high quality.The accuracy of multipath refinement network exists
This group is highest.As shown in table 6:
Table 6
Then, context can be switched neural network be compared with second group of method, these methods using RGB-D images as
Input.Each depth image is encoded into the HHA images with 3 channels, to keep more rich geological information.Use HHA
Image replaces RGB image to train an independent segmentation network.Trained network is tested on HHA images to obtain
Score mapping must be divided, this mapping is mapped with the score by the network calculations of training on RGB image to be combined.Use this
Kind combined strategy, the best way is the method for cascade nature network, result 47.7.Compared with network, RGB and HHA are used
Image can promote segmentation precision.
Furthermore it is also possible to using RGB and HHA images as training and test data.Based on ResNet-101, context can
Switching neural network has reached 48.3 points.It is changeable that context is further built using deeper ResNet-152 structures
Segmentation score has been increased to 49.6 by neural network.This result is 2% or so than state-of-the-art method.
As shown in figure 12, context can be switched to the figure of the image and advanced method dividing processing of neural network dividing processing
As being compared, wherein picture collection is in NYUDv2 data sets.Neural network, which can be switched, in context can promote image segmentation essence
Degree.
Then, context can be switched neural network and also be tested on SUN-RGBD data sets.In SUN-RGBD data
The image for including 10335 use, 37 classes label is concentrated, compared with NYUDv2 data sets, SUNRGBD data sets have more complicated
Scene and depth conditions.From this data set, 5285 images is selected to be trained, it is remaining, it is tested.At this
In a experiment, neural network is can be switched into context again and is carried out collectively as the method for input picture with using RGB and HHA
Compare.The optimum performance on SUN-RGBD data sets was that cascade nature network method generates in the past.The model is based on
ResNet-152 structures handle due to having carried out rational modeling to information transmission, can use simpler ResNet-101
Structure obtains better result.With deeper ResNet-152, the segmentation precision of acquisition is 50.7, is better than all comparison sides
Method.
As shown in figure 13, context can be switched to the figure of the image and advanced method dividing processing of neural network dividing processing
As being compared, wherein picture collection is in SUN-RGBD data sets.Neural network, which can be switched, in context can promote image segmentation
Precision.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein,
Any reference to memory, storage, database or other media used in each embodiment provided herein,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield is all considered to be the range of this specification record.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
It says, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection of the application
Range.Therefore, the protection domain of the application patent should be determined by the appended claims.
Claims (10)
1. a kind of image partition method, the method includes:
Obtain image to be split;
The image to be split is input in the input variable of full convolutional neural networks, convolution characteristic pattern is exported;
The convolution characteristic pattern is input in the input variable that neural network can be switched in context, output context expression letter
Breath;
Generate intermediate features figure according to the convolution characteristic pattern and the context expressing information, the intermediate features figure be used for into
Row image segmentation.
2. according to the method described in claim 1, it is characterized in that, described be input to context by the convolution characteristic pattern and can cut
It changes in the input variable of neural network, exports context expressing information, including:
The convolution characteristic pattern is divided into super-pixel region, the super-pixel region is the subregion of the convolution characteristic pattern;
According to the super-pixel Area generation local feature figure.
3. according to the method described in claim 2, it is characterized in that, described be input to context by the convolution characteristic pattern and can cut
It changes in the input variable of neural network, exports context expressing information, further include:
Calculate the average depth value in the super-pixel region;
Context expressing information corresponding with the super-pixel region is generated according to the average depth value.
4. according to the method described in claim 3, it is characterized in that, described generate and the super picture according to the average depth value
The corresponding context expressing information in plain region, including:
The average depth value is compared with condition depth value;
When the average depth value is less than the condition depth value, the super-pixel region is compressed;
When the average depth value is more than or equal to the condition depth value, the super-pixel region is extended.
5. according to the method described in claim 4, it is characterized in that, described when the average depth value is less than the condition depth
When value, the super-pixel region is compressed, including
The corresponding local feature figure in the super-pixel region is input to preset three convolutional neural networks to handle, is obtained
Compressed super-pixel region;
Wherein, three convolutional neural networks include the neural network that two convolution kernels are 1 and the nerve that a convolution kernel is 3
Network.
6. according to the method described in claim 4, it is characterized in that, described when the average depth value is more than or equal to described
When condition depth value, the super-pixel region is extended, including:
The corresponding local feature figure in the super-pixel region is input to preset three convolutional neural networks to handle, is obtained
Super-pixel region after extension;
Wherein, three convolutional neural networks include the neural network that two convolution kernels are 7 and the nerve that a convolution kernel is 1
Network.
7. method according to any one of claims 1 to 6, which is characterized in that it is logical that neural network can be switched in the context
Under type such as is crossed to train to obtain:
Input layer sequence is obtained according to the classification of the convolution characteristic pattern and the convolution characteristic pattern, by the input layer
Sequence node is projected to obtain the corresponding hidden node sequence of the first hidden layer, using the first hidden layer as currently processed hidden layer;
According to the power of the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron node of currently processed hidden layer
Weight and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping, using next layer of hidden layer as currently processed hidden
Layer repeats to enter according to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron section of currently processed hidden layer
The step of corresponding weight of point and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping, until output
Layer obtains the context expressing information probability matrix corresponding with the classification of convolution characteristic pattern of output layer output.
8. a kind of image segmentation device, which is characterized in that described device includes:
Image collection module, for obtaining image to be split;
Characteristic pattern output module is exported for the image to be split to be input in the input variable of full convolutional neural networks
Convolution characteristic pattern;
Message output module, for the convolution characteristic pattern to be input in the input variable that neural network can be switched in context,
Export context expressing information;
Characteristic pattern generation module, for generating intermediate features figure according to the convolution characteristic pattern and the context expressing information,
The intermediate features figure is for carrying out image segmentation.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In when the processor executes the computer program the step of any one of realization claim 1 to 7 the method.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claim 1 to 7 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810463609.0A CN108765425B (en) | 2018-05-15 | 2018-05-15 | Image segmentation method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810463609.0A CN108765425B (en) | 2018-05-15 | 2018-05-15 | Image segmentation method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108765425A true CN108765425A (en) | 2018-11-06 |
CN108765425B CN108765425B (en) | 2022-04-22 |
Family
ID=64007824
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810463609.0A Active CN108765425B (en) | 2018-05-15 | 2018-05-15 | Image segmentation method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108765425B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109785336A (en) * | 2018-12-18 | 2019-05-21 | 深圳先进技术研究院 | Image partition method and device based on multipath convolutional neural networks model |
CN109886990A (en) * | 2019-01-29 | 2019-06-14 | 理光软件研究所(北京)有限公司 | A kind of image segmentation system based on deep learning |
CN110047047A (en) * | 2019-04-17 | 2019-07-23 | 广东工业大学 | Method, apparatus, equipment and the storage medium of three-dimensional appearance image information interpretation |
WO2019218136A1 (en) * | 2018-05-15 | 2019-11-21 | 深圳大学 | Image segmentation method, computer device, and storage medium |
CN110490876A (en) * | 2019-03-12 | 2019-11-22 | 珠海上工医信科技有限公司 | A kind of lightweight neural network for image segmentation |
CN110689020A (en) * | 2019-10-10 | 2020-01-14 | 湖南师范大学 | Segmentation method of mineral flotation froth image and electronic equipment |
CN110689514A (en) * | 2019-10-11 | 2020-01-14 | 深圳大学 | Training method and computer equipment for new visual angle synthetic model of transparent object |
CN110852394A (en) * | 2019-11-13 | 2020-02-28 | 联想(北京)有限公司 | Data processing method and device, computer system and readable storage medium |
CN111739025A (en) * | 2020-05-08 | 2020-10-02 | 北京迈格威科技有限公司 | Image processing method, device, terminal and storage medium |
CN112215243A (en) * | 2020-10-30 | 2021-01-12 | 百度(中国)有限公司 | Image feature extraction method, device, equipment and storage medium |
CN113421276A (en) * | 2021-07-02 | 2021-09-21 | 深圳大学 | Image processing method, device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127725A (en) * | 2016-05-16 | 2016-11-16 | 北京工业大学 | A kind of millimetre-wave radar cloud atlas dividing method based on multiresolution CNN |
US20160358024A1 (en) * | 2015-06-03 | 2016-12-08 | Hyperverge Inc. | Systems and methods for image processing |
CN106530320A (en) * | 2016-09-30 | 2017-03-22 | 深圳大学 | End-to-end image segmentation processing method and system |
CN107169974A (en) * | 2017-05-26 | 2017-09-15 | 中国科学技术大学 | It is a kind of based on the image partition method for supervising full convolutional neural networks more |
CN107403430A (en) * | 2017-06-15 | 2017-11-28 | 中山大学 | A kind of RGBD image, semantics dividing method |
CN107784654A (en) * | 2016-08-26 | 2018-03-09 | 杭州海康威视数字技术股份有限公司 | Image partition method, device and full convolutional network system |
-
2018
- 2018-05-15 CN CN201810463609.0A patent/CN108765425B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160358024A1 (en) * | 2015-06-03 | 2016-12-08 | Hyperverge Inc. | Systems and methods for image processing |
CN106127725A (en) * | 2016-05-16 | 2016-11-16 | 北京工业大学 | A kind of millimetre-wave radar cloud atlas dividing method based on multiresolution CNN |
CN107784654A (en) * | 2016-08-26 | 2018-03-09 | 杭州海康威视数字技术股份有限公司 | Image partition method, device and full convolutional network system |
CN106530320A (en) * | 2016-09-30 | 2017-03-22 | 深圳大学 | End-to-end image segmentation processing method and system |
CN107169974A (en) * | 2017-05-26 | 2017-09-15 | 中国科学技术大学 | It is a kind of based on the image partition method for supervising full convolutional neural networks more |
CN107403430A (en) * | 2017-06-15 | 2017-11-28 | 中山大学 | A kind of RGBD image, semantics dividing method |
Non-Patent Citations (1)
Title |
---|
DI LIN 等: ""Cascaded_Feature_Network_for_Semantic_Segmentation_of_RGB-D_Images"", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019218136A1 (en) * | 2018-05-15 | 2019-11-21 | 深圳大学 | Image segmentation method, computer device, and storage medium |
US11409994B2 (en) | 2018-05-15 | 2022-08-09 | Shenzhen University | Methods for image segmentation, computer devices, and storage mediums |
CN109785336A (en) * | 2018-12-18 | 2019-05-21 | 深圳先进技术研究院 | Image partition method and device based on multipath convolutional neural networks model |
CN109785336B (en) * | 2018-12-18 | 2020-11-27 | 深圳先进技术研究院 | Image segmentation method and device based on multipath convolutional neural network model |
CN109886990A (en) * | 2019-01-29 | 2019-06-14 | 理光软件研究所(北京)有限公司 | A kind of image segmentation system based on deep learning |
CN110490876A (en) * | 2019-03-12 | 2019-11-22 | 珠海上工医信科技有限公司 | A kind of lightweight neural network for image segmentation |
CN110490876B (en) * | 2019-03-12 | 2022-09-16 | 珠海全一科技有限公司 | Image segmentation method based on lightweight neural network |
CN110047047A (en) * | 2019-04-17 | 2019-07-23 | 广东工业大学 | Method, apparatus, equipment and the storage medium of three-dimensional appearance image information interpretation |
CN110689020A (en) * | 2019-10-10 | 2020-01-14 | 湖南师范大学 | Segmentation method of mineral flotation froth image and electronic equipment |
CN110689514A (en) * | 2019-10-11 | 2020-01-14 | 深圳大学 | Training method and computer equipment for new visual angle synthetic model of transparent object |
CN110689514B (en) * | 2019-10-11 | 2022-11-11 | 深圳大学 | Training method and computer equipment for new visual angle synthetic model of transparent object |
CN110852394B (en) * | 2019-11-13 | 2022-03-25 | 联想(北京)有限公司 | Data processing method and device, computer system and readable storage medium |
CN110852394A (en) * | 2019-11-13 | 2020-02-28 | 联想(北京)有限公司 | Data processing method and device, computer system and readable storage medium |
CN111739025A (en) * | 2020-05-08 | 2020-10-02 | 北京迈格威科技有限公司 | Image processing method, device, terminal and storage medium |
CN111739025B (en) * | 2020-05-08 | 2024-03-19 | 北京迈格威科技有限公司 | Image processing method, device, terminal and storage medium |
CN112215243A (en) * | 2020-10-30 | 2021-01-12 | 百度(中国)有限公司 | Image feature extraction method, device, equipment and storage medium |
CN113421276A (en) * | 2021-07-02 | 2021-09-21 | 深圳大学 | Image processing method, device and storage medium |
CN113421276B (en) * | 2021-07-02 | 2023-07-21 | 深圳大学 | Image processing method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108765425B (en) | 2022-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108765425A (en) | Image partition method, device, computer equipment and storage medium | |
CN108510485B (en) | Non-reference image quality evaluation method based on convolutional neural network | |
US11409994B2 (en) | Methods for image segmentation, computer devices, and storage mediums | |
CN109410239A (en) | A kind of text image super resolution ratio reconstruction method generating confrontation network based on condition | |
CN110738207A (en) | character detection method for fusing character area edge information in character image | |
CN108765344A (en) | A method of the single image rain line removal based on depth convolutional neural networks | |
CN103824272B (en) | The face super-resolution reconstruction method heavily identified based on k nearest neighbor | |
CN107680077A (en) | A kind of non-reference picture quality appraisement method based on multistage Gradient Features | |
CN111523521A (en) | Remote sensing image classification method for double-branch fusion multi-scale attention neural network | |
CN109087375A (en) | Image cavity fill method based on deep learning | |
CN111582230A (en) | Video behavior classification method based on space-time characteristics | |
TWI719512B (en) | Method and system for algorithm using pixel-channel shuffle convolution neural network | |
CN111275057A (en) | Image processing method, device and equipment | |
CN112200724A (en) | Single-image super-resolution reconstruction system and method based on feedback mechanism | |
CN110781912A (en) | Image classification method based on channel expansion inverse convolution neural network | |
CN112950640A (en) | Video portrait segmentation method and device, electronic equipment and storage medium | |
CN109492610A (en) | A kind of pedestrian recognition methods, device and readable storage medium storing program for executing again | |
Cao et al. | Adversarial and adaptive tone mapping operator for high dynamic range images | |
CN110084181B (en) | Remote sensing image ship target detection method based on sparse MobileNet V2 network | |
Hu et al. | Hierarchical discrepancy learning for image restoration quality assessment | |
CN109978074A (en) | Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning | |
CN113689382A (en) | Tumor postoperative life prediction method and system based on medical images and pathological images | |
CN110992320B (en) | Medical image segmentation network based on double interleaving | |
CN113361589A (en) | Rare or endangered plant leaf identification method based on transfer learning and knowledge distillation | |
CN107180419A (en) | A kind of medium filtering detection method based on PCA networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |