CN108765425A - Image partition method, device, computer equipment and storage medium - Google Patents

Image partition method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN108765425A
CN108765425A CN201810463609.0A CN201810463609A CN108765425A CN 108765425 A CN108765425 A CN 108765425A CN 201810463609 A CN201810463609 A CN 201810463609A CN 108765425 A CN108765425 A CN 108765425A
Authority
CN
China
Prior art keywords
super
context
characteristic pattern
pixel region
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810463609.0A
Other languages
Chinese (zh)
Other versions
CN108765425B (en
Inventor
林迪
黄惠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN201810463609.0A priority Critical patent/CN108765425B/en
Publication of CN108765425A publication Critical patent/CN108765425A/en
Application granted granted Critical
Publication of CN108765425B publication Critical patent/CN108765425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

This application involves a kind of image partition method, device, computer equipment and storage mediums.The method includes:Obtain image to be split, image to be split is input in the input variable of full convolutional neural networks, export convolution characteristic pattern, convolution characteristic pattern is input in the input variable that neural network can be switched in context, export context expressing information, intermediate features figure is generated according to convolution characteristic pattern and context expressing information, intermediate features figure is for carrying out image segmentation.By combining convolutional neural networks downscaled images and stacking feature of the image for calculating image, semantic, and the context expressing information that neural network generates based on context can be switched and be used for image segmentation, the accuracy of image segmentation can be improved.

Description

Image partition method, device, computer equipment and storage medium
Technical field
This application involves technical field of image processing, more particularly to a kind of image partition method, device, computer equipment And storage medium.
Background technology
With the development of image processing techniques, image segmentation is a part important in technical field of image processing, and machine Device study plays an important role in technical field of image processing.Depth data can provide the geological information in image, And include largely useful to image segmentation information in depth data, by the way that depth image is encoded to three-dimensional three kinds not With the image in channel, reuses the image after color and coding and carry out training convolutional neural networks as input, to calculate image Segmentation feature, so that it may to realize the segmentation of image.
However, it is current it is this in such a way that convolutional neural networks realize image segmentation in, convolutional neural networks it is defeated Go out the major part that can be lost and include in depth data to image segmentation useful information, there are problems that image segmentation poor accuracy.
Invention content
Based on this, it is necessary in view of the above technical problems, provide a kind of image that can improve image segmentation accuracy point Segmentation method, device, computer equipment and storage medium.
A kind of image partition method, the method includes:
Obtain image to be split;
The image to be split is input in the input variable of full convolutional neural networks, convolution characteristic pattern is exported;
The convolution characteristic pattern is input in the input variable that neural network can be switched in context, output context expression Information;
Intermediate features figure is generated according to the convolution characteristic pattern and the context expressing information, the intermediate features figure is used In progress image segmentation.
It is described in one of the embodiments, that the convolution characteristic pattern is input to the defeated of the changeable neural network of context Enter in variable, exports context expressing information, including:
The convolution characteristic pattern is divided into super-pixel region, the super-pixel region is the sub-district of the convolution characteristic pattern Domain;
According to the super-pixel Area generation local feature figure.
It is described in one of the embodiments, that the convolution characteristic pattern is input to the defeated of the changeable neural network of context Enter in variable, exports context expressing information, further include:
Calculate the average depth value in the super-pixel region;
Context expressing information corresponding with the super-pixel region is generated according to the average depth value.
It is described in one of the embodiments, to be generated on corresponding with the super-pixel region according to the average depth value Hereafter expressing information, including:
The average depth value is compared with condition depth value;
When the average depth value is less than the condition depth value, the super-pixel region is compressed;
When the average depth value is more than or equal to the condition depth value, the super-pixel region is expanded Exhibition.
It is described when the average depth value is less than the condition depth value in one of the embodiments, to described super Pixel region is compressed, including
The corresponding local feature figure in the super-pixel region is input to preset three convolutional neural networks to handle, Obtain compressed super-pixel region;
Wherein, three convolutional neural networks include neural network that two convolution kernels are 1 and a convolution kernel is 3 Neural network.
It is described when the average depth value is more than or equal to the condition depth value in one of the embodiments, The super-pixel region is extended, including:
The corresponding local feature figure in the super-pixel region is input to preset three convolutional neural networks to handle, Super-pixel region after being expanded;
Wherein, three convolutional neural networks include neural network that two convolution kernels are 7 and a convolution kernel is 1 Neural network.
Training obtains the changeable neural network of the context in the following way in one of the embodiments,:
Input layer sequence is obtained according to the classification of the convolution characteristic pattern and the convolution characteristic pattern, it will be described defeated Enter node layer sequence to be projected to obtain the corresponding hidden node sequence of the first hidden layer, using the first hidden layer as currently processed hidden Layer;
According to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron node of currently processed hidden layer Weight and deviation the hidden node sequence of next layer of hidden layer is obtained using Nonlinear Mapping, using next layer of hidden layer as current place Hidden layer is managed, repeats to enter according to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each nerve of currently processed hidden layer The step of corresponding weight of first node and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping, until defeated Go out layer, obtains the context expressing information probability matrix corresponding with the classification of convolution characteristic pattern of output layer output.
A kind of image segmentation device, described device include:
Image collection module, for obtaining image to be split;
Characteristic pattern output module, for the image to be split to be input in the input variable of full convolutional neural networks, Export convolution characteristic pattern;
Message output module, for the convolution characteristic pattern to be input to the input variable that neural network can be switched in context In, export context expressing information;
Characteristic pattern generation module, for generating intermediate features according to the convolution characteristic pattern and the context expressing information Figure, the intermediate features figure is for carrying out image segmentation.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device realizes following steps when executing the computer program:
Obtain image to be split;
The image to be split is input in the input variable of full convolutional neural networks, convolution characteristic pattern is exported;
The convolution characteristic pattern is input in the input variable that neural network can be switched in context, output context expression Information;
Intermediate features figure is generated according to the convolution characteristic pattern and the context expressing information, the intermediate features figure is used In progress image segmentation.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor Following steps are realized when row:
Obtain image to be split;
The image to be split is input in the input variable of full convolutional neural networks, convolution characteristic pattern is exported;
The convolution characteristic pattern is input in the input variable that neural network can be switched in context, output context expression Information;
Intermediate features figure is generated according to the convolution characteristic pattern and the context expressing information, the intermediate features figure is used In progress image segmentation.
Above-mentioned image partition method, device, computer equipment and storage medium will be waited for point by obtaining image to be split It cuts image to be input in the input variable of full convolutional neural networks, exports convolution characteristic pattern, convolution characteristic pattern is input to up and down In the input variable of the changeable neural network of text, context expressing information is exported, is believed according to convolution characteristic pattern and context expression Breath generates intermediate features figure, and intermediate features figure is for carrying out image segmentation.By combining convolutional neural networks downscaled images and heap Folded image is used to calculate the feature of image, semantic, and the context expressing information that neural network generates based on context can be switched and use In image segmentation, the accuracy of image segmentation can be improved.
Description of the drawings
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with Obtain other attached drawings according to these attached drawings.
Fig. 1 is the internal structure chart of one embodiment Computer equipment;
Fig. 2 is the flow diagram of image partition method in one embodiment;
Fig. 3 is the method flow schematic diagram that local feature figure is generated in one embodiment;
Fig. 4 is the method flow schematic diagram that context expressing information is generated in one embodiment;
Fig. 5 is the method flow schematic diagram handled super-pixel region in one embodiment;
Fig. 6 is the schematic diagram that architecture is compressed in one embodiment;
Fig. 7 is the schematic diagram of XA(extended architecture) in one embodiment;
Fig. 8 is the system assumption diagram that neural network can be switched in context in one embodiment;
Fig. 9 is the corresponding local structural graph of average depth value in super-pixel region in one embodiment;
Figure 10 is the structure diagram of image segmentation device in one embodiment;
Figure 11 is the structure diagram of message output module in one embodiment;
Figure 12 is that different dividing methods are compared schematic diagram in NYUDv2 data concentrated collection pictures in experimentation;
Figure 13 is that different dividing methods are compared schematic diagram in SUN-RGBD data concentrated collection pictures in experimentation.
Specific implementation mode
It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
Unless otherwise defined, all of technologies and scientific terms used here by the article and belong to the technical field of the application The normally understood meaning of technical staff is identical.The term used in the description of the present application is intended merely to description tool herein The purpose of the embodiment of body, it is not intended that in the limitation present invention.Each technical characteristic of above example can carry out arbitrary group It closes, to keep description succinct, combination not all possible to each technical characteristic in above-described embodiment is all described, however, As long as contradiction is not present in the combination of these technical characteristics, it is all considered to be the range of this specification record.
As shown in Figure 1, for the internal structure schematic diagram of one embodiment Computer equipment.The computer equipment can be Terminal can also be server, wherein terminal can be smart mobile phone, tablet computer, laptop, desktop computer, individual Digital assistants, Wearable and mobile unit etc. have the electronic equipment of communication function, and server can be independent service Device can also be server cluster.Referring to Fig.1, the computer equipment include the processor connected by system bus, it is non-volatile Property storage medium, built-in storage and network interface.Wherein, the non-volatile memory medium of the computer equipment can store operation system System and computer program, the computer program are performed, and processor may make to execute a kind of image processing method.The computer The processor of equipment supports the operation of entire computer equipment for providing calculating and control ability.It can be stored up in the built-in storage There are operating system, computer program and database.Wherein, when which is executed by processor, it may make processing Device executes a kind of image processing method.The network interface of computer equipment is for carrying out network communication.
It will be understood by those skilled in the art that structure shown in Fig. 1, is only tied with the relevant part of application scheme The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment May include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.
In one embodiment, as shown in Fig. 2, providing a kind of image partition method, it is applied in Fig. 1 in this way It illustrates, includes the following steps for computer equipment:
Step 202, image to be split is obtained.
Wherein, image is a kind of description or description of objective objects, is a kind of common information carrier, image contains It is described object for information about.For example, image can be the photo containing different objects of camera acquisition, it can also be logical Cross the information carrier containing different objects information of computer software synthesis.
Image segmentation refers to dividing the image into multiple specific, with unique properties regions.Image to be split can be It obtains in real time, can also be pre-stored, for example, the figure to be split that can be acquired by camera user in real Picture can also be that image to be split is then obtained from database by image to be split storage to database in advance.
Step 204, image to be split is input in the input variable of full convolutional neural networks, exports convolution characteristic pattern.
Full convolutional neural networks FCN (Fully Convolutional Networks) is advance trained neural network Model can be used for image segmentation.FCN for image segmentation can be recovered from abstract feature belonging to each pixel Classification further extends into the classification of pixel scale from the classification of image level.Convolutional neural networks model may include entirely Convolutional layer and pond layer.Wherein, there are convolution kernel in convolutional layer, the weight matrix for extracting feature, by the way that convolution is arranged The step-length of operation can reduce the quantity of weight.Pond layer, which is called, does down-sampling layer, can reduce the dimension of matrix.
Using image to be split as the input item data of advance trained full convolutional neural networks.Image to be split is defeated Enter to convolutional layer, convolution kernel according to corresponding length be according to convolution kernel size from front to back to the image to be split of input into Row scanning, and execute convolution operation.By process of convolution, followed by pond, layer is handled, and pond layer can effectively reduce dimension Degree.Full convolutional neural networks can will be exported by the convolution characteristic pattern obtained after convolutional layer and the processing of pond layer.
Step 206, convolution characteristic pattern is input in the input variable that neural network can be switched in context, exports context Expressing information.
It is trained in advance according to picture structure and depth data that neural network, which can be switched, in context, and context is changeable Neural network is a kind of full convolutional neural networks, also includes convolutional layer and pond layer.Context expressing information is to refer to influence Some information or all information of object in image.
After convolution characteristic pattern to be input to the input variable that neural network can be switched in context, nerve net can be switched in context Convolutional layer and pond layer in network handle convolution characteristic pattern, can obtain the context expressing information of convolution characteristic pattern.
Step 208, intermediate features figure is generated according to convolution characteristic pattern and context expressing information, intermediate features figure be used for into Row image segmentation.
Intermediate features figure can be the characteristic pattern of multiple different resolutions.Intermediate features figure can be according to differentiating from up to down The sequence of rate from low to high is ranked up.Intermediate features figure can be generated according to convolution characteristic pattern and context expressing information, Specific formula can be expressed as:Fl+1=Ml+1+Dl→(l+1), l=0 ..., L-1.Wherein, L indicates the quantity of intermediate features figure, Fl+1Indicate the intermediate features figure ultimately generated, F1Indicate the intermediate features figure with lowest resolution, FlIndicate that there is best result The intermediate features figure of resolution.The convolution characteristic pattern that M expressions are exported by full convolutional neural networks, and Dl→(l+1)Indicate convolution feature Scheme the corresponding context expressing informations of M.The intermediate features figure of generation is for carrying out image segmentation.
By obtaining image to be split, image to be split is input in the input variable of full convolutional neural networks, is exported Convolution characteristic pattern is input in the input variable that neural network can be switched in context by convolution characteristic pattern, output context expression Information generates intermediate features figure according to convolution characteristic pattern and context expressing information, and intermediate features figure is for carrying out image segmentation. By combining convolutional neural networks downscaled images and stacking feature of the image for calculating image, semantic, and based on context can cut The context expressing information for changing neural network generation is used for image segmentation, can improve the accuracy of image segmentation.
As shown in figure 3, in one embodiment, a kind of image partition method provided can also include generating local feature The process of figure, specific steps include:
Step 302, convolution characteristic pattern is divided into super-pixel region, super-pixel region is the subregion of convolution characteristic pattern.
It is the figure of Pixel-level (pixel-level) originally that super-pixel, which is a width, is divided into region class (district- Level figure) can use super-pixel algorithm to carry out super-pixel region segmentation to image.Super-pixel is carried out to image and divides it Afterwards, many regions not of uniform size can be obtained, include effective information, such as color histogram, texture in these regions Information.For example, there are one people in image, we can carry out super-pixel segmentation to the image of this people, and then by each The feature extraction of zonule, it is which part (head, shoulder, leg) in human body to pick out these regions, and then is established The arthrosis image of human body.
After carrying out super-pixel region division to convolution characteristic pattern, multiple super-pixel regions can be obtained, what is obtained is multiple super Pixel region is not the region of overlapping, these super-pixel regions are all the subregions of convolution characteristic pattern.
Step 304, according to super-pixel Area generation local feature figure.
Each super-pixel region is corresponding with local feature figure, and the formula that local feature figure generates can be expressed as:Wherein, SnIndicate that super-pixel region, region ri indicate figure to be split A receptive field as in.Receptive field is the size in visual experience region, and in convolutional neural networks, the definition of receptive field is The area size that pixel on the characteristic pattern of each layer of output of convolutional neural networks maps on the original image.φ(Sn) indicate The set at receptive field center in multiple super-pixel regions, H (:) indicate local structural graph.It can be obtained from formula, for region Ri contains the feature of intermediate features figure in the local feature figure of generation, and therefore, the local feature figure of generation remains original area Content in the ri of domain.
By the way that convolution characteristic pattern is divided into super-pixel region, super-pixel region is the subregion of convolution characteristic pattern, according to Super-pixel Area generation local feature figure.Convolution characteristic pattern is first divided into super-pixel region, further according to super-pixel Area generation Local feature figure can retain the content in original region, keep image segmentation more accurate.
In one embodiment, as shown in figure 4, a kind of image partition method provided can also include generating context table Up to the process of information, specific steps include:
Step 402, the average depth value in super-pixel region is calculated.
The gray value of each pixel in depth image can be used for characterizing distance of the certain point apart from video camera in scene, Depth value is exactly distance value of the certain point apart from video camera in scene.There can be multiple objects to coexist simultaneously in super-pixel region. By obtaining the depth value of each object in super-pixel region, entire super-pixel can be calculated according to the depth value of each object The average depth value in region.
Step 404, context expressing information corresponding with super-pixel region is generated according to average depth value.
Depth value is the significant data for generating context expressing information.Each corresponding context expression letter in super-pixel region Breath is generated according to the average depth value in each super-pixel region.
By calculating the average depth value in super-pixel region, generated on corresponding with super-pixel region according to average depth value Hereafter expressing information.Corresponding context expressing information is generated according to the average depth value in super-pixel region, generation can be made Context expressing information is more accurate, to improve the accuracy of image segmentation.
As shown in figure 5, in one embodiment, a kind of image partition method provided can also include to super-pixel region The process handled, specific steps include:
Step 502, average depth value is compared with condition depth value.
Condition depth value can be the specific numerical value pre-set.Computer equipment is calculating mean depth Size comparison can be carried out after value with condition depth value.
Step 504, when average depth value less-than condition depth value, super-pixel region is compressed.
When average depth value less-than condition depth value, indicates that the information content in super-pixel region is larger, need using pressure Contracting architecture refines the information in super-pixel region, reduces the diversified information of transition in super-pixel region.Compression Architecture can learn again to be weighted corresponding super-pixel region, and the formula compressed to super-pixel region is:Wherein, rj indicates that the super-pixel region of average depth value less-than condition depth value, c indicate pressure Contracting architecture, DlIndicate structure feature figure, andRepresenting matrix corresponding element is multiplied.It indicates to pass through compression body tying The compressed super-pixel region of structure.
Step 506, when average depth value is more than or equal to condition depth value, super-pixel region is extended.
When average depth value is more than or equal to condition depth value, indicates that the information content in super-pixel region is less, need Information in super-pixel region is enriched using XA(extended architecture).XA(extended architecture) is extended super-pixel region Formula is:Wherein,Indicate the super-pixel region by XA(extended architecture) extension.
It is right when average depth value less-than condition depth value by the way that average depth value to be compared with condition depth value Super-pixel region is compressed, and when average depth value is more than or equal to condition depth value, is extended to super-pixel region. Compression architecture or XA(extended architecture) is selected to handle super-pixel region according to the size of average depth value, it can be with Improve the accuracy of image segmentation.
In one embodiment, the context expressing information that neural network generation can be switched by context can pass through public affairs Formula indicates that specific formula is:
Wherein, super-pixel region SnWith super-pixel region SmIt is adjacent, it can be by super-pixel region S by this formulamIn impression The wild top-down information of region rj is transmitted to receptive field region ri,Indicate XA(extended architecture),Indicate compression body tying Structure, d (Sn) indicate super-pixel region SnMean depth.Indicator functionFor switching XA(extended architecture) and compression system Structure.As d (Sn) < d (Sm) when, compression architecture can be switched to refine the information of receptive field region ri, as d (Sn) > d(Sm) when, compression architecture can be switched to enrich the information of receptive field region ri.
In one embodiment, as shown in fig. 6, a kind of image partition method provided can also include to super-pixel region The process compressed, specifically includes:The corresponding local feature figure in super-pixel region is input to preset three convolutional Neurals Network is handled, and compressed super-pixel region is obtained.Wherein, it is 1 that three convolutional neural networks, which include two convolution kernels, The neural network that neural network and a convolution kernel are 3.
As shown in fig. 6, by local feature Figure 61 0 as inputting, it is output in compression architecture.Compress architecture by First 1*1 convolutional layers 620,3*3 convolutional layers 630 and the 2nd 1*1 convolutional layers 640 composition.Wherein, 620 He of the first 1*1 convolutional layers 2nd 1*1 convolutional layers 640 are the convolutional layer that convolution kernel is 1, and 3*3 convolutional layers 630 are the convolutional layer that convolution kernel is 3.
After local feature Figure 61 0 is inputted compression architecture, handled by the first 1*1 convolutional layers 620.First 1*1 The process that dimension halves can be filtered local feature Figure 61 0 by convolutional layer 620 for halving the dimension of local feature Figure 61 0 In garbage, the useful information in local feature Figure 61 0 can also be retained.After dimensionality reduction, 3*3 convolutional layers 630 can will be tieed up Degree restores, and rebuilds back original dimension, reuses the 2nd 1*1 convolutional layers and generates weighing vector c (D againl(rj)), and according to adding Weight vector c (Dl(rj)) compressed super-pixel region is generated.
As shown in fig. 7, in one embodiment, a kind of image partition method provided can also include to super-pixel region The process being extended, specifically includes:The corresponding local feature figure in super-pixel region is input to preset three convolutional Neurals Network is handled, the super-pixel region after being expanded.Wherein, it is 7 that three convolutional neural networks, which include two convolution kernels, The neural network that neural network and a convolution kernel are 1.
As shown in fig. 7, XA(extended architecture) is by the first 7*7 convolutional layers 720,1*1 convolutional layers 730 and the 2nd 7*7 convolution Layer 740 forms.After 0 input expanding architectures of local feature Figure 71, the first 7*7 convolutional layers 720 are come using larger kernel Expand receptive field, and learns relevant context expressing information.1*1 convolutional layers 730 remove the first 7*7 for halving dimension The redundancy that the big kernel of convolutional layer 720 includes.2nd 7*7 convolutional layers 740 are for restoring dimension, the 2nd 7*7 convolutional layers 740 It can be with ε (DlAnd D (rj))l(rj) dimension matching.
In one embodiment, training obtains the changeable neural network of context in the following way:According to convolution feature The classification of figure and convolution characteristic pattern obtains input layer sequence, and input layer sequence is projected to obtain the first hidden layer Corresponding hidden node sequence, using the first hidden layer as currently processed hidden layer.
According to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron node of currently processed hidden layer Weight and deviation the hidden node sequence of next layer of hidden layer is obtained using Nonlinear Mapping, using next layer of hidden layer as current place Hidden layer is managed, repeats to enter according to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each nerve of currently processed hidden layer The step of corresponding weight of first node and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping, until defeated Go out layer, obtains the context expressing information probability matrix corresponding with the classification of convolution characteristic pattern of output layer output.
Convolution characteristic pattern is input to after neural network can be switched in context and will produce the local feature figure of different resolution, and The local feature figure of generation is sent in the grader pixel-by-pixel for semantic segmentation.Grader can be that part is special pixel-by-pixel The pixel for levying figure exports one group of class label, and the sequence of class label can be expressed as:Y=f (Fl), wherein function f (:) be One flexible maximum value regressor for generating classification pixel-by-pixel.Y=f (Fl) can be used for predicting class label pixel-by-pixel.Instruction Practicing the object function of the changeable neural network of context can be formulated as:Wherein, Function L (:) it is flexible maximum loss value, for receptive field region ri, y (ri) can be used to indicate the prediction kind of receptive field region ri Class label.Result and the types of forecast label that neural network output can will can be switched in computer equipment by context carry out pair Than to realize that the training of neural network can be switched in context.
Context can be switched neural network in processing procedure be:The convolution feature that will be generated by full convolutional neural networks The convolution characteristic pattern of input is divided into super-pixel region by the class label of figure and convolution characteristic pattern as input, according to super Pixel region generates local feature figure.The average depth value in super-pixel region is calculated, according to the size of average depth value Super-pixel region is handled.When average depth value less-than condition depth value, using compression architecture to super-pixel area Domain is handled, and the context expressing information in compressed super-pixel region is obtained;When average depth value is more than or equal to item When part depth value, super-pixel region is handled using XA(extended architecture), and the super-pixel region after being expanded is upper Hereafter expressing information, then obtained context expressing information is exported.
In one embodiment, the weight parameter in neural network can be switched to context using gradient descent method to adjust It is whole.
The calculation formula of gradient is:Wherein, Sn And SmIt is two adjacent super-pixel regions, ri, rj, rk indicate the receptive field in image to be split, J respectively Indicate that the object function of neural network model can be switched in training context.In the calculation formula of gradient,Indicate update letter Number.It, can be to centre when the weight parameter in neural network can be switched to context using the calculation formula of gradient being adjusted Characteristic pattern optimizes.Receptive field region ri is from super-pixel region SnReceptive field region rk receive more new signalIt should More new signalFor adjusting the feature being located in same super-pixel region, coexisted so that these regions show object Property.Receptive field region ri is also from neighbouring super pixels region SmMiddle Fl+1Receptive field region rj in receive more new signal.
In the calculation formula of gradient,Indicate the more new signal from receptive field region rj.When more new signal from When receptive field region rj is transmitted to receptive field region ri, more new signalAccording to signalIt is weighted, makes it Extend receptive field region rj.Meanwhile parameter lambdacAnd parameter lambdaeIt is by super-pixel region SnWith super-pixel region SmMean depth Determining switch, that is, according to super-pixel region SnWith super-pixel region SmMean depth can determine be use parameter λcIt is compressed, or uses parameter lambdaeIt is extended.For example, signalIt can be extended to:Meanwhile signalIt can be extended to:
As d (Sn) < d (Sm), i.e. super-pixel region SnAverage depth value be less than super-pixel region SmAverage depth value When, signalTo the signal transmitted from receptive field region rjThe gradient of back transfer is weighted.Compression body Architecture C (:) can be optimized by back transfer, back transfer refers to object function Be transmitted to compression architecture C (:).Wherein, weighing vector C (D againl(rk)) more new signal is also assisted inIn training When neural network can be switched in context, vectorial C (D are usedl(rk)) local feature figure D is selectedl(rk) useful for dividing in Information constructs intermediate features figure Fl+1(rj).With weighing vector C (D againl(rk)) together, for dividing in the rj of receptive field region Receptive field region ri fresh informations can preferably be instructed by cutting useful information.
As d (Sn)≥d(Sm), i.e. super-pixel region SnAverage depth value be more than or equal to super-pixel region SmAverage depth When angle value, signalIt can be to signalIt has an impact.Receptive field region rj and sense can be formed by the factor 1 Jump connection between by wild region ri, jump connection refer to that information is propagated without by appointing between receptive field region ri and rj What neural network structure.When this information is propagated between different zones, weighted by the factor 1, signal without Any change.XA(extended architecture) is by widening receptive field to obtain context expressing information, but XA(extended architecture) Big convolution kernel may disperse the back transfer signal from receptive field region rj to receptive field region ri during the training period, use jump Jump connection allows for the signal of back transfer, and directly from receptive field region, rj is transmitted to receptive field region ri.
The weight parameter that context can be switched by gradient descent algorithm neural network optimizes, and is more advantageous to Image segmentation.
In one embodiment, the architecture of the changeable neural network of context is as shown in Figure 8.Image to be split is defeated After entering full convolutional neural networks, multiple convolution characteristic patterns can be exported.First convolution characteristic pattern 810, the second convolution characteristic pattern 820, Third convolution characteristic pattern 830, Volume Four product characteristic pattern 840 etc..By taking Volume Four accumulates characteristic pattern 840 as an example, Volume Four is accumulated special Sign Figure 84 0 is input to context and neural network can be switched, and context can be switched neural network and be divided to Volume Four product characteristic pattern 840 Super-pixel region, and according to super-pixel Area generation local feature Figure 84 4.Context can be switched neural computing and go out super-pixel The average depth value in region, and compression or XA(extended architecture) are selected according to average depth value, generate context expressing information 846.Context can be switched neural network and generate intermediate features figure according to local feature Figure 84 4 and context expressing information 846 842, intermediate features Figure 84 2 are used for image segmentation.
In one embodiment, the corresponding partial structurtes of the average depth value in super-pixel region are as shown in Figure 9.Context can Each super-pixel region can be calculated average depth value by switching neural network, and calculated average depth value and condition is deep Angle value is compared, to determine that the super-pixel region is to use compression architecture processes, or use XA(extended architecture) Processing.For example, the average depth value in the first super-pixel region 910 is 6.8, the average depth value in the second super-pixel region 920 is 7.5, the average depth value in third super-pixel region 930 is 7.3, and the average depth value in the 4th super-pixel region 940 is 3.6, the The average depth value in five super-pixel regions 950 is 4.3, and the average depth value in the 6th super-pixel region 960 is 3.1.When preset When condition depth value is 5.0, the first super-pixel region 910, the second super-pixel region 920 and third super-pixel region 930 are answered This is handled using compression architecture, and the 4th super-pixel region 940, the 5th super-pixel region 950 and the six surpass picture Plain region 960 should be handled using XA(extended architecture).
In one embodiment, a kind of image processing method is provided, this method is with applied to computer as shown in Figure 1 It is illustrated in equipment.
First, computer equipment can obtain image to be split.Image segmentation refers to dividing the image into multiple specific, tools There is the region of peculiar property.Image to be split can obtain in real time, can also be pre-stored, for example, can pass through The image to be split of camera user in real acquisition can also be in advance to store image to be split into database, so Image to be split is obtained from database afterwards.
Then, image to be split can be input in the input variable of full convolutional neural networks by computer equipment, output Convolution characteristic pattern.Using image to be split as the input item data of advance trained full convolutional neural networks.By figure to be split As being input to convolutional layer, convolution kernel is according to corresponding length i.e. according to the size of convolution kernel from front to back to the figure to be split of input As being scanned, and execute convolution operation.By process of convolution, followed by pond, layer is handled, and pond layer can effectively drop Low dimensional.Full convolutional neural networks can will be exported by the convolution characteristic pattern obtained after convolutional layer and the processing of pond layer.
Then, convolution characteristic pattern can be input to the input variable that neural network can be switched in context by computer equipment In, export context expressing information.Neural network, which can be switched, in context is trained in advance according to picture structure and depth data , it is a kind of full convolutional neural networks that neural network, which can be switched, in context, also includes convolutional layer and pond layer.Context expression letter Breath is some information or all information for referring to the object in influence diagram picture.Convolution characteristic pattern, which is driven into context, to be cut After changing the input variable of neural network, convolutional layer and pond layer in the changeable neural network of context carry out convolution characteristic pattern Processing, can obtain the context expressing information of convolution characteristic pattern.
Convolution characteristic pattern can also be divided into super-pixel region by computer equipment, and super-pixel region is convolution characteristic pattern Subregion.After carrying out super-pixel division to image, many regions not of uniform size can be obtained, include to have in these regions The information of effect, such as color histogram, texture information.For example, there are one people in image, we can to the image of this people into Row super-pixel segmentation, and then by the feature extraction to each zonule, pick out these regions are which portions in human body Divide (head, shoulder, leg), and then establishes the arthrosis image of human body.It, can after carrying out super-pixel region division to convolution characteristic pattern To obtain multiple super-pixel regions, obtained multiple super-pixel regions are not the region of overlapping, these super-pixel regions are all The subregion of convolution characteristic pattern.Computer equipment can also be according to super-pixel Area generation local feature figure.
Computer equipment can also calculate the average depth value in super-pixel region.The ash of each pixel in depth image Angle value can be used for characterize scene in distance of the certain point apart from video camera, depth value be exactly in scene certain point apart from video camera Distance value.There can be multiple objects to coexist simultaneously in super-pixel region.By the depth for obtaining each object in super-pixel region Value, can calculate the average depth value in entire super-pixel region according to the depth value of each object.Computer equipment can be with Context expressing information corresponding with super-pixel region is generated according to average depth value.
Average depth value can also be compared by computer equipment with condition depth value.When average depth value less-than condition When depth value, super-pixel region is compressed.When average depth value is more than or equal to condition depth value, to super-pixel area Domain is extended.Wherein, when being compressed to super-pixel region, the corresponding local feature figure in super-pixel region is input to default Three convolutional neural networks handled, obtain compressed super-pixel region.Wherein, three convolutional neural networks include two The neural network that the neural network and a convolution kernel that a convolution kernel is 1 are 3.It wherein, will when being extended to super-pixel region The corresponding local feature figure in super-pixel region is input to preset three convolutional neural networks and is handled, super after being expanded Pixel region.Wherein, three convolutional neural networks include the neural network that two convolution kernels are 7 and the god that a convolution kernel is 1 Through network.
Then, training obtains the changeable neural network of context in the following way:According to convolution characteristic pattern and convolution The classification of characteristic pattern obtains input layer sequence, and input layer sequence is projected to obtain the corresponding hidden layer of the first hidden layer Sequence node, using the first hidden layer as currently processed hidden layer.According to the corresponding hidden node sequence of currently processed hidden layer and currently The weight and deviation of the corresponding each neuron node of processing hidden layer obtain the hidden layer section of next layer of hidden layer using Nonlinear Mapping Point sequence repeats to enter according to the corresponding hidden node sequence of currently processed hidden layer using next layer of hidden layer as currently processed hidden layer The corresponding weight of row each neuron node corresponding with currently processed hidden layer and deviation obtain next layer using Nonlinear Mapping The step of hidden node sequence of hidden layer, obtains the corresponding with the classification of convolution characteristic pattern of output layer output until output layer Context expressing information probability matrix.
It should be understood that although each step in above-mentioned flow chart is shown successively according to the instruction of arrow, this A little steps are not that the inevitable sequence indicated according to arrow executes successively.Unless being expressly stated otherwise in the application, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, in above-mentioned flow chart extremely Few a part of step may include that either these sub-steps of multiple stages or stage are not necessarily same to multiple sub-steps Moment executes completion, but can execute at different times, and the execution sequence in these sub-steps or stage is also not necessarily It carries out successively, but can either the sub-step of other steps or at least part in stage in turn or are handed over other steps Alternately execute.
In one embodiment, as shown in Figure 10, a kind of image processing apparatus is provided, including:Image collection module 1010, characteristic pattern output module 1020, message output module 1030 and characteristic pattern generation module 1040, wherein:
Image collection module 1010, for obtaining image to be split.
Characteristic pattern output module 1020, for image to be split to be input in the input variable of full convolutional neural networks, Export convolution characteristic pattern.
Message output module 1030, for convolution characteristic pattern to be input to the input variable that neural network can be switched in context In, export context expressing information.
Characteristic pattern generation module 1040, for generating intermediate features figure according to convolution characteristic pattern and context expressing information, Intermediate features figure is for carrying out image segmentation.
In one embodiment, message output module 1030 can be also used for convolution characteristic pattern being divided into super-pixel area Domain, super-pixel region is the subregion of convolution characteristic pattern, according to super-pixel Area generation local feature figure.
In one embodiment, message output module 1030 can be also used for calculating the average depth value in super-pixel region, Context expressing information corresponding with super-pixel region is generated according to average depth value.
In one embodiment, as shown in figure 11, message output module 1030 includes comparison module 1032, compression module 1034 and expansion module 1036, wherein:
Comparison module 1032, for average depth value to be compared with condition depth value.
Compression module 1034, for when average depth value less-than condition depth value, being compressed to super-pixel region.
Expansion module 1036, for when average depth value be more than or equal to condition depth value when, to super-pixel region into Row extension.
In one embodiment, compression module 1034 can be also used for the corresponding local feature figure input in super-pixel region It is handled to preset three convolutional neural networks, obtains compressed super-pixel region.Wherein, three convolutional neural networks Including two convolution kernels be 1 neural network and a convolution kernel be 3 neural network.
In one embodiment, expansion module 1036 can be also used for the corresponding local feature figure input in super-pixel region It is handled to preset three convolutional neural networks, the super-pixel region after being expanded.Wherein, three convolutional neural networks Including two convolution kernels be 7 neural network and a convolution kernel be 1 neural network.
In one embodiment, the changeable neural network of context passes through such as lower section in a kind of image segmentation device provided Formula trains to obtain:Input layer sequence is obtained according to the classification of convolution characteristic pattern and convolution characteristic pattern, by input layer Sequence is projected to obtain the corresponding hidden node sequence of the first hidden layer, using the first hidden layer as currently processed hidden layer.According to working as The weight and deviation of the corresponding hidden node sequence of pre-treatment hidden layer and the corresponding each neuron node of currently processed hidden layer are adopted Obtain the hidden node sequence of next layer of hidden layer with Nonlinear Mapping, using next layer of hidden layer as currently processed hidden layer, repeat into Enter corresponding according to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron node of currently processed hidden layer The step of weight and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping obtains defeated until output layer Go out the context expressing information probability matrix corresponding with the classification of convolution characteristic pattern of layer output.
Specific about image segmentation device limits the restriction that may refer to above for image partition method, herein not It repeats again.Modules in above-mentioned image segmentation device can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in or independently of in the processor in computer equipment, can also store in a software form in the form of hardware In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.
In one embodiment, a kind of computer equipment, including memory and processor are provided, is stored in memory Computer program, the processor realize following steps when executing computer program:
Obtain image to be split;Image to be split is input in the input variable of full convolutional neural networks, exports convolution Characteristic pattern;Convolution characteristic pattern is input in the input variable that neural network can be switched in context, exports context expressing information; Intermediate features figure is generated according to convolution characteristic pattern and context expressing information, intermediate features figure is for carrying out image segmentation.
In one embodiment, following steps are also realized when processor executes computer program:Convolution characteristic pattern is divided For super-pixel region, super-pixel region is the subregion of convolution characteristic pattern;According to super-pixel Area generation local feature figure.
In one embodiment, following steps are also realized when processor executes computer program:Calculate super-pixel region Average depth value;Context expressing information corresponding with super-pixel region is generated according to average depth value.
In one embodiment, following steps are also realized when processor executes computer program:By average depth value and item Part depth value is compared;When average depth value less-than condition depth value, super-pixel region is compressed;Work as mean depth When value is more than or equal to condition depth value, super-pixel region is extended.
In one embodiment, following steps are also realized when processor executes computer program:Super-pixel region is corresponded to Local feature figure be input to preset three convolutional neural networks and handled, obtain compressed super-pixel region;Wherein, The neural network that three convolutional neural networks include the neural network that two convolution kernels are 1 and a convolution kernel is 3.
In one embodiment, following steps are also realized when processor executes computer program:Super-pixel region is corresponded to Local feature figure be input to preset three convolutional neural networks and handled, the super-pixel region after being expanded;Wherein, The neural network that three convolutional neural networks include the neural network that two convolution kernels are 7 and a convolution kernel is 1.
In one embodiment, training obtains the changeable neural network of context in the following way:According to convolution feature The classification of figure and convolution characteristic pattern obtains input layer sequence, and input layer sequence is projected to obtain the first hidden layer Corresponding hidden node sequence, using the first hidden layer as currently processed hidden layer;According to the corresponding hidden node of currently processed hidden layer It is hidden that the weight and deviation of sequence and the corresponding each neuron node of currently processed hidden layer using Nonlinear Mapping obtain next layer The hidden node sequence of layer repeats to enter corresponding according to currently processed hidden layer using next layer of hidden layer as currently processed hidden layer Hidden node sequence and the corresponding weight of the corresponding each neuron node of currently processed hidden layer and deviation use Nonlinear Mapping The step of obtaining the hidden node sequence of next layer of hidden layer, until output layer, obtain output layer output with convolution characteristic pattern The corresponding context expressing information probability matrix of classification.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes following steps when being executed by processor:
Obtain image to be split;Image to be split is input in the input variable of full convolutional neural networks, exports convolution Characteristic pattern;Convolution characteristic pattern is input in the input variable that neural network can be switched in context, exports context expressing information; Intermediate features figure is generated according to convolution characteristic pattern and context expressing information, intermediate features figure is for carrying out image segmentation.
In one embodiment, following steps are also realized when processor executes computer program:Convolution characteristic pattern is divided For super-pixel region, super-pixel region is the subregion of convolution characteristic pattern;According to super-pixel Area generation local feature figure.
In one embodiment, following steps are also realized when processor executes computer program:Calculate super-pixel region Average depth value;Context expressing information corresponding with super-pixel region is generated according to average depth value.
In one embodiment, following steps are also realized when processor executes computer program:By average depth value and item Part depth value is compared;When average depth value less-than condition depth value, super-pixel region is compressed;Work as mean depth When value is more than or equal to condition depth value, super-pixel region is extended.
In one embodiment, following steps are also realized when processor executes computer program:Super-pixel region is corresponded to Local feature figure be input to preset three convolutional neural networks and handled, obtain compressed super-pixel region;Wherein, The neural network that three convolutional neural networks include the neural network that two convolution kernels are 1 and a convolution kernel is 3.
In one embodiment, following steps are also realized when processor executes computer program:Super-pixel region is corresponded to Local feature figure be input to preset three convolutional neural networks and handled, the super-pixel region after being expanded;Wherein, The neural network that three convolutional neural networks include the neural network that two convolution kernels are 7 and a convolution kernel is 1.
In one embodiment, training obtains the changeable neural network of context in the following way:According to convolution feature The classification of figure and convolution characteristic pattern obtains input layer sequence, and input layer sequence is projected to obtain the first hidden layer Corresponding hidden node sequence, using the first hidden layer as currently processed hidden layer;According to the corresponding hidden node of currently processed hidden layer It is hidden that the weight and deviation of sequence and the corresponding each neuron node of currently processed hidden layer using Nonlinear Mapping obtain next layer The hidden node sequence of layer repeats to enter corresponding according to currently processed hidden layer using next layer of hidden layer as currently processed hidden layer Hidden node sequence and the corresponding weight of the corresponding each neuron node of currently processed hidden layer and deviation use Nonlinear Mapping The step of obtaining the hidden node sequence of next layer of hidden layer, until output layer, obtain output layer output with convolution characteristic pattern The corresponding context expressing information probability matrix of classification.
The technical solution of the application is it was proved that feasible, and specific experimentation is as described below:
In experiment, neural network can be switched in the context tested in the present invention using two common references, is used for depth map As the semantic segmentation of RGB-D (Red, Green, Blue, Depth Map), two common references, that is, NYUDv2 data sets and SUN- RGBD data sets.NYUDv2 data sets are widely used in assessment segmentation performance, there is 1449 RGB-D images.In this data set In, 795 images are for training, and 654 images are for testing.It is possible, firstly, to select 414 width from original training set The verification collection of image.The classification of image is marked using marking pixel-by-pixel, 40 classifications are all marked in all pixels.It uses NYUDv2 data sets assess the above method, are further compared with advanced method using SUN-RGBD data sets Compared with.
Then, segmentation result is calculated using multiple dimensioned test.That is, using four ratios (i.e. 0.6,0.8,1, 1.1) test image is supplied to the size that test image is reset before network.For condition random field algorithm CRF The output segmentation score of the post-processing of (conditional random field algorithm), the image of re-scaling is put down ?.
When being tested on NYUDv2 data sets, it is necessary first to calculate the sensibility of super-pixel quantity.Context can Switch in neural network, the control section of context expressing information depends on the size of super-pixel.It is super by using tool adjustment The size of pixel, and different scales is selected by rule of thumb, it is 500,1000,2000,4000,8000 and 12000 respectively.For Each scale can train context that neural network can be switched based on ResNet-101 models.Nerve net can be switched in context In the RGB image of the input depth image of network, Range image is used for handoff features, and RGB image is for dividing image.NYUDv2 is tested The segmentation precision for demonstrate,proving collection is as shown in table 1:
Super-pixel scale 500 1000 2000 4000 8000 12000
Segmentation precision 42.7 43.5 45.6 43.6 44.2 42.9
Table 1
As shown in table 1, the corresponding segmentation precision of each scale is indicated with average interaction than (%).When scale is set as 500 When, segmentation accuracy is minimum.It is too small because of super pixel that this thing happens, therefore the contextual information for including is very little.With The increase of scale, segmented performance is improving.When scale is set as 2000, neural network segmentation precision can be switched most in context It is good.Super-pixel too conference reduces performance, is because too big super-pixel may include additional object, this object limits super The interim of pixel property preserves.In subsequent experiment, it is changeable to build context to be continuing with 2000 scale Neural network.
Then, experiment also needs to the strategy of partial structurtes information transmission.Partial structurtes information is transmitted to generate to be had more with region The feature of strong relationship.Analysis is as shown in table 2, and local structural information, which is transmitted, to be replaced by other using the strategy of structural information.First A experiment measures the performance of the method without using partial structurtes information, and neural network can be switched using the context of full version, 45.6 segmentation scoring is realized on NYUDv2 verification collection.Then it is super without transmitting that neural network can be switched in re -training context The partial structurtes information of pixel.Similarly, all intermediate features are all handled by global identical mapping, have reached 40.3 standard Exactness.In addition, new feature is generated using interpolation and deconvolution, wherein each region includes more extensive but regular receptive field Information, still, these methods generate the insensitive feature of structure, and it is low that than context neural network can be switched in score.
Table 2
As shown in table 2, there is several methods that the partial structurtes information of super-pixel can be transmitted.Information passes through to identical super-pixel In the feature in region averagely calculate, it means that partial structurtes mapping is realized by same kernel, according to this It is a realize 43.8 segmentation score.Since same kernel does not include the parameter that can learn, selection useful information is missed Flexibility.Using different convolution kernels, such as size is 3 × 3 and 5 × 5,1 with the finer structure for capturing super-pixel × 1 kernel is compared, and larger kernel generates poor result.
Then, experiment also needs to the assessment of top-down changeable transmission.Given partial structurtes feature can be applied certainly Changeable information under above is transmitted to generate context expression, and generates context expression and guided by super-pixel and depth, has Body process is as follows:
As shown in table 3, top-down transmission can be measured according to different data, without the use of super-pixel and depth, only Carry out construction context expression using deconvolution and interpolation.Neural network can be switched less than context in the segmentation precision of acquisition.
Table 3
In next test, the guiding of super-pixel is only deactivated, followed by top-down information is transmitted.Not super picture Element, executes changeable process transmission in compression and extension feature mapping, and wherein information transmission is defined by conventional kernel.With this Setting is compared, and neural network, which can be switched, in complete context has better performance.It is passed in addition to super-pixel provides more natural information Except the fact that pass, the mean depth calculated in each super-pixel is realized more by avoiding the noise depth of isolated area Stable feature switching.
In addition, the case where experiment is investigated in the transmission of top-down handover information without using depth.In such case Under, compression is expressed with extension feature mapping as context respectively.As shown in table 3, independent compression/extension Feature Mapping lacks The flexibility of weary identification segmentation feature appropriate.Their performance is less than the changeable knot of the context expression driven by depth Structure.
Next, experiment also needs to the close feature of adjustment contextual information.Top-down changeable information is transmitted It is made of pressure texture and expansion structure, it provides different contextual informations.These architectures use compact feature next life It is expressed at context.In an experiment, by pressure texture and expansion structure, and show that effective compact spy may be implemented in they Property adjusts contextual information.
Table 4
In table 4, experiment provides the comparison of different pressure texture designs.Wherein, there is one kind simply into row information pressure The method of contracting is to learn compact feature using 1*1 convolution, and thing followed 1*1 convolution is for restoring characteristic dimension.This is than compression Architecture generates lower accuracy.Compared with the simple alternative solution for using two continuous 1*1 convolution, pressure texture exists It is related to 3*3 convolution between two 1*1 convolution.To a certain extent, 3*3 convolution realizes wider contextual information, supplement It may lead to compact feature caused by the size reduction that information loses, and the feature that the 3*3 convolution of pressure texture obtains is still It is so compact.1*1 convolution that the last one is used to restore characteristic dimension when removal, and directly generated using 3*3 convolution opposite Compared with higher-dimension dimensional characteristics when, performance less than compression architecture.This demonstrate the important of the compact feature of 3*3 convolution generation Property.
In table 5, the expansion structure that begins one's study is tested, and it is compared from different Information expansion modes.Again, Only expand receptive field for the convolutional layer of 7*7 using a single convolution kernel, generates 43.8 segmentation score.Assuming that increasing The big convolutional layer of additional convolution kernel can further increase performance.Therefore higher to obtain using the convolutional layer of two 7*7 44.2 points.Expansion structure is less than by the segmentation score that convolution above generates, expansion structure calculates tight using the convolutional layer of 1*1 Gather feature.
Table 5
Then, neural network can be switched in context and the comparative experiments of advanced method is as follows:By the changeable nerve of context Network is compared with state-of-the-art method, and all methods are divided into two groups.All methods are all on NYUDv2 test sets Assessment.First group includes the method that RGB image is used only and is split, and the performance of these methods is listed in RGB inputs. Deep layer network is transmitted with top-down information, generates the segmentation feature of high quality.The accuracy of multipath refinement network exists This group is highest.As shown in table 6:
Table 6
Then, context can be switched neural network be compared with second group of method, these methods using RGB-D images as Input.Each depth image is encoded into the HHA images with 3 channels, to keep more rich geological information.Use HHA Image replaces RGB image to train an independent segmentation network.Trained network is tested on HHA images to obtain Score mapping must be divided, this mapping is mapped with the score by the network calculations of training on RGB image to be combined.Use this Kind combined strategy, the best way is the method for cascade nature network, result 47.7.Compared with network, RGB and HHA are used Image can promote segmentation precision.
Furthermore it is also possible to using RGB and HHA images as training and test data.Based on ResNet-101, context can Switching neural network has reached 48.3 points.It is changeable that context is further built using deeper ResNet-152 structures Segmentation score has been increased to 49.6 by neural network.This result is 2% or so than state-of-the-art method.
As shown in figure 12, context can be switched to the figure of the image and advanced method dividing processing of neural network dividing processing As being compared, wherein picture collection is in NYUDv2 data sets.Neural network, which can be switched, in context can promote image segmentation essence Degree.
Then, context can be switched neural network and also be tested on SUN-RGBD data sets.In SUN-RGBD data The image for including 10335 use, 37 classes label is concentrated, compared with NYUDv2 data sets, SUNRGBD data sets have more complicated Scene and depth conditions.From this data set, 5285 images is selected to be trained, it is remaining, it is tested.At this In a experiment, neural network is can be switched into context again and is carried out collectively as the method for input picture with using RGB and HHA Compare.The optimum performance on SUN-RGBD data sets was that cascade nature network method generates in the past.The model is based on ResNet-152 structures handle due to having carried out rational modeling to information transmission, can use simpler ResNet-101 Structure obtains better result.With deeper ResNet-152, the segmentation precision of acquisition is 50.7, is better than all comparison sides Method.
As shown in figure 13, context can be switched to the figure of the image and advanced method dividing processing of neural network dividing processing As being compared, wherein picture collection is in SUN-RGBD data sets.Neural network, which can be switched, in context can promote image segmentation Precision.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, Any reference to memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the protection domain of the application patent should be determined by the appended claims.

Claims (10)

1. a kind of image partition method, the method includes:
Obtain image to be split;
The image to be split is input in the input variable of full convolutional neural networks, convolution characteristic pattern is exported;
The convolution characteristic pattern is input in the input variable that neural network can be switched in context, output context expression letter Breath;
Generate intermediate features figure according to the convolution characteristic pattern and the context expressing information, the intermediate features figure be used for into Row image segmentation.
2. according to the method described in claim 1, it is characterized in that, described be input to context by the convolution characteristic pattern and can cut It changes in the input variable of neural network, exports context expressing information, including:
The convolution characteristic pattern is divided into super-pixel region, the super-pixel region is the subregion of the convolution characteristic pattern;
According to the super-pixel Area generation local feature figure.
3. according to the method described in claim 2, it is characterized in that, described be input to context by the convolution characteristic pattern and can cut It changes in the input variable of neural network, exports context expressing information, further include:
Calculate the average depth value in the super-pixel region;
Context expressing information corresponding with the super-pixel region is generated according to the average depth value.
4. according to the method described in claim 3, it is characterized in that, described generate and the super picture according to the average depth value The corresponding context expressing information in plain region, including:
The average depth value is compared with condition depth value;
When the average depth value is less than the condition depth value, the super-pixel region is compressed;
When the average depth value is more than or equal to the condition depth value, the super-pixel region is extended.
5. according to the method described in claim 4, it is characterized in that, described when the average depth value is less than the condition depth When value, the super-pixel region is compressed, including
The corresponding local feature figure in the super-pixel region is input to preset three convolutional neural networks to handle, is obtained Compressed super-pixel region;
Wherein, three convolutional neural networks include the neural network that two convolution kernels are 1 and the nerve that a convolution kernel is 3 Network.
6. according to the method described in claim 4, it is characterized in that, described when the average depth value is more than or equal to described When condition depth value, the super-pixel region is extended, including:
The corresponding local feature figure in the super-pixel region is input to preset three convolutional neural networks to handle, is obtained Super-pixel region after extension;
Wherein, three convolutional neural networks include the neural network that two convolution kernels are 7 and the nerve that a convolution kernel is 1 Network.
7. method according to any one of claims 1 to 6, which is characterized in that it is logical that neural network can be switched in the context Under type such as is crossed to train to obtain:
Input layer sequence is obtained according to the classification of the convolution characteristic pattern and the convolution characteristic pattern, by the input layer Sequence node is projected to obtain the corresponding hidden node sequence of the first hidden layer, using the first hidden layer as currently processed hidden layer;
According to the power of the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron node of currently processed hidden layer Weight and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping, using next layer of hidden layer as currently processed hidden Layer repeats to enter according to the corresponding hidden node sequence of currently processed hidden layer and the corresponding each neuron section of currently processed hidden layer The step of corresponding weight of point and deviation obtain the hidden node sequence of next layer of hidden layer using Nonlinear Mapping, until output Layer obtains the context expressing information probability matrix corresponding with the classification of convolution characteristic pattern of output layer output.
8. a kind of image segmentation device, which is characterized in that described device includes:
Image collection module, for obtaining image to be split;
Characteristic pattern output module is exported for the image to be split to be input in the input variable of full convolutional neural networks Convolution characteristic pattern;
Message output module, for the convolution characteristic pattern to be input in the input variable that neural network can be switched in context, Export context expressing information;
Characteristic pattern generation module, for generating intermediate features figure according to the convolution characteristic pattern and the context expressing information, The intermediate features figure is for carrying out image segmentation.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In when the processor executes the computer program the step of any one of realization claim 1 to 7 the method.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claim 1 to 7 is realized when being executed by processor.
CN201810463609.0A 2018-05-15 2018-05-15 Image segmentation method and device, computer equipment and storage medium Active CN108765425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810463609.0A CN108765425B (en) 2018-05-15 2018-05-15 Image segmentation method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810463609.0A CN108765425B (en) 2018-05-15 2018-05-15 Image segmentation method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108765425A true CN108765425A (en) 2018-11-06
CN108765425B CN108765425B (en) 2022-04-22

Family

ID=64007824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810463609.0A Active CN108765425B (en) 2018-05-15 2018-05-15 Image segmentation method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108765425B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785336A (en) * 2018-12-18 2019-05-21 深圳先进技术研究院 Image partition method and device based on multipath convolutional neural networks model
CN109886990A (en) * 2019-01-29 2019-06-14 理光软件研究所(北京)有限公司 A kind of image segmentation system based on deep learning
CN110047047A (en) * 2019-04-17 2019-07-23 广东工业大学 Method, apparatus, equipment and the storage medium of three-dimensional appearance image information interpretation
WO2019218136A1 (en) * 2018-05-15 2019-11-21 深圳大学 Image segmentation method, computer device, and storage medium
CN110490876A (en) * 2019-03-12 2019-11-22 珠海上工医信科技有限公司 A kind of lightweight neural network for image segmentation
CN110689020A (en) * 2019-10-10 2020-01-14 湖南师范大学 Segmentation method of mineral flotation froth image and electronic equipment
CN110689514A (en) * 2019-10-11 2020-01-14 深圳大学 Training method and computer equipment for new visual angle synthetic model of transparent object
CN110852394A (en) * 2019-11-13 2020-02-28 联想(北京)有限公司 Data processing method and device, computer system and readable storage medium
CN111739025A (en) * 2020-05-08 2020-10-02 北京迈格威科技有限公司 Image processing method, device, terminal and storage medium
CN112215243A (en) * 2020-10-30 2021-01-12 百度(中国)有限公司 Image feature extraction method, device, equipment and storage medium
CN113421276A (en) * 2021-07-02 2021-09-21 深圳大学 Image processing method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127725A (en) * 2016-05-16 2016-11-16 北京工业大学 A kind of millimetre-wave radar cloud atlas dividing method based on multiresolution CNN
US20160358024A1 (en) * 2015-06-03 2016-12-08 Hyperverge Inc. Systems and methods for image processing
CN106530320A (en) * 2016-09-30 2017-03-22 深圳大学 End-to-end image segmentation processing method and system
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
CN107784654A (en) * 2016-08-26 2018-03-09 杭州海康威视数字技术股份有限公司 Image partition method, device and full convolutional network system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358024A1 (en) * 2015-06-03 2016-12-08 Hyperverge Inc. Systems and methods for image processing
CN106127725A (en) * 2016-05-16 2016-11-16 北京工业大学 A kind of millimetre-wave radar cloud atlas dividing method based on multiresolution CNN
CN107784654A (en) * 2016-08-26 2018-03-09 杭州海康威视数字技术股份有限公司 Image partition method, device and full convolutional network system
CN106530320A (en) * 2016-09-30 2017-03-22 深圳大学 End-to-end image segmentation processing method and system
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DI LIN 等: ""Cascaded_Feature_Network_for_Semantic_Segmentation_of_RGB-D_Images"", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019218136A1 (en) * 2018-05-15 2019-11-21 深圳大学 Image segmentation method, computer device, and storage medium
US11409994B2 (en) 2018-05-15 2022-08-09 Shenzhen University Methods for image segmentation, computer devices, and storage mediums
CN109785336A (en) * 2018-12-18 2019-05-21 深圳先进技术研究院 Image partition method and device based on multipath convolutional neural networks model
CN109785336B (en) * 2018-12-18 2020-11-27 深圳先进技术研究院 Image segmentation method and device based on multipath convolutional neural network model
CN109886990A (en) * 2019-01-29 2019-06-14 理光软件研究所(北京)有限公司 A kind of image segmentation system based on deep learning
CN110490876A (en) * 2019-03-12 2019-11-22 珠海上工医信科技有限公司 A kind of lightweight neural network for image segmentation
CN110490876B (en) * 2019-03-12 2022-09-16 珠海全一科技有限公司 Image segmentation method based on lightweight neural network
CN110047047A (en) * 2019-04-17 2019-07-23 广东工业大学 Method, apparatus, equipment and the storage medium of three-dimensional appearance image information interpretation
CN110689020A (en) * 2019-10-10 2020-01-14 湖南师范大学 Segmentation method of mineral flotation froth image and electronic equipment
CN110689514A (en) * 2019-10-11 2020-01-14 深圳大学 Training method and computer equipment for new visual angle synthetic model of transparent object
CN110689514B (en) * 2019-10-11 2022-11-11 深圳大学 Training method and computer equipment for new visual angle synthetic model of transparent object
CN110852394B (en) * 2019-11-13 2022-03-25 联想(北京)有限公司 Data processing method and device, computer system and readable storage medium
CN110852394A (en) * 2019-11-13 2020-02-28 联想(北京)有限公司 Data processing method and device, computer system and readable storage medium
CN111739025A (en) * 2020-05-08 2020-10-02 北京迈格威科技有限公司 Image processing method, device, terminal and storage medium
CN111739025B (en) * 2020-05-08 2024-03-19 北京迈格威科技有限公司 Image processing method, device, terminal and storage medium
CN112215243A (en) * 2020-10-30 2021-01-12 百度(中国)有限公司 Image feature extraction method, device, equipment and storage medium
CN113421276A (en) * 2021-07-02 2021-09-21 深圳大学 Image processing method, device and storage medium
CN113421276B (en) * 2021-07-02 2023-07-21 深圳大学 Image processing method, device and storage medium

Also Published As

Publication number Publication date
CN108765425B (en) 2022-04-22

Similar Documents

Publication Publication Date Title
CN108765425A (en) Image partition method, device, computer equipment and storage medium
CN108510485B (en) Non-reference image quality evaluation method based on convolutional neural network
US11409994B2 (en) Methods for image segmentation, computer devices, and storage mediums
CN109410239A (en) A kind of text image super resolution ratio reconstruction method generating confrontation network based on condition
CN110738207A (en) character detection method for fusing character area edge information in character image
CN108765344A (en) A method of the single image rain line removal based on depth convolutional neural networks
CN103824272B (en) The face super-resolution reconstruction method heavily identified based on k nearest neighbor
CN107680077A (en) A kind of non-reference picture quality appraisement method based on multistage Gradient Features
CN111523521A (en) Remote sensing image classification method for double-branch fusion multi-scale attention neural network
CN109087375A (en) Image cavity fill method based on deep learning
CN111582230A (en) Video behavior classification method based on space-time characteristics
TWI719512B (en) Method and system for algorithm using pixel-channel shuffle convolution neural network
CN111275057A (en) Image processing method, device and equipment
CN112200724A (en) Single-image super-resolution reconstruction system and method based on feedback mechanism
CN110781912A (en) Image classification method based on channel expansion inverse convolution neural network
CN112950640A (en) Video portrait segmentation method and device, electronic equipment and storage medium
CN109492610A (en) A kind of pedestrian recognition methods, device and readable storage medium storing program for executing again
Cao et al. Adversarial and adaptive tone mapping operator for high dynamic range images
CN110084181B (en) Remote sensing image ship target detection method based on sparse MobileNet V2 network
Hu et al. Hierarchical discrepancy learning for image restoration quality assessment
CN109978074A (en) Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning
CN113689382A (en) Tumor postoperative life prediction method and system based on medical images and pathological images
CN110992320B (en) Medical image segmentation network based on double interleaving
CN113361589A (en) Rare or endangered plant leaf identification method based on transfer learning and knowledge distillation
CN107180419A (en) A kind of medium filtering detection method based on PCA networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant