WO2019218136A1 - 图像分割方法、计算机设备和存储介质 - Google Patents

图像分割方法、计算机设备和存储介质 Download PDF

Info

Publication number
WO2019218136A1
WO2019218136A1 PCT/CN2018/086832 CN2018086832W WO2019218136A1 WO 2019218136 A1 WO2019218136 A1 WO 2019218136A1 CN 2018086832 W CN2018086832 W CN 2018086832W WO 2019218136 A1 WO2019218136 A1 WO 2019218136A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
neural network
depth value
hidden layer
pixel region
Prior art date
Application number
PCT/CN2018/086832
Other languages
English (en)
French (fr)
Inventor
林迪
黄惠
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to US16/490,696 priority Critical patent/US11409994B2/en
Priority to PCT/CN2018/086832 priority patent/WO2019218136A1/zh
Publication of WO2019218136A1 publication Critical patent/WO2019218136A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/2163Partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to the field of image processing technologies, and in particular, to an image segmentation method, a computer device, and a storage medium.
  • the depth data can provide geometric information in the image, and the depth data contains a large amount of information useful for image segmentation by encoding the depth image into three different images of three different channels, and then using the color and the encoded image as input. To train the convolutional neural network to calculate the segmentation features of the image, the image segmentation can be achieved.
  • an image segmentation method a computer device, and a storage medium are provided.
  • An image segmentation method comprising:
  • a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the computer program to implement the following steps:
  • a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the following steps:
  • FIG. 1 is a diagram showing the internal structure of a computer device in an embodiment
  • FIG. 2 is a schematic flow chart of an image segmentation method in an embodiment
  • FIG. 3 is a schematic flow chart of a method for generating a local feature map in an embodiment
  • FIG. 4 is a schematic flow chart of a method for generating context expression information in an embodiment
  • FIG. 5 is a schematic flow chart of a method for processing a super pixel region in an embodiment
  • Figure 6 is a schematic diagram of a compression architecture in one embodiment
  • Figure 7 is a schematic diagram of an extended architecture in one embodiment
  • FIG. 8 is an architectural diagram of a context switchable neural network in one embodiment
  • FIG. 9 is a partial structural diagram corresponding to an average depth value of a super pixel region in one embodiment
  • Figure 10 is a block diagram showing the structure of an image dividing device in an embodiment
  • Figure 11 is a block diagram showing the structure of an information output module in an embodiment
  • Figure 12 is a schematic diagram of comparison of different segmentation methods in the NYUDv2 data set during the experiment.
  • Figure 13 is a schematic diagram showing the comparison of different segmentation methods in the SUN-RGBD dataset during the experiment.
  • the computer device may be a terminal or a server, wherein the terminal may be a communication device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device, and an in-vehicle device, and the server may be A standalone server can also be a server cluster.
  • the computer device includes a processor, a non-volatile storage medium, an internal memory, and a network interface connected by a system bus.
  • the non-volatile storage medium of the computer device can store an operating system and a computer program, which when executed, can cause the processor to perform an image processing method.
  • the processor of the computer device is used to provide computing and control capabilities to support the operation of the entire computer device.
  • An operating system, a computer program, and a database can be stored in the internal memory.
  • the processor when the computer program is executed by the processor, the processor can be caused to perform an image processing method.
  • the network interface of the computer device is used for network communication.
  • FIG. 1 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • an image segmentation method is provided.
  • the method is applied to the computer device in FIG. 1 as an example, and includes the following steps:
  • Step 202 Acquire an image to be segmented.
  • the image is a kind of description or photo of the objective object, which is a commonly used information carrier, and the image contains relevant information of the object to be described.
  • the image may be a photo taken by a camera containing different objects, or may be an information carrier containing different object information synthesized by computer software.
  • Image segmentation refers to the division of an image into a number of specific, uniquely characterized regions.
  • the image to be segmented may be acquired in real time or may be pre-stored.
  • the computer device may acquire the image to be segmented collected by the user in real time through the camera, or may store the image to be segmented in a database in advance, and then obtain the image from the database. Image to be split.
  • Step 204 Input the image to be segmented into an input variable of the full convolutional neural network, and output a convolution feature map.
  • Fully Convolutional Networks is a pre-trained neural network model that can be used for image segmentation.
  • the FCN for image segmentation can recover the category to which each pixel belongs from the abstract features, that is, from the image level classification to the pixel level classification.
  • the full convolutional neural network model can include a convolutional layer and a pooled layer. Among them, there is a convolution kernel in the convolutional layer, which is used to extract the weight matrix of the feature, and the number of weights can be reduced by setting the step size of the convolution operation.
  • the pooling layer also known as the downsampling layer, can reduce the dimensions of the matrix.
  • the computer device uses the image to be segmented as input data for a pre-trained full convolutional neural network.
  • the computer device inputs the image to be segmented to the convolution layer, and the convolution kernel scans the input image to be segmented from front to back according to the corresponding length, that is, according to the size of the convolution kernel, and performs a convolution operation. After convolution processing and then processing in the pooling layer, the pooling layer can effectively reduce the dimension.
  • the full convolutional neural network can output convolutional feature maps obtained by processing the convolutional layer and the pooled layer.
  • Step 206 Input the convolution feature map into an input variable of the context switchable neural network, and output context expression information.
  • the context switchable neural network is pre-trained based on image structure and depth data.
  • the context switchable neural network is a full convolutional neural network, including a convolutional layer and a pooled layer.
  • Context Expressive information refers to some or all of the information that can affect an object in an image.
  • the convolutional layer and the pooled layer in the context switchable neural network process the convolutional feature map, and the context expression of the convolutional feature graph can be obtained. information.
  • Step 208 Generate an intermediate feature map according to the convolution feature map and the context expression information, where the intermediate feature map is used for image segmentation.
  • the intermediate feature map may be a plurality of feature maps of different resolutions.
  • the intermediate feature maps can be sorted in descending order from top to bottom resolution.
  • the intermediate feature map can be generated according to the convolution feature map and the context expression information.
  • L represents the number of intermediate feature maps
  • F l+1 represents the resulting intermediate feature map
  • F 1 represents the intermediate feature map with the lowest resolution
  • F l represents the intermediate feature map with the highest resolution.
  • M denotes a convolutional feature map outputted by the full convolutional neural network
  • D l ⁇ (l+1) denotes context expression information corresponding to the convolutional feature map M.
  • the generated intermediate feature map is used for image segmentation.
  • the computer device obtains the image to be segmented, inputs the image to be segmented into an input variable of the full convolutional neural network, outputs a convolution feature map, and the computer device inputs the convolution feature map into an input variable of the context switchable neural network, and outputs
  • the context expresses information
  • the computer device generates an intermediate feature map according to the convolution feature map and the context expression information, and the intermediate feature map is used for image segmentation.
  • the computer device can reduce the image by combining the convolutional neural network and stack the image for calculating the feature of the image semantics, and according to the context switchable neural network generated context expression information for image segmentation, the accuracy of the image segmentation can be improved.
  • an image segmentation method provided may further include a process of generating a local feature map, and the specific steps include:
  • step 302 the convolution feature map is divided into super pixel regions, and the super pixel regions are sub-regions of the convolution feature map.
  • Superpixels divide a picture that is originally pixel-level into a district-level map.
  • Computer equipment can use the superpixel algorithm to subdivide the image into superpixel regions. After the computer device super-pixels the image, many areas of different sizes can be obtained, which contain valid information, such as color histogram and texture information. For example, if there is a person in the image, we can superpixel segment the image of the person, and then through the feature extraction of each small area, identify which part of the body (head, shoulder, leg) these areas are in. And then to establish a joint image of the human body.
  • a plurality of super-pixel regions can be obtained, and the obtained plurality of super-pixel regions are not overlapping regions, and the super-pixel regions are all sub-regions of the convolution feature map.
  • Step 304 Generate a local feature map according to the super pixel region.
  • Each super pixel area corresponds to a local feature map
  • the formula generated by the local feature map can be expressed as:
  • Sn represents a super pixel region
  • region ri represents a receptive field in the image to be segmented.
  • the receptive field is the size of the visual perception area.
  • the receptive field is defined as the size of the area on the original image mapped by the pixel points on the feature map output by each layer of the convolutional neural network.
  • ⁇ (S n ) represents a set of receptive field centers in a plurality of super pixel regions
  • H(:) represents a partial structure map. It can be obtained from the formula that for the region ri, the generated local feature map contains the features of the intermediate feature map, and therefore, the generated local feature map retains the content in the original region ri.
  • the computer device divides the convolution feature map into superpixel regions, which are sub-regions of the convolution feature map, and the computer device generates a local feature map according to the super-pixel region.
  • the computer device first divides the convolutional feature map into super-pixel regions, and then generates a local feature map according to the super-pixel region, which can preserve the content in the original region and make the image segmentation more accurate.
  • an image segmentation method provided may further include a process of generating context expression information, and the specific steps include:
  • Step 402 calculating an average depth value of the super pixel region.
  • the gray value of each pixel in the depth image can be used to represent the distance of a point in the scene from the camera.
  • the depth value is the distance from the camera at a certain point in the scene.
  • Multiple objects can coexist in the super pixel area at the same time.
  • the computer device can calculate the average depth value of the entire super pixel region according to the depth value of each object.
  • Step 404 Generate context expression information corresponding to the super pixel region according to the average depth value.
  • the depth value is important data for generating contextual expression information.
  • the context expression information corresponding to each super pixel region is generated based on the average depth value of each super pixel region.
  • the computer device generates context expression information corresponding to the super pixel region according to the average depth value by calculating an average depth value of the super pixel region.
  • the computer device generates corresponding context expression information according to the average depth value of the super pixel region, which can make the generated context expression information more accurate, thereby improving the accuracy of image segmentation.
  • an image segmentation method provided may further include a process of processing a super pixel region, and the specific steps include:
  • Step 502 comparing the average depth value with the conditional depth value.
  • the conditional depth value may be a specific value set in advance.
  • the computer device can compare the size with the conditional depth value after calculating the average depth value.
  • Step 504 compressing the super pixel region when the average depth value is less than the condition depth value.
  • the computer device needs to use the compression architecture to refine the information in the super-pixel region, and reduce the information of the transition diversity in the super-pixel region.
  • the compression architecture can learn to reweight the corresponding superpixel region, and the formula for compressing the superpixel region is: Wherein, RJ denotes an average depth less than the value of the super-pixel region depth value conditions, c denotes the compression architecture, D l wherein a structural view, and ⁇ denotes the matrix element corresponding to multiplication. Represents the superpixel region compressed by the compression architecture.
  • Step 506 expanding the super pixel region when the average depth value is greater than or equal to the condition depth value.
  • the computer device needs to adopt an extended architecture to enrich the information in the superpixel region.
  • the formula for extending the architecture of a superpixel region is: among them, Represents a superpixel region that has been extended by an extended architecture.
  • the computer device compares the average depth value with the conditional depth value, and when the average depth value is less than the conditional depth value, the computer device compresses the super pixel region, and when the average depth value is greater than or equal to the conditional depth value, the computer device pairs the super pixel The area is expanded.
  • the computer device selects the compression architecture or the extended architecture to process the super pixel region according to the size of the average depth value, which can improve the accuracy of image segmentation.
  • the context expression information generated by the context switchable neural network may be expressed by a formula, and the specific formula is: Wherein superpixel area S n and S m adjacent superpixel areas, the computer device by this formula can be super receptive field region of the pixel region S m rj in top-down information to the receptive field region ri, Represents an extended architecture, Of a compression architecture, d (S n) represents an average depth of superpixel area S n. Indication function Used to switch between extended architecture and compression architecture.
  • the computer device can switch to the compression architecture to refine the information of the receptive field ri, and when d(S n )>d(S m ), the computer device can switch Compress the architecture to enrich the information of the receptive field ri.
  • an image segmentation method may further include a process of compressing a super pixel region, specifically: the computer device inputs a local feature map corresponding to the super pixel region to a preset.
  • the three convolutional neural networks process to obtain a compressed superpixel region.
  • the three convolutional neural networks include two neural networks with a convolution kernel of 1 and a neural network with a convolution kernel of 3.
  • the computer device takes the local feature map 610 as input and outputs it to the compression architecture.
  • the compression architecture consists of a first 1*1 convolutional layer 620, a 3*3 convolutional layer 630, and a second 1*1 convolutional layer 640.
  • the first 1*1 convolutional layer 620 and the second 1*1 convolutional layer 640 are convolutional layers having a convolution kernel of 1, and the 3*3 convolutional layer 630 is a convolutional layer of a convolution kernel of 3.
  • the computer device After the computer device inputs the local feature map 610 into the compression architecture, it is processed by the first 1*1 convolution layer 620.
  • the first 1*1 convolutional layer 620 is used to halve the dimension of the local feature map 610, and the process of halving the dimension by the computer device can filter the useless information in the local feature map 610, and can also preserve the usefulness in the local feature map 610. information.
  • the 3*3 convolutional layer 630 can restore the dimension and reconstruct the original dimension.
  • the computer device then uses the second 1*1 convolutional layer to generate the re-weighting vector c(D l (rj)), and according to the weighting The vector c(D l (rj)) generates a compressed super pixel region.
  • an image segmentation method may further include a process of expanding a super pixel region, specifically: the computer device inputs a local feature map corresponding to the super pixel region to a preset.
  • the three convolutional neural networks process to obtain an extended superpixel region.
  • the three convolutional neural networks include two neural networks with a convolution kernel of 7 and a neural network with a convolution kernel of 1.
  • the extended architecture is composed of a first 7*7 convolutional layer 720, a 1*1 convolutional layer 730, and a second 7*7 convolutional layer 740.
  • the first 7*7 convolutional layer 720 uses a larger kernel to expand the receptive field and learn relevant contextual expression information.
  • the 1*1 convolutional layer 730 is used to halve the dimension, removing redundant information contained in the large kernel of the first 7*7 convolutional layer 720.
  • the second 7*7 convolutional layer 740 is used to recover the dimensions, and the second 7*7 convolutional layer 740 can also match the dimensions of ⁇ (D l (rj)) and D l (rj).
  • the context switchable neural network is trained by: the computer device obtains the input layer node sequence according to the convolution feature map and the category of the convolution feature map, and projects the input layer node sequence to obtain the first hidden layer corresponding
  • the hidden layer node sequence uses the first hidden layer as the current processing hidden layer.
  • the computer device obtains the hidden layer node sequence of the next layer of hidden layer according to the hidden layer node sequence corresponding to the current processing hidden layer and the weight and deviation of each neuron node corresponding to the current processing hidden layer, and the computer device will next layer As the current processing hidden layer, the hidden layer repeatedly enters the hidden layer according to the hidden layer sequence corresponding to the current processing hidden layer and the weight and deviation corresponding to each neuron node corresponding to the current processing hidden layer, and uses the nonlinear mapping to obtain the hidden layer of the next hidden layer. The steps of the sequence of nodes, up to the output layer, the computer device obtains a context representation information probability matrix output by the output layer corresponding to the category of the convolutional feature map.
  • the convolutional feature map is input to the context switchable neural network to generate local feature maps of different resolutions, and the generated local feature maps are sent to the pixel-by-pixel classifier for semantic segmentation.
  • the pixel-by-pixel classifier can output a set of category labels for the pixels of the local feature map.
  • the objective function of the training context switchable neural network can be expressed as: Among them, the function L(:) is the flexible maximum loss value, and for the receptive field region ri, the predicted type label of the receptive field region ri can be represented by y(ri).
  • the computer device can compare the output of the context switchable neural network with the predicted category tag to implement training of the context switchable neural network.
  • the processing in the context switchable neural network is: the computer device takes the convolution feature map generated by the full convolutional neural network and the category label of the convolution feature map as input, and the computer device divides the input convolution feature map into super pixels. The region generates a local feature map according to the super pixel region. The computer device calculates the average depth value of the super pixel region, and processes the super pixel region according to the size of the average depth value.
  • the computer device When the average depth value is less than the conditional depth value, the computer device processes the superpixel region by using a compression architecture to obtain context expression information of the compressed superpixel region; when the average depth value is greater than or equal to the conditional depth value, the computer device adopts The extended architecture processes the super pixel region, and obtains context expression information of the extended super pixel region, and then outputs the obtained context expression information.
  • the weighting parameters in the context switchable neural network are adjusted using a gradient descent method.
  • the formula for calculating the gradient is: Wherein, S n and S m is two super pixel adjacent region, ri, rj, rk be divided images respectively receptive fields, J represents the training context may handover target function neural network model.
  • J represents the training context may handover target function neural network model.
  • In the formula for calculating the gradient Indicates an update signal.
  • the intermediate feature map can be optimized when the weighting parameters in the context switchable neural network are adjusted using the gradient calculation formula.
  • Ri receptive field area update signal received from the receptive field region rk superpixel area S n The update signal Used to adjust features located in the same superpixel area so that these areas exhibit object coexistence.
  • the receptive field region ri also receives an update signal from the receptive field region rj of F l+1 in the adjacent superpixel region S m .
  • the parameter ⁇ c and the parameter ⁇ e is determined by the super-pixel region S n and the average depth of superpixel area S m of the switch, i.e., according to the super-pixel region S n and the average depth of superpixel area S m may be determined is the use of
  • the parameter ⁇ c is compressed or expanded with the parameter ⁇ e .
  • the compression architecture C(:) can be optimized by reverse transfer, and the reverse transfer is the objective function. Passed to the compression architecture C(:).
  • the re-weighting vector C (D l (rk)) also participates in the update signal
  • the computer device uses the vector C(D l (rk)) to select the information useful for segmentation in the local feature map D l (rk) to construct the intermediate feature map F l+1 (rj).
  • the information useful for segmentation in the receptive field region rj can better guide the receptive field region ri update information.
  • the jump connection between the receptive field region rj and the receptive field region ri can be formed by the factor 1, and the jump connection means that the information propagation between the receptive field regions ri and rj does not pass through any neural network structure. When this information is transmitted between different regions, it is weighted by factor 1, and the signal is not changed.
  • the extended architecture is to obtain contextual expression information by broadening the receptive field, but the large convolution kernel of the extended architecture may disperse the reverse transmission signal from the receptive field region rj to the receptive field region ri during training, using a jump connection.
  • the signal allowing the reverse transfer is directly transmitted from the receptive field region rj to the receptive field region ri.
  • the computer device can optimize the weight parameters of the context switchable neural network by using the gradient descent algorithm, which is more advantageous for image segmentation.
  • the architecture of the context switchable neural network is as shown in FIG.
  • the computer device inputs the image to be segmented into the full convolutional neural network, multiple convolution feature maps are output.
  • the fourth convolution feature map 840 is input to the context switchable neural network, and the context switchable neural network divides the super-pixel region into the fourth convolution feature map 840, and according to the super-pixel region A local feature map 844 is generated.
  • the context switchable neural network calculates an average depth value for the superpixel region and selects a compression or extension architecture based on the average depth value to generate contextual representation information 846.
  • the context switchable neural network generates an intermediate feature map 842 for image segmentation based on local feature map 844 and contextual representation information 846.
  • the local structure corresponding to the average depth value of the super pixel region is as shown in FIG.
  • the context switchable neural network calculates an average depth value for each superpixel region and compares the calculated average depth value with the conditional depth value to determine whether the superpixel region is processed using a compression architecture or an extended architecture deal with.
  • the average depth value of the first superpixel region 910 is 6.8, the average depth value of the second superpixel region 920 is 7.5, the average depth value of the third superpixel region 930 is 7.3, and the average depth of the fourth superpixel region 940.
  • the value is 3.6, the average depth value of the fifth superpixel region 950 is 4.3, and the average depth value of the sixth superpixel region 960 is 3.1.
  • the first superpixel region 910, the second superpixel region 920, and the third superpixel region 930 should be processed using a compression architecture
  • the fourth superpixel region 940, the fifth super Pixel region 950 and sixth superpixel region 960 should be processed using an extended architecture.
  • an image processing method is provided that is exemplified for use in a computer device as shown in FIG.
  • the computer device can acquire an image to be segmented.
  • Image segmentation refers to the division of an image into a number of specific, uniquely characterized regions.
  • the image to be segmented may be acquired in real time or may be pre-stored.
  • the computer device may acquire the image to be segmented collected by the user in real time through the camera, and the computer device may also store the image to be segmented in the database in advance, and then from the database. Get the image to be split.
  • the computer device can input the image to be segmented into an input variable of the full convolutional neural network, and output a convolution feature map.
  • the computer device uses the image to be segmented as input data for a pre-trained full convolutional neural network.
  • the computer device inputs the image to be segmented to the convolution layer, and the convolution kernel scans the input image to be segmented from front to back according to the corresponding length, that is, according to the size of the convolution kernel, and performs a convolution operation. After convolution processing and then processing in the pooling layer, the pooling layer can effectively reduce the dimension.
  • the full convolutional neural network can output convolutional feature maps obtained by processing the convolutional layer and the pooled layer.
  • the computer device can then input the convolutional feature map into the input variables of the context switchable neural network to output contextual expression information.
  • the context switchable neural network is pre-trained based on image structure and depth data.
  • the context switchable neural network is a full convolutional neural network, including a convolutional layer and a pooled layer.
  • Contextually expressed information refers to some or all of the information that can affect an object in an image.
  • the computer device can also divide the convolutional feature map into superpixel regions, which are subregions of the convolutional feature map. After the computer device super-pixels the image, many areas of different sizes can be obtained, which contain valid information, such as color histogram and texture information. For example, if there is a person in the image, we can superpixel segment the image of the person, and then through the feature extraction of each small area, identify which part of the body (head, shoulder, leg) these areas are in. And then to establish a joint image of the human body.
  • the computer device After the computer device divides the convolutional feature map into super-pixel regions, a plurality of super-pixel regions can be obtained, and the obtained plurality of super-pixel regions are not overlapping regions, and the super-pixel regions are all sub-regions of the convolution feature map.
  • the computer device can also generate a local feature map based on the superpixel region.
  • the computer device can also calculate an average depth value for the superpixel region.
  • the gray value of each pixel in the depth image can be used to represent the distance of a point in the scene from the camera.
  • the depth value is the distance from the camera at a certain point in the scene. Multiple objects can coexist in the super pixel area at the same time.
  • the computer device can calculate the average depth value of the entire super pixel region according to the depth value of each object.
  • the computer device can also generate contextual expression information corresponding to the superpixel region based on the average depth value.
  • the computer device can also compare the average depth value to the conditional depth value. When the average depth value is less than the conditional depth value, the computer device compresses the superpixel region. The computer device expands the superpixel region when the average depth value is greater than or equal to the conditional depth value. Wherein, when compressing the super pixel region, the local feature map corresponding to the super pixel region is input to the preset three convolutional neural networks for processing, and the computer device obtains the compressed super pixel region.
  • the three convolutional neural networks include two neural networks with a convolution kernel of 1 and a neural network with a convolution kernel of 3.
  • the computer device inputs the local feature map corresponding to the super pixel region to the preset three convolutional neural networks for processing, and obtains the extended super pixel region.
  • the three convolutional neural networks include two neural networks with a convolution kernel of 7 and a neural network with a convolution kernel of 1.
  • the context switchable neural network is trained by: the computer device obtains the input layer node sequence according to the convolution feature map and the category of the convolution feature map, and projects the input layer node sequence to obtain the hidden layer node corresponding to the first hidden layer.
  • the sequence uses the first hidden layer as the current processing hidden layer.
  • the computer device obtains the hidden layer node sequence of the next hidden layer according to the hidden layer node sequence corresponding to the current hidden layer and the weight and deviation of each neuron node corresponding to the current processing hidden layer, and the next layer of the hidden layer As the current processing hidden layer, the weights and deviations corresponding to the respective hidden neuron nodes corresponding to the current processing hidden layer and the current processing hidden layer are repeatedly entered, and the hidden layer node sequence of the next hidden layer is obtained by nonlinear mapping.
  • the steps up to the output layer the computer device obtains a context representation information probability matrix corresponding to the category of the convolutional feature map output by the output layer.
  • an image processing apparatus including: an image acquisition module 1010, a feature map output module 1020, an information output module 1030, and a feature map generation module 1040, wherein:
  • the image obtaining module 1010 is configured to acquire an image to be segmented.
  • the feature map output module 1020 is configured to input the image to be segmented into an input variable of the full convolutional neural network, and output a convolution feature map.
  • the information output module 1030 is configured to input the convolution feature map into an input variable of the context switchable neural network, and output context expression information.
  • the feature map generating module 1040 is configured to generate an intermediate feature map according to the convolution feature map and the context expression information, where the intermediate feature map is used for image segmentation.
  • the information output module 1030 is further configured to divide the convolution feature map into a super pixel region, the super pixel region is a sub-region of the convolution feature map, and generate a local feature map according to the super pixel region.
  • the information output module 1030 may be further configured to calculate an average depth value of the super pixel region, and generate context expression information corresponding to the super pixel region according to the average depth value.
  • the information output module 1030 includes a comparison module 1032, a compression module 1034, and an expansion module 1036, where:
  • the comparison module 1032 is configured to compare the average depth value with the conditional depth value.
  • the compression module 1034 is configured to compress the super pixel region when the average depth value is less than the condition depth value.
  • the expansion module 1036 is configured to expand the super pixel region when the average depth value is greater than or equal to the condition depth value.
  • the compression module 1034 is further configured to input the local feature map corresponding to the super pixel region to the preset three convolutional neural networks for processing to obtain the compressed super pixel region.
  • the three convolutional neural networks include two neural networks with a convolution kernel of 1 and a neural network with a convolution kernel of 3.
  • the expansion module 1036 is further configured to input the local feature map corresponding to the super pixel region to the preset three convolutional neural networks for processing to obtain the extended super pixel region.
  • the three convolutional neural networks include two neural networks with a convolution kernel of 7 and a neural network with a convolution kernel of 1.
  • a context switchable neural network is provided in an image segmentation apparatus obtained by training a computer device to obtain an input layer node sequence according to a convolution feature map and a category of a convolution feature map, and input layer sequence of the node The projection is performed to obtain a sequence of hidden layer nodes corresponding to the first hidden layer, and the first hidden layer is used as a current processing hidden layer.
  • the computer device obtains the hidden layer node sequence of the next hidden layer according to the hidden layer node sequence corresponding to the current hidden layer and the weight and deviation of each neuron node corresponding to the current processing hidden layer, and the next layer of the hidden layer As the current processing hidden layer, the weights and deviations corresponding to the respective hidden neuron nodes corresponding to the current processing hidden layer and the current processing hidden layer are repeatedly entered, and the hidden layer node sequence of the next hidden layer is obtained by nonlinear mapping.
  • the steps up to the output layer the computer device obtains a context representation information probability matrix corresponding to the category of the convolutional feature map output by the output layer.
  • each of the above-described image segmentation devices may be implemented in whole or in part by software, hardware, and combinations thereof.
  • Each of the above modules may be embedded in or independent of the processor in the computer device, or may be stored in a memory in the computer device in a software form, so that the processor invokes the operations corresponding to the above modules.
  • a computer apparatus comprising a memory and a processor having a computer program stored therein, the processor implementing the computer program to:
  • the convolution feature map and the context expression information generate an intermediate feature map, and the intermediate feature map is used for image segmentation.
  • the processor further implements the steps of dividing the convolved feature map into a superpixel region, the superpixel region being a subregion of the convolved feature map, and generating a local feature map from the superpixel region.
  • the processor further implements the steps of: calculating an average depth value of the super pixel region when the computer program is executed; and generating context expression information corresponding to the super pixel region according to the average depth value.
  • the processor when executing the computer program, further implements the steps of: comparing the average depth value to the conditional depth value; compressing the superpixel region when the average depth value is less than the conditional depth value; and when the average depth value is greater than Or equal to the conditional depth value, the superpixel region is expanded.
  • the processor executes the computer program, the following steps are further performed: the local feature map corresponding to the super pixel region is input to the preset three convolutional neural networks for processing, and the compressed super pixel region is obtained;
  • the three convolutional neural networks include two neural networks with a convolution kernel of one and a neural network with a convolution kernel of three.
  • the processor executes the computer program, the following steps are further performed: the local feature map corresponding to the super pixel region is input to the preset three convolutional neural networks for processing, and the extended super pixel region is obtained;
  • the three convolutional neural networks include two neural networks with a convolution kernel of 7 and a neural network with a convolution kernel of 1.
  • the context switchable neural network is trained by: obtaining an input layer node sequence according to a convolution feature map and a convolution feature graph category, and projecting the input layer node sequence to obtain a hidden corresponding to the first hidden layer
  • the layer node sequence uses the first hidden layer as the current processing hidden layer; according to the current processing hidden layer corresponding hidden layer node sequence and the weight and deviation of each neuron node corresponding to the current processing hidden layer, the nonlinear layer is used to obtain the next layer of hidden
  • the hidden layer node sequence of the layer uses the next layer of the hidden layer as the current processing hidden layer, and repeatedly enters the weights and deviations corresponding to the respective hidden and corresponding hidden layers of the hidden layer according to the current processing hidden layer.
  • the linear mapping obtains the sequence of the hidden layer node of the next layer of the hidden layer until the output layer obtains the context expression information probability matrix outputted by the output layer corresponding to the category of the convolutional feature map.
  • a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the following steps:
  • the convolution feature map and the context expression information generate an intermediate feature map, and the intermediate feature map is used for image segmentation.
  • the processor further implements the steps of dividing the convolved feature map into a superpixel region, the superpixel region being a subregion of the convolved feature map, and generating a local feature map from the superpixel region.
  • the processor when executing the computer program, further implements the steps of: calculating an average depth value of the superpixel region; and generating contextual expression information corresponding to the superpixel region based on the average depth value.
  • the processor when executing the computer program, further implements the steps of: comparing the average depth value to the conditional depth value; compressing the superpixel region when the average depth value is less than the conditional depth value; and when the average depth value is greater than Or equal to the conditional depth value, the superpixel region is expanded.
  • the processor executes the computer program, the following steps are further performed: the local feature map corresponding to the super pixel region is input to the preset three convolutional neural networks for processing, and the compressed super pixel region is obtained;
  • the three convolutional neural networks include two neural networks with a convolution kernel of one and a neural network with a convolution kernel of three.
  • the processor executes the computer program, the following steps are further performed: the local feature map corresponding to the super pixel region is input to the preset three convolutional neural networks for processing, and the extended super pixel region is obtained;
  • the three convolutional neural networks include two neural networks with a convolution kernel of 7 and a neural network with a convolution kernel of 1.
  • the context switchable neural network is trained by: obtaining an input layer node sequence according to a convolution feature map and a convolution feature graph category, and projecting the input layer node sequence to obtain a hidden corresponding to the first hidden layer
  • the layer node sequence uses the first hidden layer as the current processing hidden layer; according to the current processing hidden layer corresponding hidden layer node sequence and the weight and deviation of each neuron node corresponding to the current processing hidden layer, the nonlinear layer is used to obtain the next layer of hidden
  • the hidden layer node sequence of the layer uses the next layer of the hidden layer as the current processing hidden layer, and repeatedly enters the weights and deviations corresponding to the respective hidden and corresponding hidden layers of the hidden layer according to the current processing hidden layer.
  • the linear mapping obtains the sequence of the hidden layer node of the next layer of the hidden layer until the output layer obtains the context expression information probability matrix outputted by the output layer corresponding to the category of the convolutional feature map.
  • the computer device tested the context switchable neural network in the present invention using two common benchmarks for semantic segmentation of the depth image RGB-D (Red, Green, Blue, Depth Map), and the two common benchmarks are the NYUDv2 data set and SUN-RGBD data set.
  • the NYUDv2 data set has been widely used to evaluate segmentation performance with 1,449 RGB-D images. In this data set, 795 images were used for training and 654 images were used for testing.
  • the computer device can select a validation set from 414 images of the original training set. Computer devices use a pixel-by-pixel label to mark the categories of images, and all pixels are labeled with 40 categories. The computer device evaluates the above method using the NYUDv2 data set and further uses the SUN-RGBD data set to compare with advanced methods.
  • the computer device uses a multi-scale test to calculate the segmentation results. That is, the computer device resizes the test image before using four ratios (ie, 0.6, 0.8, 1, 1.1) to provide the test image to the network. For post processing of the conditional random field algorithm (CRF), the output segmentation scores of the rescaled images of the computer device are averaged.
  • CRF conditional random field algorithm
  • the computer device When experimenting on the NYUDv2 dataset, the computer device first needs to calculate the sensitivity of the number of superpixels.
  • the control of contextual expression information depends in part on the size of the superpixel.
  • Computer equipment adjusts the size of the superpixels by using tools, and selects different scales by experience, namely 500, 1000, 2000, 4000, 8000, and 12000.
  • the computer device can train the context switchable neural network based on the ResNet-101 model.
  • the depth image is used to switch features
  • the RGB image is used to segment images.
  • the segmentation accuracy of the NYUDv2 verification set is shown in Table 1:
  • the segmentation precision corresponding to each scale is expressed as an average cross ratio (%).
  • the segmentation accuracy is the lowest. This happens because the superpixel is too small and therefore contains too little context information.
  • the segmentation performance increases.
  • the scale is set to 2000, the context switchable neural network has the best segmentation accuracy. Too much superpixels can degrade performance because too large superpixels may include extra objects that limit the staged preservation of superpixel properties. In subsequent experiments, the computer device can continue to build a context switchable neural network using the 2000 scale.
  • the experiment also requires a strategy for local structural information transfer.
  • Local structural information transfer produces features that have a stronger relationship with the region. Analysis As shown in Table 2, local structure information transfer is replaced by other strategies that use structural information.
  • the first experiment measured the performance of a method that did not use local structural information, applied the full version of the context switchable neural network, and achieved a 45.6 segmentation score on the NYUDv2 validation set.
  • the computer device then retrains the context switchable neural network without passing the local structure information of the superpixel.
  • all intermediate features are processed by a global identity map, achieving an accuracy of 40.3.
  • computer equipment applies interpolation and deconvolution to generate new features, where each region contains more extensive but regular receptive field information, but these methods produce structurally insensitive features with lower scores than context switchable neural networks. .
  • the information is calculated by averaging the features of the regions in the same superpixel, which means that the local structure mapping is implemented by the same kernel, according to which a split score of 43.8 is achieved. Because the same kernel does not contain learnable parameters, it misses the flexibility to choose useful information. Using different convolution kernels, such as sizes 3x3 and 5x5, a larger kernel produces poorer results than a 1x1 core that captures a finer structure of superpixels.
  • the experiment also requires an evaluation of top-down switchable delivery.
  • the computer device can apply top-down switchable information transfer to generate contextual expressions, while generating contextual expressions is guided by superpixels and depths, as follows:
  • the computer device can measure top-down transfer based on different data without using superpixels and depths, applying only deconvolution and interpolation to construct contextual representations.
  • the computer device obtains a lower segmentation accuracy than the context switchable neural network.
  • the experiment also studied the case where depth is not used in top-down switching information transfer.
  • the computer device separately maps the compressed and extended feature maps as context.
  • independent compression/expansion feature mapping lacks the flexibility to identify appropriate segmentation features. Their performance is lower than the switchable structure expressed by the depth-driven context.
  • the experiment also needs to adjust the tight features of the context information.
  • the top-down switchable information transfer consists of a compressed structure and an extended structure, which provide different context information. These architectures use compact features to generate contextual representations.
  • computer devices adjusted the context information by compressing the structure and expanding the structure and indicating that they can achieve effective compact features.
  • the context switchable neural network is compared to a second set of methods that take RGB-D images as input.
  • the computer device encodes each depth image into an HHA image with 3 channels to maintain richer geometric information.
  • Computer equipment uses HHA images to train a separate segmentation network instead of RGB images.
  • the computer-trained network is tested on the HHA image to obtain a segmentation score map that is combined with a score map calculated by the network trained on the RGB image.
  • the best approach is to cascade the feature network approach with a result of 47.7.
  • Using RGB and HHA images can improve segmentation accuracy compared to the network.
  • computer devices can also use RGB and HHA images as training and test data.
  • the context switchable neural network reached 48.3 points.
  • the computer device further uses a deeper ResNet-152 architecture to build a context switchable neural network that increases the segmentation score to 49.6. This result is about 2% better than the most advanced method.
  • the computer device compares the context-switchable neural network segmentation processed image with the advanced method segmentation processed image, wherein the image is captured in the NYUDv2 data set.
  • Context-switchable neural networks can improve image segmentation accuracy.
  • the context switchable neural network is also experimenting on the SUN-RGBD data set.
  • the SUN-RGBD data set contains 10,335 images marked with 37 classes. Compared to the NYUDv2 data set, the SUNRGBD data set has more complex scene and depth conditions. From this data set, the computer device selects 5,285 images for training, and the rest is tested. In this experiment, the computer device again compared the context switchable neural network with a method that uses RGB and HHA together as an input image. Previously the best performance on the SUN-RGBD dataset was generated by the cascading feature network approach. The model is based on the ResNet-152 architecture. Due to the reasonable modeling of information transfer by computer equipment, computer equipment can use the simpler ResNet-101 structure to achieve better results. With the deeper ResNet-152, the computer device achieves a segmentation accuracy of 50.7, which is superior to all comparison methods.
  • the computer device compares the image of the context switchable neural network segmentation with the image of the advanced method segmentation process, wherein the image is collected in the SUN-RGBD data set.
  • the context switchable neural network can improve image segmentation accuracy.
  • Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • Synchlink DRAM SLDRAM
  • Memory Bus Radbus
  • RDRAM Direct RAM
  • DRAM Direct Memory Bus Dynamic RAM
  • RDRAM Memory Bus Dynamic RAM

Abstract

一种图像分割方法、计算机设备和存储介质。所述方法包括:获取待分割图像,将待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图,将卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息,根据卷积特征图与上下文表达信息生成中间特征图,中间特征图用于进行图像分割。

Description

图像分割方法、计算机设备和存储介质 技术领域
本申请涉及图像处理技术领域,特别是涉及一种图像分割方法、计算机设备和存储介质。
背景技术
随着图像处理技术的发展,图像分割是图像处理技术领域中重要的一部分,而机器学习在图像处理技术领域中发挥着重要的作用。深度数据可以提供图像中的几何信息,且深度数据中包含有大量的对图像分割有用的信息,通过将深度图像编码为三维的三种不同通道的图像,再使用颜色和编码后的图像作为输入来训练卷积神经网络,用以计算图像的分割特征,就可以实现图像的分割。
然而,目前的这种通过卷积神经网络实现图像分割的方式中,卷积神经网络的输出会丢失深度数据中包含的大部分对图像分割有用信息,存在图像分割精确度差的问题。
发明内容
根据本申请的各种实施例,提供一种图像分割方法、计算机设备和存储介质。
一种图像分割方法,所述方法包括:
获取待分割图像;
将所述待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图;
将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上 下文表达信息;
根据所述卷积特征图与所述上下文表达信息生成中间特征图,所述中间特征图用于进行图像分割。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
获取待分割图像;
将所述待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图;
将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息;
根据所述卷积特征图与所述上下文表达信息生成中间特征图,所述中间特征图用于进行图像分割。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:
获取待分割图像;
将所述待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图;
将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息;
根据所述卷积特征图与所述上下文表达信息生成中间特征图,所述中间特征图用于进行图像分割。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为一个实施例中计算机设备的内部结构图;
图2为一个实施例中图像分割方法的流程示意图;
图3为一个实施例中生成局部特征图的方法流程示意图;
图4为一个实施例中生成上下文表达信息的方法流程示意图;
图5为一个实施例中对超像素区域进行处理的方法流程示意图;
图6为一个实施例中压缩体系结构的示意图;
图7为一个实施例中扩展体系结构的示意图;
图8为一个实施例中上下文可切换神经网络的体系结构图;
图9为一个实施例中超像素区域的平均深度值对应的局部结构图;
图10为一个实施例中图像分割装置的结构框图;
图11为一个实施例中信息输出模块的结构框图;
图12为实验过程中不同分割方法在NYUDv2数据集中采集图片进行比较示意图;
图13为实验过程中不同分割方法在SUN-RGBD数据集中采集图片进行比较示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。以上实 施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本申请书记载的范围。
如图1所示,为一个实施例中计算机设备的内部结构示意图。该计算机设备可以是终端,也可以是服务器,其中,终端可以是智能手机、平板电脑、笔记本电脑、台式电脑、个人数字助理、穿戴式设备和车载设备等具有通信功能的电子设备,服务器可以是独立的服务器,也可以是服务器集群。参照图1,该计算机设备包括通过系统总线连接的处理器、非易失性存储介质、内存储器和网络接口。其中,该计算机设备的非易失性存储介质可存储操作系统和计算机程序,该计算机程序被执行时,可使得处理器执行一种图像处理方法。该计算机设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该内存储器中可储存有操作系统、计算机程序以及数据库。其中,该计算机程序被处理器执行时,可使得处理器执行一种图像处理方法。计算机设备的网络接口用于进行网络通信。
本领域技术人员可以理解,图1中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,如图2所示,提供了一种图像分割方法,以该方法应用于图1中的计算机设备为例进行说明,包括以下步骤:
步骤202,获取待分割图像。
其中,图像是客观对象的一种描述或者写真,是一种常用的信息载体,图像包含了被描述对象的有关信息。例如,图像可以是摄像头采集的含有不 同物体的照片,还可以是通过计算机软件合成的含有不同物体信息的信息载体。
图像分割是指将图像分成多个特定的、具有独特性质的区域。待分割图像可以是实时获取的,也可以是预先存储的,例如,计算机设备可以通过摄像头实时获取用户采集的待分割图像,也可以是预先将待分割图像存储到数据库中,然后从数据库中获取待分割图像。
步骤204,将待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图。
全卷积神经网络FCN(Fully Convolutional Networks)是预先训练好的神经网络模型,可以用于图像分割。用于图像分割的FCN可以从抽象的特征中恢复出每个像素所属的类别,即从图像级别的分类进一步延伸到像素级别的分类。全卷积神经网络模型可以包括卷积层和池化层。其中,卷积层中存在有卷积核,用于提取特征的权重矩阵,通过设置卷积操作的步长可以减少权重的数量。池化层又叫做下采样层,可以降低矩阵的维度。
计算机设备将待分割图像作为预先训练好的全卷积神经网络的输入项数据。计算机设备将待分割图像输入至卷积层,卷积核按照相应的长度即按照卷积核的大小从前向后对输入的待分割图像进行扫描,并执行卷积操作。经过卷积处理后接着在池化层进行处理,池化层能够有效降低维度。全卷积神经网络可以将通过卷积层和池化层处理后得到的卷积特征图输出。
步骤206,将卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息。
上下文可切换神经网络是根据图像结构和深度数据预先训练好的,上下文可切换神经网络是一种全卷积神经网络,也包括卷积层和池化层。上下文 表达信息是指能够影响图像中的对象的一些信息或者全部信息。
计算机设备将卷积特征图输入到上下文可切换神经网络的输入变量后,上下文可切换神经网络中的卷积层和池化层对卷积特征图进行处理,可以得到卷积特征图的上下文表达信息。
步骤208,根据卷积特征图与上下文表达信息生成中间特征图,中间特征图用于进行图像分割。
中间特征图可以是多张不同分辨率的特征图。中间特征图可以按照自上向下分辨率由低到高的顺序进行排序。中间特征图可以根据卷积特征图以及上下文表达信息生成,具体的公式可以表示为:F l+1=M l+1+D l→(l+1),l=0,...,L-1。其中,L表示中间特征图的数量,F l+1表示最终生成的中间特征图,F 1表示具有最低分辨率的中间特征图,F l表示具有最高分辨率的中间特征图。M表示通过全卷积神经网络输出的卷积特征图,而D l→(l+1)表示卷积特征图M对应的上下文表达信息。生成的中间特征图用于进行图像分割。
计算机设备通过获取待分割图像,将待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图,计算机设备将卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息,计算机设备根据卷积特征图与上下文表达信息生成中间特征图,中间特征图用于进行图像分割。计算机设备通过结合卷积神经网络缩小图像并堆叠图像用于计算图像语义的特征,并根据上下文可切换神经网络生成的上下文表达信息用于图像分割,可以提高图像分割的精确度。
如图3所示,在一个实施例中,提供的一种图像分割方法还可以包括生成局部特征图的过程,具体步骤包括:
步骤302,将卷积特征图划分为超像素区域,超像素区域为卷积特征图 的子区域。
超像素是把一幅原本是像素级(pixel-level)的图,划分成区域级(district-level)的图,计算机设备可以使用超像素算法对图像进行超像素区域分割。计算机设备对图像进行超像素划分之后,可以得到许多大小不一的区域,这些区域中包含有有效的信息,比如颜色直方图、纹理信息。例如,图像中有一个人,我们可以对这个人的图像进行超像素分割,进而通过对每个小区域的特征提取,辨识出这些区域是处于人体的哪个部分(头部、肩部、腿部),进而建立人体的关节图像。
计算机设备对卷积特征图进行超像素区域划分后,可以得到多个超像素区域,得到的多个超像素区域均不是重叠的区域,这些超像素区域都是卷积特征图的子区域。
步骤304,根据超像素区域生成局部特征图。
每一个超像素区域都对应有局部特征图,局部特征图生成的公式可以表示为:
Figure PCTCN2018086832-appb-000001
其中,S n表示超像素区域,区域ri表示待分割图像中的一个感受野。感受野是视觉感受区域的大小,而在卷积神经网络中,感受野的定义是卷积神经网络每一层输出的特征图上的像素点在原始图像上映射的区域大小。φ(S n)表示多个超像素区域中感受野中心的集合,H(:)表示局部结构图。从公式可以得到,对于区域ri,生成的局部特征图中包含了中间特征图的特征,因此,生成的局部特征图保留了原来区域ri中的内容。
计算机设备通过将卷积特征图划分为超像素区域,超像素区域为卷积特征图的子区域,计算机设备根据超像素区域生成局部特征图。计算机设备先将卷积特征图划分为超像素区域,再根据超像素区域生成局部特征图,可以 保留原来区域中的内容,使图像分割更加精确。
在一个实施例中,如图4所示,提供的一种图像分割方法还可以包括生成上下文表达信息的过程,具体步骤包括:
步骤402,计算超像素区域的平均深度值。
深度图像中的每个像素点的灰度值可用于表征场景中某一点距离摄像机的远近,深度值就是场景中某一点距离摄像机的距离值。超像素区域中可以同时有多个物体共存。计算机设备通过获取超像素区域中每个物体的深度值,可以根据每个物体的深度值计算出整个超像素区域的平均深度值。
步骤404,根据平均深度值生成与超像素区域对应的上下文表达信息。
深度值是生成上下文表达信息的重要数据。每个超像素区域对应的上下文表达信息是根据每个超像素区域的平均深度值生成的。
计算机设备通过计算超像素区域的平均深度值,根据平均深度值生成与超像素区域对应的上下文表达信息。计算机设备根据超像素区域的平均深度值生成对应的上下文表达信息,可以使生成的上下文表达信息更加准确,从而提高图像分割的精确度。
如图5所示,在一个实施例中,提供的一种图像分割方法还可以包括对超像素区域进行处理的过程,具体步骤包括:
步骤502,将平均深度值与条件深度值进行比较。
条件深度值可以是预先设置好的一个具体的数值。计算机设备在计算出平均深度值后可以与条件深度值进行大小比较。
步骤504,当平均深度值小于条件深度值时,对超像素区域进行压缩。
当平均深度值小于条件深度值时,表示超像素区域中的信息量较大,计算机设备需要采用压缩体系结构对超像素区域中的信息进行细化,减少超像 素区域内过渡多样化的信息。压缩体系结构可以学习重新对相应的超像素区域进行加权,对超像素区域进行压缩的公式为:
Figure PCTCN2018086832-appb-000002
其中,rj表示平均深度值小于条件深度值的超像素区域,c表示压缩体系结构,D l表示结构特征图,而⊙表示矩阵对应元素相乘。
Figure PCTCN2018086832-appb-000003
表示经过压缩体系结构压缩后的超像素区域。
步骤506,当平均深度值大于或者等于条件深度值时,对超像素区域进行扩展。
当平均深度值大于或者等于条件深度值时,表示超像素区域中的信息量较少,计算机设备需要采用扩展体系结构来丰富超像素区域中的信息。扩展体系结构对超像素区域进行扩展的公式为:
Figure PCTCN2018086832-appb-000004
其中,
Figure PCTCN2018086832-appb-000005
表示经过扩展体系结构扩展的超像素区域。
计算机设备通过将平均深度值与条件深度值进行比较,当平均深度值小于条件深度值时,计算机设备对超像素区域进行压缩,当平均深度值大于或者等于条件深度值时,计算机设备对超像素区域进行扩展。计算机设备根据平均深度值的大小选择压缩体系结构或者扩展体系结构对超像素区域进行处理,可以提高图像分割的精确度。
在一个实施例中,经过上下文可切换神经网络生成的上下文表达信息可以通过公式表示,具体公式为:
Figure PCTCN2018086832-appb-000006
其中,超像素区域S n与超像素区域S m相邻,计算机设备通过这个公式可以将超像素区域S m中的感受野区域rj自顶向下的信息传递到感受野区域ri,
Figure PCTCN2018086832-appb-000007
表示扩展体系结构,
Figure PCTCN2018086832-appb-000008
表示压缩体系结构,d(S n)表示超像素区域S n的平均深度。 指示函数
Figure PCTCN2018086832-appb-000009
用于切换扩展体系结构和压缩体系结构。当d(S n)<d(S m)时,计算机设备可以切换到压缩体系结构来细化感受野区域ri的信息,当d(S n)>d(S m)时,计算机设备可以切换到压缩体系结构来丰富感受野区域ri的信息。
在一个实施例中,如图6所示,提供的一种图像分割方法还可以包括对超像素区域进行压缩的过程,具体包括:计算机设备将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到压缩后的超像素区域。其中,三个卷积神经网络包括两个卷积核为1的神经网络和一个卷积核为3的神经网络。
如图6所示,计算机设备将局部特征图610作为输入,输出到压缩体系结构中。压缩体系结构由第一1*1卷积层620、3*3卷积层630以及第二1*1卷积层640组成。其中,第一1*1卷积层620和第二1*1卷积层640为卷积核为1的卷积层,3*3卷积层630为卷积核为3的卷积层。
计算机设备将局部特征图610输入压缩体系结构后,由第一1*1卷积层620进行处理。第一1*1卷积层620用于将局部特征图610的维度减半,计算机设备将维度减半的过程可以过滤局部特征图610中的无用信息,还可以保留局部特征图610中的有用信息。降维后,3*3卷积层630可以将维度恢复,重建回原来的维度,计算机设备再使用第二1*1卷积层产生重新加权向量c(D l(rj)),并根据加权向量c(D l(rj))生成压缩后的超像素区域。
如图7所示,在一个实施例中,提供的一种图像分割方法还可以包括对超像素区域进行扩展的过程,具体包括:计算机设备将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到扩展后的超像素区域。其中,三个卷积神经网络包括两个卷积核为7的神经网络和一个卷积核为1的神经网络。
如图7所示,扩展体系结构由第一7*7卷积层720、1*1卷积层730以及第二7*7卷积层740组成。计算机设备将局部特征图710输入扩展体系结构后,第一7*7卷积层720使用较大的内核来扩大感受野,并学习相关的上下文表达信息。1*1卷积层730用于将维度减半,去除第一7*7卷积层720的大内核包含的冗余信息。第二7*7卷积层740用于恢复维度,第二7*7卷积层740还可以ε(D l(rj))与D l(rj)的维度匹配。
在一个实施例中,上下文可切换神经网络通过如下方式训练得到:计算机设备根据卷积特征图以及卷积特征图的类别得到输入层节点序列,将输入层节点序列进行投影得到第一隐层对应的隐层节点序列,将第一隐层作为当前处理隐层。
计算机设备根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列,计算机设备将下一层隐层作为当前处理隐层,重复进入根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点对应的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列的步骤,直到输出层,计算机设备获取输出层输出的与卷积特征图的类别对应的上下文表达信息概率矩阵。
卷积特征图输入到上下文可切换神经网络后会产生不同分辨率的局部特征图,而生成的局部特征图被发送到用于语义分割的逐像素分类器中。逐像素分类器可以为局部特征图的像素输出一组类别标签,类别标签的序列可以表示为:y=f(F l),其中,函数f(:)是一个产生逐像素类别的柔性最大值回归量。y=f(F l)可以用来预测逐像素的类别标签。训练上下文可切换神经网络的 目标函数可以用公式表示为:
Figure PCTCN2018086832-appb-000010
其中,函数L(:)是柔性最大损失值,对于感受野区域ri,可以用y(ri)表示感受野区域ri的预测种类标签。计算机设备可以将通过上下文可切换神经网络输出的结果与预测种类标签进行对比,从而实现上下文可切换神经网络的训练。
上下文可切换神经网络中的处理过程为:计算机设备将通过全卷积神经网络生成的卷积特征图以及卷积特征图的类别标签作为输入,计算机设备将输入的卷积特征图划分为超像素区域,根据超像素区域生成局部特征图。计算机设备对超像素区域的平均深度值进行计算,根据平均深度值的大小对超像素区域进行处理。当平均深度值小于条件深度值时,计算机设备采用压缩体系结构对超像素区域进行处理,得到压缩后的超像素区域的上下文表达信息;当平均深度值大于或者等于条件深度值时,计算机设备采用扩展体系结构对超像素区域进行处理,并得到扩展后的超像素区域的上下文表达信息,再将得到的上下文表达信息输出。
在一个实施例中,采用梯度下降法对上下文可切换神经网络中的权重参数进行调整。
梯度的计算公式为:
Figure PCTCN2018086832-appb-000011
其中,S n和S m是两个相邻的超像素区域,ri,rj,rk分别表示待分割图像中的感受野,J表示训练上下文可切换神经网络模型的目标函数。梯度的计算公式中,
Figure PCTCN2018086832-appb-000012
表示更新信号。使用梯度的计算公式对上下文可切换神经网络中的权重参数进行调整时,可以对中间特征图进行优化。感受野区域ri从超像素区域S n的感受野区域rk 接收更新信号
Figure PCTCN2018086832-appb-000013
该更新信号
Figure PCTCN2018086832-appb-000014
用于调整位于同一超像素区域中的特征,以使这些区域表现出物体共存性。感受野区域ri还从相邻超像素区域S m中F l+1的感受野区域rj中接收更新信号。
梯度的计算公式中,
Figure PCTCN2018086832-appb-000015
表示来自感受野区域rj的更新信号。当更新信号从感受野区域rj传递到感受野区域ri时,更新信号
Figure PCTCN2018086832-appb-000016
根据信号
Figure PCTCN2018086832-appb-000017
进行加权,使其扩展感受野区域rj。同时,参数λ c和参数λ e是由超像素区域S n和超像素区域S m的平均深度确定的开关,即,根据超像素区域S n和超像素区域S m的平均深度可以决定是采用参数λ c进行压缩,还是采用参数λ e进行扩展。例如,信号
Figure PCTCN2018086832-appb-000018
可以被扩展为:
Figure PCTCN2018086832-appb-000019
同时,信号
Figure PCTCN2018086832-appb-000020
可以被扩展为:
Figure PCTCN2018086832-appb-000021
当d(S n)<d(S m),即超像素区域S n的平均深度值小于超像素区域S m的平均深度值时,信号
Figure PCTCN2018086832-appb-000022
对从感受野区域rj传递的信号
Figure PCTCN2018086832-appb-000023
反向传递的梯度进行加权。压缩体系结构C(:)可以通过反向传递进行优化,反向传递是指目标函数
Figure PCTCN2018086832-appb-000024
传递到压缩体系结构C(:)。其中,重新加权向量C(D l(rk))也参与更新信号
Figure PCTCN2018086832-appb-000025
在训练上下文可切换神经网络时,计算机设备使用向量C(D l(rk))来选择局部特征图D l(rk)中对于分割有用的信息来构造中间特征图F l+1(rj)。同重新加权向量C(D l(rk))一起,感受野区域rj中对于分割有 用的信息可以更好的指导感受野区域ri更新信息。
当d(S n)≥d(S m),即超像素区域S n的平均深度值大于等于超像素区域S m的平均深度值时,信号
Figure PCTCN2018086832-appb-000026
会对信号
Figure PCTCN2018086832-appb-000027
产生影响。通过因子1可以形成感受野区域rj和感受野区域ri之间的跳跃连接,跳跃连接是指感受野区域ri以及rj之间信息传播没有经过任何的神经网络结构。这种信息在不同区域之间传播的时候,是被因子1加权的,信号不经过任何改变。扩展体系结构是通过拓宽感受野来获得上下文表达信息,但是扩展体系结构的大的卷积核可能在训练期间分散从感受野区域rj到感受野区域ri的反向传递信号,使用跳跃连接就可以允许反向传递的信号直接从感受野区域rj传递到感受野区域ri。
计算机设备通过使用梯度下降算法可以对上下文可切换神经网络的权重参数进行优化,更有利于图像分割。
在一个实施例中,上下文可切换神经网络的体系结构如图8所示。计算机设备将待分割图像输入全卷积神经网络后,会输出多张卷积特征图。第一卷积特征图810、第二卷积特征图820、第三卷积特征图830、第四卷积特征图840等等。以第四卷积特征图840为例,将第四卷积特征图840输入至上下文可切换神经网络,上下文可切换神经网络对第四卷积特征图840划分超像素区域,并根据超像素区域生成局部特征图844。上下文可切换神经网络计算出超像素区域的平均深度值,并根据平均深度值选择压缩或者扩展体系结构,生成上下文表达信息846。上下文可切换神经网络根据局部特征图844以及上下文表达信息846生成中间特征图842,中间特征图842用于图像分割。
在一个实施例中,超像素区域的平均深度值对应的局部结构如图9所示。 上下文可切换神经网络会对每一个超像素区域计算平均深度值,并将计算出的平均深度值与条件深度值进行比较,从而决定该超像素区域是使用压缩体系结构处理,还是使用扩展体系结构处理。例如,第一超像素区域910的平均深度值为6.8,第二超像素区域920的平均深度值为7.5,第三超像素区域930的平均深度值为7.3,第四超像素区域940的平均深度值为3.6,第五超像素区域950的平均深度值为4.3,第六超像素区域960的平均深度值为3.1。当预设的条件深度值为5.0时,第一超像素区域910、第二超像素区域920以及第三超像素区域930应该使用压缩体系结构进行处理,而第四超像素区域940、第五超像素区域950以及第六超像素区域960应该使用扩展体系结构进行处理。
在一个实施例中,提供了一种图像处理方法,该方法以应用于如图1所示的计算机设备中进行举例说明。
首先,计算机设备可以获取待分割图像。图像分割是指将图像分成多个特定的、具有独特性质的区域。待分割图像可以是实时获取的,也可以是预先存储的,例如,计算机设备可以通过摄像头实时获取用户采集的待分割图像,计算机设备也可以是预先将待分割图像存储到数据库中,然后从数据库中获取待分割图像。
接着,计算机设备可以将待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图。计算机设备将待分割图像作为预先训练好的全卷积神经网络的输入项数据。计算机设备将待分割图像输入至卷积层,卷积核按照相应的长度即按照卷积核的大小从前向后对输入的待分割图像进行扫描,并执行卷积操作。经过卷积处理后接着在池化层进行处理,池化层能够有效降低维度。全卷积神经网络可以将通过卷积层和池化层处理后得到的卷积特征 图输出。
接着,计算机设备可以将卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息。上下文可切换神经网络是根据图像结构和深度数据预先训练好的,上下文可切换神经网络是一种全卷积神经网络,也包括卷积层和池化层。上下文表达信息是指能够影响图像中的对象的一些信息或者全部信息。计算机设备将卷积特征图驶入到上下文可切换神经网络的输入变量后,上下文可切换神经网络中的卷积层和池化层对卷积特征图进行处理,可以得到卷积特征图的上下文表达信息。
计算机设备还可以将卷积特征图划分为超像素区域,超像素区域为卷积特征图的子区域。计算机设备对图像进行超像素划分之后,可以得到许多大小不一的区域,这些区域中包含有有效的信息,比如颜色直方图、纹理信息。例如,图像中有一个人,我们可以对这个人的图像进行超像素分割,进而通过对每个小区域的特征提取,辨识出这些区域是处于人体的哪个部分(头部、肩部、腿部),进而建立人体的关节图像。计算机设备对卷积特征图进行超像素区域划分后,可以得到多个超像素区域,得到的多个超像素区域均不是重叠的区域,这些超像素区域都是卷积特征图的子区域。计算机设备还可以根据超像素区域生成局部特征图。
计算机设备还可以计算超像素区域的平均深度值。深度图像中的每个像素点的灰度值可用于表征场景中某一点距离摄像机的远近,深度值就是场景中某一点距离摄像机的距离值。超像素区域中可以同时有多个物体共存。计算机设备通过获取超像素区域中每个物体的深度值,可以根据每个物体的深度值计算出整个超像素区域的平均深度值。计算机设备还可以根据平均深度值生成与超像素区域对应的上下文表达信息。
计算机设备还可以将平均深度值与条件深度值进行比较。当平均深度值小于条件深度值时,计算机设备对超像素区域进行压缩。当平均深度值大于或者等于条件深度值时,计算机设备对超像素区域进行扩展。其中,对超像素区域进行压缩时,将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,计算机设备得到压缩后的超像素区域。其中,三个卷积神经网络包括两个卷积核为1的神经网络和一个卷积核为3的神经网络。其中,对超像素区域进行扩展时,计算机设备将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到扩展后的超像素区域。其中,三个卷积神经网络包括两个卷积核为7的神经网络和一个卷积核为1的神经网络。
接着,上下文可切换神经网络通过如下方式训练得到:计算机设备根据卷积特征图以及卷积特征图的类别得到输入层节点序列,将输入层节点序列进行投影得到第一隐层对应的隐层节点序列,将第一隐层作为当前处理隐层。计算机设备根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列,将下一层隐层作为当前处理隐层,重复进入根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点对应的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列的步骤,直到输出层,计算机设备获取输出层输出的与卷积特征图的类别对应的上下文表达信息概率矩阵。
应该理解的是,虽然上述流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本申请中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述流程图中的至少一部分步骤可以包括多个子步骤或者 多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图10所示,提供了一种图像处理装置,包括:图像获取模块1010、特征图输出模块1020、信息输出模块1030以及特征图生成模块1040,其中:
图像获取模块1010,用于获取待分割图像。
特征图输出模块1020,用于将待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图。
信息输出模块1030,用于将卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息。
特征图生成模块1040,用于根据卷积特征图与上下文表达信息生成中间特征图,中间特征图用于进行图像分割。
在一个实施例中,信息输出模块1030还可以用于将卷积特征图划分为超像素区域,超像素区域为卷积特征图的子区域,根据超像素区域生成局部特征图。
在一个实施例中,信息输出模块1030还可以用于计算超像素区域的平均深度值,根据平均深度值生成与超像素区域对应的上下文表达信息。
在一个实施例中,如图11所示,信息输出模块1030包括比较模块1032、压缩模块1034以及扩展模块1036,其中:
比较模块1032,用于将平均深度值与条件深度值进行比较。
压缩模块1034,用于当平均深度值小于条件深度值时,对超像素区域进 行压缩。
扩展模块1036,用于当平均深度值大于或者等于条件深度值时,对超像素区域进行扩展。
在一个实施例中,压缩模块1034还可以用于将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到压缩后的超像素区域。其中,三个卷积神经网络包括两个卷积核为1的神经网络和一个卷积核为3的神经网络。
在一个实施例中,扩展模块1036还可以用于将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到扩展后的超像素区域。其中,三个卷积神经网络包括两个卷积核为7的神经网络和一个卷积核为1的神经网络。
在一个实施例中,提供的一种图像分割装置中上下文可切换神经网络通过如下方式训练得到:计算机设备根据卷积特征图以及卷积特征图的类别得到输入层节点序列,将输入层节点序列进行投影得到第一隐层对应的隐层节点序列,将第一隐层作为当前处理隐层。计算机设备根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列,将下一层隐层作为当前处理隐层,重复进入根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点对应的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列的步骤,直到输出层,计算机设备获取输出层输出的与卷积特征图的类别对应的上下文表达信息概率矩阵。
关于图像分割装置的具体限定可以参见上文中对于图像分割方法的限定,在此不再赘述。上述图像分割装置中的各个模块可全部或部分通过软件、 硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:
获取待分割图像;将待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图;将卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息;根据卷积特征图与上下文表达信息生成中间特征图,中间特征图用于进行图像分割。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:将卷积特征图划分为超像素区域,超像素区域为卷积特征图的子区域;根据超像素区域生成局部特征图。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:计算超像素区域的平均深度值;根据平均深度值生成与超像素区域对应的上下文表达信息。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:将平均深度值与条件深度值进行比较;当平均深度值小于条件深度值时,对超像素区域进行压缩;当平均深度值大于或者等于条件深度值时,对超像素区域进行扩展。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到压缩后的超像素区域;其中,三个卷积神经网络包括两个卷积核为1的神经网络和一个卷积核为3的神经网络。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到扩展后的超像素区域;其中,三个卷积神经网络包括两个卷积核为7的神经网络和一个卷积核为1的神经网络。
在一个实施例中,上下文可切换神经网络通过如下方式训练得到:根据卷积特征图以及卷积特征图的类别得到输入层节点序列,将输入层节点序列进行投影得到第一隐层对应的隐层节点序列,将第一隐层作为当前处理隐层;根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列,将下一层隐层作为当前处理隐层,重复进入根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点对应的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列的步骤,直到输出层,获取输出层输出的与卷积特征图的类别对应的上下文表达信息概率矩阵。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:
获取待分割图像;将待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图;将卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息;根据卷积特征图与上下文表达信息生成中间特征图,中间特征图用于进行图像分割。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:将卷积特征图划分为超像素区域,超像素区域为卷积特征图的子区域;根据超像素区域生成局部特征图。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:计算超像 素区域的平均深度值;根据平均深度值生成与超像素区域对应的上下文表达信息。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:将平均深度值与条件深度值进行比较;当平均深度值小于条件深度值时,对超像素区域进行压缩;当平均深度值大于或者等于条件深度值时,对超像素区域进行扩展。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到压缩后的超像素区域;其中,三个卷积神经网络包括两个卷积核为1的神经网络和一个卷积核为3的神经网络。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:将超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到扩展后的超像素区域;其中,三个卷积神经网络包括两个卷积核为7的神经网络和一个卷积核为1的神经网络。
在一个实施例中,上下文可切换神经网络通过如下方式训练得到:根据卷积特征图以及卷积特征图的类别得到输入层节点序列,将输入层节点序列进行投影得到第一隐层对应的隐层节点序列,将第一隐层作为当前处理隐层;根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列,将下一层隐层作为当前处理隐层,重复进入根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点对应的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列的步骤,直到输出层,获取输出层输出的与卷积特征图的类别对应的上下文表达信息概率矩阵。
本申请的技术方案是经过实验证明可行的,具体的实验过程如下所述:
试验中,计算机设备使用两个公共基准测试本发明中的上下文可切换神经网络,用于深度图像RGB-D(Red、Green、Blue、Depth Map)的语义分割,两公共基准即NYUDv2数据集和SUN-RGBD数据集。NYUDv2数据集已广泛用于评估分割性能,有1449个RGB-D图像。在这个数据集中,795个图像用于训练,654个图像用于测试。首先,计算机设备可以选择一个来自原始训练集的414幅图像的验证集。计算机设备使用逐像素标注对图像的类别进行标记,所有像素都标记了40个类别。计算机设备使用NYUDv2数据集对上述方法进行评估,再进一步使用SUN-RGBD数据集来与先进方法进行比较。
接着,计算机设备使用多尺度测试来计算分割结果。也就是说,计算机设备使用四个比例(即0.6,0.8,1,1.1)来将测试图像提供给网络之前重新设置测试图像的大小。为了条件随机场算法CRF(conditional random field algorithm)的后处理,计算机设备重新缩放的图像的输出分割分数被平均。
在NYUDv2数据集上进行实验时,计算机设备首先需要计算超像素数量的敏感性。在上下文可切换神经网络中,上下文表达信息的控制部分取决于超像素的大小。计算机设备通过使用工具调整超像素的大小,并且凭经验选择不同的尺度,分别是500、1000、2000、4000、8000和12000。对于每个尺度,计算机设备可以基于ResNet-101模型来训练上下文可切换神经网络。上下文可切换神经网络的输入深度图像的RGB图像中,深度图象用于切换特征,RGB图像用于分割图像。NYUDv2验证集的分割精度如表1所示:
超像素尺度 500 1000 2000 4000 8000 12000
分割精度 42.7 43.5 45.6 43.6 44.2 42.9
表1
如表1所示,每个尺度对应的分割精度以平均交互比(%)表示。当尺度 设置为500时,分割准确性最低。发生这种情况是因为超级像素太小,因此包含的上下文信息太少。随着尺度的增加,分段性能在提高。当尺度设置为2000时,上下文可切换神经网络分割精度最好。超像素太大会降低性能,是因为太大的超像素可能包括额外的对象,这种对象限制了超像素性质的阶段性保存。在随后的实验中,计算机设备可以继续使用2000的尺度来构建上下文可切换神经网络。
接着,实验还需要局部结构信息传递的策略。局部结构信息传递产生与区域有更强关系的特征。分析如表2所示,本地结构信息传递被其他使用结构信息的策略代替。第一个实验测量不使用局部结构信息的方法的性能,应用完整版的上下文可切换神经网络,在NYUDv2验证集上实现了45.6的分割评分。然后计算机设备重新训练上下文可切换神经网络而不传递超像素的局部结构信息。同样地,所有的中间特征都由全局恒等映射来处理,达到了40.3的准确度。另外,计算机设备应用插值和反卷积来产生新的特征,其中每个区域包含更广泛但规则的感受野的信息,但是,这些方法产生结构不敏感的特征,得分比上下文可切换神经网络低。
Figure PCTCN2018086832-appb-000028
Figure PCTCN2018086832-appb-000029
表2
如表2所示,有几种方法可以传递超像素的局部结构信息。信息通过对相同超像素中的区域的特征进行平均来计算,这意味着局部结构映射是通过同一内核实现的,根据这个实现了43.8的分割分数。由于同一内核不包含可学习的参数,因此错过了选择有用信息的灵活性。使用不同的卷积核,例如尺寸为3×3和5×5,与捕捉超像素的更精细的结构的1×1内核相比,较大的内核产生较差的结果。
接着,实验还需要自上而下的可切换传递的评估。给定局部结构特征,计算机设备可以应用自上而下的可切换信息传递来产生上下文表达,而产生上下文表达由超像素和深度引导,具体过程如下:
如表3所示,计算机设备可以根据不同数据测量自顶向下的传递,而不使用超像素和深度,只应用反卷积和插值来构造上下文表达。计算机设备获得的分割精度低于上下文可切换神经网络。
Figure PCTCN2018086832-appb-000030
表3
在接下来的测试中,仅停用超像素的引导,其次是自上而下的信息传递。没有超像素,计算机设备在压缩和扩展特征映射上执行可切换的过程传递,其中信息传递由常规内核定义。与此设定相比,完整上下文可切换神经网络具有更好的性能。除了超像素提供更自然的信息传递这一事实之外,计算机设备在每个超像素上计算的平均深度通过避免孤立区域的噪声深度来实现更稳定的特征切换。
此外,实验还研究了在自顶向下切换信息传递中不使用深度的情况。在这种情况下,计算机设备分别将压缩和扩展特征映射作为上下文表达。如表3所示,独立的压缩/扩展特征映射缺乏识别适当的分割特征的灵活性。它们的性能低于由深度驱动的上下文表达的可切换结构。
接下来,实验还需要调整上下文信息的紧密的特征。自上而下的可切换信息传递由压缩结构和扩展结构组成,它提供不同的上下文信息。这些体系结构使用紧凑特征来生成上下文表达。在实验中,计算机设备通过压缩结构和扩展结构,并且表明它们可以实现有效的紧凑特性来调整上下文信息。
Figure PCTCN2018086832-appb-000031
表4
在表4中,实验提供了不同压缩结构设计的比较。其中,有一种简单的进行信息压缩的方法是应用1*1卷积学习紧凑特征,随之而来的1*1卷积用于恢复特征维度。这比压缩体系结构产生更低的准确性。与使用两个连续的 1*1卷积的简单替代方案相比,压缩结构在两个1*1卷积之间涉及3*3卷积。在某种程度上,3*3卷积实现了更广泛的上下文信息,补充了可能导致信息丢失的尺寸缩减导致的紧凑特征,并且压缩结构的3*3卷积获得的特征仍然紧凑。当移除最后一个用于恢复特征维度的1*1卷积,而且直接使用3*3卷积来产生相对较高维的维度特征时,性能低于压缩体系结构。这表明了3*3卷积生成的紧凑特征的重要性。
在表5中,实验开始研究扩展结构,并将其与不同的信息扩展方式进行比较。再次,只是使用一个单一的卷积核为7*7的卷积层来扩大感受野,产生43.8的分割分数。假设增加额外的卷积核大的卷积层可以进一步提高性能。因此使用两个7*7的卷积层以获得更高的44.2分。由上面的卷积产生的分割分数低于扩展结构,扩展结构使用1*1的卷积层来计算紧凑特征。
Figure PCTCN2018086832-appb-000032
表5
接着,上下文可切换神经网络与先进方法的比较实验如下:计算机设备将上下文可切换神经网络与最先进的方法进行比较,计算机设备将所有方法分为两组。所有的方法都是在NYUDv2测试集上评估的。第一组包括仅使用RGB图像进行分割的方法,计算机设备将这些方法的性能列在列RGB输入中。深层网络具有自上而下的信息传递,产生高质量的分割特征。多路径细化网络的精确度在这个组是最高的。如表6所示:
Figure PCTCN2018086832-appb-000033
Figure PCTCN2018086832-appb-000034
表6
接着,上下文可切换神经网络与第二组方法进行比较,这些方法以RGB-D图像作为输入。计算机设备将每个深度图像编码成具有3个通道的HHA图像,以保持更丰富的几何信息。计算机设备使用HHA图像来训练一个独立的分割 网络来代替RGB图像。计算机设备训练好的网络在HHA图像上进行测试以获得分割分数映射,这个映射与通过在RGB图像上训练的网络计算的分数映射相结合。使用这种组合策略,最好的方法是级联特征网络的方法,结果为47.7。与网络相比,使用RGB和HHA图像可以提升分割精度。
另外,计算机设备还可以使用RGB和HHA图像作为训练和测试数据。基于ResNet-101,上下文可切换神经网络达到了48.3分。计算机设备进一步采用更深层次的ResNet-152结构来构建上下文可切换神经网络,将分割分数提高到了49.6。这个结果比最先进的方法要好2%左右。
如图12所示,计算机设备将上下文可切换神经网络分割处理的图像与先进方法分割处理的图像进行比较,其中,图片采集于NYUDv2数据集。上下文可切换神经网络可以提升图像分割精度。
接着,上下文可切换神经网络还在SUN-RGBD数据集上进行实验。在SUN-RGBD数据集中包含10335个用37个类标记的图像,与NYUDv2数据集相比,SUNRGBD数据集具有更复杂的场景和深度条件。从这个数据集中,计算机设备选择5285张图像进行训练,剩下的则进行测试。在这个实验中,计算机设备再次将上下文可切换神经网络与采用RGB和HHA来共同作为输入图像的方法进行比较。以前在SUN-RGBD数据集上的最佳性能是级联特征网络方法产生的。该模型基于ResNet-152结构,由于计算机设备对信息传递进行了合理的建模处理,计算机设备可以使用更简单的ResNet-101结构来获得更好的结果。随着更深的ResNet-152,计算机设备获得的分割精度为50.7,优于所有比较方法。
如图13所示,计算机设备将上下文可切换神经网络分割处理的图像与先进方法分割处理的图像进行比较,其中,图片采集于SUN-RGBD数据集。上下 文可切换神经网络可以提升图像分割精度。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (30)

  1. 一种图像分割方法,执行于计算机设备,所述方法包括:
    获取待分割图像;
    将所述待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图;
    将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息;
    根据所述卷积特征图与所述上下文表达信息生成中间特征图,所述中间特征图用于进行图像分割。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息,包括:
    将所述卷积特征图划分为超像素区域,所述超像素区域为所述卷积特征图的子区域;
    根据所述超像素区域生成局部特征图。
  3. 根据权利要求2所述的方法,其特征在于,在根据所述超像素区域生成局部特征图之后,所述步骤还包括:
    将所述局部特征图输入至用于语义分割的逐像素分类器中,得到所述局部特征图对应的类别标签。
  4. 根据权利要求2所述的方法,其特征在于,所述将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息,还包括:
    计算所述超像素区域的平均深度值;
    根据所述平均深度值生成与所述超像素区域对应的上下文表达信息。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述平均深度值生成与所述超像素区域对应的上下文表达信息,包括:
    将所述平均深度值与条件深度值进行比较;
    当所述平均深度值小于所述条件深度值时,对所述超像素区域进行压缩;
    当所述平均深度值大于或者等于所述条件深度值时,对所述超像素区域 进行扩展。
  6. 根据权利要求5所述的方法,其特征在于,所述当所述平均深度值小于所述条件深度值时,对所述超像素区域进行压缩,包括
    将所述超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到压缩后的超像素区域;
    其中,所述三个卷积神经网络包括两个卷积核为1的神经网络和一个卷积核为3的神经网络。
  7. 根据权利要求5所述的方法,其特征在于,所述当所述平均深度值大于或者等于所述条件深度值时,对所述超像素区域进行扩展,包括:
    将所述超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到扩展后的超像素区域;
    其中,所述三个卷积神经网络包括两个卷积核为7的神经网络和一个卷积核为1的神经网络。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述上下文可切换神经网络通过如下方式训练得到:
    根据所述卷积特征图以及所述卷积特征图的类别得到输入层节点序列,将所述输入层节点序列进行投影得到第一隐层对应的隐层节点序列,将第一隐层作为当前处理隐层;
    根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列,将下一层隐层作为当前处理隐层,重复进入根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点对应的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列的步骤,直到输出层,获取输出层输出的与所述卷积特征图的类别对应的上下文表达信息概率矩阵。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    采用梯度下降法对所述上下文可切换神经网络的权重参数进行调整。
  10. 根据权利要求9所述的方法,其特征在于,对所述上下文可切换神 经网络的权重参数进行调整,包括反向传递和跳跃连接;
    其中,所述反向传递是从所述上下文可切换神经网络传递到超像素区域,所述跳跃连接是超像素区域之间信息传播没有经过神经网络。
  11. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时,使得所述处理器执行以下步骤:获取待分割图像;将所述待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图;将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息;根据所述卷积特征图与所述上下文表达信息生成中间特征图,所述中间特征图用于进行图像分割。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息的步骤时,还执行以下步骤:将所述卷积特征图划分为超像素区域,所述超像素区域为所述卷积特征图的子区域;根据所述超像素区域生成局部特征图。
  13. 根据权利要求12所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器还执行以下步骤:将所述局部特征图输入至用于语义分割的逐像素分类器中,得到所述局部特征图对应的类别标签。
  14. 根据权利要求12所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息的步骤时,还执行以下步骤:计算所述超像素区域的平均深度值;根据所述平均深度值生成与所述超像素区域对应的上下文表达信息。
  15. 根据权利要求14所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行根据所述平均深度值生成与所述超像素区域对应的上下文表达信息的步骤时,还执行以下步骤:将所述平均深度值与条件深度值进行比较;当所述平均深度值小于所述条件深度 值时,对所述超像素区域进行压缩;当所述平均深度值大于或者等于所述条件深度值时,对所述超像素区域进行扩展。
  16. 根据权利要求15所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行当所述平均深度值小于所述条件深度值时,对所述超像素区域进行压缩的步骤时,还执行以下步骤:将所述超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到压缩后的超像素区域;其中,所述三个卷积神经网络包括两个卷积核为1的神经网络和一个卷积核为3的神经网络。
  17. 根据权利要求15所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行当所述平均深度值大于或者等于所述条件深度值时,对所述超像素区域进行扩展的步骤时,还执行以下步骤:将所述超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到扩展后的超像素区域;其中,所述三个卷积神经网络包括两个卷积核为7的神经网络和一个卷积核为1的神经网络。
  18. 根据权利要求11至17任一项所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行训练上下文可切换神经网络时,还执行以下步骤:根据所述卷积特征图以及所述卷积特征图的类别得到输入层节点序列,将所述输入层节点序列进行投影得到第一隐层对应的隐层节点序列,将第一隐层作为当前处理隐层;根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列,将下一层隐层作为当前处理隐层,重复进入根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点对应的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列的步骤,直到输出层,获取输出层输出的与所述卷积特征图的类别对应的上下文表达信息概率矩阵。
  19. 根据权利要求18所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器还执行以下步骤:采用梯度下降 法对所述上下文可切换神经网络的权重参数进行调整。
  20. 根据权利要求19所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器还执行以下步骤:对所述上下文可切换神经网络的权重参数进行调整,包括反向传递和跳跃连接;其中,所述反向传递是从所述上下文可切换神经网络传递到超像素区域,所述跳跃连接是超像素区域之间信息传播没有经过神经网络。
  21. 一种非易失性的计算机可读存储介质,存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:获取待分割图像;将所述待分割图像输入到全卷积神经网络的输入变量中,输出卷积特征图;将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息;根据所述卷积特征图与所述上下文表达信息生成中间特征图,所述中间特征图用于进行图像分割。
  22. 根据权利要求21所述的计算机可读存储介质,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息的步骤时,还执行以下步骤:将所述卷积特征图划分为超像素区域,所述超像素区域为所述卷积特征图的子区域;根据所述超像素区域生成局部特征图。
  23. 根据权利要求22所述的计算机可读存储介质,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器还执行以下步骤:将所述局部特征图输入至用于语义分割的逐像素分类器中,得到所述局部特征图对应的类别标签。
  24. 根据权利要求22所述的计算机可读存储介质,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行将所述卷积特征图输入到上下文可切换神经网络的输入变量中,输出上下文表达信息的步骤时,还执行以下步骤:计算所述超像素区域的平均深度值;根据所述平均深度值生成与所述超像素区域对应的上下文表达信息。
  25. 根据权利要求24所述的计算机可读存储介质,其特征在于,所述计 算机可读指令被所述处理器执行时,使得所述处理器在执行根据所述平均深度值生成与所述超像素区域对应的上下文表达信息的步骤时,还执行以下步骤:将所述平均深度值与条件深度值进行比较;当所述平均深度值小于所述条件深度值时,对所述超像素区域进行压缩;当所述平均深度值大于或者等于所述条件深度值时,对所述超像素区域进行扩展。
  26. 根据权利要求25所述的计算机可读存储介质,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行当所述平均深度值小于所述条件深度值时,对所述超像素区域进行压缩的步骤时,还执行以下步骤:将所述超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到压缩后的超像素区域;其中,所述三个卷积神经网络包括两个卷积核为1的神经网络和一个卷积核为3的神经网络。
  27. 根据权利要求25所述的计算机可读存储介质,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行当所述平均深度值大于或者等于所述条件深度值时,对所述超像素区域进行扩展的步骤时,还执行以下步骤:将所述超像素区域对应的局部特征图输入到预设的三个卷积神经网络进行处理,得到扩展后的超像素区域;其中,所述三个卷积神经网络包括两个卷积核为7的神经网络和一个卷积核为1的神经网络。
  28. 根据权利要求21至27任一项所述的计算机可读存储介质,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器在执行训练上下文可切换神经网络时,还执行以下步骤:根据所述卷积特征图以及所述卷积特征图的类别得到输入层节点序列,将所述输入层节点序列进行投影得到第一隐层对应的隐层节点序列,将第一隐层作为当前处理隐层;根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列,将下一层隐层作为当前处理隐层,重复进入根据当前处理隐层对应的隐层节点序列和当前处理隐层对应的各个神经元节点对应的权重和偏差采用非线性映射得到下一层隐层的隐层节点序列的步骤,直到输出层,获取输出层输出的与所述卷积特 征图的类别对应的上下文表达信息概率矩阵。
  29. 根据权利要求28所述的计算机可读存储介质,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器还执行以下步骤:采用梯度下降法对所述上下文可切换神经网络的权重参数进行调整。
  30. 根据权利要求29所述的计算机可读存储介质,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器还执行以下步骤:对所述上下文可切换神经网络的权重参数进行调整,包括反向传递和跳跃连接;其中,所述反向传递是从所述上下文可切换神经网络传递到超像素区域,所述跳跃连接是超像素区域之间信息传播没有经过神经网络。
PCT/CN2018/086832 2018-05-15 2018-05-15 图像分割方法、计算机设备和存储介质 WO2019218136A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/490,696 US11409994B2 (en) 2018-05-15 2018-05-15 Methods for image segmentation, computer devices, and storage mediums
PCT/CN2018/086832 WO2019218136A1 (zh) 2018-05-15 2018-05-15 图像分割方法、计算机设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/086832 WO2019218136A1 (zh) 2018-05-15 2018-05-15 图像分割方法、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2019218136A1 true WO2019218136A1 (zh) 2019-11-21

Family

ID=68539242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/086832 WO2019218136A1 (zh) 2018-05-15 2018-05-15 图像分割方法、计算机设备和存储介质

Country Status (2)

Country Link
US (1) US11409994B2 (zh)
WO (1) WO2019218136A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462060A (zh) * 2020-03-24 2020-07-28 湖南大学 胎儿超声图像中标准切面图像的检测方法和装置
CN111627029A (zh) * 2020-05-28 2020-09-04 北京字节跳动网络技术有限公司 图像实例分割结果的获取方法及装置
CN112418233A (zh) * 2020-11-18 2021-02-26 北京字跳网络技术有限公司 图像处理方法、装置、可读介质及电子设备
CN114119627A (zh) * 2021-10-19 2022-03-01 北京科技大学 基于深度学习的高温合金微观组织图像分割方法及装置
CN114429607A (zh) * 2022-01-24 2022-05-03 中南大学 一种基于Transformer的半监督视频目标分割方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956575B (zh) * 2018-09-26 2022-04-12 京东方科技集团股份有限公司 转变图像风格的方法和装置、卷积神经网络处理器
EP3896647A4 (en) * 2018-12-14 2022-01-26 FUJIFILM Corporation MINI-BATCH LEARNING DEVICE, OPERATING PROGRAM FOR MINI-BATCH LEARNING DEVICE, OPERATING METHOD FOR MINI-BATCH LEARNING DEVICE, AND IMAGE PROCESSING DEVICE
CN112215243A (zh) * 2020-10-30 2021-01-12 百度(中国)有限公司 图像特征提取方法、装置、设备及存储介质
CN112785575B (zh) * 2021-01-25 2022-11-18 清华大学 一种图像处理的方法、装置和存储介质
TWI810946B (zh) * 2022-05-24 2023-08-01 鴻海精密工業股份有限公司 圖像識別方法、電腦設備及儲存介質
CN115841492B (zh) * 2023-02-24 2023-05-12 合肥恒宝天择智能科技有限公司 基于云边协同的松材线虫病变色立木遥感智能识别方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651877A (zh) * 2016-12-20 2017-05-10 北京旷视科技有限公司 实例分割方法及装置
CN107808111A (zh) * 2016-09-08 2018-03-16 北京旷视科技有限公司 用于行人检测和姿态估计的方法和装置
US9939272B1 (en) * 2017-01-06 2018-04-10 TCL Research America Inc. Method and system for building personalized knowledge base of semantic image segmentation via a selective random field approach
CN108765425A (zh) * 2018-05-15 2018-11-06 深圳大学 图像分割方法、装置、计算机设备和存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10095950B2 (en) 2015-06-03 2018-10-09 Hyperverge Inc. Systems and methods for image processing
CN106127725B (zh) 2016-05-16 2019-01-22 北京工业大学 一种基于多分辨率cnn的毫米波雷达云图分割方法
CN107784654B (zh) 2016-08-26 2020-09-25 杭州海康威视数字技术股份有限公司 图像分割方法、装置及全卷积网络系统
CN106530320B (zh) 2016-09-30 2019-12-17 深圳大学 一种端到端的图像分割处理方法及系统
US10957045B2 (en) * 2016-12-12 2021-03-23 University Of Notre Dame Du Lac Segmenting ultrasound images
CN107169974A (zh) 2017-05-26 2017-09-15 中国科学技术大学 一种基于多监督全卷积神经网络的图像分割方法
CN107403430B (zh) 2017-06-15 2020-08-07 中山大学 一种rgbd图像语义分割方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808111A (zh) * 2016-09-08 2018-03-16 北京旷视科技有限公司 用于行人检测和姿态估计的方法和装置
CN106651877A (zh) * 2016-12-20 2017-05-10 北京旷视科技有限公司 实例分割方法及装置
US9939272B1 (en) * 2017-01-06 2018-04-10 TCL Research America Inc. Method and system for building personalized knowledge base of semantic image segmentation via a selective random field approach
CN108765425A (zh) * 2018-05-15 2018-11-06 深圳大学 图像分割方法、装置、计算机设备和存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462060A (zh) * 2020-03-24 2020-07-28 湖南大学 胎儿超声图像中标准切面图像的检测方法和装置
CN111627029A (zh) * 2020-05-28 2020-09-04 北京字节跳动网络技术有限公司 图像实例分割结果的获取方法及装置
CN112418233A (zh) * 2020-11-18 2021-02-26 北京字跳网络技术有限公司 图像处理方法、装置、可读介质及电子设备
CN114119627A (zh) * 2021-10-19 2022-03-01 北京科技大学 基于深度学习的高温合金微观组织图像分割方法及装置
CN114119627B (zh) * 2021-10-19 2022-05-17 北京科技大学 基于深度学习的高温合金微观组织图像分割方法及装置
CN114429607A (zh) * 2022-01-24 2022-05-03 中南大学 一种基于Transformer的半监督视频目标分割方法
CN114429607B (zh) * 2022-01-24 2024-03-29 中南大学 一种基于Transformer的半监督视频目标分割方法

Also Published As

Publication number Publication date
US20210374478A1 (en) 2021-12-02
US11409994B2 (en) 2022-08-09

Similar Documents

Publication Publication Date Title
WO2019218136A1 (zh) 图像分割方法、计算机设备和存储介质
CN108765425B (zh) 图像分割方法、装置、计算机设备和存储介质
CN111047516B (zh) 图像处理方法、装置、计算机设备和存储介质
CN109949255B (zh) 图像重建方法及设备
EP3605388B1 (en) Face detection method and apparatus, computer device, and storage medium
CN110047069B (zh) 一种图像检测装置
CN111369440B (zh) 模型训练、图像超分辨处理方法、装置、终端及存储介质
US20210358082A1 (en) Computer-implemented method using convolutional neural network, apparatus for generating composite image, and computer-program product
US10621764B2 (en) Colorizing vector graphic objects
CN111767979A (zh) 神经网络的训练方法、图像处理方法、图像处理装置
CN111192292A (zh) 基于注意力机制与孪生网络的目标跟踪方法及相关设备
Li et al. FilterNet: Adaptive information filtering network for accurate and fast image super-resolution
CN110111256B (zh) 基于残差蒸馏网络的图像超分辨重建方法
CN110796162B (zh) 图像识别、训练识别模型的方法、相关设备及存储介质
US20230085605A1 (en) Face image processing method, apparatus, device, and storage medium
CN112541864A (zh) 一种基于多尺度生成式对抗网络模型的图像修复方法
Chen et al. Convolutional neural network based dem super resolution
CN108960260B (zh) 一种分类模型生成方法、医学影像图像分类方法及装置
US20220253977A1 (en) Method and device of super-resolution reconstruction, computer device and storage medium
CN110866938B (zh) 一种全自动视频运动目标分割方法
CN112330684A (zh) 对象分割方法、装置、计算机设备及存储介质
CN113034358A (zh) 一种超分辨率图像处理方法以及相关装置
Ji et al. ColorFormer: Image colorization via color memory assisted hybrid-attention transformer
CN111179270A (zh) 基于注意力机制的图像共分割方法和装置
CN114419406A (zh) 图像变化检测方法、训练方法、装置和计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18919236

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18919236

Country of ref document: EP

Kind code of ref document: A1