WO2020077604A1 - 图像语义分割方法、计算机设备和存储介质 - Google Patents

图像语义分割方法、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020077604A1
WO2020077604A1 PCT/CN2018/110918 CN2018110918W WO2020077604A1 WO 2020077604 A1 WO2020077604 A1 WO 2020077604A1 CN 2018110918 W CN2018110918 W CN 2018110918W WO 2020077604 A1 WO2020077604 A1 WO 2020077604A1
Authority
WO
WIPO (PCT)
Prior art keywords
prediction
image
feature map
level
branch
Prior art date
Application number
PCT/CN2018/110918
Other languages
English (en)
French (fr)
Inventor
林迪
黄惠
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2018/110918 priority Critical patent/WO2020077604A1/zh
Publication of WO2020077604A1 publication Critical patent/WO2020077604A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation

Definitions

  • the present application relates to the field of image segmentation technology, and in particular to an image semantic segmentation method, a computer device, and a storage medium.
  • Image semantic segmentation is an important research topic in the field of computer vision and pattern recognition. It is widely used in AI (Artificial Intelligence) scenarios such as autonomous driving systems and drones. Its goal is to classify each pixel of the image and classify the image. It is divided into regional blocks with certain semantic meaning, and the category of each regional block is identified, and finally a segmented image with semantic annotation is obtained.
  • AI Artificial Intelligence
  • an image semantic segmentation method a computer device, and a storage medium are provided.
  • An image semantic segmentation method includes:
  • the depth image corresponding to the image to be processed is discretized, and the pixel area of the prediction branch at each level is determined;
  • a computer device includes a memory and a processor, and the memory stores computer-readable instructions.
  • the processor causes the processor to perform image semantics in any embodiment The steps of the segmentation method.
  • One or more non-volatile storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform image semantics in any embodiment The steps of the segmentation method.
  • FIG. 1 is an application environment diagram of an image semantic segmentation method in an embodiment
  • FIG. 2 is a schematic flowchart of an image semantic segmentation method in an embodiment
  • 3 is a schematic diagram of RGB images and depth images in an embodiment
  • FIG. 4 is a schematic diagram of image semantic segmentation using a cascading feature network in an embodiment
  • FIG. 5 is a schematic flowchart of a context information obtaining step in an embodiment
  • FIG. 6 is a schematic diagram of the effect of superpixel division and feature enhancement processing in an embodiment
  • FIG. 7 is a schematic flowchart of an image semantic segmentation method in an embodiment
  • FIG. 8 is a schematic diagram of a cascade structure and decoder processing process in an embodiment
  • FIG. 10 is a structural block diagram of an image semantic segmentation device in an embodiment
  • FIG. 11 is a structural block diagram of a computer device in an embodiment.
  • the image semantic segmentation method provided by this application can be applied in the application environment shown in FIG. 1.
  • the terminal 102 When detecting the image semantic segmentation instruction, the terminal 102 performs image semantic segmentation on the image to be processed to obtain a segmented image corresponding to the image to be processed. Specifically, the terminal 102 may perform the steps of implementing the image semantic segmentation method in any of the following embodiments.
  • the terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and server terminals.
  • an image semantic segmentation method is provided.
  • the method is applied to the terminal 102 in FIG. 1 as an example for illustration.
  • the method includes the following steps:
  • the depth image corresponding to the image to be processed is discretized, and the pixel area of the prediction branch at each level is determined.
  • the image to be processed refers to a color image that needs to be semantically segmented by the image.
  • the image to be processed is a color image in the RGB format.
  • Depth image refers to an image or image channel that contains surface distance information from the scene / object.
  • a depth image is similar to a grayscale image, except that each pixel value is the actual distance from the sensor or camera to the object. This pixel value can also be called a depth or depth value.
  • RGB images and depth images are registered, so there is a one-to-one correspondence between pixels.
  • Scene resolution refers to the general name of scene and object resolution. More specifically, the scene resolution refers to a depth interval of the scene and the object, different scene resolutions correspond to different depth intervals, high scene resolutions correspond to low depth intervals, and low scene resolutions correspond to high depth intervals.
  • an RGB image and its corresponding depth image the near field is composed of pixels with high scene resolution (light areas), and the far field is composed of pixels with low scene resolution (deep Color area).
  • the darker the color, the lower the depth value of the scene / object, and the higher the resolution of the scene it may also indicate different depth ranges by the depth of different colors, which is not limited herein.
  • objects and scenes coexist intensively. Compared with areas with higher scene resolutions, more complex correlations are formed between objects / scenes.
  • the depth image corresponding to the image to be processed is discretized, and the pixels whose depth values belong to the same depth interval are classified as the same scene resolution, and each level of prediction branch corresponds to one of them.
  • the scene resolution for example, the k-th prediction branch corresponds to the k-th scene resolution, thereby determining the pixel area of each level of prediction branch.
  • the pixel areas of the prediction branches at all levels refer to the pixel areas predicted by the prediction branches at all levels, the pixel areas of the prediction branches at each level are different from each other, and the pixel areas of the prediction branches at all levels constitute the entire image area.
  • the relationship between the level of the prediction branch and the level of the scene resolution may be a positive correlation or a negative correlation.
  • the level of the prediction branch has a negative correlation with the level of the scene resolution. For example, if a total of K levels of prediction branches are included, the scene resolution corresponding to the level 1 prediction branch is the highest, and the scene resolution corresponding to the level K prediction branch is the lowest.
  • the number of preset scene resolutions is the same as the number of configured prediction branches.
  • the number of prediction branches is set to 3
  • the discrete depth image is obtained by discretely dividing the depth image.
  • the depth image is divided into 3 pixel areas with different scene resolutions, namely the pixel area at level 1 scene resolution 1, the pixel area at level 2 scene resolution 2 and the third Pixel area 3 with high-level scene resolution.
  • S204 Determine the context information of the convolution feature map corresponding to the image to be processed in the prediction branches at all levels.
  • the convolution feature map refers to the feature map obtained by performing convolution processing on the image to be processed.
  • the convolution process includes multiple layers of convolutions. Therefore, the number of convolution feature maps corresponding to the image to be processed is the same as the number of layers of the convolution layer.
  • the step of obtaining a convolutional feature map includes: performing convolution processing on the image to be processed using a convolutional neural network to obtain convolutional feature maps of each convolutional layer.
  • the convolutional neural network may be a commonly used CNN (Convolutional Neural Network, convolutional neural network).
  • the context information of each neuron in the convolution feature map of each layer is separately determined, and the context information of the convolution feature map is composed of the context information of each neuron.
  • context information refers to the interaction information between different objects and between objects and scenes.
  • an object In an image, an object cannot exist alone. It must have a more or less relationship with other surrounding objects and scenes. This is commonly referred to as contextual information.
  • the road usually includes pedestrians and vehicles. There will be a certain co-occurrence between the road, pedestrians and vehicles, and the relevant information that reflects this co-occurrence is the context information.
  • the context information can have It helps to classify and predict pedestrians and vehicles. For example, objects that appear on the road are more likely to be pedestrians or vehicles.
  • the context information of a convolutional feature map refers to the combination of context information of neurons in the convolutional feature map.
  • S206 Obtain an enhanced feature map of the convolution feature map in the current prediction branch according to the context information of the convolution feature map in the current prediction branch and the enhanced feature map of the superior prediction branch.
  • the current prediction branch is an arbitrary-level prediction branch.
  • the purpose of naming any level of prediction branch as the current prediction branch is to illustrate that in the steps describing the current prediction branch, the same steps need to be performed for each level of prediction branch.
  • the higher-level prediction branch refers to a prediction branch whose scene resolution is one level higher than the current prediction branch. Still taking the prediction branch including K levels as an example, when the current prediction branch is the K-th prediction branch, the superior prediction branch is the K-1th prediction branch.
  • the context information of the convolution feature map in the current prediction branch and the enhanced feature map in the upper-level prediction branch are added to obtain the enhanced feature map of the convolution feature map in the current prediction branch.
  • this schematic diagram shows a cascade structure when including three-level prediction branches.
  • l 1, ..., L ⁇ in L different convolutional layers.
  • the convolution feature map B l of layer 1 K levels of prediction are used
  • the branch cascade structure realizes the semantic segmentation of K different scene resolution regions, and the scene resolution of the first level prediction branch is the highest.
  • each pixel is projected onto one of K prediction branches, and each prediction branch classifies and predicts a group of pixels in a specific pixel area.
  • the enhanced feature map F l, k output by the k-th prediction branch is:
  • F l, k represents the enhanced feature map of the convolution feature map Bl at the k-th prediction branch
  • F l, k-1 represents the convolution feature map Bl at the k-1 level prediction branch Enhanced feature map
  • Q l, k represents the context information of the convolution feature map Bl at the k-th prediction branch.
  • the k-th prediction branch is the current prediction branch
  • the k-1th prediction branch is the upper-level prediction branch.
  • Each prediction branch focuses on the classification prediction of a specific scene resolution, and by cascading parallel prediction branches, the context information of the convolution feature map in each prediction branch is enriched, thereby improving the overall segmentation performance.
  • each level of prediction branch classifies and predicts the corresponding pixel area, respectively obtains the classification results of each pixel area, and obtains a complete picture by combining the classification results of each pixel area Split image.
  • the above image semantic segmentation method discretizes the depth image according to the preset scene resolution, determines the pixel area of each level of prediction branch, determines the context information of the convolution feature map corresponding to the image to be processed in each level of prediction branch, and then uses In a cascading manner, the enhanced feature map of the convolution feature map in the current prediction branch is obtained according to the enhanced feature map of the convolution feature map in the upper-level prediction branch and the context information in the current prediction branch.
  • the context information of the higher-level prediction branch can be transferred to the enhanced feature map of the lower-level prediction branch, which enriches the context information of the enhanced feature map in each level of prediction branch, thereby enhancing the use of enhanced feature maps at all levels The accuracy of classification prediction.
  • step S204 further includes the following sub-steps:
  • S502 In the prediction branches at various levels, superpixels are divided into images to be processed, and each superpixel is determined.
  • Superpixel division refers to the process of subdividing a digital image into multiple image sub-regions.
  • a super pixel refers to a small area composed of a series of adjacent pixels with similar characteristics such as color, brightness, and texture.
  • each level of prediction branch is based on a preset superpixel division rule, and superpixel division is performed on the image to be processed using a superpixel division tool to determine the superpixels of the image to be processed in the prediction branches at each level.
  • superpixel division rule may be the size range of each superpixel.
  • step S502 includes: in each prediction branch at each level, superpixel division is performed on the image to be processed according to the superpixel division rules determined by different scene resolutions, and each of the prediction branches at each level is determined for the image to be processed Super pixels.
  • each level of prediction branch uses superpixel division according to the determined superpixel division rules
  • the tool divides the super-pixels of the image to be processed to determine each super-pixel of the prediction branch of the image to be processed at each level.
  • S504 Determine the local enhancement feature of each neuron in the convolution feature map corresponding to the image to be processed.
  • the local enhancement feature is determined by each neuron in the superpixel to which it belongs.
  • the local enhancement feature of the neuron is determined by each neuron in the superpixel to which the neuron belongs, that is, the local enhancement feature is determined by the first-level local enhancement.
  • the superpixel of a neuron refers to the superpixel where the neuron is located.
  • step S504 includes: determining the local weight of each neuron according to the neuron in the superpixel to which each neuron belongs; and determining the local enhancement feature of the corresponding neuron according to the local weight.
  • a group of non-overlapping superpixels is generated using the superpixel division tool.
  • S i represents the i-th superpixel
  • S j represents the j-th superpixel.
  • the local weight of each neuron where the local weight w i is obtained by the following calculation formula:
  • W represents the weight matrix learned through training
  • represents the fully connected layer with sigmoid activation function
  • the local enhancement feature of the neuron is determined according to the local weight, which is achieved by the following formula:
  • B represents the convolution feature map
  • B (x, y, c) represents the neuron in the convolutional feature map B
  • (x, y) represents the coordinate value of the neuron in the feature graph, (x, y) ⁇ (S i );
  • c represents Feature channel identification;
  • w i represents local weights
  • M represents the first feature map generated by local enhancement
  • M (x, y, c) represents the neuron in the first feature map M, that is, the local enhancement feature of the neuron B (x, y, c) in the convolution feature map B
  • FIG. 6 (b) is a schematic diagram of local enhancement by local weighting.
  • This implementation uses local weighting to achieve local enhancement, and introduces weights to avoid using the same features to represent different neurons.
  • Weighting neuron B (x, y, c) through w i produces neuron M (x, y, c), so that neuron M (x, y, c) is compared with other neurons in S i , It has different manifestations and is highly sensitive to the overall content of S i provided by the weight w i .
  • S506 Obtain the global enhancement features of each neuron according to the local enhancement features of each neuron and the local enhancement features of each neuron in the adjacent superpixels.
  • the global enhancement features of each neuron form the corresponding convolution features Context information of the graph.
  • the local enhancement feature of each neuron and the local enhancement feature of each neuron of the adjacent superpixel are used to perform the second-level global enhancement to obtain the global enhancement feature.
  • the global enhanced features of the neuron constitute the context information of the convolutional feature map.
  • the adjacent superpixel of a neuron refers to the superpixel adjacent to the superpixel of the neuron.
  • the context information of the convolutional feature map is actually a feature map, such as the second feature map Q mentioned below.
  • a second feature map is generated by gathering local enhancement features in adjacent superpixels
  • the neuron Q (x, y, c) in the second feature map is expressed as:
  • ⁇ (S i) represents a set of super-pixels of adjacent superpixel S i;
  • (x ', y') represents a super pixel adjacent coordinate values S i for each neuron, (x ', y' ) ⁇ (S j );
  • ⁇ (S i ) represents the number of receptive field centers located in the superpixel S i
  • ⁇ (S j ) represents the number of receptive field centers located in the superpixel S j
  • w s represents the first weight map, which is obtained by performing a 3 ⁇ 3 convolution operation on the first feature map M
  • w a represents the second weight map, which is obtained by performing a 1 ⁇ 1 convolution operation on the first feature map M
  • each neuron Q (x, y, c) represents the context information of neuron B (x, y, c).
  • CARF Context Aware Receptive Field
  • the schematic diagram shows that after local enhancement, the neurons in each superpixel are aggregated; as shown in Figure 6 (d), the schematic diagram shows that each neuron is further enhanced by global enhancement Weighting uses the content of adjacent superpixels to form a CARF.
  • the context information of the features can be better learned, and the context information obtained based on this method can mitigate the negative impact of mixing features of too small or too large regions.
  • two-level enhancement processing is performed on the convolutional feature map through local enhancement and global enhancement, so that the context information can fully represent superpixels with different contents, and the communication between adjacent superpixels can be adaptively adjusted to capture More global context information.
  • step S208 further includes the following sub-steps: obtaining the classification results of the prediction branches at various levels according to the enhanced feature maps and pixel regions of the prediction branches at each level; combining the classification results to obtain a segmented image of the image to be processed .
  • each level of prediction branch obtains the classification result of each pixel area according to the obtained enhanced feature map and pixel area, respectively. Since the pixel area of each level of prediction branch is different, the classification results of each level of prediction branch also correspond to the classification results of different pixel areas. Then, by combining the classification results of each pixel area, a divided image of the image to be processed is obtained.
  • the classification results of the prediction branches at various levels are obtained, including: combining the enhanced feature maps of the convolutional layers on the prediction branches at the same level to obtain the prediction branches at all levels Branch feature map; classify and predict the corresponding branch feature map according to the pixel area of the predicted branch at each level to obtain the classification result of the predicted branch at each level.
  • the resolution of the branch feature map is the same as the maximum resolution of the convolution feature map.
  • the maximum resolution of the convolution feature map is defined as the first resolution.
  • an enhanced feature map is merged in sequence through the upsampling process until the first resolution is completed Enhanced feature map, the branch feature map of the predicted branch at this level is obtained. Refer to the decoder shown in FIG. 8. Then, according to the pixel area of the prediction branch at this level, the corresponding pixel area in the branch feature map is classified and predicted to obtain the classification result of the prediction branch at each level.
  • H l + 1, k represents the up-sampling convolution kernel that matches the size of the feature maps U l + 1, k and F l, k .
  • the neuron H l + 1, k * U l + 1, k (x, y, c) contains the nerves located in adjacent superpixels Meta information.
  • the feature map U 1, k finally obtained by formula (7) is the branch feature map.
  • a high-resolution branch feature map is generated, so that the branch feature map used for classification prediction has more feature information, thereby obtaining higher segmentation accuracy.
  • the corresponding branch feature maps are classified and predicted according to the pixel regions of the predicted branches at various levels, and the classification results of the predicted branches at various levels are obtained, including: determining the feature maps of the branches according to the pixel areas of the predicted branches at all levels Classification area of each branch; perform classification prediction on the classification area of each branch feature map to obtain the classification results of predicted branches at all levels.
  • the branch feature map U 1, k is input to the predictor of the corresponding prediction branch, and the predictor outputs the class labels of all pixels in the pixel area as Set y k :
  • the function f refers to the Softmax predictor used for pixel-level classification.
  • the branch predictor 1 classifies and predicts the pixel region 1
  • the branch predictor 2 classifies and predicts the pixel region 2
  • the branch predictor 3 classifies and predicts the pixel region 3.
  • the classification results are combined to output the final segmented image.
  • each prediction branch focuses on the semantic segmentation of the pixel area corresponding to the specific scene resolution, and then combines the classification results to obtain a segmented image, which effectively improves the segmented image Accuracy of segmentation.
  • the image semantic segmentation method further includes the step of: acquiring a depth image corresponding to the image to be processed.
  • the depth image can be obtained by performing depth processing on the image to be processed, or by placing a sensor at the shooting position of the image to be processed, and using the distance information sensed by the sensor.
  • an image semantic segmentation method includes the following steps:
  • S702 Discrete the depth image corresponding to the image to be processed according to the preset scene resolution, and determine the pixel area of the prediction branch at each level.
  • S703 Perform convolution processing on the image to be processed by using a convolutional neural network to obtain convolution feature maps of each convolutional layer.
  • S705 Determine the local weight of each neuron according to the neuron in the superpixel to which each neuron belongs.
  • S706 Determine the local enhancement feature of the corresponding neuron according to the local weight.
  • S708 Obtain the enhanced feature map of the convolution feature map in the current prediction branch according to the context information of the convolution feature map in the current prediction branch and the enhanced feature map of the superior prediction branch.
  • S709 Combine the enhanced feature maps of the convolution layers on the prediction branches at the same level to obtain the branch feature maps of the prediction branches at all levels.
  • S710 Determine the classification area of each branch feature map according to the pixel area of the prediction branch at each level.
  • S711 Perform classification prediction on the classification regions in each branch feature map to obtain classification results of prediction branches at all levels.
  • the first column is the image to be processed
  • the second column is the corresponding ground-truth
  • the third column is the segmented image obtained by RefineNet for semantic segmentation of the image
  • the fourth column is the existing CFN (Cascaded feature) network, grade Connected feature network) image segmentation obtained by image semantic segmentation
  • the fifth column is the image segmentation obtained by image semantic segmentation using the method shown in FIG. 7 (including the three-level prediction branch).
  • the image semantic segmentation method of the present application has better segmentation accuracy than existing image semantic segmentation methods. This is because through the cascade structure and CARF, the context information of the features can be better learned, so that the context information fully represents the super pixels with different contents, and the communication between adjacent super pixels can be adaptively adjusted to capture more Multiple global context information, and by predicting the branch cascade, enrich the context information of each branch, thereby effectively improving the segmentation accuracy of the image to be processed.
  • the cascaded feature network shown in FIG. 4 is used to perform the steps in the embodiment shown in FIG. 7, and the training process of the cascaded feature network will be described below.
  • ⁇ k represents the pixel set of the pixel area corresponding to the k-th prediction branch; probability Indicates that the predicted pixel (x, y) has a ground-truth label Probability; J k represents the objective function that penalizes pixel-level classification errors.
  • a standard back-propagation algorithm is used to train the cascaded feature network.
  • the features in formula (10) are updated in each iteration.
  • the objective function J is calculated Relative to the gradient of U l, k , the gradient is calculated by the following formula:
  • the update signal of F l, k is used as a compromise between the back propagation information of the characteristic graphs U l, k and U l, k + 1 ; Represents the k-th prediction branch.
  • the signal of the k + 1th level prediction branch It affects the update of F l, k during the training phase. Since the adjacent prediction branches of each level communicate through a cascade structure, any two-level prediction branches can be effectively balanced.
  • the update signal is transferred from the feature map Q l, k to the feature map M l, k , which affects the update of the local neurons of M l, k .
  • the neuron M l, k (x, y, c) corresponding to the receptive field in the image space combined with the definition of formula (5), calculate the objective function J relative to M l, k (x, y, c) Gradient, the gradient is calculated by the following formula:
  • the weight map w g is calculated by applying 1 ⁇ 1 convolution on the feature map M. Since the 1 ⁇ 1 convolution does not expand the receptive field of the feature map Q, therefore, the partial derivative Reduce to zero, and omit the last term of formula (15). In the case where the superpixel S i has many adjacent superpixels, the use of equation (15) significantly saves the calculation of partial derivatives.
  • the update of the local neuron M l, k (x, y, c) is affected by the signals of its neighboring superpixels.
  • this communication is implemented on adjacent superpixels, non-adjacent superpixels can influence each other in turn along the path of adjacent superpixels.
  • one prediction branch can receive signals from other prediction branches.
  • signals from other prediction branches can be diffused to any local area in the prediction branch. Therefore, the feature map M l, k, c can be updated by capturing signals of the relationship between neurons in different prediction branches.
  • steps in the embodiments of the present application are not necessarily executed in the order indicated by the step numbers. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The order is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • an image semantic segmentation apparatus includes: a pixel division module 1002, a context determination module 1004, an enhanced feature map acquisition module 1006, and a classification prediction module 1008.
  • the pixel division module 1002 is configured to discretize the depth image corresponding to the image to be processed according to a preset scene resolution and determine the pixel area of the prediction branch at each level.
  • the context determination module 1004 is used to determine the context information of the convolution feature map corresponding to the image to be processed in the prediction branches at all levels.
  • the enhanced feature map acquisition module 1006 is configured to obtain the enhanced feature map of the convolution feature map in the current prediction branch according to the context information of the convolution feature map in the current prediction branch and the enhanced feature map in the superior prediction branch.
  • the classification prediction module 1008 is used to perform classification prediction based on the enhanced feature maps and pixel regions of the prediction branches at various levels to obtain a segmented image of the image to be processed.
  • the above image semantic segmentation device can transfer the context information of the upper-level prediction branch to the enhanced feature map of the lower-level prediction branch through a cascading method, which enriches the context information of the enhanced feature map in the prediction branches at all levels, thereby enhancing the utilization of all levels The accuracy of the enhanced feature map for classification prediction.
  • the context determination module 1004 further includes: a superpixel division module, a local enhancement module, and a global enhancement module. among them:
  • the superpixel division module is used to divide the superpixels of the image to be processed in each prediction branch at each level to determine each superpixel.
  • the local enhancement module is used to determine the local enhancement feature of each neuron in the convolution feature map corresponding to the image to be processed, and the local enhancement feature is determined by each neuron in the superpixel to which it belongs.
  • the global enhancement module is used to obtain the global enhancement features of each neuron according to the local enhancement features of each neuron and the local enhancement features of each neuron in the adjacent superpixel, which are composed of the corresponding global enhancement features of each neuron Context information of the convolutional feature map.
  • the local enhancement module includes a local weight determination module and a local feature determination module.
  • the local weight determination module is used to determine the local weight of each neuron according to the neuron in the superpixel to which each neuron belongs;
  • the local feature determination module is used to determine the local enhancement feature of the corresponding neuron according to the local weight.
  • the superpixel division module is also used to divide the superpixels of the image to be processed according to the superpixel division rules determined by different scene resolutions in each level of prediction branches, and determine the prediction branches of the image to be processed at each level The super pixels.
  • the classification prediction module 1008 includes a branch prediction module and a combination module.
  • the branch prediction module is used to obtain the classification results of the prediction branches at all levels according to the enhanced feature maps and pixel regions of the prediction branches at each level respectively;
  • the combination module is used to combine the classification results to obtain the segmented images of the image to be processed.
  • the branch prediction module includes a feature map merging module and a classification result acquisition module.
  • the feature map merging module is used to merge the enhanced feature maps of the convolutional layers on the prediction branches at the same level to obtain the branch feature maps of the prediction branches at various levels; Region, classify and predict the corresponding branch feature map to obtain the classification results of the predicted branches at all levels.
  • the classification result acquisition module is also used to determine the classification area of each branch feature map according to the pixel area of the prediction branch at each level; separately classify and predict the classification area of each branch feature map to obtain each level of prediction branch Classification results.
  • the image semantic segmentation device further includes a depth image acquisition module, configured to acquire a depth image corresponding to the image to be processed.
  • the image semantic segmentation device further includes a convolution processing module, which is used to perform convolution processing on the image to be processed using a convolutional neural network to obtain convolution feature maps of each convolutional layer.
  • a convolution processing module which is used to perform convolution processing on the image to be processed using a convolutional neural network to obtain convolution feature maps of each convolutional layer.
  • Each module in the above image semantic segmentation device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above modules may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and its internal structure diagram may be as shown in FIG. 11.
  • the computer equipment includes a processor, memory, network interface, display screen, input device, and microphone array connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer programs.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with external terminals through a network connection. When the computer program is executed by the processor, an image semantic segmentation method is realized.
  • the display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the computer device housing , Can also be an external keyboard, touchpad or mouse.
  • FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Include more or less components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • a computer device which includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to perform image semantics in any of the above embodiments. The steps of the segmentation method.
  • one or more non-volatile storage media storing computer readable instructions are provided, and when the computer readable instructions are executed by one or more processors, the one or more processors execute any of the above The steps of the image semantic segmentation method in the embodiment.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

本申请涉及一种图像语义分割方法、计算机设备和存储介质。该方法包括:按照预设的场景分辨率,对待处理图像对应的深度图像进行离散,确定各级预测分支的像素区域;确定在各级预测分支中,待处理图像对应的卷积特征图的上下文信息;根据卷积特征图在当前预测分支的上下文信息和在上级预测分支的增强特征图,获得卷积特征图在当前预测分支的增强特征图;根据各级预测分支的增强特征图和像素区域进行分类预测,获得待处理图像的分割图像。

Description

图像语义分割方法、计算机设备和存储介质 技术领域
本申请涉及图像分割技术领域,特别是涉及一种图像语义分割方法、计算机设备和存储介质。
背景技术
图像语义分割是计算机视觉和模式识别领域的重要研究课题,广泛应用于自动驾驶系统、无人机等AI(Artificial Intelligence,人工智能)场景,其目标是对图像每个像素点进行分类,将图像分割成具有一定语义含义的区域块,并识别出每个区域块的类别,最终得到具有语义标注的分割图像。
在现有图像语义分割研究中,尝试利用图像的深度信息来辅助语义分割。然而,深度信息和语义分割中用到的颜色通道之间几乎没有相关性,现有利用深度信息辅助语义分割的分割效果仍有待提升。因此,如何充分利用深度信息来进一步提升语义分割的精确度,仍然是当前的研究难点。
申请内容
根据本申请的各种实施例,提供一种图像语义分割方法、计算机设备和存储介质。
一种图像语义分割方法,所述方法包括:
按照预设的场景分辨率,对待处理图像对应的深度图像进行离散,确定各级预测分支的像素区域;
确定在各级所述预测分支中,所述待处理图像对应的卷积特征图的上下文信息;
根据所述卷积特征图在当前预测分支的上下文信息和在上级预测分支的增强特征图,获得所述卷积特征图在所述当前预测分支的增强特征图;
根据各级所述预测分支的增强特征图和所述像素区域进行分类预测,获得所述待处理图像的分割图像。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可 读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行任一项实施例中图像语义分割方法的步骤。
一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行任一项实施例中图像语义分割方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中图像语义分割方法的应用环境图;
图2为一个实施例中图像语义分割方法的流程示意图;
图3为一个实施例中RGB图像和深度图像的示意图;
图4为一个实施例中采用级联特征网络进行图像语义分割的示意图;
图5为一个实施例中上下文信息获取步骤的流程示意图;
图6为一个实施例中超像素划分以及特征增强处理的效果示意图;
图7为一个实施例中图像语义分割方法的流程示意图;
图8为一个实施例中级联结构和解码器处理的过程示意图;
图9为一个实施例中图像语义分割效果对比图;
图10为一个实施例中图像语义分割装置的结构框图;
图11为一个实施例中计算机设备的结构框图。
具体实施方式
为使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步的详细说明。应当理解,此处所描述的具体实施方式仅仅用以解释本申请,并不限定本申请的保护范围。
本申请提供的图像语义分割方法,可以应用于如图1所示的应用环境中。终端102在检测到图像语义分割指令时,对待处理图像进行图像语义分割,获得与待处理图像对应的分割图像。具体地,终端102可执行实现下述任意项实施例中图像语义分割方法的步骤。终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和服务器端等。
在一个实施例中,如图2所示,提供了一种图像语义分割方法,以该方法应用于图1中的终端102为例进行说明,该方法包括以下步骤:
S202,按照预设的场景分辨率,对待处理图像对应的深度图像进行离散,确定各级预测分支的像素区域。
待处理图像是指需要进行图像语义分割处理的彩色图像,具体地,待处理图像为RGB格式的彩色图像。深度图像是指包含与场景/对象的表面距离信息的图像或图像通道。深度图像类似于灰度图像,只是它的每个像素值是传感器或者摄像机距离物体的实际距离,该像素值也可称之为深度或深度值。通常RGB图像和深度图像是配准的,因而像素点之间具有一对一的对应关系。
场景分辨率是指场景和对象分辨率的总称。更具体地,场景分辨率是指场景和对象的一个深度区间,不同的场景分辨率对应于不同的深度区间,高场景分辨率对应于低深度区间,低场景分辨率对应于高深度区间。
如图3所示,为一实施例中RGB图像及其对应的深度图像,近场由高场景分辨率的像素点组成(浅色区域),远场由低场景分辨率的像素点组成(深色区域)。在图3的深度图像中,颜色越深,则说明该场景/对象的深度值越高,场景分辨率越低。在其他实施例中,也可以是颜色越深,该场景/对象的深度值越低,场景分辨率越高;还可以是通过不同颜色的深浅,表示不同深度范围,在此不作限定。并且,在场景分辨率较低的区域,对象与场景密集共存,相对于较高场景分辨率区域,对象/场景之间形成了更为复杂的相关性。
本实施例中,按照预设的场景分辨率,对待处理图像对应的深度图像进行离散,将深度值属于同一深度区间的像素点归类为同一场景分辨率,每一级预测分支对应于其中一场景分辨率,比如,第k级预测分支对应第k级场景 分辨率,由此确定各级预测分支的像素区域。
各级预测分支的像素区域是指各级预测分支预测的像素区域,每级预测分支的像素区域互不相同,且所有级预测分支的像素区域组成整幅图像区域。
预测分支的级别与场景分辨率的高低之间的可以是正相关关系,也可以是负相关关系。当为正相关关系时,则预测分支的级别越高,对应的场景分辨率也越高;当为负相关关系时,则预测分支的级别越低,对应的场景分辨率也越高。为方便描述,假定在各实施例中,预测分支的级别与场景分辨率的高低之间呈负相关关系。例如,假设一共包括K个级别的预测分支,则第1级预测分支对应的场景分辨率最高,第K级预测分支对应的场景分辨率最低。
其中,预设的场景分辨率的个数与所配置的预测分支的个数相同。参照图4,预测分支的个数被配置为3,通过对深度图像进行离散,得到离散深度图像。从离散深度图像可以看出,深度图像被划分为3个具有不同场景分辨率的像素区域,分别为第1级场景分辨率的像素区域1、第2级场景分辨率的像素区域2和第3级场景分辨率的像素区域3。
S204,确定在各级预测分支中,待处理图像对应的卷积特征图的上下文信息。
卷积特征图是指通过对待处理图像进行卷积处理后获得的特征图。通常卷积处理包括多层卷积,因此,待处理图像对应的卷积特征图的数量与卷积层的层数相同。
具体地,获取卷积特征图的步骤包括:利用卷积神经网络对待处理图像进行卷积处理,获得各卷积层的卷积特征图。其中,卷积神经网络可以是常用的CNN(Convolutional Neural Network,卷积神经网络)。
在本实施例中,针对每一级预测分支,分别确定各层卷积特征图中各神经元的上下文信息,由各神经元的上下文信息组成卷积特征图的上下文信息。
其中,上下文信息是指不同的对象之间、对象与场景之间的相互作用信息。在图像中,某一对象不可能单独的存在,它一定会与周围其他的对象和场景有着或多或少的关系,这就是通常所说的上下文信息。比如,在拍摄的 马路图片中,马路上通常包括行人、车辆,马路、行人和车辆之间会存在一定共现性,而体现这一共现性的相关信息即为上下文信息,该上下文信息能够有助于对行人、车辆进行分类预测,比如,在马路上出现的物体是行人或车辆的几率更大。而对于一幅卷积特征图的上下文信息,则是指该卷积特征图中各神经元的上下文信息的组合。
S206,根据卷积特征图在当前预测分支的上下文信息和在上级预测分支的增强特征图,获得卷积特征图在当前预测分支的增强特征图。
其中,当前预测分支为任意级预测分支。将任意级预测分支命名为当前预测分支的目的在于,说明在描述有当前预测分支的步骤中,均需要对每一级预测分支执行相同的步骤。上级预测分支则是指场景分辨率比当前预测分支高一个档次的预测分支。仍然以包括K个级别的预测分支为例,当前预测分支为第K级预测分支时,则上级预测分支为第K-1级预测分支。
具体地,将卷积特征图在当前预测分支的上下文信息,以及在上级预测分支的增强特征图相加,获得卷积特征图在当前预测分支的增强特征图。
进一步可参照图8,该示意图给出包括三级预测分支时的级联结构。假设在L个不同卷积层中,存在一组卷积特征图{B l|l=1,...,L},对于第l层的卷积特征图B l,采用K个级别的预测分支级联结构实现对K个不同场景分辨率区域的语义分割,第1级预测分支的场景分辨率最高。给定深度图像D,通过离散处理,将每个像素投影到K个预测分支中的一个,每个预测分支对一组在特定像素区域内的像素进行分类预测。给定一个彩色图像I作为输入,第k级预测分支输出的增强特征图F l,k为:
F l,k=F l,k-1+Q l,k,k=1,...,K                      (1)
在式(1)中,F l,k表示卷积特征图B l在第k级预测分支的增强特征图;F l,k-1表示卷积特征图B l在第k-1级预测分支的增强特征图;Q l,k表示卷积特征图B l在第k级预测分支的上下文信息。在本实施例中,第k级预测分支为当前预测分支,第k-1级预测分支为上级预测分支。
由于当当前预测分支为第1级预测分支时,其没有上级预测分支,因此, 规定当前预测分支为第1级预测分支时,上级预测分支的增强特征图就为对应层的卷积特征图,也即F l,0=B l
每个预测分支都侧重于特定场景分辨率的分类预测,而通过将并行的预测分支进行级联,丰富了卷积特征图在各个预测分支的上下文信息,从而提高了整体分割性能。
S208,根据各级预测分支的增强特征图和像素区域进行分类预测,获得待处理图像的分割图像。
具体地,根据各级预测分支的增强特征图,每一级预测分支对对应的像素区域进行分类预测,分别获得各像素区域的分类结果,通过将各像素区域的分类结果组合,得到一幅完整的分割图像。
上述图像语义分割方法,按照预设的场景分辨率对深度图像进行离散,确定各级预测分支的像素区域,确定在各级预测分支中,待处理图像对应卷积特征图的上下文信息,而后采用级联的方式,根据卷积特征图在上级预测分支的增强特征图和在当前预测分支的上下文信息,获得卷积特征图在当前预测分支的增强特征图。通过这种级联方式,能够将上级预测分支的上下文信息传递至下级预测分支的增强特征图中,丰富了各级预测分支中增强特征图的上下文信息,从而提升了利用各级的增强特征图进行分类预测的精确度。
在一实施例中,上下文信息通过局部增强和全局增强这两级增强确定。如图5所示,步骤S204进一步包括以下子步骤:
S502,在各级预测分支中,分别对待处理图像进行超像素划分,确定各超像素。
超像素划分是指将数字图像细分为多个图像子区域的过程。超像素是指由一系列位置相邻且颜色、亮度、纹理等特征相似的像素点组成的小区域。
在本实施例中,每一级预测分支均基于预设的超像素划分规则,并利用超像素划分工具对待处理图像进行超像素划分,分别确定各级预测分支中待处理图像各超像素。通过对待处理图像进行超像素划分,将待处理图像划分为多个不重叠的超像素定义的区域,以便根据超像素提取上下文信息。其中, 超像素划分规则可以是每一超像素的大小范围。
在一具体实施例中,步骤S502包括:在各级预测分支中,分别根据不同场景分辨率确定的超像素划分规则,对待处理图像进行超像素划分,确定待处理图像在各级预测分支的各超像素。
针对不同场景分辨率配置有不同的超像素划分规则。由于每一级预测分支的场景分辨率不同,因此首先需根据不同场景分辨率,分别确定与之对应的超像素划分规则,每一级预测分支按照所确定的超像素划分规则,利用超像素划分工具对待处理图像进行超像素划分,确定待处理图像在各级预测分支的各超像素。
通常,对于低场景分辨率对应的预测分支,超像素更大、包含更多对象和场景信息;对于高场景分辨率对应的预测分支,超像素更小、更精细化,以避免过多的多样化信息。通过根据场景分辨率大小自适应地调整超像素的大小,有助于捕获不同区域中的复杂对象/场景关系。参照图6(a),给出了一实施例中,在三种不同场景分辨率下超像素划分的示意图。
S504,确定待处理图像对应的卷积特征图中各神经元的局部增强特征,局部增强特征由所属超像素内的各神经元确定。
本实施例中,神经元的局部增强特征由该神经元所属超像素内的各神经元确定,也即通过第一级局部增强确定局部增强特征。其中,某神经元的所属超像素,是指该神经元所在的超像素。
进一步地,步骤S504包括:根据各神经元所属超像素内的神经元,确定各神经元的局部权重;根据局部权重,确定对应神经元的局部增强特征。
假设给定待处理图像I,利用超像素划分工具生成一组不重叠超像素,这一组不重叠的超像素表示为{S i},满足∪ iS i=I,
Figure PCTCN2018110918-appb-000001
可以理解,S i表示第i个超像素,S j表示第j个超像素。在第一级局部增强中,增强位于同一超像素上的神经元。首先,确定各神经元的局部权重,其中,局部权重w i通过以下计算公式获得:
w i(c)=σ[W TB i(c)]                     (2)
其中,W表示通过训练学习到的权重矩阵,
Figure PCTCN2018110918-appb-000002
σ表示带有sigmoid激活函数的全连接层;
Figure PCTCN2018110918-appb-000003
通过以下公式计算获得:
Figure PCTCN2018110918-appb-000004
再者,根据局部权重确定神经元的局部增强特征,通过以下公式实现:
M(x,y,c)=w i(c)·B(x,y,c)                   (4)
其中,B表示卷积特征图,
Figure PCTCN2018110918-appb-000005
B(x,y,c)表示在卷积特征图B中的神经元;(x,y)表示神经元在特征图中的坐标值,(x,y)∈Φ(S i);c表示特征通道标识;w i表示局部权重,
Figure PCTCN2018110918-appb-000006
M表示通过局部增强产生的第一特征图,
Figure PCTCN2018110918-appb-000007
M(x,y,c)表示第一特征图M中的神经元,也即卷积特征图B中神经元B(x,y,c)的局部增强特征。
参照图6(b),为通过局部加权来进行局部增强的示意图。本实施通过采用局部加权的方式实现局部增强,通过引入权值避免了使用相同特征来表示不同神经元。通过w i来加权神经元B(x,y,c),产生神经元M(x,y,c),使得神经元M(x,y,c)与S i中的其他神经元相比,具有不同的表现形式,并且对由权重w i提供的S i的总体含量的敏感度很高。
S506,根据各神经元的局部增强特征,以及相邻超像素内各神经元的局部增强特征,获得各神经元的全局增强特征,分别由各神经元的全局增强特征,组成对应的卷积特征图的上下文信息。
对于任意卷积特征图,利用各神经元的局部增强特征,以及相邻超像素各神经元的局部增强特征,进行第二级全局增强,获得全局增强特征,并由该卷积特征图中各神经元的全局增强特征,组成该卷积特征图的上下文信息。其中,某神经元的相邻超像素,是指与该神经元的所属超像素相邻的超像素。卷积特征图的上下文信息实际也是一特征图,如下所提及的第二特征图Q。
在第二级全局增强中,通过聚集相邻超像素内的局部增强特征产生第二特征图
Figure PCTCN2018110918-appb-000008
第二特征图中的神经元Q(x,y,c)表示为:
Figure PCTCN2018110918-appb-000009
其中,Ν(S i)表示超像素S i的相邻超像素集合;(x',y')表示在S i的相邻超像素中,各神经元的坐标值,(x',y')∈Φ(S j);
Figure PCTCN2018110918-appb-000010
表示第一全局权重,
Figure PCTCN2018110918-appb-000011
Figure PCTCN2018110918-appb-000012
表示第二全局权重,具体是指相邻超像素的权重,
Figure PCTCN2018110918-appb-000013
进一步地,
Figure PCTCN2018110918-appb-000014
Figure PCTCN2018110918-appb-000015
的计算公式如下:
Figure PCTCN2018110918-appb-000016
其中,
Figure PCTCN2018110918-appb-000017
Φ(S i)表示位于超像素S i内的感受野中心的数量;
Figure PCTCN2018110918-appb-000018
Φ(S j)表示位于超像素S j内的感受野中心的数量;w s表示第一权重图,通过对第一特征图M进行3×3卷积运算得到,
Figure PCTCN2018110918-appb-000019
w a表示第二权重图,通过在第一特征图M上进行1×1卷积运算得到,
Figure PCTCN2018110918-appb-000020
如公式(5)所示,第一全局权重
Figure PCTCN2018110918-appb-000021
控制在同一位置传递给神经元Q(x,y,c)的M(x,y,c)的信息,第二全局权重
Figure PCTCN2018110918-appb-000022
控制相邻超像素传递给神经元Q(x,y,c)的M(x',y',c)的信息,实现对全局范围的信息的访问。其中,每个神经元Q(x,y,c)代表神经元B(x,y,c)的上下文信息。为便于描述,将公式(4)所示的局部增强和公式(5)所示的全局增强获得的上下文信息定义为CARF(Context aware receptive field,上下文感知感受野),同时也用CARF表示执行公式(4)所示的局部增强和公式(5)所示的全局增强的网络模型。
如图6(c)所示,该示意图表示在局部增强后,聚集每个超像素中的神经元;如图6(d)所示,该示意图表示通过全局增强进一步增强每个神经元,全局加权使用相邻超像素的内容来形成CARF。
通过采用CARF,能够更好地学习到特征的上下文信息,且基于该方式获得的上下文信息,能够减轻混合过小或过大区域特征的负面影响。进一步地,通过局部增强和全局增强对卷积特征图进行两级增强处理,使得上下文信息能够充分表示具有不同内容的超像素,且能够自适应地调整相邻超像素之间 的通信,从而捕获更多的全局上下文信息。
在一实施例中,步骤S208进一步包括以下子步骤:分别根据各级预测分支的增强特征图和像素区域,获得各级预测分支的分类结果;将各分类结果组合,获得待处理图像的分割图像。
具体地,各级预测分支分别根据所获得的增强特征图和像素区域,获得各像素区域的分类结果。由于每一级预测分支的像素区域不同,因此,每一级预测分支的分类结果也对应为不同像素区域的分类结果。再通过将各像素区域的分类结果组合,获得待处理图像的分割图像。
进一步地,分别根据各级预测分支的增强特征图和像素区域,获得各级预测分支的分类结果,包括:将同级的预测分支上各卷积层的增强特征图合并,获得各级预测分支的分支特征图;根据各级预测分支的像素区域,对对应的分支特征图进行分类预测,获得各级预测分支的分类结果。
其中,分支特征图的分辨率与卷积特征图的最大分辨率相同,为便于描述,将卷积特征图的最大分辨率定义为第一分辨率。具体地,对于同级预测分支上的各卷积层的增强特征图,按照各增强特征图的分辨率从小到大的顺序,通过上采样处理依次合并一个增强特征图,直至合并完第一分辨率的增强特征图,获得该级预测分支的分支特征图。参照图8所示的解码器。而后,根据该级预测分支的像素区域,对分支特征图中对应的像素区域进行分类预测,获得各级预测分支的分类结果。
具体地,合并同级的预测分支上的增强特征图,通过以下公式实现:
U l,k=F l,k+H l+1,k*U l+1,k,k=1,...,K                (7)
其中,U L+1,k=0,因此U L,k=F L,k。H l+1,k表示与特征图U l+1,k和F l,k的大小相匹配的上采样卷积核。对于上采样特征图H l+1,k*U l+1,k,神经元H l+1,k*U l+1,k(x,y,c)包含位于相邻超像素内的神经元信息。通过公式(7)最终得到的特征图U 1,k即为分支特征图。
通过组合同级的各层增强特征图,产生高分辨率的分支特征图,使得用于分类预测的分支特征图具有更多的特征信息,从而获得更高的分割准确度。
在一实施例中,根据各级预测分支的像素区域,对对应的分支特征图进行分类预测,获得各级预测分支的分类结果,包括:根据各级预测分支的像素区域,确定各分支特征图的分类区域;分别对各分支特征图中的分类区域进行分类预测,获得各级预测分支的分类结果。
例如,给定分配给第k级预测分支的像素区域内的所有像素,将分支特征图U 1,k输入至对应预测分支的预测器中,预测器输出该像素区域内所有像素的类标签为集合y k
y k=f(U 1,k)                      (8)
其中,函数f是指用于像素级分类的Softmax预测器。通过将所有预测分支的预测结果y k组合,获得待处理图像I中所有像素的类标签集合y,并在待处理图像I上形成最终分割y,得到分割图像。
参照图4,通过分支预测器1对像素区域1进行分类预测,通过分支预测器2对像素区域2进行分类预测,通过分支预测器3对像素区域3进行分类预测,将三个分支预测器的分类结果组合,输出最终的分割图像。
通过针对不同场景分辨率,构造不同的分支特征图,使得每级预测分支都侧重于特定场景分辨率对应的像素区域的语义分割,再将各分类结果进行组合得到分割图像,有效提高了分割图像的分割精确度。
在一实施例中,图像语义分割方法还包括步骤:获取与待处理图像对应的深度图像。其中,深度图像即可通过对待处理图像进行深度处理得到,也可通过在待处理图像拍摄位置放置传感器,利用传感器感知的距离信息得到。
在一实施例中,如图7所示,提供一种图像语义分割方法,该方法包括以下步骤:
S701,获取与待处理图像对应的深度图像。
S702,按照预设的场景分辨率,对待处理图像对应的深度图像进行离散,确定各级预测分支的像素区域。
S703,利用卷积神经网络对待处理图像进行卷积处理,获得各卷积层的卷积特征图。
S704,在各级预测分支中,分别根据不同场景分辨率确定的超像素划分规则,对待处理图像进行超像素划分,确定待处理图像在各级预测分支的各超像素。
S705,根据各神经元所属超像素内的神经元,确定各神经元的局部权重。
S706,根据局部权重,确定对应神经元的局部增强特征。
S707,根据各神经元的局部增强特征,以及相邻超像素内各神经元的局部增强特征,获得各神经元的全局增强特征,分别由各神经元的全局增强特征,组成对应的卷积特征图的上下文信息。
S708,根据卷积特征图在当前预测分支的上下文信息和在上级预测分支的增强特征图,获得卷积特征图在当前预测分支的增强特征图。
S709,将同级的预测分支上各卷积层的增强特征图合并,获得各级预测分支的分支特征图。
S710,根据各级预测分支的像素区域,确定各分支特征图的分类区域。
S711,分别对各分支特征图中的分类区域进行分类预测,获得各级预测分支的分类结果。
S712,将各分类结果组合,获得待处理图像的分割图像。
进一步地,通过采用图7所示的方法步骤,对预测分支数量分别为1、2、3、4和5时的分割效果,进行了对比实验,对比实验结果如表1所示。
表1不同预测分支数量效果比对表
Figure PCTCN2018110918-appb-000023
从表1可以看出,采用1个预测分支时,分割准确度仅达到43.7%,效果最差,而依次增加预测分支个数至3时,分割准确度稳步上升,并在采用3个预测分支时,分割准确度最高,达到了46.4%。这是由于在单个预测分支中,仅用了一个CARF,因此不能针对不同的场景分辨率实现特定的上下文信息表示,而依次增加预测分支,对应增加了CARF的个数,能够获得更多不同的场景分辨率特定的上下文信息。从表1还可看出,进一步增加预测分支的 数量,即使用4或5个预测分支时,会导致分割准确度下降。这是由于预测分支继续增加时,使用了更大的超像素,过大的超像素不适合使用,因为它们极大地使对象/场景多样化,并因此分散应该由级联结构学习的稳定模式。
再者,通过采用两个预测分支,对不采用局部加权和全局加权的、只采用局部加权、以及同事采用局部加权和全局加权三种方法进行了对比实验,对比实验结果如下表2所示。其中,局部加权是指采用公式(4)所示的局部增强方案,全局加权是指采用公式(5)所示的全局增强方案。
表2局部加权和/或全局加权效果对比
Figure PCTCN2018110918-appb-000024
从表1可以看出,同时采用公式(4)和公式(5)所示的局部加权和全局加权时,分割准确度最高。
如图9所示,为采用三种不同方法,在NYUD-v2数据集上进行图像语义分割的效果对比图。其中,第一列为待处理图像,第二列为对应的ground-truth,第三列为采用RefineNet进行图像语义分割获得的分割图像,第四列为采用现有的CFN(Cascaded feature network,级联特征网络)进行图像语义分割获得的分割图像,第五列为采用图7(包括三级预测分支)所示方法进行图像语义分割获得的分割图像。
从图9可以看出,采用本申请的图像语义分割方法相比于现有图像语义分割方法,具有更好的分割准确度。这是由于通过级联结构以及CARF,能够更好地学习特征的上下文信息,使得上下文信息充分表示具有不同内容的超像素,且能自适应地调整相邻超像素之间的通信,从而捕获更多的全局上下文信息,并通过预测分支级联的方式,丰富各个分支的上下文信息,进而有效提高对待处理图像的分割精确度。
以图4所示的级联特征网络为例,该级联特征网络用于执行图7所示实 施例中的步骤,下面对该级联特征网络的训练过程进行说明。
假设用y *表示待处理图像I的ground-truth,并使用公式(8)计算待处理图像I的分割。为了训练级联特征网络进行分割,整体目标函数J定义为公式(9),网络训练是通过最小化公式(9)中的目标J来完成:
Figure PCTCN2018110918-appb-000025
Figure PCTCN2018110918-appb-000026
其中,Ω k表示第k级预测分支对应像素区域的像素集;概率
Figure PCTCN2018110918-appb-000027
表示预测像素(x,y)具有ground-truth标签
Figure PCTCN2018110918-appb-000028
的概率;J k表示惩罚像素级分类误差的目标函数。
此外,使用标准反向传播算法来训练级联特征网络,在标准反向传播阶段,公式(10)中的特征在每次迭代中都会更新。为了更新已解码的特征图{U l,k|l=1,...,L,k=1,...,K},结合公式(7)-(10)的定义,计算目标函数J相对于U l,k的梯度,具体通过以下公式实现梯度的计算:
Figure PCTCN2018110918-appb-000029
为了更新级联特征网络产生的增强特征图{F l,k|l=1,...,L,k=1,...,K},结合等式(1)、(7)和(11)的定义,计算目标函数J相对于F l,k的梯度,具体通过以下公式实现梯度的计算:
Figure PCTCN2018110918-appb-000030
其中,F l,k的更新信号作为特征图U l,k和U l,k+1的反向传播信息之间的折衷;更新信号
Figure PCTCN2018110918-appb-000031
表示第k级预测分支。利用连接两级预测分支的级联结构,第 k+1级预测分支的信号
Figure PCTCN2018110918-appb-000032
影响训练阶段中F l,k的更新。由于每级相邻的预测分支通过级联结构进行通信,任何两级预测分支都可以有效地进行平衡。
由CARF产生的特征图{Q l,k|l=1,...,L,k=1,...,K}在网络训练期间进行了更新,结合公式(1)的定义,计算目标函数J相对于Q l,k的梯度,具体通过以下公式实现梯度的计算:
Figure PCTCN2018110918-appb-000033
其中,
Figure PCTCN2018110918-appb-000034
可以在公式(12)中获得。
Figure PCTCN2018110918-appb-000035
被视为传播到神经元Q l,k(x,y,c)的梯度。
在第k级预测分支中,更新信号从特征图Q l,k传递到特征图M l,k,影响了M l,k的局部神经元的更新。为了更新对应于图像空间中感受野的神经元M l,k(x,y,c),结合公式(5)的定义,计算目标函数J相对于M l,k(x,y,c)的梯度,具体通过以下公式实现梯度的计算:
Figure PCTCN2018110918-appb-000036
其中,(x,y)∈Φ(S i);
Figure PCTCN2018110918-appb-000037
Figure PCTCN2018110918-appb-000038
可由公式计算得到。进一步结合公式(5)和公式(6)的定义,
Figure PCTCN2018110918-appb-000039
计算如下:
Figure PCTCN2018110918-appb-000040
其中,权重图w g通过在特征图M上应用1×1卷积计算得到。由于1×1卷积不扩展特征图Q的感受野,因此,偏导数
Figure PCTCN2018110918-appb-000041
减小到零,并且省略了公式(15)的最后项。在超像素S i具有许多相邻超像素的情况下,使用公式(15)显著地节省了偏导数的计算。
此外,公式(14)中的
Figure PCTCN2018110918-appb-000042
可以通过如下公式计算得到:
Figure PCTCN2018110918-appb-000043
如公式(14)所示,局部神经元M l,k(x,y,c)的更新受其相邻超像素的信号的影响。虽然这种通信是在相邻超像素上实现,但不相邻的超像素可以沿相邻超像素的路径,依次相互影响。通过级联结构,一个预测分支可以接收来自其他预测分支的信号。此外,利用CARF定义的相邻关系,来自其他预测分支的信号可以扩散到预测分支中的任何局部区域。因此,可以通过捕获不同预测分支中神经元之间关系的信号,来更新特征图M l,k,c
应该理解的是,虽然本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一实施例中,提供一种图像语义分割装置,参照图10,图像语义分割装置1000包括:像素划分模块1002、上下文确定模块1004、增强特征图获取模块1006和分类预测模块1008。
像素划分模块1002,用于按照预设的场景分辨率,对待处理图像对应的深度图像进行离散,确定各级预测分支的像素区域。
上下文确定模块1004,用于确定在各级预测分支中,待处理图像对应的卷积特征图的上下文信息。
增强特征图获取模块1006,用于根据卷积特征图在当前预测分支的上下文信息和在上级预测分支的增强特征图,获得卷积特征图在当前预测分支的增强特征图。
分类预测模块1008,用于根据各级预测分支的增强特征图和像素区域进行分类预测,获得待处理图像的分割图像。
上述图像语义分割装置,通过级联方式,能够将上级预测分支的上下文信息传递至下级预测分支的增强特征图中,丰富了各级预测分支中增强特征图的上下文信息,从而提升了利用各级的增强特征图进行分类预测的精确度。
在一实施例中,上下文确定模块1004进一步包括:超像素划分模块、局部增强模块和全局增强模块。其中:
超像素划分模块,用于在各级预测分支中,分别对待处理图像进行超像素划分,确定各超像素。
局部增强模块,用于确定待处理图像对应的卷积特征图中各神经元的局部增强特征,局部增强特征由所属超像素内的各神经元确定。
全局增强模块,用于根据各神经元的局部增强特征,以及相邻超像素内各神经元的局部增强特征,获得各神经元的全局增强特征,分别由各神经元的全局增强特征,组成对应的卷积特征图的上下文信息。
进一步地,局部增强模块包括局部权重确定模块和局部特征确定模块。其中,局部权重确定模块,用于根据各神经元所属超像素内的神经元,确定各神经元的局部权重;局部特征确定模块,用于根据局部权重,确定对应神经元的局部增强特征。
在一实施例中,超像素划分模块还用于在各级预测分支中,分别根据不同场景分辨率确定的超像素划分规则,对待处理图像进行超像素划分,确定待处理图像在各级预测分支的各超像素。
在一实施例中,分类预测模块1008包括分支预测模块和组合模块。其中,分支预测模块,用于分别根据各级预测分支的增强特征图和像素区域,获得各级预测分支的分类结果;组合模块,用于将各分类结果组合,获得待处理图像的分割图像。
进一步地,分支预测模块包括特征图合并模块和分类结果获取模块。其中,特征图合并模块,用于将同级的预测分支上各卷积层的增强特征图合并, 获得各级预测分支的分支特征图;分类结果获取模块,用于根据各级预测分支的像素区域,对对应的分支特征图进行分类预测,获得各级预测分支的分类结果。
在一实施例中,分类结果获取模块还用于根据各级预测分支的像素区域,确定各分支特征图的分类区域;分别对各分支特征图中的分类区域进行分类预测,获得各级预测分支的分类结果。
在一实施例中,图像语义分割装置还包括深度图像获取模块,用于获取与待处理图像对应的深度图像。
在一实施例中,图像语义分割装置还包括卷积处理模块,用于利用卷积神经网络对待处理图像进行卷积处理,获得各卷积层的卷积特征图。
关于图像语义分割装置的具体限定可以参见上文中对于图像语义分割方法的限定,在此不再赘述。上述图像语义分割装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,其内部结构图可以如图11所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏、输入装置和麦克风阵列。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种图像语义分割方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其 上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一实施例中,提供一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行以上任一项实施例中图像语义分割方法的步骤。
在一实施例中,提供一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以上任一项实施例中图像语义分割方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体,但并不能因此而理解为对专利范围的限制。应指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可做出若干变形和改进,这些都属于本申请保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种图像语义分割方法,所述方法包括:
    按照预设的场景分辨率,对待处理图像对应的深度图像进行离散,确定各级预测分支的像素区域;
    确定在各级所述预测分支中,所述待处理图像对应的卷积特征图的上下文信息;
    根据所述卷积特征图在当前预测分支的上下文信息和在上级预测分支的增强特征图,获得所述卷积特征图在所述当前预测分支的增强特征图;
    根据各级所述预测分支的增强特征图和所述像素区域进行分类预测,获得所述待处理图像的分割图像。
  2. 根据权利要求1所述的方法,其特征在于,所述确定在各级所述预测分支中,所述待处理图像对应的卷积特征图的上下文信息,包括:
    在各级所述预测分支中,分别对所述待处理图像进行超像素划分,确定各超像素;
    确定所述待处理图像对应的卷积特征图中各神经元的局部增强特征,所述局部增强特征由所属超像素内的各神经元确定;
    根据各所述神经元的所述局部增强特征,以及相邻超像素内各所述神经元的所述局部增强特征,获得各所述神经元的全局增强特征,分别由各所述神经元的所述全局增强特征,组成对应的所述卷积特征图的上下文信息。
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述待处理图像对应的卷积特征图中各神经元的局部增强特征,包括:
    根据各所述神经元所属超像素内的神经元,确定各所述神经元的局部权重;
    根据所述局部权重,确定对应所述神经元的局部增强特征。
  4. 根据权利要求2所述的方法,其特征在于,所述在各级所述预测分支中,分别对所述待处理图像进行超像素划分,确定各超像素,包括:
    在各级所述预测分支中,分别根据不同场景分辨率确定的超像素划分规 则,对所述待处理图像进行超像素划分,确定所述待处理图像在各级所述预测分支的各超像素。
  5. 根据权利要求1所述的方法,其特征在于,所述根据各级所述预测分支的增强特征图和所述像素区域进行分类预测,获得所述待处理图像的分割图像,包括:
    分别根据各级所述预测分支的增强特征图和所述像素区域,获得各级所述预测分支的分类结果;
    将各所述分类结果组合,获得所述待处理图像的分割图像。
  6. 根据权利要求5所述的方法,其特征在于,所述分别根据各级所述预测分支的增强特征图和所述像素区域,获得各级所述预测分支的分类结果,包括:
    将同级的所述预测分支上各卷积层的增强特征图合并,获得各级所述预测分支的分支特征图;
    根据各级所述预测分支的所述像素区域,对对应的所述分支特征图进行分类预测,获得各级所述预测分支的分类结果。
  7. 根据权利要求6所述的方法,其特征在于,所述根据各级所述预测分支的所述像素区域,对对应的所述分支特征图进行分类预测,获得各级所述预测分支的分类结果,包括:
    根据各级所述预测分支的所述像素区域,确定各所述分支特征图的分类区域;
    分别对各所述分支特征图中的所述分类区域进行分类预测,获得各级所述预测分支的分类结果。
  8. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取与待处理图像对应的深度图像。
  9. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    利用卷积神经网络对待处理图像进行卷积处理,获得各卷积层的卷积特征图。
  10. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:
    按照预设的场景分辨率,对待处理图像对应的深度图像进行离散,确定各级预测分支的像素区域;
    确定在各级所述预测分支中,所述待处理图像对应的卷积特征图的上下文信息;
    根据所述卷积特征图在当前预测分支的上下文信息和在上级预测分支的增强特征图,获得所述卷积特征图在所述当前预测分支的增强特征图;
    根据各级所述预测分支的增强特征图和所述像素区域进行分类预测,获得所述待处理图像的分割图像。
  11. 根据权利要求10所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:
    在各级所述预测分支中,分别对所述待处理图像进行超像素划分,确定各超像素;
    确定所述待处理图像对应的卷积特征图中各神经元的局部增强特征,所述局部增强特征由所属超像素内的各神经元确定;
    根据各所述神经元的所述局部增强特征,以及相邻超像素内各所述神经元的所述局部增强特征,获得各所述神经元的全局增强特征,分别由各所述神经元的所述全局增强特征,组成对应的所述卷积特征图的上下文信息。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:
    根据各所述神经元所属超像素内的神经元,确定各所述神经元的局部权重;
    根据所述局部权重,确定对应所述神经元的局部增强特征。
  13. 根据权利要求11所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:
    在各级所述预测分支中,分别根据不同场景分辨率确定的超像素划分规则,对所述待处理图像进行超像素划分,确定所述待处理图像在各级所述预测分支的各超像素。
  14. 根据权利要求10所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:
    分别根据各级所述预测分支的增强特征图和所述像素区域,获得各级所述预测分支的分类结果;
    将各所述分类结果组合,获得所述待处理图像的分割图像。
  15. 根据权利要求14所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:
    将同级的所述预测分支上各卷积层的增强特征图合并,获得各级所述预测分支的分支特征图;
    根据各级所述预测分支的所述像素区域,对对应的所述分支特征图进行分类预测,获得各级所述预测分支的分类结果。
  16. 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    按照预设的场景分辨率,对待处理图像对应的深度图像进行离散,确定各级预测分支的像素区域;
    确定在各级所述预测分支中,所述待处理图像对应的卷积特征图的上下文信息;
    根据所述卷积特征图在当前预测分支的上下文信息和在上级预测分支的增强特征图,获得所述卷积特征图在所述当前预测分支的增强特征图;
    根据各级所述预测分支的增强特征图和所述像素区域进行分类预测,获得所述待处理图像的分割图像。
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    在各级所述预测分支中,分别对所述待处理图像进行超像素划分,确定各超像素;
    确定所述待处理图像对应的卷积特征图中各神经元的局部增强特征,所述局部增强特征由所属超像素内的各神经元确定;
    根据各所述神经元的所述局部增强特征,以及相邻超像素内各所述神经元的所述局部增强特征,获得各所述神经元的全局增强特征,分别由各所述神经元的所述全局增强特征,组成对应的所述卷积特征图的上下文信息。
  18. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    根据各所述神经元所属超像素内的神经元,确定各所述神经元的局部权重;
    根据所述局部权重,确定对应所述神经元的局部增强特征。
  19. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    在各级所述预测分支中,分别根据不同场景分辨率确定的超像素划分规则,对所述待处理图像进行超像素划分,确定所述待处理图像在各级所述预测分支的各超像素。
  20. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    分别根据各级所述预测分支的增强特征图和所述像素区域,获得各级所述预测分支的分类结果;
    将各所述分类结果组合,获得所述待处理图像的分割图像。
PCT/CN2018/110918 2018-10-19 2018-10-19 图像语义分割方法、计算机设备和存储介质 WO2020077604A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/110918 WO2020077604A1 (zh) 2018-10-19 2018-10-19 图像语义分割方法、计算机设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/110918 WO2020077604A1 (zh) 2018-10-19 2018-10-19 图像语义分割方法、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2020077604A1 true WO2020077604A1 (zh) 2020-04-23

Family

ID=70284419

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/110918 WO2020077604A1 (zh) 2018-10-19 2018-10-19 图像语义分割方法、计算机设备和存储介质

Country Status (1)

Country Link
WO (1) WO2020077604A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462771A (zh) * 2016-08-05 2017-02-22 深圳大学 一种3d图像的显著性检测方法
CN107403430A (zh) * 2017-06-15 2017-11-28 中山大学 一种rgbd图像语义分割方法
CN108664974A (zh) * 2018-04-03 2018-10-16 华南理工大学 一种基于rgbd图像与全残差网络的语义分割方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462771A (zh) * 2016-08-05 2017-02-22 深圳大学 一种3d图像的显著性检测方法
CN107403430A (zh) * 2017-06-15 2017-11-28 中山大学 一种rgbd图像语义分割方法
CN108664974A (zh) * 2018-04-03 2018-10-16 华南理工大学 一种基于rgbd图像与全残差网络的语义分割方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
OMID HOSSEINI: "Analyzing modular CNN architectures for joint depth prediction and semantic segmentation", 2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA, 24 July 2017 (2017-07-24), XP081398872 *

Similar Documents

Publication Publication Date Title
WO2020216227A9 (zh) 图像分类方法、数据处理方法和装置
US9436895B1 (en) Method for determining similarity of objects represented in images
WO2021043168A1 (zh) 行人再识别网络的训练方法、行人再识别方法和装置
WO2020151167A1 (zh) 目标跟踪方法、装置、计算机装置及可读存储介质
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
JP2017062781A (ja) 深層cnnプーリング層を特徴として用いる、類似度に基づく重要な対象の検知
WO2021147325A1 (zh) 一种物体检测方法、装置以及存储介质
CN110473137A (zh) 图像处理方法和装置
CN107688772A (zh) 保单信息录入的方法、装置、计算机设备及存储介质
CN109544559A (zh) 图像语义分割方法、装置、计算机设备和存储介质
WO2021018251A1 (zh) 图像分类方法及装置
WO2021218517A1 (zh) 获取神经网络模型的方法、图像处理方法及装置
CN110765860A (zh) 摔倒判定方法、装置、计算机设备及存储介质
CN114549913B (zh) 一种语义分割方法、装置、计算机设备和存储介质
Yang et al. An improving faster-RCNN with multi-attention ResNet for small target detection in intelligent autonomous transport with 6G
WO2023036157A1 (en) Self-supervised spatiotemporal representation learning by exploring video continuity
CN114219044A (zh) 一种图像分类方法、装置、终端及存储介质
CN115512251A (zh) 基于双分支渐进式特征增强的无人机低照度目标跟踪方法
CN113240120A (zh) 基于温习机制的知识蒸馏方法、装置、计算机设备和介质
CN111104941B (zh) 图像方向纠正方法、装置及电子设备
Wang et al. Fire detection in video surveillance using superpixel-based region proposal and ESE-ShuffleNet
Xu et al. A novel dynamic graph evolution network for salient object detection
US10789481B2 (en) Video data processing
Hua et al. Visual saliency detection via a recurrent residual convolutional neural network based on densely aggregated features
WO2020077604A1 (zh) 图像语义分割方法、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18937439

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.08.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18937439

Country of ref document: EP

Kind code of ref document: A1