WO2020077535A1 - 图像语义分割方法、计算机设备和存储介质 - Google Patents

图像语义分割方法、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020077535A1
WO2020077535A1 PCT/CN2018/110493 CN2018110493W WO2020077535A1 WO 2020077535 A1 WO2020077535 A1 WO 2020077535A1 CN 2018110493 W CN2018110493 W CN 2018110493W WO 2020077535 A1 WO2020077535 A1 WO 2020077535A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
feature
input
adjacent
interleaved
Prior art date
Application number
PCT/CN2018/110493
Other languages
English (en)
French (fr)
Inventor
林迪
黄惠
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2018/110493 priority Critical patent/WO2020077535A1/zh
Publication of WO2020077535A1 publication Critical patent/WO2020077535A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion

Definitions

  • the present application relates to the field of image segmentation technology, and in particular to an image semantic segmentation method, a computer device, and a storage medium.
  • Image semantic segmentation is one of the important research topics in the field of computer vision and pattern recognition. It is widely used in scenarios such as automatic driving systems, drones, and medical imaging. Its goal is to classify each pixel of the image and divide the image into one. Group the regional blocks with certain semantic meaning, and identify the category of each regional block, and finally get an image with semantic annotation.
  • automatic driving system objects of different categories such as people, vehicles, and trees can be segmented and classified through image semantic segmentation, and different labeling methods can be used for different categories of objects to obtain semantic segmentation images to be based on semantics. Segment images to avoid obstacles such as pedestrians and vehicles.
  • the feature maps obtained by convolution are combined in order from low resolution to high resolution.
  • the low-resolution feature maps may lose information during convolution. Therefore, the feature maps obtained by combining the above methods also have the problem of information attenuation, which further affects the accuracy of semantic segmentation.
  • an image semantic segmentation method a computer device, and a storage medium are provided.
  • An image semantic segmentation method includes:
  • the obtained interlace feature map set includes only one interlace feature map
  • a computer device includes a memory and a processor, and the memory stores computer-readable instructions.
  • the processor causes the processor to perform image semantics in any embodiment The steps of the segmentation method.
  • One or more non-volatile storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform image semantics in any embodiment The steps of the segmentation method.
  • FIG. 1 is an application environment diagram of an image semantic segmentation method in an embodiment
  • FIG. 2 is a schematic flowchart of an image semantic segmentation method in an embodiment
  • FIG. 3 is a schematic diagram of a context interleaving process in an embodiment
  • FIG. 4 is a schematic diagram of performing feature enhancement processing on adjacent feature map pairs in an embodiment
  • FIG. 5 is a schematic flowchart of a step of performing feature enhancement processing in an embodiment
  • FIG. 6 is a schematic diagram of context messaging based on superpixels in an embodiment
  • FIG. 7 is a schematic flowchart of an image semantic segmentation method in an embodiment
  • FIG. 9 is a structural block diagram of an accent-based speech recognition processing device in an embodiment
  • FIG. 10 is a structural block diagram of a computer device in an embodiment.
  • the image semantic segmentation method provided by this application can be applied in the application environment shown in FIG. 1.
  • the terminal 102 detects an image semantic segmentation instruction, it uses a convolutional neural network to perform convolution processing on the input image to be processed to perform convolution filtering on it to obtain a multi-scale feature atlas. Then, each adjacent feature map pair in the multi-scale feature map set is subjected to contextual interleaving processing, and finally an interleaved feature map with the same resolution as the image to be processed is obtained, and a semantic prediction is performed on the interleaved feature map to obtain the image to be processed Corresponding semantic segmentation image.
  • the terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and server-side.
  • an image semantic segmentation method is provided. Taking the method applied to the terminal 102 in FIG. 1 as an example for illustration, it includes the following steps:
  • S202 Perform convolution processing on the image to be processed to obtain a multi-scale feature atlas, and use the multi-scale feature atlas as an input feature atlas for context interleaving processing.
  • the multi-scale feature map set refers to a collection of convolution feature maps of different resolutions, and the multi-scale here is also multi-resolution.
  • the image to be processed and the convolution feature map obtained by the convolution process are sequentially subjected to convolution processing (the convolution processing here is also referred to as convolution processing) to obtain convolution feature maps of different resolutions .
  • All the convolutional feature maps of different resolutions form a multi-scale feature atlas. 3, by convolving X 0 to obtain a convolution feature map X 1 ; then convolving X 1 to obtain a corresponding convolution feature map X 2 ; convolving X 2 to obtain a corresponding convolution
  • the product feature map X 3 , X 0 , X 1 , X 2 and X 3 constitute a multi-scale feature map set.
  • the multi-scale feature atlas is used as the input feature atlas of the context interleaving process, so as to perform the context interleaving process on each convolutional feature map in the multi-scale feature atlas.
  • S204 Perform context interleaving processing on each adjacent feature map pair in the input feature map set to obtain an interleaved feature map set.
  • the adjacent feature map pair refers to the combination of two adjacent resolution feature maps in the same input feature map set. Taking FIG. 3 as an example, X 0 -X 1 is an adjacent feature map pair, X 1 -X 2 is an adjacent feature map pair, and X 2 -X 3 is an adjacent feature map pair.
  • each adjacent feature map pair in the input feature map set is subjected to context interleaving processing to obtain an interleaved feature map corresponding to each adjacent feature map pair, and all the interleaved feature maps form an interleaved feature map set.
  • the context interleaving process refers to the process in which two convolutional feature maps transfer context information to each other in a pair of adjacent feature maps, and finally generate an interleaved feature map.
  • context information refers to the interaction information between different objects and the interaction information between objects and scenes.
  • an object cannot exist alone. It must have a more or less relationship with other objects and environments around it. This is commonly referred to as contextual information.
  • the road usually includes pedestrians and vehicles. There will be a certain co-occurrence between the road, pedestrians and vehicles, and the relevant information that reflects this co-occurrence is the context information.
  • the context information can have It helps to classify and predict pedestrians and vehicles. For example, objects that appear on the road are more likely to be pedestrians or vehicles.
  • step S206 Determine whether the obtained interlace feature map set includes only one interlace feature map. If not, go to step S208; otherwise, go to step S210.
  • the interleaved feature map when the latest interleaved feature map set includes at least two interleaved feature maps, the interleaved feature map continues to be contextually interleaved, and the context interleaved process ends when only one interlaced feature map is finally obtained.
  • FIG. 3 by interleaving the adjacent feature map pairs composed of the up-sampled feature maps, a new interleaved feature map is generated, and then the adjacent feature map pairs composed of the interleaved feature maps are context-interleaved, and iterative , Continuously transfer the context information between adjacent feature maps, and finally get an interleaved feature map with better classification characteristics.
  • the interleaving process of the upper and lower files is performed in an iterative manner in order, so that the context information can be propagated along different dimensions.
  • the first dimension exchanges multi-scale context information between adjacent feature maps along the vertical deep-level structure;
  • the second dimension along the horizontal hierarchy structure feeds the interleaved feature map generated by the context interleaving process to the next stage Context interleaving process.
  • the context information of each feature map is continuously transmitted along these two directions, and encoded into the newly generated interleaved feature map, thereby significantly enhancing the description ability of the features in the interleaved feature map to obtain more accurate semantic identification.
  • S210 Perform semantic prediction on the interleaved feature map to obtain a semantic segmented image corresponding to the image to be processed.
  • the interleaved feature map for semantic prediction is a final interleaved feature map (hereinafter referred to as the final interleaved feature map), and the final interleaved feature map has the same resolution as the image to be processed.
  • the final interleaved feature map is used as a to-be-predicted image to perform semantic prediction, classify and mark objects with different semantics, and obtain a semantic segmentation image corresponding to the image to be processed.
  • the classification identifier may be a different color identifier for different objects, or it may be another representation form that can distinguish objects of different categories.
  • each adjacent feature map pair in the obtained multi-scale feature map set is subjected to context interleaving processing to obtain an interleaved feature set with richer features. Then, each adjacent feature map pair in the interleaved feature atlas is separately subjected to context interleaving, and the adjacent feature map pairs are merged in a bidirectional and recursive manner until the obtained interleaved feature atlas includes only one interleaved feature map.
  • performing context interleaving on each adjacent feature map pair in the input feature map set to obtain an interleaved feature map set includes: performing feature enhancement processing on each adjacent feature map pair in the input feature map set, Obtain the enhanced feature map pair corresponding to each adjacent feature map pair; merge the two final enhanced feature maps in each enhanced feature map pair respectively to obtain the interleaved feature map corresponding to the adjacent feature map pair. Group atlas of attrition characteristics.
  • Feature enhancement processing refers to processing that enhances the descriptive capabilities of features. Specifically, for each adjacent feature map pair, one of the adjacent feature map pairs to be interleaved feature maps is used to perform feature enhancement processing on the other to be interleaved feature maps to generate corresponding to the two to be interleaved feature maps respectively Two final enhanced feature maps, two final enhanced feature maps constitute one enhanced feature map pair. Then the two final enhanced feature maps are combined to generate an interleaved feature map.
  • each interleaved feature map constitutes an interlaced feature map set, and the interleaved feature map set is used as the input feature map set to continue the context Interweaving.
  • the resolution of the interleaved feature map is the same as the high-resolution feature map of the corresponding adjacent feature map pair.
  • performing feature enhancement processing on each adjacent feature map pair in the input feature map set to obtain an enhanced feature map pair corresponding to each adjacent feature map includes: according to the corresponding adjacent feature map Perform feature enhancement processing on each to-be-interleaved feature map in the adjacent feature map pair in the input feature map set to obtain a final enhanced feature map corresponding to each to-be-interleaved feature map; the adjacent feature map is: the feature to be interleaved with the current feature enhancement process The graphs belong to the feature map to be interleaved of an adjacent feature map pair; each final enhanced feature map of the same adjacent feature map pair is combined as an enhanced feature map pair.
  • the other feature map to be interleaved is subjected to feature enhancement processing to generate two feature maps to be interleaved
  • Two final enhanced feature maps corresponding to each other, and the two final enhanced feature maps constitute an enhanced feature map pair.
  • the adjacent feature map pair includes two adjacent resolution feature maps to be interleaved, one feature map to be interleaved is a high-resolution feature map, and the other feature map to be interleaved is a low-resolution feature map.
  • the feature map to be interleaved in the current feature enhancement process is a high-resolution feature map
  • the adjacent feature map is a low-resolution feature map that belongs to a pair of adjacent feature maps
  • the current feature enhancement process is a low-resolution feature map
  • the adjacent feature map is a high-resolution feature map that belongs to a pair of adjacent feature maps.
  • a pair of adjacent feature maps is given Schematic diagram of feature enhancement processing, Figure 4, high-resolution feature map And low-resolution feature maps Perform feature enhancement processing on each other to finally generate two final enhanced feature maps with Among them, l represents the resolution level, T represents the maximum number of intra-interleaving.
  • each adjacent feature map pair in the input feature map set is used as an input feature map pair, and each feature map to be interleaved is used as an input feature map.
  • the adjacent input feature map refers to another input feature map that belongs to the same input feature map pair. Understandably, in the same input feature map pair, the high-resolution input feature map is the adjacent input feature map of the low-resolution input feature map, and the low-resolution input feature map is also the adjacent input of the high-resolution input feature map. Feature map.
  • extracting the context information of each adjacent input feature map is actually extracting the context information of each input feature map.
  • S506 Perform feature enhancement processing on the corresponding input feature map according to the context information of each adjacent input feature map to obtain an enhanced feature map corresponding to each input feature map.
  • the high-resolution input feature map is semantically enhanced to obtain a high-resolution enhanced feature map
  • the low-resolution input The feature map is semantically enhanced to obtain a low-resolution enhanced feature map.
  • the low-resolution input feature map Context information passed to high-resolution input feature maps Correct Perform semantic enhancement processing to obtain high-resolution enhanced feature maps will Context information passed to Correct Perform semantic enhancement processing to obtain low-resolution enhanced feature maps Among them, t represents the number of intra-interleaving.
  • step S508 Determine whether the preset total number of intra-interleaving is reached. If not, go to step S510; otherwise, go to step S512.
  • the total number of intra-interleaving refers to the total number of context interleaving of an adjacent feature map pair. Can be pre-set as needed. Generally, the total number of internal interleaving should not be too large, so as not to cause too long processing time.
  • each enhanced feature map is used as an input feature map, and the same input feature map is combined with corresponding enhanced feature maps as an input feature map pair, and the process returns to step S504.
  • the context-interleaving method is adopted to continue the feature enhancement processing on the newly generated enhanced feature map.
  • each step in FIG. 5 can be performed in a bidirectional and recursive manner through the connection between two LSTM (Long Short-Term Memory, long-short-term memory network) chains, as shown in FIG. 4 .
  • LSTM Long Short-Term Memory, long-short-term memory network
  • the step of performing feature enhancement processing on the corresponding input feature map according to the context information of each adjacent input feature map to obtain an enhanced feature map corresponding to each input feature map includes: Context information of the input feature map and the corresponding input feature map to obtain the enhanced feature of the input feature map; merge the enhanced feature with the corresponding input feature map to obtain the enhanced feature map corresponding to each input feature map.
  • the enhanced feature of the high-resolution input feature map is obtained, and the enhanced feature and the high-resolution input feature map are obtained.
  • Perform merge processing to obtain a high-resolution enhanced feature map; based on the context information of the high-resolution input feature map and the low-resolution input feature map, obtain the enhanced feature of the low-resolution input feature map, and input the enhanced feature and the low-resolution feature map
  • the feature maps are merged to obtain low-resolution enhanced feature maps.
  • the to-be-processed image is an image obtained by superpixel dividing the original image.
  • super-pixel segmentation refers to the process of subdividing a digital image into multiple image sub-regions.
  • a super pixel refers to a small area composed of a series of adjacent pixels with similar characteristics such as color, brightness, and texture.
  • the image semantic segmentation method further includes: performing superpixel segmentation on the original image to obtain a to-be-processed image including a preset number of superpixels.
  • superpixel segmenting the original image the original image is divided into regions defined by multiple non-overlapping superpixels, so as to obtain context information.
  • the step of obtaining the enhanced features of the input feature map according to the context information of each adjacent input feature map and the corresponding input feature map includes: according to the characteristics of each receptive field center in the input feature map, and Context information of corresponding receptive field centers in adjacent input feature maps to obtain enhanced features of each receptive field center in the input feature map.
  • the receptive field refers to the pixel size on the output feature map of each layer of the convolutional neural network, and the size of the area mapped on the input image.
  • the receptive field center is the mapped regional center.
  • the context information of the receptive field center refers to the characteristics of the receptive field center itself and the characteristics of the receptive field center of the adjacent area.
  • each receptive field center in the feature map determines the corresponding receptive field center in the adjacent input feature map, and then obtain the characteristics of the determined receptive field center and the receptive field center of the adjacent area, and The features of the receptive field center in the input feature map and the features obtained from adjacent input feature maps are combined to obtain the enhanced feature of the receptive field center in the input feature map.
  • the enhanced feature set of each receptive field center in the input feature map is the enhanced feature of the input feature map.
  • the corresponding adjacent input feature map is a low-resolution input feature map; when the input feature map is a low-resolution input feature map, the corresponding adjacent The input feature map is the high-resolution input feature map.
  • the context information includes: the aggregation feature of the superpixel to which the receptive field center belongs and the aggregation feature of adjacent superpixels.
  • the adjacent superpixel refers to the superpixel adjacent to the superpixel to which it belongs;
  • the aggregation feature refers to the sum of the characteristics of all the centers of the receptive fields in the area defined by the superpixel.
  • Figure 6 (a) is a low-resolution input feature map Context information, passed to high-resolution input feature map
  • Figure 6 (b) is the input high-resolution feature map Contextual information, passed to the low-resolution input feature map
  • FIG. 6 (a) suppose a high-resolution input feature map There is a receptive field center O, according to the position correspondence, determine the input feature map at low resolution Corresponding to it is the receptive field center O ', O' belongs to superpixel A, and the area adjacent to superpixel A includes: superpixel B, superpixel C, superpixel D, superpixel E and superpixel J. Aggregate all the characteristics of the center of the receptive field in superpixel A, superpixel B, superpixel C, superpixel D, superpixel E, and superpixel J to obtain the aggregated features of each superpixel, and transfer the context information including each aggregated feature Give the feeling field center O.
  • (h, w) represents the receptive field center in the feature map coordinate of, Wherein represents the center of the receptive field, ⁇ (S n) represents the coordinates of the area to feel the center of the field set S n, n represents the area identification.
  • m represents the identifier of each area in the neighborhood N (S n ).
  • the enhanced features are determined by the gate function value and cell state, and the gate function value and cell state are further determined by the context information of the adjacent input feature map and the corresponding input feature map .
  • the step of obtaining the enhanced feature of each receptive field center in the input feature map includes the following sub-steps : According to the characteristics of each receptive field center in the input feature map and the context information of the corresponding receptive field center in the adjacent input feature map, obtain the gate function value and cell state; according to the gate function value and cell state, obtain the input feature map Enhanced features of each receptive field center.
  • the gate function value refers to the specific function value of the input gate, input value, forget gate and output gate of the receptive field center in LSTM.
  • Figure 4 Represents the hidden features of the corresponding LSTM unit output, including the gate function value and cell status of the output gate.
  • the gate function value of the LSTM and the calculation method of the cell state are combined to calculate the gate function value by using the characteristics of each receptive field center in the input feature map and the context information of the corresponding receptive field center in the adjacent input feature map And cell status.
  • the step of obtaining the gate function value and the cell state includes the following sub-steps: according to the input feature map The characteristics of each receptive field center in and the context information of the corresponding receptive field center in the adjacent input feature map obtain the gate function value; according to the gate function value and the historical cell state, the cell state is obtained.
  • the historical cell state refers to the cell state calculated last time when feature enhancement processing is performed on feature maps of the same resolution rate.
  • the LSTM cell from the feature map with Generate enhanced feature maps For example, the calculation of the gate function value and cell state is realized by the following formula:
  • W represents the convolution kernel corresponding to the gate or cell state
  • b represents the deviation
  • is a preset coefficient.
  • the step of obtaining the enhanced feature of each receptive field center in the input feature map according to the gate function value and the cell state includes: obtaining the enhancement of each receptive field center in the input feature map according to the gate function value and the cell state of the output gate feature. This is achieved by the following formula:
  • the step of combining the enhanced features with the corresponding input feature maps to obtain the enhanced feature maps corresponding to the input feature maps includes: combining the features of each receptive field center in the input feature maps with the corresponding The enhanced features of are combined to obtain the enhanced feature map corresponding to each input feature map.
  • each receptive field center in the input feature map and the corresponding enhanced features are added to obtain the features of each receptive field center in the enhanced feature map, thereby obtaining an enhanced feature map.
  • Receptive field center (h, w) in the enhanced feature map Features.
  • the enhanced feature map contains features with a larger receptive field, that is, a richer global context.
  • the cellular state of LSTM can also remember the context information exchanged in different stages.
  • the local context from the early stage can be easily propagated to the final stage, and the multi-scale context information including local and global information is encoded into the final enhanced feature map.
  • the two final enhanced feature maps in each enhanced feature map pair are combined to obtain an interleaved feature map corresponding to the adjacent feature map pair
  • the step of forming an interwoven feature map set by each interwoven feature map includes: The following sub-steps: separately up-sampling each enhanced feature map to the final enhanced feature map at low and medium resolution to obtain an up-sampled feature map, the resolution of the up-sampled feature map and the enhanced feature map to the final enhanced feature at medium and high resolution
  • the graphs are the same; the features of the upsampling feature map and the high-resolution final enhanced feature map are merged to obtain the interleaved feature map corresponding to the adjacent feature map pair, and the interleaved feature maps are composed of the interleaved feature maps.
  • a specific up-sampling convolution kernel is used to up-sample the low-resolution final enhanced feature map to obtain an up-sampled feature map, and the corresponding features in the high-resolution final enhanced feature map are added together
  • the interleaved feature maps corresponding to the adjacent feature map pairs and the interleaved feature maps are composed of the interleaved feature maps.
  • the upsampling convolution kernel adopted is also different for the upsampling processing of different resolutions.
  • the low-resolution final enhanced feature map is up-sampled, and then merged with the high-resolution final enhanced feature map, which is achieved by the following formula:
  • Q l represents the interleaved feature map
  • an image semantic segmentation method which specifically includes the following steps:
  • Super pixel segmentation is performed on the original image to obtain a to-be-processed image including a preset number of super pixels.
  • S701 Perform convolution processing on the image to be processed to obtain a multi-scale feature atlas, and use the multi-scale feature atlas as an input feature atlas for context interleaving processing.
  • S702 Use each adjacent feature map pair in the input feature map set as an input feature map pair, and use each feature map to be interleaved as an input feature map.
  • the context information includes: the aggregation feature of the superpixel to which the receptive field center belongs and the aggregation feature of adjacent superpixels.
  • S704 Obtain the gate function value according to the characteristics of each receptive field center in the input feature map and the context information of the corresponding receptive field center in the adjacent input feature map.
  • S706 Obtain the enhanced feature of each receptive field center in the input feature map according to the gate function value and the state of the cell.
  • S707 Combine the features of each receptive field center in the input feature map with the corresponding enhanced features to obtain an enhanced feature map corresponding to each input feature map.
  • step S708 Determine whether the preset total number of intra-interleaving is reached. If not, go to step S709; otherwise, go to step S710.
  • S712 Combine the features of the up-sampled feature map and the high-resolution final enhanced feature map to obtain an interleaved feature map corresponding to the adjacent feature map pair, and each interleaved feature map constitutes an interwoven feature map set.
  • step S713 Determine whether the obtained interlace feature map set includes only one interlace feature map. If not, go to step S714; otherwise, go to step S715.
  • step S714 Use the interleaved feature atlas as the input feature atlas of the context interleaving process, and return to step S702.
  • S715 Perform semantic prediction on the interleaved feature map to obtain a semantic segmented image corresponding to the image to be processed.
  • the results of semantic segmentation of the original image and actual segmentation using the existing image semantic segmentation method and the method shown in FIG. 7 are shown respectively.
  • the first column is the original image
  • the second column is the ground-truth segmentation effect map
  • the third column is the effect image segmentation using the ASPP (Atrous Spatial pyramiding, porous space pyramid pooling) model
  • the fourth column is the Encoder- Decoder + ASPP model (encoder-decoder with ASPP model) segmentation effect diagram
  • the fifth column is the effect diagram using the implementation method shown in FIG. 7.
  • the method of the present application obviously has a more accurate segmentation effect than the existing method.
  • the adjacent feature map pairs are interleaved in a bidirectional and recursive manner, and the adjacent feature map pairs are merged, so that the context information of each feature map is continuously transmitted along the vertical and horizontal dimensions.
  • To the newly generated interleaved feature map which significantly enhances the ability to describe the features in the interleaved feature map, so that the final obtained interleaved feature map has better classification characteristics, and then enables the final prediction of the interleaved feature map for semantic prediction More precise semantic segmentation of images.
  • steps in the embodiments of the present application are not necessarily executed in the order indicated by the step numbers. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The order is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • an image semantic segmentation apparatus 900 includes: a convolution module 902, a context interleaving module 904, an input feature atlas determination module 906, and a prediction module 908. among them:
  • the convolution module 902 is configured to perform convolution processing on the image to be processed to obtain a multi-scale feature atlas, and use the multi-scale feature atlas as an input feature atlas for context interleaving processing.
  • the context interleaving module 904 is configured to perform a context interleaving process on each adjacent feature map pair in the input feature map set to obtain an interleaved feature map set.
  • the input feature atlas determination module 906 is configured to use the interleaved feature atlas as the input feature atlas of the context interleaving process until the obtained interleaved feature atlas includes only one interleaved feature atlas.
  • the prediction module 908 is used to perform semantic prediction on the interleaved feature map to obtain a semantic segmented image corresponding to the image to be processed.
  • the context interleaving module includes a feature enhancement module and a feature map merging module. among them:
  • the feature enhancement module is used to perform feature enhancement processing on each adjacent feature map pair in the input feature map set to obtain enhanced feature map pairs corresponding to each adjacent feature map pair.
  • the feature map merging module is used to combine the two final enhanced feature maps in each enhanced feature map pair to obtain an interleaved feature map corresponding to the adjacent feature map pair.
  • the feature enhancement module is further configured to perform feature enhancement processing on each feature map to be interleaved in each pair of adjacent feature maps in the input feature map set according to the corresponding adjacent feature maps, respectively, to obtain each feature map to be interleaved Corresponding final enhanced feature maps; each final enhanced feature map of the same adjacent feature map pair is combined as an enhanced feature map pair.
  • the adjacent feature map is: a feature map to be interleaved that belongs to a pair of adjacent feature maps that are the same as the feature map to be interleaved in the current feature enhancement process.
  • the feature enhancement module includes: an input feature map determination module, a context extraction module, and a feature enhancement sub-module. among them:
  • the input feature map determination module is configured to use each adjacent feature map pair in the input feature map set as an input feature map pair, and each feature map to be interleaved as an input feature map.
  • the context extraction module is used to extract the context information of each adjacent input feature map.
  • the feature enhancement submodule is configured to perform feature enhancement processing on the corresponding input feature map according to the context information of each adjacent input feature map to obtain an enhanced feature map corresponding to each input feature map.
  • the input feature map determination module is also used to use each enhanced feature map as an input feature map, and combine the same input feature map for each corresponding enhanced feature map as an input feature map pair until reaching a preset inter-pair interweave The total number of times.
  • the feature enhancement sub-module includes an enhanced feature determination module and a feature merge module.
  • the enhanced feature determination module is used to obtain the enhanced features of the input feature map according to the context information of each adjacent input feature map and the corresponding input feature map;
  • the feature merge module is used to combine the enhanced feature with the corresponding input feature The maps are merged to obtain enhanced feature maps corresponding to each input feature map.
  • the enhanced feature determination module is further used to obtain each receptive field center in the input feature map based on the characteristics of each receptive field center in the input feature map and the context information of the corresponding receptive field center in the adjacent input feature map Enhanced features.
  • the enhanced feature determination module includes a parameter determination module and an enhanced feature determination sub-module.
  • the parameter determination module is used to obtain the gate function value and the cell state according to the characteristics of each receptive field center in the input feature map and the context information of the corresponding receptive field center in the adjacent input feature map.
  • the enhanced feature determination submodule is used to obtain the enhanced feature of each receptive field center in the input feature map according to the gate function value and the state of the cell.
  • the parameter determination module includes a gate function determination module and a cell state determination module.
  • the gate function determination module is used to obtain the gate function value according to the characteristics of each receptive field center in the input feature map and the context information of the corresponding receptive field center in the adjacent input feature map.
  • the cell state determination module is used to obtain the cell state based on the gate function value and the historical cell state.
  • the feature map merging module is also used to combine the features of the centers of the receptive fields in the input feature map with the corresponding enhanced features to obtain enhanced feature maps corresponding to the input feature maps.
  • the feature map merging module includes an upsampling module and a merging submodule. among them:
  • the upsampling module is used to upsample each enhanced feature map to the final enhanced feature map of low and medium resolutions to obtain an upsampled feature map.
  • the enhanced feature map is the same.
  • the merging submodule is used to combine the features of the up-sampled feature map and the high-resolution final enhanced feature map to obtain the interleaved feature map corresponding to the adjacent feature map pair, and the interleaved feature maps are composed of the interwoven feature maps.
  • the image semantic segmentation device further includes a superpixel processing module for superpixel segmenting the original image to obtain a to-be-processed image including a preset number of superpixels.
  • the above image semantic segmentation device performs contextual interleaving of each adjacent feature map pair in a bidirectional and recursive manner, and merges the adjacent feature map pairs, so that the context information of each feature map is continuously transmitted along the vertical and horizontal dimensions and encoded To the newly generated interleaved feature map, which significantly enhances the descriptive ability of the features in the interleaved feature map, so that the final obtained interleaved feature map has better classification characteristics, which in turn makes the final obtained interleaved feature map for semantic prediction, Can get more accurate semantic segmentation image.
  • Each module in the above image semantic segmentation device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above modules may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and its internal structure diagram may be as shown in FIG. 10.
  • the computer equipment includes a processor, memory, network interface, display screen, input device, and microphone array connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer programs.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with external terminals through a network connection. When the computer program is executed by the processor, an image semantic segmentation method is realized.
  • the display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the computer device housing , Can also be an external keyboard, touchpad or mouse.
  • FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Include more or less components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • a computer device which includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to perform image semantics in any of the above embodiments. The steps of the segmentation method.
  • one or more non-volatile storage media storing computer readable instructions are provided, and when the computer readable instructions are executed by one or more processors, the one or more processors execute any of the above The steps of the image semantic segmentation method in the embodiment.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及一种图像语义分割方法、计算机设备和存储介质。该方法包括:对待处理图像进行卷积处理,得到多尺度特征图集,将多尺度特征图集作为上下文交织处理的输入特征图集;对输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集;将交织特征图集作为上下文交织处理的输入特征图集,返回对输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集的步骤,直至所获得的交织特征图集仅包括一个交织特征图;对交织特征图进行语义预测,获得与待处理图像对应的语义分割图像。通过上下文交织不断学习相邻特征图的上下文信息,使最终获得的交织特征图具有更好的分类特性,进而得到更精确的语义分割图像。

Description

图像语义分割方法、计算机设备和存储介质 技术领域
本申请涉及图像分割技术领域,特别是涉及一种图像语义分割方法、计算机设备和存储介质。
背景技术
图像语义分割是计算机视觉和模式识别领域重要研究课题之一,广泛应用于自动驾驶系统、无人机、医学影像等场景中,其目标是对图像的每个像素点进行分类,将图像分割成一组具有一定语义含义的区域块,并识别出每个区域块的类别,最终得到一幅具有语义标注的图像。以应用于自动驾驶系统为例,通过图像语义分割可将人、车辆、树木等不同类别的对象进行分割归类,并针对不同类别的对象采用不同的标注方式,得到语义分割图像,以根据语义分割图像避让行人和车辆等障碍。
在现有的图像语义分割方法中,按照低分辨率至高分辨率的顺序,依次组合卷积获得的各特征图。而低分辨率特征图由于卷积时会遗漏信息,因此通过上述方式组合得到的特征图,也存在信息衰减的问题,进而影响语义分割的准确性。
申请内容
根据本申请的各种实施例,提供一种图像语义分割方法、计算机设备和存储介质。
一种图像语义分割方法,所述方法包括:
对待处理图像进行卷积处理,得到多尺度特征图集,将所述多尺度特征图集作为上下文交织处理的输入特征图集;
对所述输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集;
将所述交织特征图集作为所述上下文交织处理的输入特征图集,返回对所述输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织 特征图集的步骤,直至所获得的所述交织特征图集仅包括一个交织特征图;
对所述交织特征图进行语义预测,获得与所述待处理图像对应的语义分割图像。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行任一项实施例中图像语义分割方法的步骤。
一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行任一项实施例中图像语义分割方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中图像语义分割方法的应用环境图;
图2为一个实施例中图像语义分割方法的流程示意图;
图3为一个实施例中上下文交织处理过程的示意图;
图4为一个实施例中相邻特征图对进行特征增强处理的示意图;
图5为一个实施例中进行特征增强处理步骤的流程示意图;
图6为一个实施例中基于超像素进行上下文信传递的示意图;
图7为一个实施例中图像语义分割方法的流程示意图;
图8为一个实施例中图像语义分割效果对比图;
图9为一个实施例中基于口音的语音识别处理装置的结构框图;
图10为一个实施例中计算机设备的结构框图。
具体实施方式
为使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步的详细说明。应当理解,此处所描述的具体实施方式仅仅用以解释本申请,并不限定本申请的保护范围。
本申请提供的图像语义分割方法,可以应用于如图1所示的应用环境中。当终端102检测到图像语义分割指令时,利用卷积神经网络对输入的待处理图像进行卷积处理,以对其进行卷积滤波,获得多尺度特征图集。而后对多尺度特征图集中的各相邻特征图对分别进行上下文交织处理,最终获得与待处理图像具有分辨率相同的交织特征图,并对该交织特征图进行语义预测,获得与待处理图像对应的语义分割图像。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和服务器端等。
在一个实施例中,如图2所示,提供了一种图像语义分割方法,以该方法应用于图1中的终端102为例进行说明,包括以下步骤:
S202,对待处理图像进行卷积处理,得到多尺度特征图集,将多尺度特征图集作为上下文交织处理的输入特征图集。其中,多尺度特征图集是指不同分辨率的卷积特征图的集合,这里的的多尺度也即是多分辨率。
具体地,通过不同卷积核,依次对待处理图像以及卷积处理得到的卷积特征图进行卷积处理(此处的卷积处理也即卷积处理),得到不同分辨率的卷积特征图,所有不同分辨率的卷积特征图即组成多尺度特征图集。参照图3,通过对X 0进行卷积处理,得到卷积特征图X 1;再对X 1进行卷积,得到对应的卷积特征图X 2;对进行X 2卷积,得到对应的卷积特征图X 3,X 0、X 1、X 2和X 3组成多尺度特征图集。将多尺度特征图集作为上下文交织处理的输入特征图集,以便对多尺度特征图集中的各卷积特征图进行上下文交织处理。
S204,对输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集。
其中,相邻特征图对是指同一输入特征图集中,两个相邻分辨率的特征图组合。以图3为例,X 0-X 1为一个相邻特征图对,X 1-X 2为一个相邻特征图对,X 2-X 3为一个相邻特征图对。
本实施例中,对输入特征图集中的各相邻特征图对分别进行上下文交织处理,得到与各相邻特征图对相应的交织特征图,所有交织特征图组成一个交织特征图集。
上下文交织处理是指在相邻特征图对中,两个卷积特征图相互传递上下文信息,最终生成一个交织特征图的处理。其中,上下文信息是指不同的对象之间的相互作用信息、对象与场景之间的相互作用信息。在图像中,某一对象不可能单独的存在,它一定会与周围其他的对象和环境有着或多或少的关系,这就是通常所说的上下文信息。比如,在拍摄的马路图片中,马路上通常包括行人、车辆,马路、行人和车辆之间会存在一定共现性,而体现这一共现性的相关信息即为上下文信息,该上下文信息能够有助于对行人、车辆进行分类预测,比如,在马路上出现的物体是行人或车辆的几率更大。
S206,判断所获得的交织特征图集是否仅包括一个交织特征图。若否,执行步骤S208;否则,执行步骤S210。
S208,将交织特征图集作为上下文交织处理的输入特征图集,返回步骤S204。
本实施例中,当最新得到的交织特征图集包括至少两个交织特征图时,继续对交织特征图进行上下文交织处理,直至最终仅能得到一个交织特征图时,结束上下文交织处理。如图3所示,通过对由上采样特征图构成的相邻特征图对进行上下文交织,生成新的交织特征图,再对由交织特征图构成的相邻特征图对进行上下文交织,依次迭代,将上下文信息在相邻特征图之间不断传递,最终得到一个具有更好分类特性的交织特征图。
从图3可看出,采用依次迭代的方式进行上下文件交织处理,可使上下文信息沿着不同的维度进行传播。第一个维度沿着垂直的深层次结构,在相邻特征图之间交换多尺度上下文信息;第二维度沿着水平层次结构,将上下文交织处理生成的交织特征图被馈送到下一阶段的上下文交织处理。各特征图的上下文信息沿着这两个方向维度不断传递,编码至新生成的交织特征图中,从而显著增强交织特征图中特征的描述能力,以获得更精确的语义标识。
S210,对交织特征图进行语义预测,获得与待处理图像对应的语义分割图像。
其中,进行语义预测的交织特征图为最终得到的一个交织特征图(以下简称最终交织特征图),且最终交织特征图具有与待处理图像相同的分辨率。
具体地,将最终交织特征图作为待预测图进行语义预测,对具有不同语义的对象进行分类标识,获得与待处理图像对应的语义分割图像。其中,分类标识可以是针对不同对象采用不同的颜色标识,也可以是其他可区分不同类别的对象的表示形式。
上述图像语义分割方法,通过对待处理图像进行卷积处理,将得到的多尺度特征图集中的各相邻特征图对,分别进行上下文交织处理,获得具有更丰富特征的交织特征图集。再对交织特征图集作中的各相邻特征图对分别进行上下文交织处理,以双向和递归的方式合并相邻特征图对,直至所获得的交织特征图集仅包括一个交织特征图。通过递归的上下文交织处理,不断学习相邻特征图的上下文信息,使得最终获得的交织特征图具有更好的分类特性,进而使利用最终获得的交织特征图进行语义预测时,能够得到更为精确的语义分割图像。
在一实施例中,对输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集,包括:对输入特征图集中的各相邻特征图对分别进行特征增强处理,获得与各相邻特征图对相应的增强特征图对;分别将各增强特征图对中的两个最终增强特征图合并,获得与相邻特征图对相应的交织特征图,由各交织特征图组成交织特征图集。
特征增强处理是指增强特征的描述性能力的处理。具体地,针对每个相邻特征图对,利用相邻特征图对中的其中一个待交织特征图,对另一个待交织特征图进行特征增强处理,生成与两个待交织特征图分别对应的两个最终增强特征图,两个最终增强特征图即构成一个增强特征图对。再对两个最终增强特征图进行合并,生成交织特征图。
由于输入特征图集中的每个相邻特征图对,均生成一个对应的交织特征 图,各交织特征图即组成一个交织特征图集,以将交织特征图集作为输入特征图集,继续进行上下文交织处理。其中,交织特征图的分辨率与对应的相邻特征图对中的高分辨率特征图相同。
在一实施例中,对输入特征图集中的各相邻特征图对分别进行特征增强处理,获得与各相邻特征图对应的增强特征图对,包括:分别根据对应的相邻特征图,对输入特征图集中各相邻特征图对中的各待交织特征图进行特征增强处理,获得各待交织特征图对应的最终增强特征图;相邻特征图为:与当前特征增强处理的待交织特征图同属一个相邻特征图对的待交织特征图;将同一相邻特征图对的各最终增强特征图组合,作为增强特征图对。
具体地,针对每个相邻特征图对,利用相邻特征图对中的其中一个待交织特征图的上下文信息,对另一个待交织特征图进行特征增强处理,生成与两个待交织特征图分别对应的两个最终增强特征图,两个最终增强特征图即构成一个增强特征图对。
由于相邻特征图对包括两个相邻分辨率的待交织特征图,其中一个待交织特征图为高分辨率特征图,另一个待交织特征图为低分辨率特征图。可以理解,当前特征增强处理的待交织特征图为高分辨率特征图,则相邻特征图为同属一个相邻特征图对的低分辨率特征图;当前特征增强处理的为低分辨率特征图,则相邻特征图为同属一个相邻特征图对的高分辨率特征图。
更具体地,利用高分辨率特征图的上下文信息,对低分辨率特征图进行特征增强处理,生成一个低分辨率的最终增强特征图,利用低分辨率特征图的上下文信息,对高分辨率特征图进行特征增强处理,生成一个高分辨率的最终增强特征图,两个最终增强特征图即构成一个增强特征图对。
参照图4,给出一相邻特征图对
Figure PCTCN2018110493-appb-000001
进行特征增强处理的示意图,图4中,高分辨率特征图
Figure PCTCN2018110493-appb-000002
和低分辨率特征图
Figure PCTCN2018110493-appb-000003
互相进行特征增强处理,最终生成两个最终增强特征图
Figure PCTCN2018110493-appb-000004
Figure PCTCN2018110493-appb-000005
其中,l表示分辨率等级,T表示对内交织的最大次数。
在一实施例中,如图5所示,分别根据对应的相邻特征图,对输入特征 图集中各相邻特征图对中的各待交织特征图进行特征增强处理,获得各待交织特征图对应的最终增强特征图的步骤,包括以下子步骤:
S502,将输入特征图集中的各相邻特征图对作为输入特征图对,将各待交织特征图作为输入特征图。
例如,
Figure PCTCN2018110493-appb-000006
作为其中一个输入特征图对,
Figure PCTCN2018110493-appb-000007
Figure PCTCN2018110493-appb-000008
均作为输入特征图。
S504,提取各相邻输入特征图的上下文信息。
其中,相邻输入特征图是指同属一个输入特征图对的另一个输入特征图。可以理解,在同一个输入特征图对中,高分辨率输入特征图是低分辨率输入特征图的相邻输入特征图,低分辨率输入特征图同样也是高分辨率输入特征图的相邻输入特征图。
本实施例中,提取各相邻输入特征图的上下文信息,实际就是提取各输入特征图的上下文信息。
S506,分别根据各相邻输入特征图的上下文信息,对对应的输入特征图进行特征增强处理,获得与各输入特征图对应的增强特征图。
具体地,根据低分辨率输入特征图的上下文信息,对高分辨率输入特征图进行语义增强处理,获得高分辨率增强特征图;根据高分辨率输入特征图的上下文信息,对低分辨率输入特征图进行语义增强处理,获得低分辨率增强特征图。
如图4所示,将低分辨率输入特征图
Figure PCTCN2018110493-appb-000009
的上下文信息,传递给高分辨率输入特征图
Figure PCTCN2018110493-appb-000010
Figure PCTCN2018110493-appb-000011
进行语义增强处理,获得高分辨率增强特征图
Figure PCTCN2018110493-appb-000012
Figure PCTCN2018110493-appb-000013
的上下文信息传递给
Figure PCTCN2018110493-appb-000014
Figure PCTCN2018110493-appb-000015
进行语义增强处理,获得低分辨率增强特征图
Figure PCTCN2018110493-appb-000016
其中,t表示对内交织次数。
S508,判断是否达到预设的对内交织总次数。若否,执行步骤S510;否则,执行步骤S512。
其中,对内交织总次数是指一个相邻特征图对进行上下文交织的总次数。可按需预先设置。通常对内交织总次数不宜过大,以免造成处理耗时过长。
S510,将各增强特征图作为输入特征图,将同一输入特征图对相应的各 增强特征图组合,作为输入特征图对,返回步骤S504。
当未达到对内交织总次数时,则采用上下文交织的方式,继续对新生成的增强特征图进行特征增强处理。
S512,将最终获得的各增强特征图作为最终增强特征图。
具体地,图5中各步骤所实现的上下文交织,可通过两个LSTM(Long Short-Term Memory,长短期记忆网络)链之间的连接,以双向和递归的方式进行,如图4所示。
在一实施例中,分别根据各相邻输入特征图的上下文信息,对对应的输入特征图进行特征增强处理,获得与各输入特征图对应的增强特征图的步骤,包括:分别根据各相邻输入特征图的上下文信息,以及对应的输入特征图,获得输入特征图的增强特征;将增强特征与对应的输入特征图进行合并处理,获得与各输入特征图对应的增强特征图。
具体地,在同一输入特征图对中,根据低分辨率输入特征图的上下文信息和高分辨率输入特征图,获得高分辨率输入特征图的增强特征,将增强特征与高分辨率输入特征图进行合并处理,获得高分辨率增强特征图;根据高分辨率输入特征图的上下文信息和低分辨率输入特征图,获得低分辨率输入特征图的增强特征,将该增强特征与低分辨率输入特征图进行合并处理,获得低分辨率增强特征图。
在一实施例中,待处理图像为对原始图像进行超像素分割后的图像。其中,超像素分割是指将数字图像细分为多个图像子区域的过程。超像素是指由一系列位置相邻且颜色、亮度、纹理等特征相似的像素点组成的小区域。
进一步地,图像语义分割方法还包括:对原始图像进行超像素分割,获得包括预设数量的超像素的待处理图像。通过对原始图像进行超像素分割,将原始图像划分为多个不重叠的超像素定义的区域,以便于获取上下文信息。
在一实施例中,分别根据各相邻输入特征图的上下文信息,以及对应的输入特征图,获得输入特征图的增强特征的步骤,包括:根据输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信 息,获得输入特征图中各感受野中心的增强特征。
其中,感受野是指卷积神经网络每一层输出特征图上的像素点,在输入图像上映射的区域大小。感受野中心即为映射的区域中心。感受野中心的上下文信息即指该感受野中心本身的特征,以及相邻区域的感受野中心的特征。
具体地,首先输入特征图中各感受野中心进行映射,确定相邻输入特征图中与之对应的感受野中心,再获取所确定的感受野中心以及相邻区域的感受野中心的特征,将输入特征图中感受野中心本身的特征,以及从相邻输入特征图获取的特征进行合并,得到输入特征图中感受野中心的增强特征。输入特征图中每一个感受野中心的增强特征集合,即为输入特征图的增强特征。
可以理解,当输入特征图为高分辨率输入特征图时,对应的相邻输入特征图中即为低分辨率输入特征图;当输入特征图为低分辨率输入特征图时,对应的相邻输入特征图中即为高分辨率输入特征图。
进一步地,当待处理图像为对原始图像进行超像素分割后的图像时,上下文信息包括:感受野中心所属的超像素的聚合特征以及相邻超像素的聚合特征。其中,相邻超像素是指与所属超像素相邻的超像素;聚合特征是指超像素定义的区域内所有感受野中心的特征之和。
如图6所示,为基于超像素进行上下文信传递的示意图。其中,图6(a)为将低分辨率输入特征图
Figure PCTCN2018110493-appb-000017
的上下文信息,传递至高分辨率输入特征图
Figure PCTCN2018110493-appb-000018
图6(b)为将高分辨率输入特征图
Figure PCTCN2018110493-appb-000019
的上下文信息,传递至低分辨率输入特征图
Figure PCTCN2018110493-appb-000020
以图6(a)为例,假设高分辨率输入特征图
Figure PCTCN2018110493-appb-000021
中存在一个感受野中心O,根据位置对应关系,确定在低分辨率输入特征图
Figure PCTCN2018110493-appb-000022
中与之对应的为感受野中心O’,O’属于超像素A,与超像素A相邻的区域包括:超像素B、超像素C、超像素D、超像素E和超像素J。聚合超像素A、超像素B、超像素C、超像素D、超像素E和超像素J中所有感受野中心的特征,分别得到各超像素的聚合特征,将包括各聚合特征的上下文信息传递给感受野中心O。
假设给定特征图
Figure PCTCN2018110493-appb-000023
和区域S n,聚合区域S n中所有感受野中心的特征,得 到区域S n的聚合特征
Figure PCTCN2018110493-appb-000024
Figure PCTCN2018110493-appb-000025
其中,(h,w)表示感受野中心在特征图中
Figure PCTCN2018110493-appb-000026
的坐标,
Figure PCTCN2018110493-appb-000027
表示感受野中心的特征,φ(S n)表示区域S n内感受野中心的坐标集合,n表示区域标识。
进一步地,通过聚合区域S n的邻域N(S n)(相邻超像素定义的区域集合)的特征,得到一个更全局的聚合特征
Figure PCTCN2018110493-appb-000028
Figure PCTCN2018110493-appb-000029
其中,m表示邻域N(S n)中各区域的标识。
在一实施例中,采用LSTM进行上下文交织时,增强特征由的门函数值和细胞状态确定,而门函数值和细胞状态进一步由相邻输入特征图的上下文信息,以及对应的输入特征图确定。
具体地,根据输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信息,获得输入特征图中各感受野中心的增强特征的步骤,包括以下子步骤:根据输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信息,获得门函数值和细胞状态;根据门函数值和细胞状态,获得输入特征图中各感受野中心的增强特征。
其中,门函数值是指感受野中心在LSTM的输入门、输入值、遗忘门和输出门的具体函数值。在图4中,
Figure PCTCN2018110493-appb-000030
表示对应LSTM单元输出的隐藏特征,包括输出门的门函数值和细胞状态。
具体地,结合LSTM的门函数值和细胞状态的计算方式,利用输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信息进行计算,获得门函数值和细胞状态。
进一步地,根据输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信息,获得门函数值和细胞状态的步骤,包括以下子步骤:根据输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信息,获得门函数值;根据门函数值和历史 细胞状态,获得细胞状态。其中历史细胞状态是指上一次对相同分别率的特征图进行特征增强处理时,计算得到的细胞状态。
以在t阶段,LSTM单元从特征图
Figure PCTCN2018110493-appb-000031
Figure PCTCN2018110493-appb-000032
生成增强特征图
Figure PCTCN2018110493-appb-000033
为例,门函数值和细胞状态的计算通过以下公式实现:
Figure PCTCN2018110493-appb-000034
其中,
Figure PCTCN2018110493-appb-000035
表示输入门i的门函数值,
Figure PCTCN2018110493-appb-000036
表示输入值c的门函数值,
Figure PCTCN2018110493-appb-000037
表示遗忘门f的门函数值,
Figure PCTCN2018110493-appb-000038
表示输出门o的门函数值,
Figure PCTCN2018110493-appb-000039
表示细胞状态,W表示对应门或者细胞状态的卷积核,b表示偏差,σ为预设系数。在本实施例中,
Figure PCTCN2018110493-appb-000040
表示历史细胞状态,
Figure PCTCN2018110493-appb-000041
又可以作为下一阶段上下文交织的历史细胞状态。
进一步地,根据门函数值和细胞状态,获得输入特征图中各感受野中心的增强特征的步骤,包括:根据输出门的门函数值和细胞状态,获得输入特征图中各感受野中心的增强特征。具体通过以下公式实现:
Figure PCTCN2018110493-appb-000042
其中,
Figure PCTCN2018110493-appb-000043
表示感受野中心(h,w)的增强特征。
在一实施例中,将增强特征与对应的输入特征图进行合并处理,获得与各输入特征图对应的增强特征图的步骤,包括:将输入特征图中各感受野中心的特征,分别与对应的增强特征进行合并处理,获得与各输入特征图对应的增强特征图。
具体地,分别将输入特征图中各感受野中心的特征,以及对应的增强特征相加,得到各感受野中心在增强特征图中的特征,从而得到增强特征图。具体通过以下公式实现:
Figure PCTCN2018110493-appb-000044
其中,
Figure PCTCN2018110493-appb-000045
表示感受野中心(h,w)在增强特征图
Figure PCTCN2018110493-appb-000046
中的特征。
通过沿着LSTM链,使得增强特征图包含具有更大感受野的特征,即具有更丰富的全局上下文。此外,LSTM的细胞状态还可以记忆在不同阶段交换的上下文信息,来自早期阶段的局部上下文可以容易地传播到最后阶段,将包括局部和全局信息的多尺度上下文信息编码到最终增强特征图中。
在一实施例中,分别将各增强特征图对中的两个最终增强特征图合并,获得与相邻特征图对相应的交织特征图,由各交织特征图组成交织特征图集的步骤,包括以下子步骤:分别将各增强特征图对中低分辨率的最终增强特征图进行上采样处理,获得上采样特征图,上采样特征图的分辨率与增强特征图对中高分辨率的最终增强特征图相同;将上采样特征图和高分辨率的最终增强特征图中的特征合并,获得与相邻特征图对相应的交织特征图,由各交织特征图组成交织特征图集。
具体地,利用特定的上采样卷积核对低分辨率的最终增强特征图进行上采样处理,获得上采样特征图,将上采样特征图和高分辨率的最终增强特征图中的对应特征相加,获得与相邻特征图对相应的交织特征图,由各交织特征图组成交织特征图集。其中,对于不同分辨率的上采样处理,所采用的上采样卷积核也不同。
如下式所示,对低分辨率的最终增强特征图进行上采样处理,再与高分辨率的最终增强特征图合并,通过以下公式实现:
Figure PCTCN2018110493-appb-000047
其中,Q l表示交织特征图,
Figure PCTCN2018110493-appb-000048
表示上采样卷积核。
在一实施例中,如图7所示,提供一种图像语义分割方法,具体包括以下步骤:
对原始图像进行超像素分割,获得包括预设数量的超像素的待处理图像。
S701,对待处理图像进行卷积处理,得到多尺度特征图集,将多尺度特征图集作为上下文交织处理的输入特征图集。
S702,将输入特征图集中的各相邻特征图对作为输入特征图对,将各待 交织特征图作为输入特征图。
S703,提取各相邻输入特征图的上下文信息。上下文信息包括:感受野中心所属的超像素的聚合特征以及相邻超像素的聚合特征。
S704,根据输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信息,获得门函数值。
S705,根据门函数值和历史细胞状态,获得细胞状态。
S706,根据门函数值和细胞状态,获得输入特征图中各感受野中心的增强特征。
S707,将输入特征图中各感受野中心的特征,分别与对应的增强特征进行合并处理,获得与各输入特征图对应的增强特征图。
S708,判断是否达到预设的对内交织总次数。若否,执行步骤S709;否则,执行步骤S710。
S709,将各增强特征图作为输入特征图,将同一输入特征图对相应的各增强特征图组合,作为输入特征图对,返回步骤S703。
S710,将最终获得的各增强特征图作为最终增强特征图,将同一相邻特征图对的各最终增强特征图组合,作为增强特征图对。
S711,分别将各增强特征图对中低分辨率的最终增强特征图进行上采样处理,获得上采样特征图,上采样特征图的分辨率与增强特征图对中高分辨率的最终增强特征图相同;
S712,将上采样特征图和高分辨率的最终增强特征图中的特征合并,获得与相邻特征图对相应的交织特征图,由各交织特征图组成交织特征图集。
S713,判断所获得的交织特征图集是否仅包括一个交织特征图。若否,执行步骤S714;否则,执行步骤S715。
S714,将交织特征图集作为上下文交织处理的输入特征图集,返回步骤S702。
S715,对交织特征图进行语义预测,获得与待处理图像对应的语义分割图像。
如图8所示,分别示出了采用现有的图像语义分割方法、图7所示方法对原始图像进行语义分割以及实际分割的效果对比图。其中,第一列为原始图像,第二列ground-truth分割效果图,第三列为采用ASPP(Atrous Spatial pyramid pooling,多孔空间金字塔池化)模型分割的效果图,第四列为采用Encoder-Decoder+ASPP模型(带有ASPP模型的编码器-解码器)分割的效果图,第五列为采用图7所示实施方法分割的效果图。
从图8可看出,本申请方法相比于现有方法,明显具有更准确的分割效果。这是因为,本申请以双向和递归的方式对各相邻特征图对进行上下文交织,并合并相邻特征图对,使各特征图的上下文信息沿着垂直和水平两个维度不断传递,编码至新生成的交织特征图,从而显著增强交织特征图中特征的描述能力,使得最终获得的交织特征图具有更好的分类特性,进而使利用最终获得的交织特征图进行语义预测时,能够得到更为精确的语义分割图像。
应该理解的是,虽然本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一实施例中,如图9所示,提供一种图像语义分割装置900,该装置包括:卷积模块902、上下文交织模块904、输入特征图集确定模块906和预测模块908。其中:
卷积模块902,用于对待处理图像进行卷积处理,得到多尺度特征图集,将多尺度特征图集作为上下文交织处理的输入特征图集。
上下文交织模块904,用于对输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集。
输入特征图集确定模块906,用于将交织特征图集作为上下文交织处理 的输入特征图集,直至所获得的交织特征图集仅包括一个交织特征图。
预测模块908,用于对交织特征图进行语义预测,获得与待处理图像对应的语义分割图像。
在一实施例中,上下文交织模块包括特征增强模块和特征图合并模块。其中:
特征增强模块,用于对输入特征图集中的各相邻特征图对分别进行特征增强处理,获得与各相邻特征图对相应的增强特征图对。
特征图合并模块,用于分别将各增强特征图对中的两个最终增强特征图合并,获得与相邻特征图对相应的交织特征图。
在一实施例中,特征增强模块还用于分别根据对应的相邻特征图,对输入特征图集中各相邻特征图对中的各待交织特征图进行特征增强处理,获得各待交织特征图对应的最终增强特征图;将同一相邻特征图对的各最终增强特征图组合,作为增强特征图对。其中,相邻特征图为:与当前特征增强处理的待交织特征图同属一个相邻特征图对的待交织特征图。
在一实施例中,特征增强模块包括:输入特征图确定模块、上下文提取模块和特征增强子模块。其中:
输入特征图确定模块,用于将输入特征图集中的各相邻特征图对作为输入特征图对,将各待交织特征图作为输入特征图。
上下文提取模块,用于提取各相邻输入特征图的上下文信息。
特征增强子模块,用于分别根据各相邻输入特征图的上下文信息,对对应的输入特征图进行特征增强处理,获得与各输入特征图对应的增强特征图。
进一步地,输入特征图确定模块,还用于将各增强特征图作为输入特征图,将同一输入特征图对相应的各增强特征图组合,作为输入特征图对,直至达到预设的对内交织总次数。
在一实施例中,特征增强子模块包括增强特征确定模块和特征合并模块。其中,增强特征确定模块,用于分别根据各相邻输入特征图的上下文信息,以及对应的输入特征图,获得输入特征图的增强特征;特征合并模块,用于 将增强特征与对应的输入特征图进行合并处理,获得与各输入特征图对应的增强特征图。
在一实施例中,增强特征确定模块还用于根据输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信息,获得输入特征图中各感受野中心的增强特征。
在一实施例中,增强特征确定模块包括参数确定模块和增强特征确定子模块。
其中,参数确定模块,用于根据输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信息,获得门函数值和细胞状态。
增强特征确定子模块,用于根据门函数值和细胞状态,获得输入特征图中各感受野中心的增强特征。
在一实施例中,参数确定模块包括门函数确定模块和细胞状态确定模块。
其中,门函数确定模块,用于根据输入特征图中各感受野中心的特征,以及相邻输入特征图中对应的感受野中心的上下文信息,获得门函数值。
细胞状态确定模块,用于根据门函数值和历史细胞状态,获得细胞状态。
在一实施例中,特征图合并模块,还用于将输入特征图中各感受野中心的特征,分别与对应的增强特征进行合并处理,获得与各输入特征图对应的增强特征图。
在一实施例中,特征图合并模块包括上采样模块和合并子模块。其中:
上采样模块,用于分别将各增强特征图对中低分辨率的最终增强特征图进行上采样处理,获得上采样特征图,上采样特征图的分辨率与增强特征图对中高分辨率的最终增强特征图相同。
合并子模块,用于将上采样特征图和高分辨率的最终增强特征图中的特征合并,获得与相邻特征图对相应的交织特征图,由各交织特征图组成交织特征图集。
在一实施例中,图像语义分割装置还包括超像素处理模块,用于对原始 图像进行超像素分割,获得包括预设数量的超像素的待处理图像。
上述图像语义分割装置,以双向和递归的方式对各相邻特征图对进行上下文交织,并合并相邻特征图对,使各特征图的上下文信息沿着垂直和水平两个维度不断传递,编码至新生成的交织特征图中,从而显著增强交织特征图中特征的描述性能力,使得最终获得的交织特征图具有更好的分类特性,进而使利用最终获得的交织特征图进行语义预测时,能够得到更为精确的语义分割图像。
关于图像语义分割装置的具体限定可以参见上文中对于图像语义分割方法的限定,在此不再赘述。上述图像语义分割装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏、输入装置和麦克风阵列。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种图像语义分割方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一实施例中,提供一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行以上任一项实施例中图像语义分割方法的步骤。
在一实施例中,提供一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以上任一项实施例中图像语义分割方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种图像语义分割方法,所述方法包括:
    对待处理图像进行卷积处理,得到多尺度特征图集,将所述多尺度特征图集作为上下文交织处理的输入特征图集;
    对所述输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集;
    将所述交织特征图集作为所述上下文交织处理的输入特征图集,返回对所述输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集的步骤,直至所获得的所述交织特征图集仅包括一个交织特征图;
    对所述交织特征图进行语义预测,获得与所述待处理图像对应的语义分割图像。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集,包括:
    对所述输入特征图集中的各相邻特征图对分别进行特征增强处理,获得与各所述相邻特征图对相应的增强特征图对;
    分别将各所述增强特征图对中的两个最终增强特征图合并,获得与所述相邻特征图对相应的交织特征图,由各所述交织特征图组成交织特征图集。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述输入特征图集中的各相邻特征图对分别进行特征增强处理,获得与各所述相邻特征图对应的增强特征图对,包括:
    分别根据对应的相邻特征图,对所述输入特征图集中各相邻特征图对中的各待交织特征图进行特征增强处理,获得各所述待交织特征图对应的最终增强特征图;所述相邻特征图为:与当前特征增强处理的待交织特征图同属一个相邻特征图对的待交织特征图;
    将同一相邻特征图对的各所述最终增强特征图组合,作为增强特征图对。
  4. 根据权利要求3所述的方法,其特征在于,所述分别根据对应的相邻特征图,对所述输入特征图集中各相邻特征图对中的各待交织特征图进行特 征增强处理,获得各所述待交织特征图对应的最终增强特征图,包括:
    将所述输入特征图集中的各相邻特征图对作为输入特征图对,将所述各所述待交织特征图作为输入特征图;
    提取各相邻输入特征图的上下文信息;
    分别根据各所述相邻输入特征图的所述上下文信息,对对应的所述输入特征图进行特征增强处理,获得与各所述输入特征图对应的增强特征图;
    将各所述增强特征图作为输入特征图,将同一输入特征图对相应的各所述增强特征图组合,作为输入特征图对,返回提取各相邻输入特征图的上下文信息的步骤,直至达到预设的对内交织总次数,将最终获得的各所述增强特征图作为最终增强特征图。
  5. 根据权利要求4所述的方法,其特征在于,所述分别根据各所述相邻输入特征图的所述上下文信息,对对应的所述输入特征图进行特征增强处理,获得与各所述输入特征图对应的增强特征图,包括:
    分别根据各所述相邻输入特征图的所述上下文信息,以及对应的所述输入特征图,获得所述输入特征图的增强特征;
    将所述增强特征与对应的所述输入特征图进行合并处理,获得与各所述输入特征图对应的增强特征图。
  6. 根据权利要求5所述的方法,其特征在于,所述分别根据各所述相邻输入特征图的所述上下文信息,以及对应的所述输入特征图,获得所述输入特征图的增强特征,包括:
    根据所述输入特征图中各感受野中心的特征,以及所述相邻输入特征图中对应的感受野中心的上下文信息,获得所述输入特征图中各感受野中心的增强特征。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述输入特征图中各感受野中心的特征,以及所述相邻输入特征图中对应的感受野中心的上下文信息,获得所述输入特征图中各感受野中心的增强特征,包括:
    根据所述输入特征图中各感受野中心的特征,以及所述相邻输入特征图 中对应的感受野中心的上下文信息,获得门函数值和细胞状态;
    根据所述门函数值和所述细胞状态,获得所述输入特征图中各感受野中心的增强特征。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述输入特征图中各感受野中心的特征,以及所述相邻输入特征图中对应的感受野中心的上下文信息,获得门函数值和细胞状态,包括:
    根据所述输入特征图中各感受野中心的特征,以及所述相邻输入特征图中对应的感受野中心的上下文信息,获得门函数值;
    根据所述门函数值和历史细胞状态,获得细胞状态。
  9. 根据权利要求6所述的方法,其特征在于,所述将所述增强特征与对应的所述输入特征图进行合并处理,获得与各所述输入特征图对应的增强特征图,包括:
    将所述输入特征图中各感受野中心的特征,分别与对应的所述增强特征进行合并处理,获得与各所述输入特征图对应的增强特征图。
  10. 根据权利要求6所述的方法,其特征在于,所述待处理图像为对原始图像进行超像素分割后的图像;所述上下文信息包括:所述感受野中心所属的超像素的聚合特征以及相邻超像素的聚合特征。
  11. 根据权利要求2所述的方法,其特征在于,所述分别将各所述增强特征图对中的两个最终增强特征图合并,获得与所述相邻特征图对相应的交织特征图,由各所述交织特征图组成交织特征图集,包括:
    分别将各所述增强特征图对中低分辨率的最终增强特征图进行上采样处理,获得上采样特征图,所述上采样特征图的分辨率与所述增强特征图对中高分辨率的最终增强特征图相同;
    将所述上采样特征图和所述高分辨率的最终增强特征图中的特征合并,获得与所述相邻特征图对相应的交织特征图,由各所述交织特征图组成交织特征图集。
  12. 根据权利要求1至11任一项所述的方法,其特征在于,所述方法还 包括:
    对原始图像进行超像素分割,获得包括预设数量的超像素的待处理图像。
  13. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:
    对待处理图像进行卷积处理,得到多尺度特征图集,将所述多尺度特征图集作为上下文交织处理的输入特征图集;
    对所述输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集;
    将所述交织特征图集作为所述上下文交织处理的输入特征图集,返回对所述输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集的步骤,直至所获得的所述交织特征图集仅包括一个交织特征图;
    对所述交织特征图进行语义预测,获得与所述待处理图像对应的语义分割图像。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:
    对所述输入特征图集中的各相邻特征图对分别进行特征增强处理,获得与各所述相邻特征图对相应的增强特征图对;
    分别将各所述增强特征图对中的两个最终增强特征图合并,获得与所述相邻特征图对相应的交织特征图,由各所述交织特征图组成交织特征图集。
  15. 根据权利要求14所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:
    分别根据对应的相邻特征图,对所述输入特征图集中各相邻特征图对中的各待交织特征图进行特征增强处理,获得各所述待交织特征图对应的最终增强特征图;所述相邻特征图为:与当前特征增强处理的待交织特征图同属一个相邻特征图对的待交织特征图;
    将同一相邻特征图对的各所述最终增强特征图组合,作为增强特征图对。
  16. 根据权利要求15所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:
    将所述输入特征图集中的各相邻特征图对作为输入特征图对,将所述各所述待交织特征图作为输入特征图;
    提取各相邻输入特征图的上下文信息;
    分别根据各所述相邻输入特征图的所述上下文信息,对对应的所述输入特征图进行特征增强处理,获得与各所述输入特征图对应的增强特征图;
    将各所述增强特征图作为输入特征图,将同一输入特征图对相应的各所述增强特征图组合,作为输入特征图对,返回提取各相邻输入特征图的上下文信息的步骤,直至达到预设的对内交织总次数,将最终获得的各所述增强特征图作为最终增强特征图。
  17. 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    对待处理图像进行卷积处理,得到多尺度特征图集,将所述多尺度特征图集作为上下文交织处理的输入特征图集;
    对所述输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集;
    将所述交织特征图集作为所述上下文交织处理的输入特征图集,返回对所述输入特征图集中的各相邻特征图对分别进行上下文交织处理,获得交织特征图集的步骤,直至所获得的所述交织特征图集仅包括一个交织特征图;
    对所述交织特征图进行语义预测,获得与所述待处理图像对应的语义分割图像。
  18. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    对所述输入特征图集中的各相邻特征图对分别进行特征增强处理,获得与各所述相邻特征图对相应的增强特征图对;
    分别将各所述增强特征图对中的两个最终增强特征图合并,获得与所述相邻特征图对相应的交织特征图,由各所述交织特征图组成交织特征图集。
  19. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    分别根据对应的相邻特征图,对所述输入特征图集中各相邻特征图对中的各待交织特征图进行特征增强处理,获得各所述待交织特征图对应的最终增强特征图;所述相邻特征图为:与当前特征增强处理的待交织特征图同属一个相邻特征图对的待交织特征图;
    将同一相邻特征图对的各所述最终增强特征图组合,作为增强特征图对。
  20. 根据权利要求19所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
    将所述输入特征图集中的各相邻特征图对作为输入特征图对,将所述各所述待交织特征图作为输入特征图;
    提取各相邻输入特征图的上下文信息;
    分别根据各所述相邻输入特征图的所述上下文信息,对对应的所述输入特征图进行特征增强处理,获得与各所述输入特征图对应的增强特征图;
    将各所述增强特征图作为输入特征图,将同一输入特征图对相应的各所述增强特征图组合,作为输入特征图对,返回提取各相邻输入特征图的上下文信息的步骤,直至达到预设的对内交织总次数,将最终获得的各所述增强特征图作为最终增强特征图。
PCT/CN2018/110493 2018-10-16 2018-10-16 图像语义分割方法、计算机设备和存储介质 WO2020077535A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/110493 WO2020077535A1 (zh) 2018-10-16 2018-10-16 图像语义分割方法、计算机设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/110493 WO2020077535A1 (zh) 2018-10-16 2018-10-16 图像语义分割方法、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2020077535A1 true WO2020077535A1 (zh) 2020-04-23

Family

ID=70283211

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/110493 WO2020077535A1 (zh) 2018-10-16 2018-10-16 图像语义分割方法、计算机设备和存储介质

Country Status (1)

Country Link
WO (1) WO2020077535A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507995A (zh) * 2020-04-30 2020-08-07 柳州智视科技有限公司 一种基于彩色图像金字塔和颜色通道分类的图像分割方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766794A (zh) * 2017-09-22 2018-03-06 天津大学 一种特征融合系数可学习的图像语义分割方法
CN108230329A (zh) * 2017-12-18 2018-06-29 孙颖 基于多尺度卷积神经网络的语义分割方法
CN108596330A (zh) * 2018-05-16 2018-09-28 中国人民解放军陆军工程大学 一种并行特征全卷积神经网络及其构建方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766794A (zh) * 2017-09-22 2018-03-06 天津大学 一种特征融合系数可学习的图像语义分割方法
CN108230329A (zh) * 2017-12-18 2018-06-29 孙颖 基于多尺度卷积神经网络的语义分割方法
CN108596330A (zh) * 2018-05-16 2018-09-28 中国人民解放军陆军工程大学 一种并行特征全卷积神经网络及其构建方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507995A (zh) * 2020-04-30 2020-08-07 柳州智视科技有限公司 一种基于彩色图像金字塔和颜色通道分类的图像分割方法
CN111507995B (zh) * 2020-04-30 2023-05-23 柳州智视科技有限公司 一种基于彩色图像金字塔和颜色通道分类的图像分割方法

Similar Documents

Publication Publication Date Title
CN110852285B (zh) 对象检测方法、装置、计算机设备和存储介质
WO2021093435A1 (zh) 语义分割网络结构的生成方法、装置、设备及存储介质
WO2022105125A1 (zh) 图像分割方法、装置、计算机设备及存储介质
CN112699937B (zh) 基于特征引导网络的图像分类与分割的装置、方法、设备及介质
US11983903B2 (en) Processing images using self-attention based neural networks
CN113343982B (zh) 多模态特征融合的实体关系提取方法、装置和设备
CN109543685A (zh) 图像语义分割方法、装置和计算机设备
CN114549913B (zh) 一种语义分割方法、装置、计算机设备和存储介质
CN109977832B (zh) 一种图像处理方法、装置及存储介质
EP3836083A1 (en) Disparity estimation system and method, electronic device and computer program product
CN115147606B (zh) 医学图像的分割方法、装置、计算机设备和存储介质
CN113869138A (zh) 多尺度目标检测方法、装置及计算机可读存储介质
CN112016502B (zh) 安全带检测方法、装置、计算机设备及存储介质
CN111192279B (zh) 基于边缘检测的物体分割方法、电子终端及存储介质
CN110807463B (zh) 图像分割方法、装置、计算机设备和存储介质
Dong et al. EGFNet: Edge-aware guidance fusion network for RGB–thermal urban scene parsing
CN112419342A (zh) 图像处理方法、装置、电子设备和计算机可读介质
CN114359361A (zh) 深度估计方法、装置、电子设备和计算机可读存储介质
CN111652245B (zh) 车辆轮廓检测方法、装置、计算机设备及存储介质
WO2020077535A1 (zh) 图像语义分割方法、计算机设备和存储介质
CN113284155A (zh) 视频目标分割方法、装置、存储介质及电子设备
CN112541902A (zh) 相似区域搜索方法、装置、电子设备及介质
CN111429388B (zh) 一种图像处理方法、装置和终端设备
CN114913339B (zh) 特征图提取模型的训练方法和装置
CN115713769A (zh) 文本检测模型的训练方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18937461

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 05.08.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18937461

Country of ref document: EP

Kind code of ref document: A1