WO2018188270A1 - 一种图像语义分割方法及装置 - Google Patents

一种图像语义分割方法及装置 Download PDF

Info

Publication number
WO2018188270A1
WO2018188270A1 PCT/CN2017/102031 CN2017102031W WO2018188270A1 WO 2018188270 A1 WO2018188270 A1 WO 2018188270A1 CN 2017102031 W CN2017102031 W CN 2017102031W WO 2018188270 A1 WO2018188270 A1 WO 2018188270A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic segmentation
segmentation result
neural network
sub
convolution neural
Prior art date
Application number
PCT/CN2017/102031
Other languages
English (en)
French (fr)
Inventor
戴恒晨
王乃岩
Original Assignee
北京图森未来科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京图森未来科技有限公司 filed Critical 北京图森未来科技有限公司
Publication of WO2018188270A1 publication Critical patent/WO2018188270A1/zh
Priority to US16/577,753 priority Critical patent/US11205271B2/en
Priority to US17/556,900 priority patent/US11875511B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/162Segmentation; Edge detection involving graph-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present invention relates to the field of computers, and in particular, to an image semantic segmentation method and an image semantic segmentation device.
  • semantic segmentation of images is required in various application scenarios (such as object recognition, object detection, etc.).
  • the purpose of image semantic segmentation is to classify each pixel in the image, that is, to classify each pixel.
  • graph models such as conditional random field models (ie, CRF), Markov random field models, and the like.
  • CRF is a probabilistic model based on undirected graphs, which is used to mark sequence data and has strong probabilistic reasoning ability. Assuming that each pixel i has a category label y i and an observation value x i , each pixel is used as a node, and the relationship between the pixel and the pixel is used as an edge to form a conditional random field as shown in FIG. The variable y i of i is estimated by the category label x i corresponding to the pixel i .
  • the one-dimensional potential function is derived from the output of the front-end FCN, and ⁇ p (x i , y i ) is a binary potential function, and the binary potential function is as follows:
  • the binary potential function is used to describe the relationship between pixels and pixels, which assigns the same category label to pixels with smaller differences, and assigns different category labels to pixels with larger differences. Evaluating the difference between two pixels is represented by "distance”, which is related to the color value of two pixels and the actual relative distance of two pixels.
  • the image can be segmented as much as possible at the boundary, so that the error result in the initial semantic segmentation result can be corrected to a certain extent to improve the accuracy of the semantic segmentation result.
  • the CRF needs to consider the correlation between two pixels, the calculation amount is large, so the post-processing method is slow and inefficient.
  • the present invention provides an image semantic segmentation method and apparatus to improve semantic segmentation efficiency and accuracy.
  • An embodiment of the present invention provides an image semantic segmentation method, where the method includes:
  • Semantic segmentation of the image to obtain an initial semantic segmentation result
  • the image information including the initial semantic segmentation result is input into the pre-trained convolutional neural network for semantic segmentation and post-processing, and the final semantic segmentation result is obtained.
  • an image semantic segmentation apparatus comprising:
  • a receiving unit configured to receive an image
  • a segmentation unit configured to perform semantic segmentation on the image to obtain an initial semantic segmentation result
  • the post-processing unit is configured to input the image information including the initial semantic segmentation result into the pre-trained convolutional neural network for semantic segmentation post-processing to obtain a final semantic segmentation result.
  • an image semantic segmentation apparatus comprising: a processor and at least one memory, wherein the memory stores at least one machine executable instruction, and the processor executes at least one instruction to: receive an image ;
  • Semantic segmentation of the image to obtain an initial semantic segmentation result
  • the image information including the initial semantic segmentation result is input into the pre-trained convolutional neural network for semantic segmentation and post-processing, and the final semantic segmentation result is obtained.
  • the image information including the initial semantic segmentation result is input into the convolutional neural network for semantic segmentation and post-processing, and the final language is obtained.
  • the result of splitting With the image segmentation scheme provided by the present invention, since the convolutional neural network is pre-trained, it can be quickly post-processed according to the image information including the initial semantic segmentation result, and it is not necessary to calculate each pixel in the image as in the prior art CRF mode. The correlation between the two is post-processing, which improves the post-processing speed and efficiency.
  • FIG. 1 is a schematic diagram of a conditional random field in the prior art
  • FIG. 2 is a flowchart of an image semantic segmentation method according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for performing semantic segmentation post-processing by a convolutional neural network according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a training convolutional neural network according to an embodiment of the present invention.
  • FIG. 5 is a second schematic diagram of a training convolutional neural network according to an embodiment of the present invention.
  • FIG. 6 is a second flowchart of a method for performing semantic segmentation post-processing by a convolutional neural network according to an embodiment of the present invention
  • FIG. 7 is a schematic diagram of semantic segmentation processing by a convolutional neural network according to an embodiment of the present invention.
  • FIG. 8 is a third flowchart of a method for performing semantic segmentation post-processing by a convolutional neural network according to an embodiment of the present invention.
  • FIG. 9 is a second schematic diagram of semantic segmentation processing by a convolutional neural network according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of processing a convolutional neural network after global information optimization according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a partial edge optimization post-processing convolutional neural network according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of an image semantic segmentation apparatus according to an embodiment of the present invention.
  • FIG. 13 is a schematic structural diagram of a post-processing unit according to an embodiment of the present invention.
  • FIG. 14 is another schematic structural diagram of an image semantic segmentation apparatus according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a method for image semantic segmentation according to an embodiment of the present invention, where the method includes:
  • Step 201 Receive an image.
  • Step 202 Perform semantic segmentation on the image to obtain an initial semantic segmentation result.
  • step 202 may perform semantic segmentation on the received image through a pre-trained neural network (such as a fully connected convolutional neural network), or may perform semantic segmentation on the received image through an image segmentation algorithm.
  • a pre-trained neural network such as a fully connected convolutional neural network
  • the initial semantic segmentation result may be a category tag to which each pixel included in the image belongs (subsequently indicated by a label).
  • the initial semantic segmentation result input to the convolutional neural network in the embodiment of the present invention may be a confidence map (ie, a Confidence Map) instead of a label representation of each pixel of the image.
  • a confidence map ie, a Confidence Map
  • the initial semantic segmentation result is that each pixel in the image belongs to the aforementioned n category labels. Probability value.
  • Step 203 Input image information including the initial semantic segmentation result into a pre-trained convolutional neural network for semantic segmentation post-processing to obtain a final semantic segmentation result.
  • the image information including the initial semantic segmentation result is input into the convolutional neural network for semantic segmentation and post-processing, and the final semantic segmentation result is obtained.
  • the convolutional neural network since the convolutional neural network is pre-trained, it can be quickly post-processed according to the image information including the initial semantic segmentation result, and it is not necessary to calculate each pixel in the image as in the prior art CRF mode.
  • the correlation between the two is post-processing, which improves the post-processing speed and efficiency.
  • the image information may only include an initial semantic segmentation result.
  • the image information includes an initial semantic segmentation result and at least one modality corresponding to the image describing the feature information of the image, and the modality may be Including one or more of the following: visible image modes (eg, RGB mode, HSV (Hue, Saturation, Value) mode), depth mode, CT (Computed Tomography) mode, infrared mode , millimeter wave mode and ultrasonic mode.
  • Pixels belonging to the same category tag in practice generally have the same feature information, so the accuracy of correcting the erroneous result in the semantic segmentation result is higher in combination with the modality of the image, and therefore, when the image information includes at least one In the modal mode, the scheme can further improve the accuracy of the semantic segmentation result.
  • the convolutional neural network only includes the first-order convolutional neural network, and the foregoing step 203 can be specifically implemented by the following steps A1 to A2:
  • Step A1 input image information including an initial semantic segmentation result to the first-order convolutional neural network, to obtain a modified semantic segmentation result;
  • Step A2 Obtain a final semantic segmentation result according to the modified semantic segmentation result.
  • the image information may include only an initial semantic segmentation result, or may include at least one modality including an initial semantic segmentation result and the image corresponding.
  • the modified semantic segmentation result is a semantic segmentation result obtained by correcting the erroneous result in the initial semantic segmentation result by the convolutional neural network. If the initial semantic segmentation result is a label of each pixel in the image, the modified semantic segmentation result is a label of each pixel of the image; if the initial semantic segmentation result is a Confidence Map, the modified semantic segmentation result is also a Confidence Map.
  • the foregoing step A2 is specifically implemented as follows: for each pixel of the image, the maximum probability value of the pixel belonging to each category label is determined according to the modified semantic segmentation result. Value, the category label with the highest probability value is used as the category label to which the pixel ultimately belongs.
  • the modified semantic segmentation result in the embodiment of the present invention is the label of each pixel of the image
  • the foregoing step A2 is specifically implemented as follows: the modified semantic segmentation result is used as the final semantic segmentation result.
  • the convolutional neural network only includes the first-order convolutional neural network.
  • the convolutional neural network performs multiple iterative optimizations until the optimization requirements are met.
  • the final semantic segmentation result is determined according to the modified semantic segmentation result obtained in the last iteration.
  • Step 301 Input image information including an initial semantic segmentation result into the convolutional neural network, to obtain a modified semantic segmentation result;
  • Step 302 it is determined whether the iteration condition is met, if yes, step 303 is performed, if not, step 304 is performed;
  • Step 303 The modified semantic segmentation result is used as the initial semantic segmentation result in the image information, and the foregoing step 301 is repeated, that is, the initial semantic segmentation result in step 301 is the modified semantic segmentation result obtained in step 301;
  • Step 304 Determine to stop the iteration, and obtain a final semantic segmentation result according to the modified semantic segmentation result.
  • the modified semantic segmentation result is a semantic segmentation result obtained by correcting the erroneous result in the initial semantic segmentation result by the convolutional neural network. If the initial semantic segmentation result is a label of each pixel in the image, the modified semantic segmentation result is a label of each pixel of the image; if the initial semantic segmentation result is a Confidence Map, the modified semantic segmentation result is also a Confidence Map.
  • the foregoing step 304 is specifically implemented to: for each pixel of the image, determine, according to the modified semantic segmentation result obtained by the last iteration of the convolutional neural network, the pixel belongs to each The maximum value of the probability value of the category tag, and the category tag with the highest probability value is used as the category tag to which the pixel ends.
  • the foregoing step 304 is specifically implemented as: a modified semantic segmentation result obtained by the last iteration of the convolutional neural network as a final semantic segmentation result.
  • the image information may only include an initial semantic segmentation result, and may also include an initial semantic segmentation result and at least one modality corresponding to the image.
  • the iterative condition may be that the iteration cumulative number reaches the preset number threshold, or the modified semantic segmentation result of the current output of the convolutional neural network satisfies the convergence condition of the previous output semantic segmentation result, and the present application Not strictly limited.
  • the foregoing step 302 determines whether the iteration condition is met, and can be implemented by, but not limited to, the following two methods:
  • Mode 1 Determine whether the iteration cumulative number reaches the preset number of times threshold. If yes, determine that the iteration condition is not met. Otherwise, determine that the iteration condition is satisfied; if the number of iterations is counted by the counter, add one time for each iteration.
  • Manner 2 Determine whether the convergence condition is satisfied according to the modified semantic segmentation result outputted by the convolutional neural network and the semantic segmentation result of the previous output, and if yes, determine that the iteration condition is not satisfied, if otherwise, it is determined that the iteration condition is satisfied.
  • the convolutional neural network in the foregoing first embodiment and the second embodiment can be obtained by training a large number of sample images in advance.
  • the category label to which each pixel in the sample image belongs is marked in advance, and the training process is as shown in FIG. 4 .
  • the category labels of each pixel in the sample image are labeled in advance, and each modal value corresponding to the sample image is determined, and training is performed. The process is shown in Figure 5.
  • the convolutional neural network is composed of at least two levels of sub-convolution neural networks, and the structures of the sub-convolution neural networks of different levels may be the same or different.
  • the structures of the sub-convolution neural networks of different levels are different.
  • the structure of each level of the sub-convolution neural network may The same may be different, and the modalities corresponding to the sub-convolution neural networks of the various levels may be the same or different.
  • the present application is not strictly limited, and those skilled in the art may flexibly set the sub-convolutions of the respective levels according to actual needs.
  • the neural network makes the optimization of the sub-convolution neural network at different levels different in order to optimize the initial semantic segmentation results. More preferably, when the sub-convolution neural network structure of each level is the same, the modal item parts corresponding to the sub-convolution neural networks of the respective levels are the same or completely different; when the sub-convolution neural network structures of different levels are different, the levels are different. Other modal items corresponding to the convolutional neural network are set to be identical, partially identical, or completely different.
  • step 203 can be specifically implemented by the following steps B1 to B2, where:
  • Step B1 According to the cascading order, sequentially sub-convolution neural network for each level, performing the following steps: inputting the initial semantic segmentation result into the sub-convolution neural network of the level, obtaining a modified semantic segmentation result, and correcting the semantic segmentation result As the initial semantic segmentation result of the next-level sub-convolution neural network.
  • the initial semantic segmentation result input to the first-level sub-convolution neural network is the initial semantic segmentation result obtained in the foregoing step 202;
  • the initial semantic segmentation result of the other-level sub-convolution neural network is The modified semantic segmentation result of the output of the previous sub-convolution neural network.
  • Step B2 Determine a final semantic segmentation result according to a modified semantic segmentation result output by the last-level sub-convolution neural network.
  • the foregoing step 203 can be specifically implemented by the following steps C1 to C2, where:
  • Step C1 According to the cascading sequence, sequentially sub-convolution neural network for each level, performing the following steps: inputting the initial semantic segmentation result, the modality corresponding to the sub-convolution neural network of the modality to the current level
  • the sub-convolution neural network obtains the modified semantic segmentation result, and the modified semantic segmentation result is taken as the initial semantic segmentation result of the next-level sub-convolution neural network.
  • the initial semantic segmentation result input to the first-level sub-convolution neural network is the initial semantic segmentation result obtained in the foregoing step 202;
  • the initial semantic segmentation result of the other-level sub-convolution neural network is The modified semantic segmentation result of the output of the previous sub-convolution neural network.
  • Step C2 Determine a final semantic segmentation result according to a modified semantic segmentation result output by the last-level sub-convolution neural network.
  • the foregoing steps C1 to C2 can be implemented in more detail through the method flow shown in FIG. 6.
  • the method process includes:
  • Step 601 Enter an initial semantic segmentation result, a modality corresponding to the sub-convolution neural network in the modal state, and input the modal state to the sub-convolution neural network, to obtain a modified semantic segmentation result;
  • Step 602 determining whether the sub-convolution neural network of the current level is the last-level sub-convolution neural network, if otherwise, performing step 603, if yes, executing step 604;
  • Step 603 the modified semantic segmentation result is used as the initial semantic segmentation result of the next-level sub-convolution neural network, and the next-level sub-convolution neural network is used as the current sub-convolution neural network, and step 601 is performed;
  • Step 604 Obtain a final semantic segmentation result according to the modified semantic segmentation result of the sub-convolution neural network of the level.
  • the modified semantic segmentation result is a semantic segmentation result obtained by correcting the erroneous result in the initial semantic segmentation result input to the self-convolution neural network by the sub-convolution neural network. If the initial semantic segmentation result is a label of each pixel in the image, the modified semantic segmentation result is a label of each pixel of the image; if the initial semantic segmentation result is a Confidence Map, the modified semantic segmentation result is also a Confidence Map.
  • step B2 and step C2 are specifically implemented as: for each pixel of the image, determined according to the modified semantic segmentation result outputted by the last-level sub-convolution neural network.
  • the pixel belongs to the maximum value of the probability value of each category label, and the category label with the largest probability value is used as the category label to which the pixel ultimately belongs.
  • step B2 and step C2 are specifically implemented as: the final semantic segmentation result of the last-level sub-convolution neural network is used as the final semantic segmentation result. .
  • the sub-convolution neural network at each level can be independently trained in advance.
  • the training mode of each sub-convolution neural network is as shown in FIG.
  • the training method for each sub-convolution neural network is as shown in FIG. 5, and is used to train the training samples of each sub-convolution neural network.
  • the modalities of the images correspond to the corresponding sub-convolution neural networks, respectively.
  • the convolutional neural network includes a first-level sub-convolution neural network and a second-level sub-convolution neural network
  • the first-stage sub-convolution neural network corresponds to a modal mode of a depth mode and an RGB mode
  • a second The modal of the sub-convolution neural network is RGB mode and CT mode.
  • the training data is the initial semantic segmentation result, RGB mode and depth mode of the sample image.
  • the training data of the second-level sub-convolution neural network is trained as the initial semantic segmentation result of the sample image, the RGB mode and the CT mode.
  • the modalities corresponding to the respective sub-convolution neural networks include the visible image modality.
  • the process of post-processing by the sub-convolution neural network including at least two stages can be as shown in FIG. 7 .
  • the convolutional neural network is composed of at least two sub-convolution neural networks, and the structures of the sub-convolution neural networks of different levels may be the same or different.
  • the structures of the sub-convolution neural networks of different levels are different.
  • the structure of each level of the sub-convolution neural network may The same may be different, and the modalities corresponding to the sub-convolution neural networks of the various levels may be the same or different.
  • the present application is not strictly limited, and those skilled in the art may flexibly set the sub-convolutions of the respective levels according to actual needs.
  • the neural network makes the optimization of the sub-convolution neural network at different levels different in order to optimize the initial semantic segmentation results. More preferably, when the sub-convolution neural network structure of each level is the same, the modal item parts corresponding to the sub-convolution neural networks of the respective levels are the same or completely different; when the sub-convolution neural network structures of different levels are different, the levels are different. Other modal items corresponding to the convolutional neural network are set to be identical, partially identical, or completely different.
  • the modified semantic segmentation result obtained by the last iteration of the level convolutional neural network is used as the next-level sub-volume.
  • the initial semantic segmentation result of the product neural network is outputted.
  • the number of iterations of the sub-convolution neural network of each level may be the same or different, and those skilled in the art may flexibly set according to actual needs, and the present application does not strictly limit.
  • step 203 can be specifically implemented by the following steps D1 to D2, where:
  • Step D1 According to the cascading order, sequentially sub-convolution neural network for each level, performing the following steps: inputting the initial semantic segmentation result into the sub-convolution neural network of the level, obtaining a modified semantic segmentation result; determining whether the iterative condition is satisfied; If not satisfied, it is determined that the iteration is stopped and the modified semantic segmentation result is used as the initial semantic segmentation result of the next-level sub-convolution neural network; if satisfied, the modified semantic segmentation result is used as the initial of the sub-convolution neural network of the current level. Semantic segmentation results, and repeating the steps of inputting the initial semantic segmentation results into the sub-convolution neural network of the present level;
  • Step D2 Determine a final semantic segmentation result according to a modified semantic segmentation result output by the last-level sub-convolution neural network.
  • step 203 can be specifically implemented by the following steps E1 to E2, wherein:
  • Step E1 sequentially, for each level of the sub-convolution neural network according to the cascading order, perform the following steps: input the initial semantic segmentation result, the modality corresponding to the sub-convolution neural network of the modality to the level
  • the sub-convolution neural network obtains the modified semantic segmentation result; determines whether the iterative condition is satisfied; if not, determines to stop the iteration and uses the modified semantic segmentation result as the initial semantic segmentation result of the next-level sub-convolution neural network; Then, the modified semantic segmentation result is used as the initial semantic segmentation result of the sub-convolution neural network of the level, and the modal state corresponding to the sub-convolution neural network of the first level is input to the initial semantic segmentation result.
  • the steps of the sub-convolution neural network of this level ;
  • Step E2 Determine a final semantic segmentation result according to a modified semantic segmentation result output by the last-level sub-convolution neural network.
  • the foregoing steps E1 to E2 can be implemented in more detail through the method flow shown in FIG. 8.
  • the method process includes:
  • Step 801 Input an initial semantic segmentation result, a modality corresponding to the sub-convolution neural network in the modality, and input the modal state corresponding to the sub-convolution neural network of the current level to the sub-convolution neural network, to obtain a modified semantic segmentation result;
  • Step 802 it is determined whether the iteration condition is met, if not, step 803 is performed, if yes, step 804 is performed;
  • the number of iterations of the sub-convolution neural network of the current level is counted by a counter, and each time iteration is incremented by one; when the iteration of the sub-convolution neural network of the level is completed, the counter is cleared.
  • Step 803 determining whether the sub-convolution neural network of the current level is the last-level sub-convolution neural network, if yes, executing step 806, if otherwise, performing step 805;
  • Step 804 the modified semantic segmentation result is used as the initial semantic segmentation result of the sub-convolution neural network of the level, and repeats the foregoing step 801;
  • Step 805 Determine to stop the iteration and use the modified semantic segmentation result as the initial semantic segmentation result of the next-level sub-convolution neural network, and use the next-level sub-convolution neural network as the current sub-convolution neural network, and execute Step 801;
  • Step 806 Obtain a final semantic segmentation result according to the modified semantic segmentation result of the sub-convolution neural network of the level.
  • the modified semantic segmentation result is a semantic segmentation result obtained by correcting the erroneous result in the initial semantic segmentation result input to the sub-convolution neural network by the sub-convolution neural network. If the initial semantic segmentation result is a label of each pixel in the image, the modified semantic segmentation result is a label of each pixel of the image; if the initial semantic segmentation result is a Confidence Map, the modified semantic segmentation result is also a Confidence Map.
  • step D2 and step E2 are specifically implemented as: for each pixel of the image, the modified semantics obtained according to the last iteration of the last-level sub-convolution neural network The segmentation result determines the maximum value of the probability value of each pixel belonging to each category tag, and the category tag with the largest probability value is used as the category tag to which the pixel ultimately belongs.
  • step D2 and step E2 are specifically implemented as: the final result of the modified semantic segmentation obtained by the last iteration of the last-level sub-convolution neural network. Semantic segmentation results.
  • the iterative condition may be that the iteration cumulative number reaches the preset number of times threshold, or that the modified semantic segmentation result of the current sub-convolution neural network output and the previous output semantic segmentation result satisfy the convergence condition.
  • This application is not strictly limited.
  • step D1 and step E1 it is determined whether the iteration condition is satisfied, and the method may be implemented by, but not limited to, the following two methods:
  • Method 1 determining whether the iteration cumulative number reaches the preset number of times threshold, if yes, determining that the iteration condition is not satisfied, if otherwise determining that the iteration condition is satisfied; if the number of iterations is counted by the counter, the iteration is incremented once per iteration, and the counter is in the present The end-of-the-counter convolutional neural network end iteration is cleared;
  • Mode 2 According to the modified semantic segmentation result of the current sub-convolution neural network output and the semantic segmentation result of the previous output, it is determined whether the convergence condition is satisfied, and if so, it is determined that the iteration condition is not satisfied, and if it is determined that the iteration condition is satisfied.
  • the sub-convolution neural networks at each level are independently trained in advance.
  • the training mode of each sub-convolution neural network is as shown in FIG. Packaged in image information
  • the initial semantic segmentation result and at least one modality are used.
  • the training method of each sub-convolution neural network refer to the method shown in FIG.
  • the corresponding sub-convolution neural network corresponds to, for example, the convolutional neural network includes a first-level sub-convolution neural network and a second-level sub-convolution neural network, and the modal of the first-level sub-convolution neural network is a depth mode State and RGB mode, the modalities corresponding to the second-level sub-convolution neural network are RGB mode and CT mode.
  • the training data is the initial semantic segmentation result of the sample image.
  • training the training data of the second-level sub-convolution neural network is the initial semantic segmentation result of the sample image, RGB mode and CT mode.
  • the modalities corresponding to the respective sub-convolution neural networks include the visible image modality.
  • the process of post-processing by the sub-convolution neural network including at least two stages can be as shown in FIG. 9.
  • the convolutional neural network is composed of a two-stage sub-convolution neural network, wherein the first-level sub-convolution neural network is a global information optimization post-processing convolutional neural network,
  • the secondary sub-convolution neural network is a local edge optimized post-processing convolutional neural network.
  • the structure of the global information optimization processing convolutional neural network may be as shown in FIG. 10, and global information is obtained by fast next sampling, and then the error result is corrected by upsampling combined with global information and low-level information.
  • the structure of the local edge optimization post-processing convolutional neural network can be as shown in FIG.
  • an embodiment of the present invention provides an image semantic segmentation device.
  • the structure of the device is as shown in FIG. 12, and includes:
  • the receiving unit 11 is configured to receive an image
  • a dividing unit 12 configured to perform semantic segmentation on the image to obtain an initial semantic segmentation result
  • the post-processing unit 13 is configured to input the image information including the initial semantic segmentation result into the pre-trained convolutional neural network for semantic segmentation post-processing to obtain a final semantic segmentation result.
  • the image information may only include an initial semantic segmentation result, and may also include an initial semantic segmentation result and at least one modality corresponding to the image describing feature information of the image.
  • the schematic diagram of the structure of the post-processing unit 13 is as shown in FIG. 13 , and specifically includes:
  • a correction sub-unit 131 configured to input image information into the convolutional neural network to obtain a modified semantic segmentation result
  • the determining sub-unit 132 is configured to determine whether the iterative condition is satisfied, if yes, the first processing sub-unit 133 is triggered, and if not, the second processing sub-unit 134 is triggered;
  • the first processing sub-unit 133 is configured to use the modified semantic segmentation result as an initial semantic segmentation result, and trigger the correction sub-unit 131;
  • the second processing sub-unit 134 is configured to determine to stop the iteration, and obtain a final semantic segmentation result according to the modified semantic segmentation result.
  • the determining subunit 132 is specifically configured to:
  • the segmentation result determines whether the convergence condition is satisfied, and if so, determines that the iteration condition is not satisfied, if otherwise it is determined that the iteration condition is satisfied.
  • the convolutional neural network is comprised of at least two levels of sub-convolution neural networks
  • the structure of the post-processing unit 93 can include a third processing sub-unit and a fourth processing sub-unit, wherein:
  • a third processing sub-unit configured to sequentially perform a sub-convolution neural network for each level according to a cascading order, and perform the following steps: inputting the initial semantic segmentation result into the sub-convolution neural network of the level, and obtaining a modified semantic segmentation result, The modified semantic segmentation result is used as the initial semantic segmentation result of the next-level sub-convolution neural network;
  • the fourth processing sub-unit is configured to determine a final semantic segmentation result according to the modified semantic segmentation result output by the last-level sub-convolution neural network.
  • the convolutional neural network is comprised of at least two levels of sub-convolution neural networks
  • the structure of the post-processing unit 93 can include a fifth processing sub-unit and a sixth processing sub-unit, wherein:
  • a fifth processing sub-unit configured to sequentially perform a convolutional neural network for each level according to a cascading order, and perform the following steps: inputting the initial semantic segmentation result into the sub-convolution neural network of the level, and obtaining a modified semantic segmentation result; Determining whether the iterative condition is satisfied; if not, determining to stop the iteration and using the modified semantic segmentation result as the initial semantic segmentation result of the next-level sub-convolution neural network; if satisfied, the modified semantic segmentation result is taken as the present level The initial semantic segmentation result of the convolutional neural network, and repeating the steps of inputting the initial semantic segmentation result into the current sub-convolution neural network;
  • the sixth processing sub-unit is configured to determine a final semantic segmentation result according to the modified semantic segmentation result output by the last-level sub-convolution neural network.
  • the convolutional neural network is composed of at least two levels of sub-convolution neural networks
  • the post-processing unit 93 specifically includes a seventh processing sub-unit and an eighth processing sub-unit, wherein:
  • a seventh processing sub-unit configured to sequentially perform a convolutional neural network for each level according to a cascading order, and perform the following steps: initial semantic segmentation result, a mode corresponding to the sub-convolution neural network of the modality
  • the state is input to the sub-convolution neural network of this level, and the modified semantic segmentation result is obtained.
  • the modified semantic segmentation result is used as the initial stage of the next-level sub-convolution neural network.
  • the eighth processing sub-unit is configured to determine a final semantic segmentation result according to the modified semantic segmentation result output by the last-level sub-convolution neural network.
  • the convolutional neural network is composed of at least two levels of sub-convolution neural networks
  • the post-processing unit 93 specifically includes a ninth processing sub-unit and a tenth processing sub-unit, wherein:
  • a ninth processing sub-unit configured to sequentially perform a convolutional neural network for each level according to a cascading order, and perform the following steps: initial semantic segmentation result, a mode corresponding to the sub-convolution neural network of the modality
  • the state is input into the sub-convolution neural network of the level, and the modified semantic segmentation result is obtained; whether the iterative condition is satisfied is determined; if not, it is determined that the iteration is stopped and the modified semantic segmentation result is used as the initial stage of the next-level sub-convolution neural network.
  • Semantic segmentation result if satisfied, the modified semantic segmentation result is used as the initial semantic segmentation result of the sub-convolution neural network of the level, and repeats the foregoing initial semantic segmentation result, the modal and the sub-convolution neural network The step of inputting the corresponding modality into the sub-convolution neural network of the current level;
  • the tenth processing sub-unit is configured to determine a final semantic segmentation result according to the modified semantic segmentation result output by the last-level sub-convolution neural network.
  • the convolutional neural network is composed of a two-stage sub-convolution neural network, the first-level sub-convolution neural network is a global information optimization post-processing convolutional neural network, and the second-level sub-convolution neural network is a local edge optimization.
  • Post-processing convolutional neural networks are composed of a two-stage sub-convolution neural network, the first-level sub-convolution neural network is a global information optimization post-processing convolutional neural network, and the second-level sub-convolution neural network is a local edge optimization.
  • the initial semantic segmentation result is a confidence map (ie, a Confidence Map), or the initial semantic segmentation result is a label to which each pixel in the image belongs.
  • the types of the other modes include one or more of the following: visible image modes (eg, RGB mode, HSV mode), depth mode, computed tomography CT mode, infrared mode, millimeters Wave mode and ultrasonic mode.
  • the convolutional neural network since the convolutional neural network is pre-trained, it can be quickly post-processed according to the image information including the initial semantic segmentation result, without the need to calculate the image in the CRF mode like the prior art.
  • the correlation between the individual pixels is post-processed, which improves the post-processing speed and efficiency;
  • the data input to the convolutional neural network includes not only the initial semantic segmentation result but also the feature information representing each pixel in the image.
  • At least one modality such as depth mode, RGB mode, etc.
  • pixels belonging to the same category label generally have the same feature information, so the modality of the image is combined to correct the erroneous result in the semantic segmentation result.
  • the accuracy is higher. Therefore, when at least one modality is included in the image information, the scheme can further improve the accuracy of the semantic segmentation result.
  • an embodiment of the present invention provides an image semantic segmentation device.
  • the device is structured as shown in FIG. 14 and includes: a processor 1401 and at least one memory 1402, the at least one memory 1402. Storing at least one machine executable instruction, the processor 1401 executing the at least one instruction to: receive an image; perform semantic segmentation on the image to obtain an initial semantic segmentation result; input image information including an initial semantic segmentation result
  • the semantic segmentation process is performed in the pre-trained convolutional neural network to obtain the final semantic segmentation result.
  • the image information further includes at least one modality corresponding to the image that describes feature information of the image.
  • the processor 1401 executes the at least one instruction to input image information including an initial semantic segmentation result into a pre-trained convolutional neural network for semantic segmentation post-processing to obtain a final semantic segmentation result.
  • the image information is input into the convolutional neural network to obtain a modified semantic segmentation result; whether the iteration condition is satisfied; if yes, the modified semantic segmentation result is used as an initial semantic segmentation result in the image information.
  • the processor 1401 executes the at least one instruction to input image information including an initial semantic segmentation result into a pre-trained convolutional neural network for semantic segmentation post-processing to obtain a final semantic segmentation result.
  • the image information is input into the convolutional neural network to obtain a modified semantic segmentation result; whether the iteration condition is satisfied; if yes, the modified semantic segmentation result is used as an initial semantic segmentation result in the image information.
  • repeating the foregoing steps of inputting image information into the convolutional neural network if not, determining to stop the iteration,
  • the convolutional neural network is composed of at least two levels of sub-convolution neural networks; then, the processor 1401 executes the at least one instruction to implement inputting image information including initial semantic segmentation results into pre-training
  • the final semantic segmentation result is obtained, which comprises: sequentially, for each level of the sub-convolution neural network according to the cascading order, performing the following steps: inputting the initial semantic segmentation result into the present
  • the sub-convolution neural network obtains the modified semantic segmentation result, and the modified semantic segmentation result is used as the initial semantic segmentation result of the next-level sub-convolution neural network; according to the modified semantic segmentation result output by the last-level sub-convolution neural network, Determine the final semantic segmentation result.
  • the convolutional neural network is composed of at least two levels of sub-convolution neural networks; then, the processor 1401 executes the at least one instruction to implement inputting image information including initial semantic segmentation results into pre-training
  • the final semantic segmentation result is obtained, which comprises: sequentially, for each level of the sub-convolution neural network according to the cascading order, performing the following steps: inputting the initial semantic segmentation result into the present In the sub-convolution neural network, the modified semantic segmentation result is obtained; whether the iterative condition is satisfied is determined; if not, it is determined that the iteration is stopped and the modified semantic segmentation result is used as the initial semantic segmentation result of the next-level sub-convolution neural network; If yes, the modified semantic segmentation result is used as the initial semantic segmentation result of the sub-convolution neural network of the level, and repeats the step of inputting the initial semantic segmentation result into the sub-convolution neural network of the level; according to the last level
  • the convolutional neural network is composed of at least two levels of sub-convolution neural networks; then, the processor 1401 executes the at least one instruction to implement inputting image information including initial semantic segmentation results into pre-training
  • the semantic segmentation process is performed in the obtained convolutional neural network to obtain the final semantic segmentation result, which comprises: sequentially, for each level of the sub-convolution neural network according to the cascading order, performing the following steps: initial semantic segmentation result, said The modality corresponding to the sub-convolution neural network in the modal state is input to the sub-convolution neural network of the sublevel, and the modified semantic segmentation result is obtained.
  • the modified semantic segmentation result is used as the initial semantic segmentation of the next-level sub-convolution neural network.
  • Results; the final semantic segmentation result is determined according to the modified semantic segmentation result of the last-level sub-convolution neural network output.
  • the convolutional neural network is composed of at least two levels of sub-convolution neural networks; then, the processor 1401 executes the at least one instruction to implement inputting image information including initial semantic segmentation results into pre-training
  • the semantic segmentation process is performed in the obtained convolutional neural network to obtain the final semantic segmentation result, which comprises: sequentially, for each level of the sub-convolution neural network according to the cascading order, performing the following steps: initial semantic segmentation result, said The modal input corresponding to the sub-convolution neural network of the modality is input into the sub-convolution neural network of the sublevel, and the modified semantic segmentation result is obtained; whether the iterative condition is satisfied is determined; if not, the iteration is determined to be stopped and the correction is performed.
  • the semantic segmentation result is used as the initial semantic segmentation result of the next-level sub-convolution neural network; if it is satisfied, the modified semantic segmentation result is taken as the initial semantic segmentation result of the sub-convolution neural network of the level, and the initial semantic segmentation result is repeated.
  • a modal input corresponding to the sub-convolution neural network of the modality in the modal state is input to the sub-convolution neural network of the current level.
  • the convolutional neural network is composed of a two-stage sub-convolution neural network
  • the first-level sub-convolution neural network is a global information optimized post-processing convolutional neural network
  • the second-level sub-convolution neural network is The local edge optimization is used to process the convolutional neural network.
  • the processor 1401 performs the at least one instruction to determine whether the iteration condition is met, and specifically includes: determining whether the iteration cumulative number reaches a preset number of times threshold, and if yes, determining that the iteration condition is not satisfied, if otherwise determining The iterative condition is satisfied; or, according to the modified semantic segmentation result outputted by the sub-convolution neural network of the present level and the semantic segmentation result of the previous output, whether the convergence condition is satisfied, if it is determined that the iterative condition is not satisfied, if it is determined that the iterative condition is satisfied .
  • the initial semantic segmentation result is a confidence map, or the initial semantic segmentation result is a category tag to which each pixel in the image belongs.
  • the modal categories corresponding to the image include one or more of the following: visible image mode, depth mode, computed tomography CT mode, infrared mode, millimeter wave mode, and Ultrasonic mode.
  • an embodiment of the present invention further provides a storage medium (which may be a non-volatile machine readable storage medium), where the computer program stores a computer program for image semantic segmentation, the computer
  • the program has a code segment configured to perform the steps of: receiving an image; semantically segmenting the image to obtain an initial semantic segmentation result; inputting image information including the initial semantic segmentation result to a pre-trained convolution
  • the semantic segmentation process is performed in the neural network to obtain the final semantic segmentation result.
  • an embodiment of the present invention further provides a computer program, the computer program having a code segment configured to perform image semantic segmentation: receiving an image; performing semantic segmentation on the image to obtain initial semantics The segmentation result is input; the image information including the initial semantic segmentation result is input into the pre-trained convolutional neural network for semantic segmentation post-processing, and the final semantic segmentation result is obtained.
  • the image information including the initial semantic segmentation result is input into the convolutional neural network for semantic segmentation and post-processing.
  • the result of the final semantic segmentation is input into the convolutional neural network for semantic segmentation and post-processing.
  • the convolutional neural network since the convolutional neural network is pre-trained, it can be quickly post-processed according to the image information including the initial semantic segmentation result, and it is not necessary to calculate each pixel in the image as in the prior art CRF mode.
  • the correlation between the two is post-processing, which improves the post-processing speed and efficiency.
  • each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the present invention can take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage and optical storage, etc.) in which computer usable program code is embodied. formula.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种图像语义分割方法及装置,以解决现有技术中图像语义分割速度慢、效率低的问题。该方法包括:接收图像;对所述图像进行语义分割,得到初始语义分割结果;将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。本发明技术方案,通过卷积神经网络对初始语义分割结果进行后处理,能够提高图像语义分割的速度和效率。

Description

一种图像语义分割方法及装置
本申请要求在2017年4月14日提交中国专利局、申请号为201710247372.8、发明名称为“一种图像语义分割方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机领域,特别涉及一种图像语义分割方法和一种图像语义分割装置。
背景技术
目前,在各种应用场景(例如物体识别、物体检测等)中需要对图像进行语义分割,图像语义分割的目的是对图像中的每个像素进行分类,即为每个像素打上类别标签。
由于像素之间的相关性较小,目前基于传统深度学习的图像语义分割方法对图像进行语义分割后得到的初始语义分割结果仍然存在不准确的问题,因此需要对初始语义分割结果做进一步的后处理,通过后处理将初始语义分割结果中的错误结果进行纠正。
目前,使用较为广泛的后处理方式为图模型,例如条件随机场模型(即CRF)、马尔可夫随机场模型等。
CRF是一种基于无向图的概率模型,用来对序列数据进行标记,具有很强的概率推理能力。假设每个像素i具有类别标签yi和观测值xi,将每个像素作为节点、像素与像素之间的关系作为边即可构成了如图1所示的一个条件随机场,通过观测像素i的变量yi推测该像素i对应的类别标签xi
条件随机场符合吉布斯分布:
Figure PCTCN2017102031-appb-000001
其中x为前述观测值,E(x|I)为能量函数。为简便,将该能量函数省略全局观测I即可得到:
Figure PCTCN2017102031-appb-000002
其中,
Figure PCTCN2017102031-appb-000003
为一元势函数,该一元势函数来自前端FCN的输出,ψp(xi,yi)为二元势函数,该二元势函数具体如下:
Figure PCTCN2017102031-appb-000004
二元势函数用于描述像素与像素之间的关系,其将差异较小的像素分配相同的类别标签,差异较大的像素分配不同类别标签。评估两个像素之间的差异性通过“距离”表示,该“距离”与两个像素的颜色值和两个像素的实际相对距离有关。
通过CRF即可使得图像尽量在边界处分割,从而在一定程度上能够对初始语义分割结果中的错误结果进行纠正,以提高语义分割结果准确性。但是由于CRF需要考虑两两像素之间的相关性,计算量大,因此该种后处理方式速度慢、效率较低。
发明内容
鉴于上述问题,本发明提供一种图像语义分割方法及装置,以提高语义分割效率和准确性。
本发明实施例,一方面提供一种图像语义分割方法,该方法包括:
接收图像;
对所述图像进行语义分割,得到初始语义分割结果;
将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
本发明实施例另一方面,提供一种图像语义分割装置,该装置包括:
接收单元,用于接收图像;
分割单元,用于对所述图像进行语义分割,得到初始语义分割结果;
后处理单元,用于将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
本发明实施例另一方面,提供一种图像语义分割装置,该装置包括:一个处理器和至少一个存储器,存储器中存储有至少一条机器可执行指令,处理器执行至少一条指令以实现:接收图像;
对所述图像进行语义分割,得到初始语义分割结果;
将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
本发明技术方案,在对接收到的图像进行语义分割得到初始语义分割结果之后,将包含初始语义分割结果的图像信息输入至卷积神经网络中进行语义分割后处理,得到最终语 义分割结果。采用本发明提供的图像分割方案,由于卷积神经网络是预先训练得到,能够快速的根据包含初始语义分割结果的图像信息进行后处理,无需像现有技术的CRF方式需要计算图像中的各个像素间的相关性来进行后处理,提高了后处理速度和效率。本发明的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。
下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。
附图说明
附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。显而易见地,下面描述中的附图仅仅是本发明一些实施例,对于本领域普通技术人员而言,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1为现有技术中条件随机场的示意图;
图2为本发明实施例中图像语义分割方法的流程图;
图3为本发明实施例中通过卷积神经网络进行语义分割后处理的方法流程图之一;
图4为本发明实施例中训练卷积神经网络的示意图之一;
图5为本发明实施例中训练卷积神经网络的示意图之二;
图6为本发明实施例中通过卷积神经网络进行语义分割后处理的方法流程图之二;
图7为本发明实施例中通过卷积神经网络进行语义分割后处理的示意图之一;
图8为本发明实施例中通过卷积神经网络进行语义分割后处理的方法流程图之三;
图9为本发明实施例中通过卷积神经网络进行语义分割后处理的示意图之二;
图10为本发明实施例中全局信息优化后处理卷积神经网络的结构示意图;
图11为本发明实施例中局部边缘优化后处理卷积神经网络的结构示意图;
图12为本发明实施例提供的图像语义分割装置的结构示意图;
图13为本发明实施例中后处理单元的结构示意图;
图14为本发明实施例提供的图像语义分割装置的另一个结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明中的技术方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通 技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
以上是本发明的核心思想,为了使本技术领域的人员更好地理解本发明实施例中的技术方案,并使本发明实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本发明实施例中技术方案作进一步详细的说明。
参见图2,为本发明实施例提供的一种图像语义分割方法的方法流程图,该方法包括:
步骤201、接收图像。
步骤202、对所述图像进行语义分割,得到初始语义分割结果。
本发明实施例中,步骤202既可以通过预先训练好的神经网络(如全连接卷积神经网络)对接收到的图像进行语义分割,也可以通过图像分割算法对接收到的图像进行语义分割,本申请不做严格限定。
本发明实施例中,所述初始语义分割结果可以为所述图像包含的各像素所属的类别标签(后续用label表示)。
优选地,为降低信息失真率,保持信息的完整性,本发明实施例中输入给卷积神经网络的初始语义分割结果可以为置信图(即Confidence Map)而不是图像的各像素的label表示。例如,预先设定n个类别标签(如自行车、小车、三轮车、行人、路面、栅栏、路灯、树、交通灯等),则初始语义分割结果为图像中每个像素分别属于前述n个类别标签的概率值。
步骤203、将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
本发明技术方案,在对接收到的图像进行语义分割得到初始语义分割结果之后,将包含初始语义分割结果的图像信息输入至卷积神经网络中进行语义分割后处理,得到最终语义分割结果。采用本发明提供的图像分割方案,由于卷积神经网络是预先训练得到,能够快速的根据包含初始语义分割结果的图像信息进行后处理,无需像现有技术的CRF方式需要计算图像中的各个像素间的相关性来进行后处理,提高了后处理速度和效率。
本发明实施例一中,所述图像信息可以仅包括初始语义分割结果。优选地,为进一步提高卷积神经网络进行后处理的准确性,该图像信息包括初始语义分割结果和所述图像对应的描述所述图像的特征信息的至少一种模态,模态的种类可以包括以下一种或多种:可见图像模态(例如RGB模态、HSV(Hue,Saturation,Value)模态)、深度模态、CT(Computed Tomography,电子计算机断层扫描)模态、红外模态、毫米波模态和超声波模态。
在实际中属于同一类别标签的像素一般具有相同的特征信息,因此结合图像的模态来对语义分割结果中的错误结果进行纠正的准确性更高,因此,当图像信息中包含至少一种 模态时,本方案还可进一步提高语义分割结果的准确性。
为进一步对本发明技术方案进行更为详细的描述,下面以几个具体实例进行详细的描述。
实施例一
在实施例一中,卷积神经网络仅包含一级卷积神经网络,则前述步骤203,具体可通过以下步骤A1~步骤A2实现:
步骤A1、将包含初始语义分割结果的图像信息输入至该一级卷积神经网络,得到修正语义分割结果;
步骤A2、根据所述修正语义分割结果得到最终语义分割结果。
本发明实施例一中,所述图像信息可以仅包含初始语义分割结果,也可以是包含初始语义分割结果和所述图像对应的至少一种模态。
本发明实施例一中,修正语义分割结果为通过卷积神经网络对初始语义分割结果中错误的结果进行纠正后得到的语义分割结果。若初始语义分割结果为图像中各像素的label,则该修正语义分割结果为所述图像的各像素的label;若初始语义分割结果为Confidence Map,则修正语义分割结果也为Confidence Map。
若本发明实施例中的修正语义分割结果也为置信图,前述步骤A2具体实现如下:针对图像的每一个像素,根据所述修正语义分割结果确定出该像素属于各类别标签的概率值的最大值,将概率值最大的类别标签作为该像素最终所属的类别标签。
若本发明实施例中的修正语义分割结果为图像的各像素的label,则前述步骤A2具体实现如下:将修正语义分割结果作为最终的语义分割结果。
实施例二
实施例二中,卷积神经网络仅包含一级卷积神经网络,为进一步提高卷积神经网络进行后处理的准确性,在该卷积神经网络进行多次的迭代优化,直到满足优化需求之后,根据最后一次迭代得到的修正语义分割结果确定最终的语义分割结果。前述步骤203的具体实现方式如图3所示的流程,该流程包括:
步骤301、将包含初始语义分割结果的图像信息输入至所述卷积神经网络中,得到修正语义分割结果;
步骤302、判断是否满足迭代条件,若满足则执行步骤303,若不满足则执行步骤304;
步骤303、将所述修正语义分割结果作为所述图像信息中的初始语义分割结果,重复前述步骤301,即此时步骤301中的初始语义分割结果为步骤301得到的修正语义分割结果;
步骤304、确定停止迭代,并根据所述修正语义分割结果得到最终的语义分割结果。
本发明实施例二中,修正语义分割结果为通过卷积神经网络对初始语义分割结果中错误的结果进行纠正后得到的语义分割结果。若初始语义分割结果为图像中各像素的label,则该修正语义分割结果为所述图像的各像素的label;若初始语义分割结果为Confidence Map,则修正语义分割结果也为Confidence Map。若本发明实施例中的修正语义分割结果也为置信图,前述步骤304具体实现为:针对图像的每一个像素,根据卷积神经网络最后一次迭代得到的修正语义分割结果确定出该像素属于各类别标签的概率值的最大值,将概率值最大的类别标签作为该像素最终所属的类别标签。
若本发明实施例中的修正语义分割结果为图像的各像素的label,则前述步骤304具体实现为:将卷积神经网络最后一次迭代得到的修正语义分割结果作为最终的语义分割结果。
本发明实施例一中,所述图像信息可以仅包含初始语义分割结果,也可以包含初始语义分割结果和所述图像对应的至少一种模态。
该实施例二中,迭代条件可以是迭代累积次数达到预置的次数阈值,也可以是卷积神经网络本次输出的修正语义分割结果与前一次输出的语义分割结果满足收敛条件,本申请并不做严格限定。前述步骤302中判断是否满足迭代条件,可通过但不仅限于以下两种方式实现:
方式1、判断迭代累积次数是否达到预置的次数阈值,若是则确定不满足迭代条件,若否则确定满足迭代条件;如通过计数器对迭代次数进行计数,每迭代一次累加1次。
方式2、根据所述卷积神经网络本次输出的修正语义分割结果与前一次输出的语义分割结果确定是否满足收敛条件,若是则确定不满足迭代条件,若否则确定满足迭代条件。
前述实施例一和实施例二中的卷积神经网络,可以预先通过大量的样本图像训练得到。以图像信息中仅包含初始语义分割结果为例,则预先对样本图像中的各个像素所属的类别标签进行标注,训练过程如图4所示。以图像信息中包含初始语义分割结果、至少一种模态为例,则预先对样本图像中的各个像素所述的类别标签进行标注,并且确定所述样本图像对应的各个模态取值,训练过程如图5所示。
实施例三
为进一步提高卷积神经网络后处理的准确性,本发明实施例三中,卷积神经网络由至少两级子卷积神经网络构成,各级子卷积神经网络的结构可以相同也可以不相同。优选地,当图像信息中仅包含初始语义分割结果时,各级子卷积神经网络的结构不相同。优选地,当图像信息中包含初始语义分割结果和至少一种模态时,各级子卷积神经网络的结构可以 相同也可以不相同,且各级子卷积神经网络对应的模态可以相同也可以不相同,本申请不做严格的限定,本领域技术人员可以根据实际的需求灵活的设置各级子卷积神经网络,使得各级子卷积神经网络优化的方向不同,以实现对初始语义分割结果进行全方面的优化。更优地,当各级子卷积神经网络结构相同时,各级子卷积神经网络对应的模态项部分相同或完全不同;当各级子卷积神经网络结构不相同时,各级子卷积神经网络对应的其他模态项设置为完全相同、部分相同或完全不同。
当所述图像信息中仅包含初始语义分割结果时,前述步骤203具体可通过以下步骤B1~步骤B2实现,其中:
步骤B1、按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果。
需要说明的是,本发明实施例中,输入给第一级子卷积神经网络的初始语义分割结果为前述步骤202得到的初始语义分割结果;其他级子卷积神经网络的初始语义分割结果为其前一级子卷积神经网络输出的修正语义分割结果。
步骤B2、根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
当所述图像信息中包含初始语义分割结果和至少一种模态时,前述步骤203具体可通过以下步骤C1~步骤C2实现,其中:
步骤C1、按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果。
需要说明的是,本发明实施例中,输入给第一级子卷积神经网络的初始语义分割结果为前述步骤202得到的初始语义分割结果;其他级子卷积神经网络的初始语义分割结果为其前一级子卷积神经网络输出的修正语义分割结果。
步骤C2、根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
更为详细的可通过图6所示的方法流程实现前述步骤C1~步骤C2,该方法流程包括:
步骤601、将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络,得到修正语义分割结果;
步骤602、判断本级子卷积神经网络是否为最后一级子卷积神经网络,若否则执行步骤603,若是则执行步骤604;
步骤603、将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果,并将下一级子卷积神经网络作为本级子卷积神经网络,并执行步骤601;
步骤604、根据本级子卷积神经网络的修正语义分割结果,得到最终的语义分割结果。
本发明实施例三中,修正语义分割结果为通过子卷积神经网络对输入该自卷积神经网络的初始语义分割结果中错误的结果进行纠正后得到的语义分割结果。若初始语义分割结果为图像中各像素的label,则该修正语义分割结果为所述图像的各像素的label;若初始语义分割结果为Confidence Map,则修正语义分割结果也为Confidence Map。
若本发明实施例三中的修正语义分割结果也为置信图,前述步骤B2、步骤C2具体实现为:针对图像的每一个像素,根据最后一级子卷积神经网络输出的修正语义分割结果确定出该像素属于各类别标签的概率值的最大值,将概率值最大的类别标签作为该像素最终所属的类别标签。
若本发明实施例中的修正语义分割结果为图像的各像素的label,则前述步骤B2、步骤C2具体实现为:将最后一级子卷积神经网络的修正语义分割结果作为最终的语义分割结果。
该实例三中,各级子卷积神经网络可预先独立训练得到。以图像信息中仅包含初始语义分割结果为例,则对各子卷积神经网络的训练方式参见图4所示的方式。以图像信息中包含初始语义分割结果和至少一种模态为例,则对各子卷积神经网络的训练方式参见图5所示的方式,且用于训练各子卷积神经网络的训练样本图像的模态分别与相应子卷积神经网络对应。例如,所述卷积神经网络包括第一级子卷积神经网络和第二级子卷积神经网络,第一级子卷积神经网络对应的模态为深度模态和RGB模态,第二级子卷积神经网络对应的模态为RGB模态和CT模态,则在训练第一级子卷积神经网络时,训练数据为样本图像的初始语义分割结果、RGB模态和深度模态,训练第二级子卷积神经网络的训练数据为样本图像的初始语义分割结果、RGB模态和CT模态。
优选地,由于可见图像模态是像素最为重要的特征信息,因此,前述各级子卷积神经网络对应的模态中均包含可见图像模态。以各级子卷积神经网络对应的模态中均包含RGB模态为例,通过包含至少两级的子卷积神经网络进行后处理的过程可如图7所示。
实施例四
为进一步提高卷积神经网络后处理的准确性,本发明实施例四中,卷积神经网络由至少两级子卷积神经网络构成,各级子卷积神经网络的结构可以相同也可以不相同。优选地,当图像信息中仅包含初始语义分割结果时,各级子卷积神经网络的结构不相同。优选地,当图像信息中包含初始语义分割结果和至少一种模态时,各级子卷积神经网络的结构可以 相同也可以不相同,且各级子卷积神经网络对应的模态可以相同也可以不相同,本申请不做严格的限定,本领域技术人员可以根据实际的需求灵活的设置各级子卷积神经网络,使得各级子卷积神经网络优化的方向不同,以实现对初始语义分割结果进行全方面的优化。更优地,当各级子卷积神经网络结构相同时,各级子卷积神经网络对应的模态项部分相同或完全不同;当各级子卷积神经网络结构不相同时,各级子卷积神经网络对应的其他模态项设置为完全相同、部分相同或完全不同。
与实施例三相比,本实施例四在每一级子卷积神经网络均进行至少一次迭代之后才将该级子卷积神经网络最后一次迭代得到的修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果输出,各级子卷积神经网络的迭代次数可以相同也可以不相同,本领域技术人员可根据实际需求灵活设置,本申请不做严格的限定。
当图像信息中仅包含初始语义分割结果时,前述步骤203具体可通过以下的步骤D1~步骤D2实现,其中:
步骤D1、按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果输入至本级子卷积神经网络的步骤;
步骤D2、根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
当图像信息中包含初始语义分割结果和至少一种模态时,前述步骤203具体可通过以下的步骤E1~步骤E2实现,其中:
步骤E1、按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络的步骤;
步骤E2、根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
更为详细的可通过图8所示的方法流程实现前述步骤E1~步骤E2,该方法流程包括:
步骤801、将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络,得到修正语义分割结果;
步骤802、判断是否满足迭代条件,若不满足则执行步骤803,若满足则执行步骤804;
本发明实施例中,通过计数器来对本级子卷积神经网络进行迭代的次数进行计数,每迭代一次累加1;当本级子卷积神经网络迭代结束后,该计数器被清零。
步骤803、判断本级子卷积神经网络是否为最后一级子卷积神经网络,若是则执行步骤806,若否则执行步骤805;
步骤804、将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述步骤801;
步骤805、确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果,将所述下一级子卷积神经网络作为本级子卷积神经网络,并执行步骤801;
步骤806、根据本级子卷积神经网络的修正语义分割结果得到最终的语义分割结果。
本发明实施例四中,修正语义分割结果为通过子卷积神经网络对输入该子卷积神经网络的初始语义分割结果中错误的结果进行纠正后得到的语义分割结果。若初始语义分割结果为图像中各像素的label,则该修正语义分割结果为所述图像的各像素的label;若初始语义分割结果为Confidence Map,则修正语义分割结果也为Confidence Map。
若本发明实施例四中的修正语义分割结果也为置信图,前述步骤D2、步骤E2具体实现为:针对图像的每一个像素,根据最后一级子卷积神经网络最后一次迭代得到的修正语义分割结果确定出各像素属于各类别标签的概率值的最大值,将概率值最大的类别标签作为该像素最终所属的类别标签。
若本发明实施例中的修正语义分割结果为图像的各像素的label,则前述步骤D2、步骤E2具体实现为:将最后一级子卷积神经网络最后一次迭代得到的修正语义分割结果作为最终的语义分割结果。
该实施例四中,迭代条件可以是迭代累积次数达到预置的次数阈值,也可以是本级子卷积神经网络本次输出的修正语义分割结果与前一次输出的语义分割结果满足收敛条件,本申请并不做严格限定。前述步骤D1、步骤E1中判断是否满足迭代条件,可通过但不仅限于以下两种方式实现:
方式1、判断迭代累积次数是否达到预置的次数阈值,若是则确定不满足迭代条件,若否则确定满足迭代条件;如通过计数器对迭代次数进行计数,每迭代一次累加1次,该计数器在本级子卷积神经网络结束迭代是被清零;
方式2、根据本级子卷积神经网络本次输出的修正语义分割结果与前一次输出的语义分割结果确定是否满足收敛条件,若是则确定不满足迭代条件,若否则确定满足迭代条件。
该实例四中,各级子卷积神经网络预先独立训练得到。以图像信息中仅包含初始语义分割结果为例,则对各子卷积神经网络的训练方式参见图4所示的方式。以图像信息中包 含初始语义分割结果和至少一种模态为例,对各子卷积神经网络的训练方式参见图5所示的方式,用于训练各子卷积神经网络的训练样本图像的模态分别与相应子卷积神经网络对应,例如,所述卷积神经网络包括第一级子卷积神经网络和第二级子卷积神经网络,第一级子卷积神经网络对应的模态为深度模态和RGB模态,第二级子卷积神经网络对应的模态为RGB模态和CT模态,则在训练第一级子卷积神经网络时,训练数据为样本图像的初始语义分割结果、RGB模态和深度模态,训练第二级子卷积神经网络的训练数据为样本图像的初始语义分割结果、RGB模态和CT模态。
优选地,由于可见图像模态是像素最为重要的特征信息,因此,前述各级子卷积神经网络对应的模态中均包含可见图像模态。以各级子卷积神经网络对应的模态均包含RGB模态为例,通过包含至少两级的子卷积神经网络进行后处理的过程可如图9所示。
优选地,本发明实施例三和实施例四中,前述卷积神经网络由两级子卷积神经网络构成,其中第一级子卷积神经网络为全局信息优化后处理卷积神经网络,第二级子卷积神经网络为局部边缘优化后处理卷积神经网络。
所述全局信息优化后处理卷积神经网络的结构可如图10所示,通过快速的下次采样得到全局信息,再通过上采样结合全局信息和low-level信息纠正错误结果。局部边缘优化后处理卷积神经网络的结构可如图11所示。
基于前述图像语义分割方法相同的构思,本发明实施例提供一种图像语义分割装置,该装置的结构如图12所示,包括:
接收单元11,用于接收图像;
分割单元12,用于对所述图像进行语义分割,得到初始语义分割结果;
后处理单元13,用于将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
所述图像信息可以仅包含初始语义分割结果,也可以包含初始语义分割结果和所述图像对应的描述所述图像的特征信息的至少一种模态。
在一个具体实例中,所述后处理单元13的结构示意图如图13所示,具体包括:
修正子单元131,用于将图像信息输入至所述卷积神经网络中,得到修正语义分割结果;
判断子单元132,用于判断是否满足迭代条件,若满足则触发第一处理子单元133,若不满足则触发第二处理子单元134;
第一处理子单元133,用于将所述修正语义分割结果作为初始语义分割结果,并触发所述修正子单元131;
第二处理子单元134,用于确定停止迭代,并根据所述修正语义分割结果得到最终的语义分割结果。
优选地,所述判断子单元132具体用于:
判断迭代累积次数是否达到预置的次数阈值,若是则确定不满足迭代条件,若否则确定满足迭代条件;或者,根据所述卷积神经网络本次输出的修正语义分割结果与前一次输出的语义分割结果确定是否满足收敛条件,若是则确定不满足迭代条件,若否则确定满足迭代条件。
在另一个实例中,所述卷积神经网络由至少两级子卷积神经网络构成,所述后处理单元93的结构可包括第三处理子单元和第四处理子单元,其中:
第三处理子单元,用于按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;
第四处理子单元,用于根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
在另一个实例中,所述卷积神经网络由至少两级子卷积神经网络构成,所述后处理单元93的结构可包括第五处理子单元和第六处理子单元,其中:
第五处理子单元,用于按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果输入至本级子卷积神经网络中的步骤;
第六处理子单元,用于根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
在另一个实例中,所述卷积神经网络由至少两级子卷积神经网络构成,所述后处理单元93具体包括第七处理子单元和第八处理子单元,其中:
第七处理子单元,用于按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初 始语义分割结果;
第八处理子单元,用于根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
在最后一个实例中,所述卷积神经网络由至少两级子卷积神经网络构成,所述后处理单元93具体包括第九处理子单元和第十处理子单元,其中:
第九处理子单元,用于按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中的步骤;
第十处理子单元,用于根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
优选地,所述卷积神经网络由两级子卷积神经网络构成,第一级子卷积神经网络为全局信息优化后处理卷积神经网络,第二级子卷积神经网络为局部边缘优化后处理卷积神经网络。
优选地,所述初始语义分割结果为置信图(即Confidence Map),或者所述初始语义分割结果为所述图像中的各像素所属的label。
优选地,所述其他模态的种类包括以下一种或多种:可见图像模态(例如RGB模态、HSV模态)、深度模态、电子计算机断层扫描CT模态、红外模态、毫米波模态和超声波模态。
采用本发明提供的图像分割方案,一方面,由于卷积神经网络是预先训练得到,能够快速的根据包含初始语义分割结果的图像信息进行后处理,无需像现有技术的CRF方式需要计算图像中的各个像素间的相关性来进行后处理,提高了后处理速度和效率;另一方面,输入给卷积神经网络的数据不仅仅包括初始语义分割结果还包括表示图像中各个像素的特征信息的至少一种模态(如深度模态、RGB模态等),在实际中属于同一类别标签的像素一般具有相同的特征信息,因此结合图像的模态来对语义分割结果中的错误结果进行纠正的准确性更高,因此,当图像信息中包含至少一种模态时,本方案还可进一步提高语义分割结果的准确性。
基于前述图像语义分割方法相同的构思,本发明实施例提供一种图像语义分割装置,该装置的结构如图14所示,包括:一个处理器1401和至少一个存储器1402,所述至少一个存储器1402存储有至少一条机器可执行指令,所述处理器1401执行所述至少一条指令以实现:接收图像;对所述图像进行语义分割,得到初始语义分割结果;将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
在一个实施例中,所述图像信息还包含所述图像对应的描述所述图像的特征信息的至少一种模态。
在一个实施例中,所述处理器1401执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:将图像信息输入至所述卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若满足,则将所述修正语义分割结果作为所述图像信息中的初始语义分割结果,并重复前述将图像信息输入至所述卷积神经网络中的步骤;若不满足,则确定停止迭代,并根据所述修正语义分割结果得到最终的语义分割结果。
在一个实施例中,所述卷积神经网络由至少两级子卷积神经网络构成;则,所述处理器1401执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
在一个实施例中,所述卷积神经网络由至少两级子卷积神经网络构成;则,所述处理器1401执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果输入至本级子卷积神经网络中的步骤;根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
在一个实施例中,所述卷积神经网络由至少两级子卷积神经网络构成;则,所述处理器1401执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预先训练 得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
在一个实施例中,所述卷积神经网络由至少两级子卷积神经网络构成;则,所述处理器1401执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中的步骤;根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
在一个实施例中,所述卷积神经网络由两级子卷积神经网络构成,第一级子卷积神经网络为全局信息优化后处理卷积神经网络,第二级子卷积神经网络为局部边缘优化后处理卷积神经网络。
在一个实施例中,所述处理器1401执行所述至少一条指令实现判断是否满足迭代条件,具体包括:判断迭代累积次数是否达到预置的次数阈值,若是则确定不满足迭代条件,若否则确定满足迭代条件;或者,根据本级子卷积神经网络本次输出的修正语义分割结果与前一次输出的语义分割结果确定是否满足收敛条件,若是则确定不满足迭代条件,若否则确定满足迭代条件。
在上述多个实施例中,初始语义分割结果为置信图,或者初始语义分割结果为所述图像中各像素所属的类别标签。
在上述多个实施例中,所述图像对应的模态种类包括以下一种或多种:可见图像模态、深度模态、电子计算机断层扫描CT模态、红外模态、毫米波模态和超声波模态。
基于与前述方法相同的构思,本发明实施例还提供一种存储介质(该存储介质可以是非易失性机器可读存储介质),该存储介质中存储有用于图像语义分割的计算机程序,该计算机程序具有被配置用于执行以下步骤的代码段:接收图像;对所述图像进行语义分割,得到初始语义分割结果;将包含初始语义分割结果的图像信息输入至预先训练得到的卷积 神经网络中进行语义分割后处理,得到最终语义分割结果。
基于与前述方法相同的构思,本发明实施例还提供一种计算机程序,该计算机程序具有被配置用于执行以下图像语义分割的代码段:接收图像;对所述图像进行语义分割,得到初始语义分割结果;将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
综上所述,根据本发明技术方案,在对接收到的图像进行语义分割得到初始语义分割结果之后,将包含初始语义分割结果的图像信息输入至卷积神经网络中进行语义分割后处理,得到最终语义分割结果。采用本发明提供的图像分割方案,由于卷积神经网络是预先训练得到,能够快速的根据包含初始语义分割结果的图像信息进行后处理,无需像现有技术的CRF方式需要计算图像中的各个像素间的相关性来进行后处理,提高了后处理速度和效率。本发明的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。
以上结合具体实施例描述了本发明的基本原理,但是,需要指出的是,对本领域普通技术人员而言,能够理解本发明的方法和装置的全部或者任何步骤或者部件可以在任何计算装置(包括处理器、存储介质等)或者计算装置的网络中,以硬件固件、软件或者他们的组合加以实现,这是本领域普通技术人员在阅读了本发明的说明的情况下运用它们的基本编程技能就能实现的。
本领域普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形 式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的上述实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括上述实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (33)

  1. 一种图像语义分割方法,其特征在于,包括:
    接收图像;
    对所述图像进行语义分割,得到初始语义分割结果;
    将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
  2. 根据权利要求1所述的方法,其特征在于,所述图像信息还包含所述图像对应的描述所述图像的特征信息的至少一种模态。
  3. 根据权利要求1或2所述的方法,其特征在于,将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    将图像信息输入至所述卷积神经网络中,得到修正语义分割结果;
    判断是否满足迭代条件;
    若满足,则将所述修正语义分割结果作为所述图像信息中的初始语义分割结果,并重复前述将图像信息输入至所述卷积神经网络中的步骤;
    若不满足,则确定停止迭代,并根据所述修正语义分割结果得到最终的语义分割结果。
  4. 根据权利要求1所述的方法,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成;
    将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;
    根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  5. 根据权利要求1所述的方法,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成;
    将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果, 并重复前述将初始语义分割结果输入至本级子卷积神经网络中的步骤;
    根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  6. 根据权利要求2所述的方法,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成;
    将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;
    根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  7. 根据权利要求2所述的方法,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成;
    将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中的步骤;
    根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  8. 根据权利要求1~7任一项所述的方法,其特征在于,所述卷积神经网络由两级子卷积神经网络构成,第一级子卷积神经网络为全局信息优化后处理卷积神经网络,第二级子卷积神经网络为局部边缘优化后处理卷积神经网络。
  9. 根据权利要求5或7所述的方法,其特征在于,判断是否满足迭代条件,具体包括:
    判断迭代累积次数是否达到预置的次数阈值,若是则确定不满足迭代条件,若否则确定满足迭代条件;
    或者,
    根据本级子卷积神经网络本次输出的修正语义分割结果与前一次输出的语义分割结果确定是否满足收敛条件,若是则确定不满足迭代条件,若否则确定满足迭代条件。
  10. 根据权利要求1~7任一项所述的方法,其特征在于,初始语义分割结果为置信图,或者初始语义分割结果为所述图像中各像素所属的类别标签。
  11. 根据权利要求2、6、7任一项所述的方法,其特征在于,所述图像对应的模态种类包括以下一种或多种:可见图像模态、深度模态、电子计算机断层扫描CT模态、红外模态、毫米波模态和超声波模态。
  12. 一种图像语义分割装置,其特征在于,包括:
    接收单元,用于接收图像;
    分割单元,用于对所述图像进行语义分割,得到初始语义分割结果;
    后处理单元,用于将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
  13. 根据权利要求12所述的装置,其特征在于,所述图像信息还包含所述图像对应的描述所述图像的特征信息的至少一种模态。
  14. 根据权利要求12或13所述的装置,其特征在于,所述后处理单元具体包括:
    修正子单元,用于将图像信息输入至所述卷积神经网络中,得到修正语义分割结果;
    判断子单元,用于判断是否满足迭代条件,若满足则触发第一处理子单元,若不满足则触发第二处理子单元;
    第一处理子单元,用于将所述修正语义分割结果作为所述图像信息中的初始语义分割结果,并触发所述修正子单元;
    第二处理子单元,用于确定停止迭代,并根据所述修正语义分割结果得到最终的语义分割结果。
  15. 根据权利要求12所述的装置,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成,所述后处理单元具体包括:
    第三处理子单元,用于按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;
    第四处理子单元,用于根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  16. 根据权利要求12所述的装置,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成,所述后处理单元具体包括:
    第五处理子单元,用于按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果输入至本级子卷积神经网络中的步 骤;
    第六处理子单元,用于根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  17. 根据权利要求13所述的装置,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成,所述后处理单元具体包括:
    第七处理子单元,用于按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;
    第八处理子单元,用于根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  18. 根据权利要求13所述的装置,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成,所述后处理单元具体包括:
    第九处理子单元,用于按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中的步骤;
    第十处理子单元,用于根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  19. 根据权利要求12~18任一项所述的装置,其特征在于,所述卷积神经网络由两级子卷积神经网络构成,第一级子卷积神经网络为全局信息优化后处理卷积神经网络,第二级子卷积神经网络为局部边缘优化后处理卷积神经网络。
  20. 根据权利要求14所述的装置,其特征在于,所述判断子单元具体用于:
    判断迭代累积次数是否达到预置的次数阈值,若是则确定不满足迭代条件,若否则确定满足迭代条件;
    或者,
    根据所述卷积神经网络本次输出的修正语义分割结果与前一次输出的语义分割结果确定是否满足收敛条件,若是则确定不满足迭代条件,若否则确定满足迭代条件。
  21. 根据权利要求12~18任一项所述的装置,其特征在于,所述初始语义分割结果为 置信图,或者初始语义分割结果为所述图像中各像素所属的类别标签。
  22. 根据权利要求13、17、18任一项所述的方法,其特征在于,所述图像对应的模态种类包括以下一种或多种:可见图像模态、深度模态、电子计算机断层扫描CT模态、红外模态、毫米波模态和超声波模态。
  23. 一种图像语义分割装置,其特征在于,包括:一个处理器和至少一个存储器,所述至少一个存储器存储有至少一条机器可执行指令,所述处理器执行所述至少一条指令以实现:
    接收图像;
    对所述图像进行语义分割,得到初始语义分割结果;
    将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果。
  24. 根据权利要求23所述的装置,其特征在于,所述图像信息还包含所述图像对应的描述所述图像的特征信息的至少一种模态。
  25. 根据权利要求23或24所述的装置,其特征在于,所述处理器执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    将图像信息输入至所述卷积神经网络中,得到修正语义分割结果;
    判断是否满足迭代条件;
    若满足,则将所述修正语义分割结果作为所述图像信息中的初始语义分割结果,并重复前述将图像信息输入至所述卷积神经网络中的步骤;
    若不满足,则确定停止迭代,并根据所述修正语义分割结果得到最终的语义分割结果。
  26. 根据权利要求23所述的装置,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成;则,
    所述处理器执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;
    根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  27. 根据权利要求23所述的装置,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成;则,
    所述处理器执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预 先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果输入至本级子卷积神经网络中的步骤;
    根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  28. 根据权利要求24所述的装置,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成;则,
    所述处理器执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络,得到修正语义分割结果,将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;
    根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  29. 根据权利要求24所述的装置,其特征在于,所述卷积神经网络由至少两级子卷积神经网络构成;则,
    所述处理器执行所述至少一条指令实现将包含初始语义分割结果的图像信息输入至预先训练得到的卷积神经网络中进行语义分割后处理,得到最终语义分割结果,具体包括:
    按照级联顺序,依次对每一级子卷积神经网络,执行以下步骤:将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中,得到修正语义分割结果;判断是否满足迭代条件;若不满足,则确定停止迭代并将该修正语义分割结果作为下一级子卷积神经网络的初始语义分割结果;若满足,则将该修正语义分割结果作为本级子卷积神经网络的初始语义分割结果,并重复前述将初始语义分割结果、所述模态中与本级子卷积神经网络对应的模态输入至本级子卷积神经网络中的步骤;
    根据最后一级子卷积神经网络输出的修正语义分割结果,确定最终语义分割结果。
  30. 根据权利要求23~29任一项所述的装置,其特征在于,所述卷积神经网络由两级子卷积神经网络构成,第一级子卷积神经网络为全局信息优化后处理卷积神经网络,第二级子卷积神经网络为局部边缘优化后处理卷积神经网络。
  31. 根据权利要求27或29所述的装置,其特征在于,所述处理器执行所述至少一条指令实现判断是否满足迭代条件,具体包括:
    判断迭代累积次数是否达到预置的次数阈值,若是则确定不满足迭代条件,若否则确 定满足迭代条件;
    或者,
    根据本级子卷积神经网络本次输出的修正语义分割结果与前一次输出的语义分割结果确定是否满足收敛条件,若是则确定不满足迭代条件,若否则确定满足迭代条件。
  32. 根据权利要求23~29任一项所述的装置,其特征在于,初始语义分割结果为置信图,或者初始语义分割结果为所述图像中各像素所属的类别标签。
  33. 根据权利要求24、28、29任一项所述的装置,其特征在于,所述图像对应的模态种类包括以下一种或多种:可见图像模态、深度模态、电子计算机断层扫描CT模态、红外模态、毫米波模态和超声波模态。
PCT/CN2017/102031 2017-04-14 2017-09-18 一种图像语义分割方法及装置 WO2018188270A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/577,753 US11205271B2 (en) 2017-04-14 2019-09-20 Method and device for semantic segmentation of image
US17/556,900 US11875511B2 (en) 2017-04-14 2021-12-20 Method and device for semantic segmentation of image

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710247372.8A CN106886801B (zh) 2017-04-14 2017-04-14 一种图像语义分割方法及装置
CN201710247372.8 2017-04-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/577,753 Continuation US11205271B2 (en) 2017-04-14 2019-09-20 Method and device for semantic segmentation of image

Publications (1)

Publication Number Publication Date
WO2018188270A1 true WO2018188270A1 (zh) 2018-10-18

Family

ID=59183947

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/102031 WO2018188270A1 (zh) 2017-04-14 2017-09-18 一种图像语义分割方法及装置

Country Status (3)

Country Link
US (2) US11205271B2 (zh)
CN (1) CN106886801B (zh)
WO (1) WO2018188270A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110415231A (zh) * 2019-07-25 2019-11-05 山东浪潮人工智能研究院有限公司 一种基于注意力先验网络的cnv分割方法
CN113361529A (zh) * 2020-03-03 2021-09-07 北京四维图新科技股份有限公司 图像语义分割方法、装置、电子设备及存储介质
CN113393421A (zh) * 2021-05-08 2021-09-14 深圳市识农智能科技有限公司 一种果实评估方法、装置和巡视设备

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886801B (zh) 2017-04-14 2021-12-17 北京图森智途科技有限公司 一种图像语义分割方法及装置
CN107274406A (zh) * 2017-08-07 2017-10-20 北京深睿博联科技有限责任公司 一种检测敏感区域的方法及装置
CN107564025B (zh) * 2017-08-09 2020-05-29 浙江大学 一种基于深度神经网络的电力设备红外图像语义分割方法
CN109426825A (zh) * 2017-08-31 2019-03-05 北京图森未来科技有限公司 一种物体封闭轮廓的检测方法和装置
CN107729987A (zh) * 2017-09-19 2018-02-23 东华大学 基于深度卷积‑循环神经网络的夜视图像的自动描述方法
CN107767380A (zh) * 2017-12-06 2018-03-06 电子科技大学 一种基于全局空洞卷积的高分辨率复合视野皮肤镜图像分割方法
CN108197623A (zh) * 2018-01-19 2018-06-22 百度在线网络技术(北京)有限公司 用于检测目标的方法和装置
CN108229575A (zh) * 2018-01-19 2018-06-29 百度在线网络技术(北京)有限公司 用于检测目标的方法和装置
CN110057352B (zh) * 2018-01-19 2021-07-16 北京图森智途科技有限公司 一种相机姿态角确定方法及装置
CN108268870B (zh) * 2018-01-29 2020-10-09 重庆师范大学 基于对抗学习的多尺度特征融合超声图像语义分割方法
CN108427951B (zh) * 2018-02-08 2023-08-04 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质和计算机设备
CN108345890B (zh) 2018-03-01 2022-10-28 腾讯科技(深圳)有限公司 图像处理方法、装置和相关设备
CN108491889A (zh) * 2018-04-02 2018-09-04 深圳市易成自动驾驶技术有限公司 图像语义分割方法、装置及计算机可读存储介质
CN108596884B (zh) * 2018-04-15 2021-05-18 桂林电子科技大学 一种胸部ct图像中的食管癌分割方法
US10586456B2 (en) * 2018-04-27 2020-03-10 TuSimple System and method for determining car to lane distance
CN112020721A (zh) * 2018-06-15 2020-12-01 富士通株式会社 用于语义分割的分类神经网络的训练方法及装置、电子设备
CN108831162B (zh) * 2018-06-26 2021-03-02 青岛科技大学 移动通信终端的交通信号控制方法及交通信号控制系统
CN109084955A (zh) * 2018-07-02 2018-12-25 北京百度网讯科技有限公司 显示屏质量检测方法、装置、电子设备及存储介质
CN109241872B (zh) * 2018-08-20 2022-03-18 电子科技大学 基于多级网络的图像语义快速分割方法
CN110866526A (zh) 2018-08-28 2020-03-06 北京三星通信技术研究有限公司 图像分割方法、电子设备及计算机可读存储介质
CN110880001A (zh) * 2018-09-06 2020-03-13 银河水滴科技(北京)有限公司 一种语义分割神经网络的训练方法、设备和存储介质
CN111077166A (zh) * 2018-10-19 2020-04-28 北京金山云网络技术有限公司 液晶屏的瑕疵检测方法、装置及终端设备
US11188799B2 (en) * 2018-11-12 2021-11-30 Sony Corporation Semantic segmentation with soft cross-entropy loss
CN110210487A (zh) * 2019-05-30 2019-09-06 上海商汤智能科技有限公司 一种图像分割方法及装置、电子设备和存储介质
CN110321897A (zh) * 2019-07-08 2019-10-11 四川九洲视讯科技有限责任公司 基于图像语义分割识别非机动车异常行为的方法
CN111179283A (zh) * 2019-12-30 2020-05-19 深圳市商汤科技有限公司 图像语义分割方法及装置、存储介质
CN111275721B (zh) * 2020-02-14 2021-06-08 推想医疗科技股份有限公司 一种图像分割方法、装置、电子设备及存储介质
CN111325212A (zh) * 2020-02-18 2020-06-23 北京奇艺世纪科技有限公司 模型训练方法、装置、电子设备和计算机可读存储介质
CN111652231B (zh) * 2020-05-29 2023-05-30 沈阳铸造研究所有限公司 一种基于特征自适应选择的铸件缺陷语义分割方法
CN112199539A (zh) * 2020-09-10 2021-01-08 佛山聚卓科技有限公司 无人机三维地图摄影图像内容自动标注方法、系统及设备
US11694301B2 (en) 2020-09-30 2023-07-04 Alibaba Group Holding Limited Learning model architecture for image data semantic segmentation
CN112330598B (zh) * 2020-10-14 2023-07-25 浙江华睿科技股份有限公司 一种化纤表面僵丝缺陷检测的方法、装置及存储介质
CN112381832A (zh) * 2020-12-04 2021-02-19 江苏科技大学 一种基于优化卷积神经网络的图像语义分割方法
CN112528873B (zh) * 2020-12-15 2022-03-22 西安电子科技大学 基于多级语义表征和语义计算的信号语义识别方法
CN114693694A (zh) * 2020-12-25 2022-07-01 日本电气株式会社 图像处理的方法、设备和计算机可读存储介质
CN113723411B (zh) * 2021-06-18 2023-06-27 湖北工业大学 一种用于遥感图像语义分割的特征提取方法和分割系统
CN114419321B (zh) * 2022-03-30 2022-07-08 珠海市人民医院 一种基于人工智能的ct图像心脏分割方法及系统

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573669A (zh) * 2015-01-27 2015-04-29 中国科学院自动化研究所 图像物体检测方法
CN104700099A (zh) * 2015-03-31 2015-06-10 百度在线网络技术(北京)有限公司 识别交通标志的方法和装置
US20160171341A1 (en) * 2014-12-15 2016-06-16 Samsung Electronics Co., Ltd. Apparatus and method for detecting object in image, and apparatus and method for computer-aided diagnosis
CN105787510A (zh) * 2016-02-26 2016-07-20 华东理工大学 基于深度学习实现地铁场景分类的系统及方法
CN106204522A (zh) * 2015-05-28 2016-12-07 奥多比公司 对单个图像的联合深度估计和语义标注
US20160358024A1 (en) * 2015-06-03 2016-12-08 Hyperverge Inc. Systems and methods for image processing
CN106447658A (zh) * 2016-09-26 2017-02-22 西北工业大学 基于全局和局部卷积网络的显著性目标检测方法
CN106548192A (zh) * 2016-09-23 2017-03-29 北京市商汤科技开发有限公司 基于神经网络的图像处理方法、装置和电子设备
CN106886801A (zh) * 2017-04-14 2017-06-23 北京图森未来科技有限公司 一种图像语义分割方法及装置

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8965044B1 (en) * 2009-06-18 2015-02-24 The Boeing Company Rotorcraft threat detection system
US9317926B2 (en) * 2013-03-06 2016-04-19 Siemens Aktiengesellschaft Automatic spinal canal segmentation using cascaded random walks
US9730643B2 (en) * 2013-10-17 2017-08-15 Siemens Healthcare Gmbh Method and system for anatomical object detection using marginal space deep neural networks
CN106156712A (zh) * 2015-04-23 2016-11-23 信帧电子技术(北京)有限公司 一种基于自然场景下的身份证号码识别方法与装置
CN106327469B (zh) * 2015-06-29 2019-06-18 北京航空航天大学 一种语义标签引导的视频对象分割方法
US9674536B2 (en) * 2015-11-10 2017-06-06 Applied Materials Israel, Ltd. Technique for visualizing elements in images by color coding
US9916522B2 (en) * 2016-03-11 2018-03-13 Kabushiki Kaisha Toshiba Training constrained deconvolutional networks for road scene semantic segmentation
US9704257B1 (en) * 2016-03-25 2017-07-11 Mitsubishi Electric Research Laboratories, Inc. System and method for semantic segmentation using Gaussian random field network
US9972092B2 (en) * 2016-03-31 2018-05-15 Adobe Systems Incorporated Utilizing deep learning for boundary-aware image segmentation
CN105912990B (zh) * 2016-04-05 2019-10-08 深圳先进技术研究院 人脸检测的方法及装置
CN106204587B (zh) * 2016-05-27 2019-01-08 浙江德尚韵兴图像科技有限公司 基于深度卷积神经网络和区域竞争模型的多器官分割方法
CN106045220A (zh) 2016-07-22 2016-10-26 宁国市渠源净水设备有限公司 一种整治生产铅的行业废水废气的装备
US9589374B1 (en) * 2016-08-01 2017-03-07 12 Sigma Technologies Computer-aided diagnosis system for medical images using deep convolutional neural networks
CN106447622A (zh) * 2016-08-30 2017-02-22 乐视控股(北京)有限公司 一种图像雾霾去除方法及装置
CN106530305B (zh) * 2016-09-23 2019-09-13 北京市商汤科技开发有限公司 语义分割模型训练和图像分割方法及装置、计算设备
US10803378B2 (en) * 2017-03-15 2020-10-13 Samsung Electronics Co., Ltd System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160171341A1 (en) * 2014-12-15 2016-06-16 Samsung Electronics Co., Ltd. Apparatus and method for detecting object in image, and apparatus and method for computer-aided diagnosis
CN104573669A (zh) * 2015-01-27 2015-04-29 中国科学院自动化研究所 图像物体检测方法
CN104700099A (zh) * 2015-03-31 2015-06-10 百度在线网络技术(北京)有限公司 识别交通标志的方法和装置
CN106204522A (zh) * 2015-05-28 2016-12-07 奥多比公司 对单个图像的联合深度估计和语义标注
US20160358024A1 (en) * 2015-06-03 2016-12-08 Hyperverge Inc. Systems and methods for image processing
CN105787510A (zh) * 2016-02-26 2016-07-20 华东理工大学 基于深度学习实现地铁场景分类的系统及方法
CN106548192A (zh) * 2016-09-23 2017-03-29 北京市商汤科技开发有限公司 基于神经网络的图像处理方法、装置和电子设备
CN106447658A (zh) * 2016-09-26 2017-02-22 西北工业大学 基于全局和局部卷积网络的显著性目标检测方法
CN106886801A (zh) * 2017-04-14 2017-06-23 北京图森未来科技有限公司 一种图像语义分割方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110415231A (zh) * 2019-07-25 2019-11-05 山东浪潮人工智能研究院有限公司 一种基于注意力先验网络的cnv分割方法
CN113361529A (zh) * 2020-03-03 2021-09-07 北京四维图新科技股份有限公司 图像语义分割方法、装置、电子设备及存储介质
CN113361529B (zh) * 2020-03-03 2024-05-10 北京四维图新科技股份有限公司 图像语义分割方法、装置、电子设备及存储介质
CN113393421A (zh) * 2021-05-08 2021-09-14 深圳市识农智能科技有限公司 一种果实评估方法、装置和巡视设备

Also Published As

Publication number Publication date
US11875511B2 (en) 2024-01-16
US20200020102A1 (en) 2020-01-16
CN106886801B (zh) 2021-12-17
US20220114731A1 (en) 2022-04-14
US11205271B2 (en) 2021-12-21
CN106886801A (zh) 2017-06-23

Similar Documents

Publication Publication Date Title
WO2018188270A1 (zh) 一种图像语义分割方法及装置
TWI742382B (zh) 透過電腦執行的、用於車輛零件識別的神經網路系統、透過神經網路系統進行車輛零件識別的方法、進行車輛零件識別的裝置和計算設備
US11657162B2 (en) Adversarial training of neural networks using information about activation path differentials
US20210027098A1 (en) Weakly Supervised Image Segmentation Via Curriculum Learning
US10318848B2 (en) Methods for object localization and image classification
JP7128022B2 (ja) 完全教師あり学習用のデータセットの形成
US10769484B2 (en) Character detection method and apparatus
EP4035064B1 (en) Object detection based on pixel differences
Fan et al. Multi-level contextual rnns with attention model for scene labeling
WO2018028255A1 (zh) 基于对抗网络的图像显著性检测方法
US20230386229A1 (en) Computer Vision Systems and Methods for Information Extraction from Text Images Using Evidence Grounding Techniques
US10410096B2 (en) Context-based priors for object detection in images
WO2018108129A1 (zh) 用于识别物体类别的方法及装置、电子设备
CN111126592A (zh) 输出预测结果、生成神经网络的方法及装置和存储介质
US20190080176A1 (en) On-line action detection using recurrent neural network
CN110348447B (zh) 一种具有丰富空间信息的多模型集成目标检测方法
US11803971B2 (en) Generating improved panoptic segmented digital images based on panoptic segmentation neural networks that utilize exemplar unknown object classes
CN115908908B (zh) 基于图注意力网络的遥感图像聚集型目标识别方法及装置
JP7327077B2 (ja) 路上障害物検知装置、路上障害物検知方法、及び路上障害物検知プログラム
US11568212B2 (en) Techniques for understanding how trained neural networks operate
CN114170446A (zh) 一种基于深层融合神经网络的温度、亮度特征提取方法
CN116075820A (zh) 用于搜索图像数据库的方法、非暂时性计算机可读存储介质和设备
US20210201015A1 (en) System and method for enhancing neural sentence classification
Balamurugan Faster region based convolution neural network with context iterative refinement for object detection
CN113792132B (zh) 一种目标答案确定方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17905608

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17905608

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03/04/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17905608

Country of ref document: EP

Kind code of ref document: A1