CN110276765B - Image panorama segmentation method based on multitask learning deep neural network - Google Patents

Image panorama segmentation method based on multitask learning deep neural network Download PDF

Info

Publication number
CN110276765B
CN110276765B CN201910544228.XA CN201910544228A CN110276765B CN 110276765 B CN110276765 B CN 110276765B CN 201910544228 A CN201910544228 A CN 201910544228A CN 110276765 B CN110276765 B CN 110276765B
Authority
CN
China
Prior art keywords
segmentation
candidate
graph
network
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910544228.XA
Other languages
Chinese (zh)
Other versions
CN110276765A (en
Inventor
白双
王聪聪
李沛安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN201910544228.XA priority Critical patent/CN110276765B/en
Publication of CN110276765A publication Critical patent/CN110276765A/en
Application granted granted Critical
Publication of CN110276765B publication Critical patent/CN110276765B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image panorama segmentation method based on a multitask learning deep neural network, which comprises the following steps: inputting the image into a backbone convolution neural network for feature extraction to obtain a corresponding feature map; inputting the feature map into a semantic segmentation network head and a region proposal network head respectively to obtain a semantic segmentation map and a plurality of candidate regions of the image; screening candidate regions according to the semantic segmentation map; inputting the screened candidate areas to an object recognition network head and a boundary box offset prediction network head respectively for classification and boundary box correction; inputting the candidate area after the classification and the correction of the bounding box into an example segmentation network head to obtain an example segmentation graph; fusing the semantic segmentation graph and the example segmentation graph to obtain a panoramic segmentation graph; training and optimizing the panoramic segmentation network through a training optimization mechanism to obtain an optimized image panoramic segmentation model; and carrying out panoramic segmentation on the image. The method can simultaneously complete the panoramic semantic and instance segmentation tasks and reduce the calculation amount.

Description

Image panorama segmentation method based on multitask learning deep neural network
Technical Field
The invention relates to the technical field of computer vision recognition, in particular to an image panorama segmentation method based on a multitask learning deep neural network.
Background
With the continuous deepening of computer vision research and deep learning methods, technologies such as image classification, semantic segmentation and instance segmentation based on deep learning are greatly improved. Semantic segmentation assigns a semantic class label to each pixel in an image, but cannot distinguish different object instances of the same semantic class in an image. Example segmentation pixel-level segmentation of object examples in an image, but does not involve a variety of incomputable objects that do not have an explicit shape. The panoramic segmentation task is the unification of semantic segmentation and instance segmentation tasks, and is very important for tasks such as automatic driving and intelligent robots which depend on image scene visual perception.
The traditional panorama segmentation technology generally independently executes semantic segmentation and instance segmentation tasks, and then fuses the semantic segmentation and the instance segmentation tasks to obtain a panorama segmentation result. The method relies on two independent networks, and the network computation amount is large. Therefore, a multitask network segmentation method which can simultaneously complete panoramic semantic and instance segmentation tasks and reduce the calculation amount is needed.
Disclosure of Invention
The invention provides an image panorama segmentation method based on a multitask learning deep neural network, which aims to solve the problems.
In order to achieve the purpose, the invention adopts the following technical scheme.
The invention provides an image panorama segmentation method based on a multitask learning deep neural network, which comprises the following steps:
inputting the image into a backbone convolution neural network for feature extraction to obtain a corresponding feature map;
inputting the feature map into a semantic segmentation network head and a region proposal network head respectively to obtain a semantic segmentation map and a plurality of candidate regions of the image;
screening the candidate region according to the semantic segmentation graph;
inputting the screened candidate areas to an object recognition network head and a boundary box offset prediction network head respectively for classification and boundary box correction;
inputting the candidate area after the classification and the correction of the bounding box into an example segmentation network head to obtain an example segmentation graph;
fusing the semantic segmentation graph and the example segmentation graph to obtain an image panoramic segmentation graph;
training and optimizing a panoramic segmentation network through a training optimization mechanism according to the image panoramic segmentation graph to obtain an optimized image panoramic segmentation model;
and carrying out panoramic segmentation on the image according to the optimized image panoramic segmentation model.
Preferably, inputting the feature map into a semantic segmentation network header and a region proposal network header respectively to obtain a semantic segmentation map and a candidate region of the image, comprises:
inputting the characteristic graph into a semantic segmentation network head, and generating pixel-level class prediction through full convolution operation so as to obtain a semantic segmentation graph of the image;
and (3) proposing a network head by inputting the characteristic diagram into an area, generating candidate areas with different sizes and length-width ratios through multiple convolution operations, and obtaining the category of each candidate area and the coordinates of a bounding box of the candidate area.
Preferably, the screening the candidate region according to the semantic segmentation map includes:
determining a region corresponding to the position of the semantic segmentation characteristic map according to the bounding box coordinates of each candidate region;
according to each candidate region, calculating the area of the pixel belonging to the countable object in the semantic segmentation map region corresponding to each candidate region, and further calculating the area ratio of the area to the corresponding candidate region;
and judging whether the area proportion corresponding to the candidate region is within a certain threshold range, and if not, deleting the candidate region.
Preferably, the certain threshold value ranges from 0.5 to 0.7.
Preferably, the method further comprises preliminarily screening the candidate regions before screening the candidate regions according to the semantic segmentation map, and removing candidate regions which do not meet the rules.
Preferably, the step of inputting the screened candidate areas to the object recognition network head and the bounding box deviation prediction network head for classification and bounding box correction respectively comprises:
extracting a feature map of the candidate region corresponding to the screening from the feature map according to the candidate region after the screening;
performing interest area pooling operation on the screened candidate area characteristic graph to obtain a pooled candidate area with a certain size;
inputting the pooled candidate areas to an object identification network head and a boundary box offset prediction network head respectively to obtain the categories of the pooled candidate areas and the coordinate offset of the boundary box;
and correcting the boundary frame of the pooled candidate area according to the category of the pooled candidate area and the coordinate offset of the boundary frame.
Preferably, inputting the candidate area after the classification and the bounding box correction to an example segmentation network head to obtain an example segmentation graph, including:
inputting the feature map and the example area into an example segmentation network head, and executing the same operation as the semantic segmentation network head to obtain an example segmentation binary distribution feature;
and acquiring a target example mask corresponding to each example area, and further generating an example segmentation graph.
Preferably, the fusing the semantic segmentation map and the example segmentation map to obtain an image panorama segmentation map includes:
carrying out convolution operation on the feature maps generated by the backbone network to generate two groups of feature maps which are respectively connected with the semantic segmentation map and the example segmentation map in series;
respectively carrying out convolution operation and sigmoid activation function processing on the semantic segmentation graph and the example segmentation graph which are connected in series to obtain an example segmentation soft threshold distribution characteristic graph and a semantic segmentation soft threshold distribution characteristic graph;
carrying out element-by-element product on the example segmentation soft threshold distribution characteristic graph and the example segmentation graph, and simultaneously carrying out element-by-element product on the semantic segmentation soft threshold distribution characteristic graph and the semantic segmentation graph;
connecting the semantic segmentation graph and the example segmentation graph after element-by-element product operation in series, performing primary fusion on the semantic segmentation graph and the example segmentation graph after connection by adopting convolution operation, then performing feature extraction by using the expansion convolution with different expansion rates, and connecting the extracted results in series;
further fusing the results after the serial connection by adopting convolution operation, and comparing the threshold value of the fused results to obtain a gating value distribution map of 0-1 distribution;
and according to the gating value distribution diagram, selecting and using a semantic segmentation result or an example segmentation result for the 0-1 value of each pixel to obtain a panoramic segmentation diagram.
Preferably, the training optimization mechanism comprises:
1) with Lstep-1=Lseg+LrpnTraining the semantic segmentation network header and the area proposal network header for an objective function;
2) with Lstep-2=Lcls-m+Lreg+LinsIs a target ofA function to train an object recognition network header, a bounding box offset prediction network header, and an instance segmentation network header;
3) training to generate a rear-end fusion network of the panoramic segmentation graph by taking the two-class cross entropy loss function as a target function;
and summing the target functions in the three steps to obtain a uniform target function, and optimizing the model based on the uniform target function to obtain an optimized panoramic segmentation node model.
Preferably, the backbone convolutional neural network is a hole convolutional structure or an encoding-decoding structure.
According to the technical scheme provided by the image panorama segmentation method based on the multitask learning deep neural network, the unified multitask network is built, image semantic segmentation and example segmentation are simultaneously realized, then panorama segmentation is performed, the execution of the example segmentation task is assisted by the semantic segmentation result, the precision of example segmentation is further improved, high-quality semantic segmentation and example segmentation results can be obtained, and finally the panorama segmentation result is obtained through the fusion of the rear end.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of an image panorama segmentation method based on a multitask learning deep neural network according to an embodiment;
fig. 2 is a schematic structural diagram of an image panorama segmentation method based on a multitask learning deep neural network according to an embodiment;
FIG. 3 is a schematic diagram of an implementation of an image panorama segmentation method based on a multitask learning deep neural network according to an embodiment;
fig. 4 is an implementation schematic diagram of fusion between a semantic segmentation graph and an example segmentation graph provided in the embodiment.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
To facilitate understanding of the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the accompanying drawings.
Examples
The panorama segmentation method comprises the following steps:
meaning of panorama segmentation method: the panoramic segmentation is to perform semantic classification and instance ID labeling on each pixel in an image, and for semantic categories corresponding to the countless objects, all pixels belonging to a certain semantic category have the same semantic category label and the same instance ID; and for the semantic categories corresponding to countable objects, pixels belonging to a certain object category have the same semantic category label, and different instance IDs are allocated according to different object instances to which the semantic categories belong.
Fig. 1 is a flowchart of an image panorama segmentation method based on a multitask learning deep neural network provided in this embodiment, fig. 2 is a schematic structural diagram of the image panorama segmentation method based on the multitask learning deep neural network provided in this embodiment, and fig. 3 is an implementation schematic diagram of the image panorama segmentation method based on the multitask learning deep neural network provided in this embodiment, referring to fig. 1, fig. 2, and fig. 3, the method includes the following steps:
and S1, inputting the image into the backbone convolutional neural network for feature extraction to obtain a corresponding feature map.
Preferably, the backbone convolutional neural network is a hole convolutional structure or an encoding-decoding structure. This structure enables the generation of richer semantic information and higher resolution feature maps, thereby enhancing the ability to identify larger or smaller objects to enhance the robustness of identifying larger or smaller objects.
Illustratively, a network structure of an encoding-decoding architecture is adopted as the structure of the backbone convolutional neural network, wherein an encoder is composed of the first four modules of ResNeXt-101, and a decoder part is composed of two stages of decoding modules based on bilinear upsampling and convolutional operation. The backbone convolutional neural network firstly uses an encoder to extract a characteristic map with rich semantics from an image, and then gradually restores spatial information in the characteristic map through a decoder.
S2, the characteristic graph is input into the semantic segmentation network head and the area proposal network head respectively to obtain the semantic segmentation graph and a plurality of candidate areas of the image.
And inputting the characteristic graph to a semantic segmentation network head, and generating pixel-level class prediction through full convolution operation so as to obtain the semantic segmentation graph of the image. The semantic segmentation network head is composed of a full convolution structure, the full convolution structure is composed of two convolution layers, two anti-convolution layers, a 1x1 convolution layer and a softmax layer, the characteristic graph obtains pixel-level class probability prediction after passing through the full convolution structure, and then the semantic segmentation graph of the input image is obtained.
And inputting the characteristic diagram into a region proposing network head, and generating candidate regions with different sizes and length-width ratios and surrounding frame coordinates thereof through multiple convolution operations. The Region Proposal Network header is composed of a Region Proposal Network (RPN), and the feature map is input to the RPN.
Of course, the full convolution structure may be in other forms, and is not limited herein.
The configuration of the semantic segmentation network header and the area proposal network header is not limited to the above-mentioned content, and any other configuration can be used as the configuration of the semantic segmentation network header and the area proposal network header, and is included in the scope of the present invention.
S3, screening the candidate region according to the semantic segmentation graph.
Preferably, before this step, the method further includes preliminarily screening the plurality of candidate regions, and removing candidate regions that do not meet the rule. The process specifically comprises the following steps: firstly, removing candidate regions which are too small and exceed the boundary; secondly, sorting the regions according to the descending category confidence score of each candidate region obtained from the RPN, and screening out a fixed number of partial candidate regions; then, using a Non-Maximum inhibition Non-Maximum Suppression (NMS) algorithm to eliminate overlapped candidate regions; and finally, reserving part of high-score candidate regions according to the category confidence scores.
Screening candidate regions according to the semantic segmentation graph, comprising:
determining a region corresponding to the position of the semantic segmentation characteristic map according to the bounding box coordinates of each candidate region; according to each candidate region, calculating the area of the pixel belonging to the countable object in the semantic segmentation map region corresponding to each candidate region, and further calculating the area ratio of the area to the corresponding candidate region; and judging whether the area proportion corresponding to the candidate region is within a certain threshold range, and if not, deleting the candidate region.
Firstly, determining a region corresponding to the position in the semantic segmentation characteristic map according to the coordinates of each candidate region; then, calculating the area of the pixel belonging to countable objects in the semantic segmentation image region corresponding to each candidate region, specifically, in a semantic segmentation region, if the category of a certain pixel belongs to countable objects, the pixel position is set to be '1', otherwise, the pixel position is set to be '0', and finally, the area of all pixels with the pixel value of '1' in the region is counted; and finally, calculating the area ratio of the area to the corresponding candidate region, and if the area ratio is smaller than a certain threshold T1, discarding the candidate region.
Preferably, the certain threshold value T1 here is in the range of 0.5-0.7.
S4, the screened candidate areas are respectively input to the object recognition network head and the boundary box deviation prediction network head for classification and boundary box correction.
Extracting a feature map of the candidate region corresponding to the screening from the feature map according to the candidate region after the screening;
performing Interest area (RI, Region of Interest) pooling operation on the screened candidate area characteristic graph to obtain a pooled candidate area with a certain size; the purpose of this step is to input each candidate region to the full link layer for processing such as classification.
Inputting the pooled candidate areas to an object identification network head and a boundary box offset prediction network head respectively to obtain the categories of the pooled candidate areas and the coordinate offset of the boundary box;
the pooled candidate region bounding boxes are modified according to the category of the pooled candidate region and the coordinate offset of the bounding box, countable object candidate bounding boxes that are determined by the object identification network head to be background are discarded, and remaining candidate bounding boxes are corrected for location based on coordinate offset prediction.
S5, inputting the candidate area after classification and coordinate correction to the example segmentation network head to obtain an example segmentation graph.
Inputting the feature map and the instance region into an instance segmentation network head, performing the same operations as said semantic segmentation network head to obtain an instance segmentation binary distribution feature, the instance segmentation network head using the same structure and sharing parameters as the semantic segmentation network head, except that the semantic segmentation network head generates probability distribution maps for all semantic classes when generating semantic segmentation predictions, and ignores prediction outputs corresponding to non-instance objects and retains only the probability distribution maps corresponding to instance objects when generating instance segmentation predictions. And then acquiring a target instance mask corresponding to each instance area, and further generating an instance segmentation graph.
Further, when an overlapping problem occurs between different instances, the prediction result with high confidence score in the instance segmentation binary distribution feature is selected as the instance segmentation graph.
And S6, fusing the semantic segmentation graph and the example segmentation graph to obtain an image panoramic segmentation graph. Referring to fig. 4, fig. 4 is an implementation schematic diagram of fusing the semantic segmentation graph and the example segmentation graph provided in this embodiment.
There may be a conflict between the instance segmentation output and the semantic segmentation output. In order to obtain a uniform panoramic segmentation result, the semantic segmentation graph and the example segmentation graph need to be fused, and the method specifically comprises the following steps:
s61, carrying out convolution operation on the feature maps generated by the backbone network to generate two groups of feature maps, and respectively connecting the two groups of feature maps in series with the semantic segmentation map and the example segmentation map;
s62, respectively carrying out convolution operation and sigmoid activation function processing on the semantic segmentation graph and the example segmentation graph which are connected in series to obtain an example segmentation soft threshold value distribution feature graph and a semantic segmentation soft threshold value distribution feature graph;
s63, carrying out element-by-element product by using the example segmentation soft threshold distribution feature map and the example segmentation map, and simultaneously carrying out element-by-element product by using the semantic segmentation soft threshold distribution feature map and the semantic segmentation map;
s64, the semantic segmentation graph and the example segmentation graph after element-by-element product operation are connected in series, the semantic segmentation graph and the example segmentation graph after connection are preliminarily fused by convolution operation, feature extraction is carried out by using expansion convolution with different expansion rates, and extracted results are connected in series;
s65, further fusing the results after the serial connection by adopting convolution operation, and comparing the threshold value of the fused results to obtain a gating value distribution map of 0-1 distribution;
s66, selecting and using semantic segmentation or example segmentation result for 0-1 value of each pixel according to the gating value distribution map to obtain a panorama segmentation map.
Preferably, the threshold value in this step is 0.5.
S7, according to the image panorama segmentation graph, training and optimizing the panorama segmentation model through a training optimization mechanism to obtain an optimized image panorama segmentation model.
Because the panorama segmentation simultaneously relates to semantic segmentation and example segmentation, a plurality of basic tasks such as detection, identification, segmentation and the like are covered. The panoramic segmentation network architecture is complex, and in order to obtain the optimal optimization result, the training process of the whole panoramic segmentation model is divided into the following 4 steps by a training optimization mechanism.
The training optimization mechanism comprises:
1) with Lstep-1=Lseg+LrpnFor the objective function, the semantic segmentation network header and the area proposal network header are trained to minimize the objective function.
Defining a network header and a region suggestion network header loss representing training semanticsLoss multitasking loss function Lstep-1Is represented by the following formula (1):
Lstep-1=Lseg+Lrpn (1)
wherein,
Figure BDA0002103498130000111
defined as a cross-entropy loss function expressed as a semantic segmentation loss, NIPIs the number of pixels in the image, M is the number of semantic categories, M represents a certain semantic category, yi mOne-hot notation, p, for pixel ii mA predicted output for pixel i for the model;
Figure BDA0002103498130000112
defined as a regional proposed penalty, wherein Lcls-bIs a two-class cross-entropy classification loss function expressed as
Figure BDA0002103498130000113
i is an index of the candidate proposed region in the image, aiSuggesting for proposal the predicted probability that region i is a countable object.
Figure BDA0002103498130000114
Indicating whether the proposed suggested area is a countable object. If so, then
Figure BDA0002103498130000115
Get 1, otherwise get 0. L isregPredicting a penalty function for the bounding box offset, in the second term
Figure BDA0002103498130000116
The bounding box coordinate offset penalty is computed for coefficient representation only for candidate proposed regions corresponding to countable objects, and λ is a weighting coefficient for the offset penalty used to balance the offset penalty and the classification penalty. t is tiA predictor representing a parameterized 4-dimensional bounding box coordinate offset vector,
Figure BDA0002103498130000117
is and proposes a proposalThe 4-dimensional coordinate offset of the real bounding box associated with region i. Bounding box coordinate offset prediction is a regression problem, therefore defining LregIs composed of
Figure BDA0002103498130000118
j represents a coordinate representation of the candidate region bounding box, where x, y are the top left coordinates of the candidate region bounding box, w, h are the width and height of the candidate region bounding box starting at the top left coordinates, where,
Figure BDA0002103498130000119
2) with Lstep-2=Lcls-m+Lreg+LinsFor the objective function, an object recognition network header, a bounding box offset prediction network header, and an instance segmentation network header are trained.
The invention adopts the candidate bounding box transferred from the preceding stage to extract the bounding box characteristics from the characteristic diagram, and defines a multitask loss function on each bounding box characteristic as shown in the following formula (2):
Lstep-2=Lcls-m+Lreg+Lins (2)
wherein,
Figure BDA00021034981300001110
Lcls-mmultiple class cross entropy loss function for classifying countable objects and backgrounds (where countable objects and backgrounds are both defined as background classes), NRAs a number of bounding box features, MinsAdding one to the number of countable object categories indicates that all background categories are considered as one category. L isregWith L in step 1)regThe loss of the coordinate offset of the predicted boundary box and the coordinate offset of the actual boundary box of the countable object example is defined;
Figure BDA0002103498130000121
Linsloss value for semantic segmentation of candidate regions, NRPIs the number of pixels in the candidate region, m is some instance level semantic category, yiOne-hot notation, p, for pixel iiThe predicted output of the model for pixel i. In calculating LinsIn the process of losing values, only countable object classes and backgrounds are considered.
3) And training to generate a rear-end fusion network of the panoramic segmentation graph by taking the two-class cross entropy loss function as a target function.
The training of the fusion network of the semantic segmentation output and the example segmentation output is characterized in that the fusion network outputs a gating value distribution map of a single channel, wherein the gating value distribution map only comprises two numerical values of 0 and 1, so that the project expresses the semantic-example segmentation gating problem into a binary classification problem, the gating value distribution map obtained through prediction and a binarized image are truly labeled, and the back-end fusion network of the panoramic segmentation graph is trained and generated through calculating a binary classification cross entropy loss function.
4) And summing the target functions in the three steps to obtain a uniform target function, and optimizing the model based on the uniform target function to obtain an optimized panoramic segmentation node model.
S8, carrying out panoramic segmentation on the image according to the optimized image panoramic segmentation model.
Those of ordinary skill in the art will understand that: the drawings are merely schematic representations of one embodiment, and the flow charts in the drawings are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. An image panorama segmentation method based on a multitask learning deep neural network is characterized by comprising the following steps:
inputting the image into a backbone convolution neural network for feature extraction to obtain a corresponding feature map;
inputting the feature map into a semantic segmentation network head and a region proposal network head respectively to obtain a semantic segmentation map and a plurality of candidate regions of the image;
screening the candidate region according to the semantic segmentation graph;
inputting the screened candidate areas to an object recognition network head and a boundary box offset prediction network head respectively for classification and boundary box correction;
inputting the candidate area after the classification and the correction of the bounding box into an example segmentation network head to obtain an example segmentation graph;
fusing the semantic segmentation graph and the example segmentation graph to obtain an image panoramic segmentation graph, wherein the method comprises the following steps:
carrying out convolution operation on the feature maps generated by the backbone network to generate two groups of feature maps which are respectively connected with the semantic segmentation map and the example segmentation map in series;
respectively carrying out convolution operation and sigmoid activation function processing on the semantic segmentation graph and the example segmentation graph which are connected in series to obtain an example segmentation soft threshold distribution characteristic graph and a semantic segmentation soft threshold distribution characteristic graph;
carrying out element-by-element product on the example segmentation soft threshold distribution characteristic graph and the example segmentation graph, and simultaneously carrying out element-by-element product on the semantic segmentation soft threshold distribution characteristic graph and the semantic segmentation graph;
connecting the semantic segmentation graph and the example segmentation graph after element-by-element product operation in series, performing primary fusion on the semantic segmentation graph and the example segmentation graph after connection by adopting convolution operation, then performing feature extraction by using the expansion convolution with different expansion rates, and connecting the extracted results in series;
further fusing the results after the serial connection by adopting convolution operation, and comparing the threshold value of the fused results to obtain a gating value distribution map of 0-1 distribution;
according to the gating value distribution diagram, selecting and using a semantic segmentation result or an example segmentation result for the 0-1 value of each pixel to obtain a panoramic segmentation image;
training and optimizing a panoramic segmentation network through a training optimization mechanism according to the image panoramic segmentation graph to obtain an optimized image panoramic segmentation model;
and carrying out panoramic segmentation on the image according to the optimized image panoramic segmentation model.
2. The method of claim 1, wherein inputting the feature map into a semantic segmentation network header and a region proposal network header, respectively, to obtain a semantic segmentation map and a plurality of candidate regions of an image comprises:
inputting the characteristic graph into a semantic segmentation network head, and generating pixel-level class prediction through full convolution operation so as to obtain a semantic segmentation graph of the image;
and (3) proposing a network head by inputting the characteristic diagram into an area, generating candidate areas with different sizes and length-width ratios through multiple convolution operations, and obtaining the category of each candidate area and the coordinates of a bounding box of the candidate area.
3. The method of claim 2, wherein said filtering said candidate regions according to said semantic segmentation map comprises:
determining a region corresponding to the position of the semantic segmentation map according to the bounding box coordinates of each candidate region;
according to each candidate region, calculating the area of pixels belonging to countable objects in the semantic segmentation map region corresponding to each candidate region, and further calculating the area ratio of the area to the corresponding candidate region;
and judging whether the area proportion corresponding to the candidate region is within a certain threshold range, and if not, deleting the candidate region.
4. The method of claim 3, wherein the certain threshold value is in a range of 0.5-0.7.
5. The method of claim 1, further comprising preliminarily filtering the candidate regions to remove candidate regions that do not meet a rule before filtering the candidate regions according to the semantic segmentation map.
6. The method of claim 1, wherein the inputting the filtered candidate areas to the object recognition network header and the bounding box offset prediction network header for classification and bounding box correction, respectively, comprises:
extracting a feature map of the candidate region corresponding to the screening from the feature map according to the candidate region after the screening;
performing interest area pooling operation on the screened candidate area characteristic graph to obtain a pooled candidate area with a certain size;
inputting the pooled candidate areas to an object identification network head and a boundary box offset prediction network head respectively to obtain the categories of the pooled candidate areas and the coordinate offset of the boundary box;
and correcting the boundary frame of the pooled candidate area according to the category of the pooled candidate area and the coordinate offset of the boundary frame.
7. The method of claim 1, wherein inputting the candidate areas with the classification and bounding box revisions to an example segmentation network header to obtain an example segmentation map comprises:
inputting the feature map and the example area into an example segmentation network head, and executing the same operation as the semantic segmentation network head to obtain an example segmentation binary distribution feature;
and acquiring a target example mask corresponding to each example area, and further generating an example segmentation graph.
8. The method of claim 1, wherein the training optimization mechanism comprises:
1) with Lstep-1=Lseg+LrpnTraining the semantic segmentation network header and the area proposal network header for an objective function;
wherein L isstep-1A multitask loss function representing losses of the training semantically segmented network header and the area suggestion network header,
Figure FDA0002964127240000031
defined as a cross-entropy loss function expressed as a semantic segmentation loss, NIPIs the number of pixels in the image, M is the number of semantic categories, M represents a certain semantic category, yi mOne-hot notation, p, for pixel ii mA predicted output for pixel i for the model;
Figure FDA0002964127240000041
defined as a regional proposed penalty, wherein Lcls-bIs a two-class cross-entropy classification loss function expressed as
Figure FDA0002964127240000042
i is an index of the candidate proposed region in the image, aiTo propose a predicted probability that the proposed region i is a countable object,
Figure FDA0002964127240000043
indicating whether the proposed suggested area is a countable object; if so, then
Figure FDA0002964127240000044
Get 1, otherwise get 0, LregPredicting a penalty function for the bounding box offset, in the second term
Figure FDA0002964127240000045
Calculating bounding box coordinate offset loss for coefficient representation only for candidate proposed regions corresponding to countable objects, λ being a weighting coefficient of offset loss for balancing offset loss and classification loss; t is tiA predictor representing a parameterized 4-dimensional bounding box coordinate offset vector,
Figure FDA0002964127240000046
is the 4-dimensional coordinate offset of the real border associated with the proposed suggested region i; bounding box coordinate offset prediction is a regression problem, therefore defining LregIs composed of
Figure FDA0002964127240000047
j represents a coordinate representation of the candidate region bounding box, where x, y are the top left coordinates of the candidate region bounding box, w, h are the width and height of the candidate region bounding box starting at the top left coordinates, where,
Figure FDA0002964127240000048
2) with Lstep-2=Lcls-m+Lreg+LinsTraining an object recognition network header, a bounding box offset prediction network header, and an instance segmentation network header for an objective function;
wherein,
Figure FDA0002964127240000049
Lcls-mfor multi-class cross entropy loss function, N, for classifying countable objects and backgroundsRAs a number of bounding box features, MinsAdding one to the number of countable object categories, wherein the addition of one indicates that all background categories are regarded as one category; l isregA loss of predicted bounding box coordinate offset and actual bounding box coordinate offset for defining countable object instances;
Figure FDA00029641272400000410
Linsloss value for semantic segmentation of candidate regions, NRPIs the number of pixels in the candidate region, m is some instance level semantic category, yiOne-hot notation, p, for pixel iiA predicted output for pixel i for the model;
3) training to generate a rear-end fusion network of the panoramic segmentation graph by taking the two-class cross entropy loss function as a target function;
and summing the target functions in the three steps to obtain a uniform target function, and optimizing the panoramic segmentation network based on the uniform target function to obtain an optimized panoramic segmentation model.
9. The method of claim 1, wherein the backbone convolutional neural network is a hole convolutional structure or a coding-decoding structure.
CN201910544228.XA 2019-06-21 2019-06-21 Image panorama segmentation method based on multitask learning deep neural network Active CN110276765B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910544228.XA CN110276765B (en) 2019-06-21 2019-06-21 Image panorama segmentation method based on multitask learning deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910544228.XA CN110276765B (en) 2019-06-21 2019-06-21 Image panorama segmentation method based on multitask learning deep neural network

Publications (2)

Publication Number Publication Date
CN110276765A CN110276765A (en) 2019-09-24
CN110276765B true CN110276765B (en) 2021-04-23

Family

ID=67961578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910544228.XA Active CN110276765B (en) 2019-06-21 2019-06-21 Image panorama segmentation method based on multitask learning deep neural network

Country Status (1)

Country Link
CN (1) CN110276765B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199199B (en) * 2019-12-27 2023-05-05 同济大学 Action recognition method based on self-adaptive context area selection
CN111210443B (en) * 2020-01-03 2022-09-13 吉林大学 Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN111259900A (en) * 2020-01-13 2020-06-09 河海大学 Semantic segmentation method for satellite remote sensing image
CN111368845B (en) * 2020-03-16 2023-04-07 河南工业大学 Feature dictionary construction and image segmentation method based on deep learning
CN111768415A (en) * 2020-06-15 2020-10-13 哈尔滨工程大学 Image instance segmentation method without quantization pooling
CN111814593B (en) * 2020-06-19 2024-08-06 浙江大华技术股份有限公司 Traffic scene analysis method and equipment and storage medium
CN111915628B (en) * 2020-06-24 2023-11-24 浙江大学 Single-stage instance segmentation method based on prediction target dense boundary points
CN111985457A (en) * 2020-09-11 2020-11-24 北京百度网讯科技有限公司 Traffic facility damage identification method, device, equipment and storage medium
CN112053358B (en) * 2020-09-28 2024-09-13 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for determining instance category of pixel in image
US11602132B2 (en) 2020-10-06 2023-03-14 Sixgill, LLC System and method of counting livestock
CN112257649A (en) * 2020-11-03 2021-01-22 深圳创新奇智科技有限公司 Article identification method, model training method, device and electronic equipment
CN112489060B (en) * 2020-12-07 2022-05-10 北京医准智能科技有限公司 System and method for pneumonia focus segmentation
CN112489064B (en) * 2020-12-14 2022-03-25 桂林电子科技大学 Panorama segmentation method based on edge scaling correction
CN112766165B (en) * 2021-01-20 2022-03-22 燕山大学 Falling pre-judging method based on deep neural network and panoramic segmentation
CN112802039B (en) * 2021-01-26 2022-03-01 桂林电子科技大学 Panorama segmentation method based on global edge attention
US20220261593A1 (en) * 2021-02-16 2022-08-18 Nvidia Corporation Using neural networks to perform object detection, instance segmentation, and semantic correspondence from bounding box supervision
CN112819840B (en) * 2021-02-24 2022-08-02 北京航空航天大学 High-precision image instance segmentation method integrating deep learning and traditional processing
CN112950642A (en) * 2021-02-25 2021-06-11 中国工商银行股份有限公司 Point cloud instance segmentation model training method and device, electronic equipment and medium
US11816841B2 (en) 2021-03-17 2023-11-14 Huawei Technologies Co., Ltd. Method and system for graph-based panoptic segmentation
CN113052858B (en) * 2021-03-23 2023-02-14 电子科技大学 Panorama segmentation method based on semantic stream
CN113139549B (en) * 2021-03-25 2024-03-15 北京化工大学 Parameter self-adaptive panoramic segmentation method based on multitask learning
CN113096136A (en) * 2021-03-30 2021-07-09 电子科技大学 Panoramic segmentation method based on deep learning
CN113240723A (en) * 2021-05-18 2021-08-10 中德(珠海)人工智能研究院有限公司 Monocular depth estimation method and device and depth evaluation equipment
CN114758128B (en) * 2022-04-11 2024-04-16 西安交通大学 Scene panorama segmentation method and system based on controlled pixel embedding characterization explicit interaction
CN114838729A (en) * 2022-04-27 2022-08-02 中国建设银行股份有限公司 Path planning method, device and equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530305B (en) * 2016-09-23 2019-09-13 北京市商汤科技开发有限公司 Semantic segmentation model training and image partition method and device calculate equipment
CN108090911B (en) * 2018-01-08 2022-04-01 北京航空航天大学 Near-shore ship segmentation method for optical remote sensing image
CN108335305B (en) * 2018-02-09 2020-10-30 北京市商汤科技开发有限公司 Image segmentation method and apparatus, electronic device, program, and medium
CN109447169B (en) * 2018-11-02 2020-10-27 北京旷视科技有限公司 Image processing method, training method and device of model thereof and electronic system
CN109493330A (en) * 2018-11-06 2019-03-19 电子科技大学 A kind of nucleus example dividing method based on multi-task learning
CN109685060B (en) * 2018-11-09 2021-02-05 安徽科大讯飞医疗信息技术有限公司 Image processing method and device
CN109801307A (en) * 2018-12-17 2019-05-24 中国科学院深圳先进技术研究院 A kind of panorama dividing method, device and equipment
CN109801297B (en) * 2019-01-14 2020-12-11 浙江大学 Image panorama segmentation prediction optimization method based on convolution

Also Published As

Publication number Publication date
CN110276765A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN110276765B (en) Image panorama segmentation method based on multitask learning deep neural network
CN112884064B (en) Target detection and identification method based on neural network
CN111553387B (en) Personnel target detection method based on Yolov3
CN109902600B (en) Road area detection method
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN110263786B (en) Road multi-target identification system and method based on feature dimension fusion
CN113486726A (en) Rail transit obstacle detection method based on improved convolutional neural network
CN111104903B (en) Depth perception traffic scene multi-target detection method and system
US11640714B2 (en) Video panoptic segmentation
CN111914727B (en) Small target human body detection method based on balance sampling and nonlinear feature fusion
CN111696110B (en) Scene segmentation method and system
CN110705412A (en) Video target detection method based on motion history image
CN108764244B (en) Potential target area detection method based on convolutional neural network and conditional random field
CN112434723B (en) Day/night image classification and object detection method based on attention network
Munir et al. LDNet: End-to-end lane marking detection approach using a dynamic vision sensor
CN112241757A (en) Apparatus and method for operating a neural network
CN115984537A (en) Image processing method and device and related equipment
CN115019039A (en) Example segmentation method and system combining self-supervision and global information enhancement
CN114330234A (en) Layout structure analysis method and device, electronic equipment and storage medium
CN113807176A (en) Small sample video behavior identification method based on multi-knowledge fusion
CN117542082A (en) Pedestrian detection method based on YOLOv7
Saravanarajan et al. Improving semantic segmentation under hazy weather for autonomous vehicles using explainable artificial intelligence and adaptive dehazing approach
CN114332797A (en) Road scene semantic segmentation method and system with self-evaluation mechanism
CN117636032A (en) Multi-label image classification method based on multi-scale local features and difficult class mining
Alajlan et al. Automatic lane marking prediction using convolutional neural network and S-Shaped Binary Butterfly Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant