CN114549833A

CN114549833A - Instance partitioning method and device, electronic equipment and storage medium

Info

Publication number: CN114549833A
Application number: CN202210087999.2A
Authority: CN
Inventors: 侯亚丽; 杨玉源; 侯志江; 郝晓莉; 申艳; 陈后金
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-05-27

Abstract

The application provides an example segmentation method, an example segmentation device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a category response map, an example center offset field and an example boundary response map of an image to be segmented; establishing a prior template of a target according to the posture of the target in the image to be segmented; based on the class response graph, the example center offset field, the example boundary response graph and the prior template, respectively improving the class response graph and the example boundary response graph through template matching to obtain a corresponding template-based class response graph and a corresponding template-based boundary graph; and under the limitation of a template boundary graph, performing region expansion on the basis of a template category response graph to obtain an example segmentation result. The scheme can reduce under-segmentation and over-segmentation.

Description

Instance partitioning method and device, electronic equipment and storage medium

Technical Field

The invention belongs to the technical field of image segmentation, and particularly relates to an example segmentation method, an example segmentation device, electronic equipment and a storage medium.

Background

The existing example segmentation technology in the image segmentation field can be mainly divided into two types, one type is full-supervision example segmentation, and the other type is weak-supervision example segmentation. The first kind of fully supervised instance segmentation method, an instance segmentation technology based on a deep learning method, can automatically predict the class of each pixel in an image and an instance to which the pixel belongs. However, the main disadvantage of this method is that in the course of training the neural network, it is necessary to use accurate image pixel level class and instance labels, and such pixel level labels require a lot of manpower and time investment, and the time and economic cost are extremely high, which limits the scale of available data and hinders the further development of the fully supervised instance segmentation method.

In order to reduce the complexity of data annotation, weakly supervised instance segmentation methods have emerged in recent years. The requirement of the weak supervision example segmentation method on image annotation is greatly reduced, wherein the image data annotation cost required by the example segmentation method based on image category annotation is the lowest. The example segmentation method based on the image category labeling only needs image category information, and a detailed segmentation result about a target example can be further obtained through technologies such as a neural network. Taking the image category label with the lowest labeling cost at present as an example, the existing weak supervision example segmentation method based on the image category label mainly performs pixel-level example segmentation based on the sensed image information, and is easy to generate a segmentation result obviously not conforming to the prior cognition of human beings.

Disclosure of Invention

An object of the embodiments of the present specification is to provide an example division method, apparatus, electronic device, and storage medium.

In order to solve the above technical problem, the embodiments of the present application are implemented as follows:

in a first aspect, the present application provides an instance segmentation method, including:

acquiring a class response map, an example center offset field and an example boundary response map of an image to be segmented;

establishing a prior template of a target according to the posture of the target in the image to be segmented;

based on the class response graph, the example center offset field, the example boundary response graph and the prior template, respectively improving the class response graph and the example boundary response graph through template matching to obtain a corresponding template-based class response graph and a corresponding template-based boundary graph;

and under the limitation of a template boundary graph, performing region expansion on the basis of a template category response graph to obtain an example segmentation result.

In one embodiment, based on the class response map, the example center offset field, the example boundary response map and the prior template, respectively improving the class response map and the example boundary response map through template matching, so as to obtain a corresponding template-based class response map and a template-based boundary map, including:

determining the center position of the example according to the example center offset field;

determining a target candidate position according to the example center position;

zooming the prior templates according to a preset proportion to obtain a plurality of zooming templates, placing the zooming templates at target candidate positions, calculating template matching scores of all the zooming templates, and selecting the zooming template corresponding to the template matching score which is greater than a score threshold value and the target matching score which is the largest as a matching template;

and respectively improving the class response graph and the example boundary response graph according to the matching template to obtain a corresponding template-based class response graph and a template-based boundary graph.

In one embodiment, determining the instance center location based on the instance center offset field comprises:

estimating to obtain an example center position area according to an example center offset field;

determining a pixel set pointing to the center position area of the example as an example area;

the center of the instance area is determined as the instance center location.

In one embodiment, determining the target candidate location based on the instance center location comprises:

selecting an example central position corresponding to the area of the example region larger than or equal to the area threshold value as a target candidate position;

and/or the presence of a gas in the gas,

according to the example region and the category response graph, the number of pixels, with the probability of belonging to the specified category being greater than the preset probability, in the example region is selected to be greater than or equal to the preset percentage of the total area of the example region, and the corresponding example center position is the target candidate position.

In one embodiment, the template matching score comprises at least one of an edge direction matching score, an offset magnitude matching score, a template region matching score, a template boundary matching score;

the edge direction matching fraction is determined according to an edge contour pixel set in the scaling template, the offset field direction of each edge pixel in the scaling template relative to the center of the scaling template and the offset field direction of the center of the example;

determining an offset amplitude matching fraction according to an edge contour pixel set in a scaling template and the normalization length of an example central offset field vector;

determining a template area matching score according to a pixel set of a template foreground area covered by a scaling template and a pixel set of the foreground area determined by the vector length of an example center offset field;

the template boundary matching score is determined from the set of edge contour pixels in the scaled template and the chamfer distance of the scaled template from the example boundary response map.

In one embodiment, the method for improving the class response graph and the example boundary response graph respectively according to the matching template to obtain the corresponding template-based class response graph and the template-based boundary graph includes:

if the chamfer distance between the matching template and the example boundary response graph is greater than the chamfer distance threshold, the boundary of the matching template is reserved so as to improve the example boundary response graph and obtain a template-based boundary graph;

and determining the coverage area of the matched template coverage category response graph, and amplifying the response score of the category response graph in the coverage area according to a preset proportion so as to improve the category response graph and obtain the template-based category response graph.

In one embodiment, under the limitation of a template boundary map, performing region expansion based on a template category response map to obtain an example segmentation result, including:

determining a similarity matrix according to the example boundary response graph;

carrying out Hadamard product and matrix row normalization processing on the similarity matrix for a plurality of times to obtain a transfer matrix;

according to the boundary probability based on the template boundary graph, adjusting the response score based on the template category response graph to obtain an adjusted category response graph;

and multiplying the transfer matrix and the adjusted class response graph for a plurality of times to obtain an example segmentation result.

In a second aspect, the present application provides an instance splitting apparatus, comprising:

the acquisition module is used for acquiring a class response map, an example center offset field and an example boundary response map of an image to be segmented;

the template establishing module is used for establishing a prior template of the target according to the posture of the target in the image to be segmented;

the matching module is used for respectively improving the class response map and the example boundary response map through template matching based on the class response map, the example center offset field, the example boundary response map and the prior template to obtain the corresponding template-based class response map and the template-based boundary map;

and the expansion module is used for performing region expansion based on the template category response graph under the limitation based on the template boundary graph to obtain an example segmentation result.

In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the example segmentation method as in the first aspect when executing the program.

In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the example segmentation method as in the first aspect.

As can be seen from the technical solutions provided in the embodiments of the present specification, the solution: by template matching, the class response graph is improved, and the occurrence of under-segmentation is reduced; by template matching, the example boundary response graph is improved, and the occurrence of over-segmentation phenomenon is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic flow diagram of an example segmentation method provided herein;

fig. 2 is a schematic structural diagram of a backbone network of ResNet50 provided in the present application;

FIG. 3 is a schematic structural diagram of a DP-IRNet network provided by the present application;

fig. 4 is a schematic structural diagram of an EDGE-IRNet network provided in the present application;

FIG. 5 is a flowchart of the training process of the convolutional neural network-based image classification network, DP-IRNet network, EDGE-IRNet network provided herein;

FIG. 6 is a schematic flow chart of object matching provided herein;

FIG. 7 is a functional block diagram of an example segmentation method provided herein;

FIG. 8 is a graph comparing experimental results using the method of the present application with results of the original method;

FIG. 9 is a schematic diagram of an example partitioning device provided herein;

fig. 10 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments described herein without departing from the scope or spirit of the application. Other embodiments will be apparent to the skilled person from the description of the present application. The specification and examples are exemplary only.

As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including, but not limited to.

In the present application, "parts" are in parts by mass unless otherwise specified.

In the related art, a target instance segmentation method based on image category labeling is mainly obtained based on a category response graph. In the image classification network based on the convolutional neural network, a class response map of an object, namely a salient region of the corresponding class object in the image, can be obtained through a gradient back propagation technology and the like. In order to obtain example information in the image, a Peak Response Map (PRM) method guides the network to highlight a Peak point in the PRM by adding a Peak excitation strategy in the training process of the classification network, so that an approximate example target position is obtained. And then sorting and screening the segmentation results from the other object suggestion generation methods according to the peak position and the peak response graph to obtain the final example segmentation result. Subsequent researchers combine the PRM method with the fully supervised instance segmentation method, and the segmentation result of the PRM is used as a pseudo label for training the fully supervised instance segmentation network. In order to solve the problem that a target region extracted by a peak response map is incomplete, subsequent researches use a segmentation result generated by a generation method of other object suggestions as a pseudo label to enable a network to learn the capability of filling the peak response map on the basis of the peak response map extracted by a PRM, wherein the filled peak response map is called an example activation map and is more complete on a coverage area compared with the PRM. The existing PRM weak supervision example segmentation method not only uses image category information, but also needs other object suggestion generation methods, and actually uses additional annotation information except the image category. An Inter-pixel relationship Network (IRNet) method establishes an independent neural Network to mine instance information and semantic similarity among pixels in a class response graph, so that a central offset vector and a boundary graph among classes of each pixel relative to an instance in which the pixel is located are obtained. Further, a random walk algorithm and a conditional random field are adopted to perfect each example area in the class response diagram.

The class labeling instance segmentation method based on the IRNet architecture is divided into three parts, wherein the first part is an image classification network based on a convolutional neural network and used for generating a class response graph. In the whole process, the category response graph is used as a seed area in the semantic propagation process and provides supervision information for the IRNet network. In the second part, the IRNet network mines class similarity between neighboring pixels from the class response map, thus guiding the convolutional neural network's ability to learn to predict class boundary maps and instance center offset fields. The offset field indicates the location of the center of the potential instance in the image. And in the third part, under the limitation of the class boundary graph, the example level class response graph carries out region expansion to obtain a final example segmentation result. The class boundary graph plays a role in limiting the excessive expansion of the region in the region expansion perfecting process of each example.

The method is improved, and the offset field result is reused to further obtain an example boundary diagram, specifically: the first partial image classification network produces a class response map. The second part, the original IRNet, splits into two separate networks DP-IRNet and EDGE-IRNet. The DP-IRNet acquires an offset field based on the class response map. The offset field contains information that is the offset vector of each pixel location to its corresponding target center location. In addition to the class response graph, EDGE-IRNet introduces offset field information into the training process to obtain an instance level boundary graph. The example level boundary map represents the probability that each pixel location in the image belongs to an example boundary. And finally, under the limitation of the example boundary graph, performing region expansion on the example level response graph to obtain a final example segmentation result.

In summary, the existing PRM class weak supervised instance segmentation method not only uses image class information, but also needs other target suggestion methods. Additional annotation information beyond the image category is actually used. The class labeling example segmentation method based on the IRNet framework only utilizes image class labeling to realize example segmentation of images, but the prior art still has the following defects:

(1) since only the perceived image information is relied on to obtain the segmentation boundary between the categories or the instances, the situation that the instance boundary is partially missing often occurs in the complex situations of similar preceding backgrounds and the like, and finally the unreasonable over-segmentation problem is caused.

(2) Because the class response graph obtained based on the convolutional neural network classifier usually only focuses on local significant regions of example individuals, such as individual components and the like, after the incomplete class response graph is subjected to semantic propagation expansion, the phenomenon of example under-segmentation still easily occurs.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Referring to fig. 1, a flowchart of an example segmentation method applicable to the embodiments of the present application is shown.

As shown in fig. 1, an example segmentation method may include:

and S110, acquiring a class response map, an example center offset field and an example boundary response map of the image to be segmented.

Wherein, under the condition of only monitoring image category information, the extracted category response map (or simply the category response map) can give the rough area of each category. The class response map may be obtained by using an image classification network based on a convolutional neural network, for example, the image classification network is obtained by using CAM, GradCAM, and the like, which is not limited herein. In this embodiment, an example in which a category response map is obtained by using an image classification network based on a convolutional neural network is described.

Specifically, the image classification network based on the convolutional neural network may adopt a ResNet50 classification network, or a ResNet50 network or a ResNet50 network for short, and may also adopt a ResNet series, an inclusion series, an Xception series, a MobileNet series, and other classification networks, which are not limited herein. In this embodiment, a ResNet50 classification network is taken as an example for explanation, and a backbone network structure of ResNet50 is shown in fig. 2. During the use process, the step size of the last down-sampling layer of the ResNet50 network is set to 1, and the resolution of the class response graph is prevented from being further reduced. In order to extract the class response graph, the last classification layer and the maximum pooling layer of the trained classification network are removed, and the feature graph output by the last convolution layer is obtained. And multiplying the vector formed by the corresponding classification weight of each class by the feature map, and normalizing to obtain a class response map. The value of each pixel in the class response map represents the probability that the pixel belongs to a class.

Wherein, the example center offset field, abbreviated offset field, contains information that is an offset vector of each pixel position with respect to its corresponding target center position. The offset Field can be obtained by adopting an offset Field pixel Relation Network (DP-IRNet, display Field Inter-pixel relationship Network) or DP-IRNet for short based on a class response map.

Specifically, the DP-IRNet network may also use the ResNet50 network as a backbone network, and the network structure of the DP-IRNet network is shown in fig. 3. And respectively carrying out 1 × 1 convolution operation on feature maps of 5 different stages of the ResNet50 network, then fusing the features of each stage through a series of operations such as convolution, up-sampling, feature cascade and the like, and finally outputting a predicted example center offset field.

When the DP-IRNet network is trained to obtain an example center offset field, network training data come from neighborhood point pairs in a class response graph. Points within a neighborhood that belong to the same category are considered to be from the same instance, and points within a neighborhood that belong to different categories are considered to be from different instances. And defining a loss function of network training according to a vector formed by the positions of two pixels, wherein the difference vector between the offset vectors of two pixel points pointing to the center of the instance in the same instance is equal to the vector formed by the positions of the two pixels.

The example boundary response graph or the example boundary graph for short can be obtained by using EDGE Inter-pixel relationship Network (EDGE-IRNet) or EDGE-IRNet for short and by using semantic similarity of neighboring domain point pairs in the offset field and the class response graph.

Specifically, the EDGE-IRNet network or EDGE-IRNet for short can also use the ResNet50 network as a backbone network, and the network structure of the EDGE-IRNet network is shown in fig. 4. The EDGE-IRNet network utilizes characteristic diagrams of 5 different stages of the ResNet50 network, the first-stage characteristic diagram and the second-stage characteristic diagram are subjected to 1 x 1 convolution and then are subjected to cascade operation with upsampling results of the third-stage characteristic diagram to the fifth-stage characteristic diagram after being subjected to 1 x 1 convolution, and finally, a predicted example boundary diagram is output through 1 x 1 convolution operation.

When the EDGE-IRNet network is trained to obtain the example boundary graph, the data trained by the network comes from the neighborhood point pairs in the class response graph and the example center offset field. Within a neighborhood, point pairs belonging to the same category, the same instance are considered from the same instance, and point pairs within a neighborhood belonging to different categories or different instances are considered from different instances. The loss function of the network training is characterized in that semantic similarity of two pixel point pairs in the same instance is high, no instance boundary exists between the two pixel point pairs, semantic similarity of the two pixel point pairs from different instances is low, and instance boundary definition exists between the two pixel point pairs.

It can be understood that before the image classification network, the DP-IRNet network, and the EDGE-IRNet network based on the convolutional neural network are adopted to obtain the class response map, the offset field, and the example boundary map, the image classification network, the DP-IRNet network, and the EDGE-IRNet network based on the convolutional neural network need to be trained, as shown in fig. 5, the image classification network based on the convolutional neural network is trained, the class response map is obtained through the trained image classification network based on the convolutional neural network, the DP-IRNet is trained according to the class response map, the offset field is obtained through the trained DP-IRNet, and the EDGE-IRNet is trained according to the class response map and the offset field.

And S120, establishing a prior template of the target according to the posture of the target in the image to be segmented.

Wherein, the target in the image to be segmented can be called as an object. The object may be a human body, a vehicle, etc. according to different images to be segmented.

Illustratively, taking the human body in the Pascal VOC2012 data set as an example, we constructed a total of 27 human body templates. The human body is roughly divided into three regions, namely a head-shoulder region, a trunk region and a leg region. The head-shoulder region consists of 5 templates, consisting of 5 heads and shoulders of different inclination angles. The trunk area is divided into two conditions of front standing and side standing, and the trunks in the two states are respectively combined with 5 head and shoulder templates to form 10 templates of the upper half body. The leg area is divided into two conditions of front standing and side standing, and is combined with corresponding 10 upper body templates to form 10 complete human body templates. In addition, in order to deal with the situation that only the head part is exposed in the data set, 2 head templates with larger inclination angles are additionally arranged.

It should be noted that the present application does not limit the execution sequence of S110 and S120, and S110 and S120 may be executed first, or S120 and S110 may be executed first, or S110 and S120 may be executed simultaneously, which is not limited herein.

S130, based on the class response graph, the example center offset field, the example boundary response graph and the prior template, respectively improving the class response graph and the example boundary response graph through template matching, and obtaining the corresponding template-based class response graph and the corresponding template-based boundary graph.

Specifically, the step S120 is to introduce a prior model of the object in a template matching manner based on the class response map, the example center offset field, and the example boundary response map, and complete the class response map and the example boundary map by using the prior knowledge of human cognition on the object.

Specifically, the template matching process is shown in fig. 6, and includes: example center positioning, example center screening, template matching evaluation, and template information fusion, which correspond to S1301-S1304, respectively.

S1301, determining an instance center position according to the instance center offset field, which may include:

the center of the instance area is determined as the instance center location.

Specifically, an example center offset field is obtained through the DP-IRNet network, and the magnitude of an example center position offset vector in the example center offset field is small, so that an approximate region where the example center position of the target is located (that is, an example center position region is estimated) can be obtained by counting the region with the small offset vector length in the offset field. The estimated approximate areas where all the example center positions are located can be regarded as a plurality of candidate target centers, each pixel in the image to be segmented is divided according to the pointing position of the refined offset field, namely, a pixel set pointing to a certain candidate target center is divided into a certain example area, or the example area is called as a candidate area. These example regions divide an image into portions, called example graphs, each corresponding to a candidate object, whose center will be identified as the example center position or as the candidate position or object center position.

S1301, example center positioning is carried out through example center information obtained through the offset field.

And S1302, determining a target candidate position according to the example center position.

Specifically, in this step, the example center positions obtained in S1301 may be further screened according to the example types of the example areas and the example center positions.

In one embodiment, S1302 may include: and selecting the example center position corresponding to the area of the example region larger than or equal to the area threshold value as a target candidate position, namely excluding the candidate region with the small size.

The area threshold may be set according to actual requirements, and the area threshold are calculated by the number of pixels, for example, the area threshold is set to be 1000 pixels.

Illustratively, taking a human target in the Pascal VOC2012 data set as an example, when the area of the example region where the candidate position is located is less than 1000 pixels, the corresponding example center position is excluded, that is, the example center position corresponding to the example region area greater than or equal to 1000 pixels is reserved as the target candidate position.

In one embodiment, S1302 may further include: according to the example region and the category response graph, the number of pixels, with the probability of belonging to the specified category being greater than the preset probability, in the example region is selected to be greater than or equal to the preset percentage of the total area of the example region, and the corresponding example center position is the target candidate position.

The category probability threshold may be set according to an actual requirement, for example, may be set to 25%; the preset percentage may be set according to actual requirements, and may be set to 20% for example.

It can be understood that the category to which the target belongs can be obtained according to the category response graph. By combining the example region and the category response map, the category of similar pixels in the example region can be determined from the category response map.

Illustratively, taking a human target in the Pascal VOC2012 data set as an example, when the number of pixels in the example region where the candidate position belongs to the specified category with a probability greater than 25% cannot reach 20% of the total area of the example region where the candidate position belongs, the candidate position is excluded, that is, the example center position corresponding to the number of pixels in the example region with a probability greater than 25% of the specified category reaching 20% of the total area of the example region is reserved as the target candidate position.

In the S1302 example center screening, the two modes utilize information such as a category response graph and the like to screen the target candidate position, and the target positioning result which is not in line with the prior is excluded, the two modes can be selected at will for screening, the two modes can also be simultaneously adopted for screening, and when the two modes are adopted for screening, the execution sequence between the two modes is not limited.

S1303, zooming the prior templates according to a preset proportion to obtain a plurality of zooming templates, placing the zooming templates at target candidate positions, calculating template matching scores of all the zooming templates, and selecting the zooming template corresponding to the template matching score which is greater than a score threshold and the target matching score which is the largest as the matching template.

Specifically, the preset scaling proportion of the prior template is a proportion set according to actual requirements, and the resolution of the prior template is multiplied or reduced according to the scaling proportion to obtain a plurality of scaling templates. And placing the scaling template at the target candidate position, and executing a template matching process. And according to the obtained template matching score, selecting a proper example template and the corresponding template size as a matching template, and guiding the template information fusion of the step S1304.

The score threshold may be set according to actual requirements.

In the scheme, the template prior information is expressed from three aspects, namely the boundary of the template, the shape of the template region and the direction pointing direction of the template, and can be expressed as

According to the three prior characteristics of the template, the template is naturally combined with information such as a boundary graph, an example graph, an offset field and the like in the IRNet overall flow to obtain the evaluation of the template matching degree.

In one embodiment, the template matching score comprises at least one of an edge direction matching score, an offset magnitude matching score, a template region matching score, a template boundary matching score.

Wherein the edge direction matches a score S_{dp_ori}For evaluating edge direction matching, this term evaluates the degree of matching used to compare the pointing direction of the scaled template edge pixels with the pointing direction of the pixels in the offset field predicted by the IRNet. The edge direction matching score is determined according to the edge contour pixel set in the scaling template, the offset field direction of each edge pixel in the scaling template relative to the center of the scaling template and the offset field direction of the example center. Specifically, the edge contour pixel set in the scaling template is recorded as Ω_{tem_b}，α_TEMRepresenting the direction of the offset field, alpha, of the edge pixels in the scaled template with respect to the center of the scaled template_DPRepresenting the predicted example center offset field direction, the edge direction matches a fraction S_{dp_ori}Can be obtained from formula 1. Omega_{tem_b}，α_TEM，α_DPAre both two-dimensional matrices. Omega_{tem_b}The edge contour pixel is 1, and the rest is 0. x, y represent offset positions relative to the center of the scaled template.

Wherein the offset amplitude matches a fraction S_{dp_mag}For evaluating offset magnitude matching, which encourages scaling of edge pixels Ω of the template_{tem_b}The length of the offset vector of the covered offset field area is as large as possible, thereby encouraging the scaling template to cover the target area as completely as possible. The offset magnitude matching score is determined from the edge contour pixel set and the normalized length of the example center offset field vector in the scaling template. In particular, in gamma_DPThe normalized length of the offset field vector representing the prediction output, then the offset magnitude matches the fraction S_{dp_mag}As shown in equation 2.

Wherein the template region matches a score S_{dp_region}And the method is used for evaluating template area matching, and the item evaluates the matching degree of the scaled template area and the offset field coverage area by calculating the intersection ratio of the scaled template area and the predicted foreground area. The larger the intersection ratio is, the closer the area covered by the scaling template is to the predicted target foreground area is. The template region matching score is determined according to the pixel set of the template foreground region covered by the scaling template and the pixel set of the foreground region determined by the vector length of the example center offset field. In particular, omega_temRepresents the pixel set, omega, of the foreground region of the target covered by the scaling template_fgRepresenting a pixel set of a foreground region determined by the vector length of the offset field of the corresponding instance position predicted by IRNet, and if the offset vector length of the position of a certain pixel is not less than 10% of the longest offset vector length, the pixel is considered to belong to omega_fg. The template region matching score S_{dp_region}The calculation is as in equation 3. Omega_temThe foreground pixel is 1, and the rest is 0.

Wherein the template boundary matches a score S_boundaryThe method is used for evaluating template boundary matching, and the matching degree of the boundary of the scaled template and the boundary of the target area to be matched is evaluated by calculating the chamfer distance of the scaled template boundary and the predicted example boundary graph of the EDGE-IRNet network. A smaller average chamfer distance indicates a higher degree of matching. The template boundary matching score is determined from the set of edge contour pixels in the scaled template and the chamfer distance of the scaled template from the example boundary response map. In particular, to

Representing the chamfer distance of the template and the example boundary graph, the template boundary matching score S_boundaryThe calculation of (d) is shown in equation 4.

Template matching score S_matchThe matching score may be any one of an edge direction matching score, an offset magnitude matching score, a template region matching score, a template boundary matching score, or a sum of two or more thereof. When the template matches score S_matchWhen the sum of the four is as follows:

S_match＝S_{dp_ori}+S_{dp_mag}+S_{dp_region}+10×S_boundary(formula 5)

It is understood that the template matching score may also be determined by other edge matching methods, etc.

And if the template matching score is higher than the scaling template corresponding to the score threshold, performing the next step of information fusion of the templates 1304.

And S1304, respectively improving the class response graph and the example boundary response graph according to the matched template to obtain a corresponding template-based class response graph and a corresponding template-based boundary graph.

Specifically, after the matched template meeting the requirement is obtained in S1303, in S1304, the template prior information is fused into the weak supervision instance segmentation process from two aspects.

In a first aspect, a matching template is used to refine an EDGE-IRNet derived example boundary response graph, comprising: if the chamfer distance between the matching template and the example boundary response graph is larger than the chamfer distance threshold, the boundary of the matching template is reserved so as to improve the example boundary response graph and obtain a template-based boundary graph.

Wherein, the chamfer distance threshold value can be set according to actual requirements. And when the chamfer distance between the matching template and the example boundary response graph is greater than the chamfer distance threshold value, adding the boundary of the matching template to the example boundary response graph, thereby perfecting the example boundary response graph.

It will be appreciated that different strategies, such as fully preserving template edges, partially preserving template edges, etc., may be used when refining the example boundary response graph using the matching template, and are not limited herein.

In a second aspect, matching templates are used to refine the category response graph. And the category response map usually only focuses on local salient regions, determines the coverage region of the matched template coverage category response map, and amplifies the response score of the category response map in the coverage region according to a preset proportion so as to improve the category response map and obtain the template-based category response map.

Wherein, the preset proportion can be set according to actual requirements.

The matching template perfects the class response graph, and can improve the under-segmentation problem.

It will be appreciated that different strategies may be used to expand the category response graph using matching templates, such as by region growing, increasing response scores between template coverage regions, and the like, without limitation.

S140, under the limitation of the template boundary map, performing region expansion based on the template category response map to obtain an example segmentation result, which may include:

In S140, a random walk algorithm is used to propagate the category response score or response score for short or response score of the category response graph to a reasonable relevant region.

Specifically, through an example boundary graph obtained by EDGE-IRNet, a semantic similarity matrix between pixels can be obtained, which is referred to as a similarity matrix for short, each element in the similarity matrix represents the correlation between two certain pixels, and the higher the element score is, the higher the correlation is, the higher the probability of the pixel is, the attention score is propagated. In order to smooth the numerical values in the similarity matrix, the Hadamard product is carried out for a plurality of times, and then row normalization processing of the matrix is carried out to obtain a transfer matrix. And multiplying the transfer matrix and the template-based class response diagram obtained in the step S1304 for several times to obtain a region growing result. Before multiplication with the transition matrix, the template-based class response map is adjusted by using the predicted boundary probability based on the template boundary map, specifically, the response score based on the template class response map is adjusted, the higher the boundary probability is, the lower the response score is, and exemplarily, the adjusted response score is (1-boundary probability) × the response score. The last 25% of the resulting response score was considered background, and the other positions yielded the final example segmentation results.

Referring to FIG. 7, a functional block diagram of an example segmentation method provided herein is shown. As shown in fig. 7, the overall scheme of the example segmentation method proposed by the present application includes four major parts. First, a class response map is obtained with an image classification network based on a convolutional neural network. In the case of image class information only supervision, the class response graph extracted from the convolutional classification network may give an approximate area for each class. The second part, DP-IRNet, acquires the offset field based on the class response map. The offset field contains information that is the offset vector of each pixel location to its corresponding target center location. The EDGE-IRNet obtains an example boundary graph by utilizing the semantic similarity of the offset field information and the neighboring domain point pairs in the class response graph. And a third part, introducing a template matching process before the region expansion step, adjusting and supplementing the class response graph through the explicit introduction of the prior knowledge of the object, and perfecting the example boundary response graph. And finally, under the limitation of the example boundary graph, performing region expansion on the example level response graph to obtain a final example segmentation result.

According to the embodiment of the application, on the basis of a boundary graph and an example center offset field generated based on a neural network, a priori knowledge about the shape of an object is explicitly introduced through an object shape template. The position and the posture of the object instance are basically determined through a template matching technology, so that the missing instance boundary of the part is supplemented by the prior cognition of the object shape. The more complete example boundary enables the segmentation result to accord with prior cognition, and the unreasonable excessive segmentation phenomenon is avoided.

According to the embodiment of the application, the object shape prior template is introduced through the template matching technology, the class response score of the region in the matched object template is enlarged, the whole of the example is perfected to a certain extent, and therefore the under-segmentation phenomenon based on region expansion is reduced.

Experimental verification

Through the segmentation experiment of the human body target in the Pascal VOC2012 data set, the segmentation marking quality generated by adopting the scheme of the invention is further improved compared with the original scheme. The results of quantitative experiments are shown in Table 1, with the correct example division for a cross-over ratio of 50% or more.

TABLE 1 example segmentation accuracy comparison on Pascal VOC2012 dataset

An example of a part of the experimental results is shown in fig. 8 (where fig. 8 shows the original color as a color, and a gradation process is performed here). Compared with the result of the original method, the experimental result of the segmentation method of the embodiment is closer to the real mark, the problems of under-segmentation and over-segmentation of the target can be effectively solved, and the segmentation effect is improved.

Referring to fig. 9, a schematic structural diagram of an example singulation apparatus is shown according to one embodiment of the present application.

As shown in fig. 9, the example segmenting device 900 may include:

an obtaining module 910, configured to obtain a category response map, an example center offset field, and an example boundary response map of an image to be segmented;

the template establishing module 920 is configured to establish a prior template of the target according to the posture of the target in the image to be segmented;

a matching module 930, configured to respectively improve the class response map and the example boundary response map through template matching based on the class response map, the example center offset field, the example boundary response map, and the prior template, so as to obtain a corresponding template-based class response map and a template-based boundary map;

and an expansion module 940, configured to perform region expansion based on the template category response map under the limitation based on the template boundary map, to obtain an example segmentation result.

Optionally, the matching module 930 is further configured to:

the center of the instance area is determined as the instance center location.

Optionally, the matching module 930 is further configured to:

and/or the presence of a gas in the gas,

Optionally, the matching module 930 is further configured to: the template matching score comprises at least one of an edge direction matching score, an offset amplitude matching score, a template area matching score and a template boundary matching score;

Optionally, the matching module 930 is further configured to:

Optionally, the extension module 940 is further configured to:

The embodiment of the method may be implemented by the example segmenting device provided in this embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 10, a schematic structural diagram of an electronic device 300 suitable for implementing an embodiment of the present application is shown.

As shown in fig. 10, the electronic apparatus 300 includes a Central Processing Unit (CPU)301 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the apparatus 300 are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

The following components are connected to the I/O interface 305: an input portion 306 including a keyboard, a mouse, and the like; an output section 307 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 308 including a hard disk and the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. A drive 310 is also connected to the I/O interface 306 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 310 as necessary, so that a computer program read out therefrom is mounted into the storage section 308 as necessary.

In particular, the process described above with reference to fig. 1 may be implemented as a computer software program, according to an embodiment of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the example segmentation method described above. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 309, and/or installed from the removable medium 311.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor. The names of these units or modules do not in some cases constitute a limitation of the unit or module itself.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a mobile phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

As another aspect, the present application also provides a storage medium, which may be the storage medium contained in the foregoing device in the above embodiment; or may be a storage medium that exists separately and is not assembled into the device. The storage medium stores one or more programs for use by one or more processors in performing the example segmentation methods described herein.

Storage media, including permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. An instance splitting method, the method comprising:

establishing a prior template of the target according to the posture of the target in the image to be segmented;

and under the limitation of the template-based boundary graph, performing region expansion on the template-based category response graph to obtain an example segmentation result.

2. The method of claim 1, wherein the refining the class response map and the example boundary response map respectively by template matching based on the class response map, the example center offset field, the example boundary response map, and the prior template to obtain a corresponding template-based class response map and a template-based boundary map comprises:

determining an example center position according to the example center offset field;

zooming the prior templates according to a preset proportion to obtain a plurality of zooming templates, placing the zooming templates at the target candidate positions, calculating template matching scores of all the zooming templates, and selecting the zooming template corresponding to the template matching score which is greater than a score threshold value and the target matching score which is the largest as a matching template;

and respectively improving the class response graph and the example boundary response graph according to the matching template to obtain a corresponding template-based class response graph and a corresponding template-based boundary graph.

3. The method of claim 2, wherein determining an instance center position from the instance center offset field comprises:

estimating to obtain an example center position area according to the example center offset field;

determining a pixel set pointing to the example central position area as an example area;

determining a center of the instance region as the instance center location.

4. The method of claim 3, wherein determining a target candidate location based on the instance center location comprises:

selecting the example central position corresponding to the area of the example region larger than or equal to the area threshold as the target candidate position;

and/or the presence of a gas in the gas,

according to the example region and the category response graph, selecting a preset percentage that the number of pixels, which belong to the example region and have a probability that the probability of belonging to the specified category is greater than or equal to a preset probability, of the total area of the example region, and setting the corresponding example center position as the target candidate position.

5. The method of claim 2, wherein the template matching score comprises at least one of an edge direction matching score, an offset magnitude matching score, a template region matching score, a template boundary matching score;

the edge direction matching score is determined according to an edge contour pixel set in the scaling template, the offset field direction of each edge pixel in the scaling template relative to the center of the scaling template and the offset field direction of the example center;

the offset amplitude matching score is determined according to an edge contour pixel set in the scaling template and the normalized length of the example central offset field vector;

the template area matching score is determined according to a pixel set of a template foreground area covered by the scaling template and a pixel set of a foreground area determined by the vector length of the example center offset field;

the template boundary matching score is determined from a set of edge contour pixels in the scaled template and a chamfer distance of the scaled template from the example boundary response map.

6. The method of claim 2, wherein refining the class response graph and the instance boundary response graph respectively according to the matching template to obtain a corresponding template-based class response graph and a corresponding template-based boundary graph comprises:

if the chamfer distance between the matching template and the example boundary response graph is greater than a chamfer distance threshold, retaining the boundary of the matching template to improve the example boundary response graph to obtain the template-based boundary graph;

determining a coverage area of the matching template covering the category response map, and amplifying the response score of the category response map in the coverage area according to a preset proportion to improve the category response map to obtain the template-based category response map.

7. The method according to claim 1, wherein under the limitation of the template-based boundary map, the region expansion is performed on the template-based class response map to obtain an instance segmentation result, and the method comprises:

and multiplying the transfer matrix and the adjusted class response graph for a plurality of times to obtain the example segmentation result.

8. An instance splitting apparatus, comprising:

a matching module, configured to respectively improve the class response map and the example boundary response map by template matching based on the class response map, the example center offset field, the example boundary response map, and the prior template, so as to obtain a corresponding template-based class response map and a template-based boundary map;

and the expansion module is used for performing region expansion on the basis of the template category response graph under the limitation of the template boundary graph to obtain an example segmentation result.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the instance splitting method as claimed in any one of claims 1 to 7 when executing the program.

10. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the example segmentation method as claimed in any one of claims 1 to 7.