CN113988179A

CN113988179A - Target segmentation method, system and equipment based on improved attention and loss function

Info

Publication number: CN113988179A
Application number: CN202111259594.4A
Authority: CN
Inventors: 王坤峰; 徐鹏斌; 张衡; 李大字; 楚纪正
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2022-01-28

Abstract

The invention belongs to the field of computer vision and image processing, and particularly relates to a target segmentation method, a target segmentation system and target segmentation equipment based on improved attention and loss functions, aiming at solving the problems of low detection precision and inaccurate mask representation after segmentation caused by complex picture spatial layout in the conventional example segmentation technology. The invention comprises the following steps: extracting the features of the image to be segmented, and extracting global information and enhancing an attention mechanism of the extracted two-dimensional image features; inputting the enhanced features into a feature processing network to obtain multi-scale features; constructing a target segmentation model based on an example segmentation network, and performing model training through a target segmentation loss function constructed based on regional focus loss; and carrying out target segmentation through the trained model to obtain a target segmentation result of the image to be segmented. The method has the advantages of good segmentation effect, high small target detection precision, difficulty in missing detection and accurate mask after segmentation.

Description

Target segmentation method, system and equipment based on improved attention and loss function

Technical Field

The invention belongs to the field of computer vision and image processing, and particularly relates to a target segmentation method, a system and equipment based on improved attention and loss functions.

Background

The purpose of object detection is to detect each object in the image and identify their class. The purpose of semantic segmentation is to perform pixel-level segmentation on the input image while assigning semantic classes to each object in the image. Instance segmentation is a combination of the two, with the goal of predicting class labels and pixel-level instance masks to locate the different numbers of instances present in the image. This task has a wide range of benefits for autonomous driving of cars, robots, video surveillance, etc.

Example segmentation has advanced greatly in the visual field over the past few years with deep convolutional neural networks. Currently, example segmentation methods are generally classified into two categories, one is a two-stage method with region suggestion, such as Mask R-CNN, the first stage proposes a set of regions of interest (RoIs), and the second stage predicts an example Mask from features extracted using roiign. Since the input image is processed in two stages, the two-stage model has higher accuracy than a single-stage model. Another type is a single-phase instance partitioning method without regional proposal, such as yolcat, the authors design two branch networks, and do the following in parallel: (1) the method comprises the steps that a pre-measuring head branches to generate class confidence coefficients of all candidate frames, the positions of anchor frames and coefficient of prototype masks; (2) the prototype branching network generates k prototype masks for each picture, the number of prototype masks and coefficients being equal. Since there is no area proposal step and the case position and shape are predicted at the same time, the speed is faster, yolcat becomes the first algorithm to achieve real-time segmentation speed on the COCO dataset. To achieve higher performance and accuracy, feature processing is introduced to extract multi-scale features within the network, where a top-down cross-connect path is added to propagate semantically strong features.

Some data sets released at present provide a larger space for the improvement of the algorithm, such as a COCO data set, which is composed of 20 ten thousand pictures, each picture has more examples of complex spatial layout, and the MVD and cityscaps data sets also provide a large number of street views of traffic participants in each picture.

Although the yolcat method has been very successful, the following problems remain. Firstly, each picture in the data set has complex spatial layout and low spatial positioning precision, and small targets in the images are easy to miss detection. Second, the detection accuracy of small targets is low. Third, the segmented mask is not accurate.

Disclosure of Invention

In order to alleviate the above problems in the prior art, namely the problems of low detection precision and inaccurate mask representation after segmentation caused by complex picture spatial layout in the prior example segmentation technology, the present invention provides an object segmentation method based on improved attention and loss functions, the object segmentation method comprising:

step S10, extracting the features of the image to be segmented, and extracting the global information and enhancing the attention mechanism of the extracted two-dimensional image features to obtain the enhanced features of the image to be segmented;

step S20, inputting the enhanced features of the image to be segmented into a feature processing network to obtain the multi-scale features of the image to be segmented;

step S30, constructing a target segmentation model based on the example segmentation network, and performing model training through a target segmentation loss function constructed based on the regional focus loss;

step S40, based on the multi-scale features of the image to be segmented, obtaining the category confidence and the position of each candidate frame and k set image masks and k set prototype masks through a trained target segmentation model;

step S50, screening the candidate frame through NMS screening algorithm, and performing matrix multiplication of k image masks and k prototype masks respectively to obtain a target boundary frame and a target object mask of the image to be segmented;

and step S60, performing binarization processing on the mask of the target object by using a set threshold value, and clearing the mask outside the target boundary frame of the image to be segmented to obtain the target segmentation result of the image to be segmented.

In some preferred embodiments, step S10 includes:

step S11, extracting the characteristics of the image to be segmented through a characteristic extraction network to obtain the two-dimensional image characteristics of the image to be segmented;

step S12, extracting global information of the two-dimensional image features of the image to be segmented through global average pooling to obtain one-dimensional feature codes of the image to be segmented;

and step S13, performing iterative attention mechanism enhancement for a set number of times on the one-dimensional feature code of the image to be segmented to obtain the enhancement feature of the image to be segmented.

In some preferred embodiments, the one-dimensional feature of the image to be segmented is encoded, which is represented as:

wherein v (h) represents the one-dimensional feature code of the image to be segmented, W represents the width of v (h), h represents the two-dimensional image feature of the image to be segmented, and f () represents the global average pooling operation.

In some preferred embodiments, step S13 includes:

step S131, performing feature grouping of one-dimensional feature codes of the image to be segmented through group convolution operation to obtain m feature groups, wherein each feature group comprises n subgroups;

step S132, uniformly mixing different groups of subgroups, and performing group convolution operation and attention mechanism enhancement on the mixed first features to obtain second features;

and S133, fusing the first characteristic and the second characteristic through a Sigmoid function to obtain an enhanced characteristic of the image to be segmented.

In some preferred embodiments, the enhanced features of the image to be segmented are expressed as:

F′(h)＝δ(M_s(GC(V(h))))

wherein F' (h) represents the enhancement feature of the image to be segmented, GC () represents the group convolution operation, M_s() Represents an iterative attention mechanism enhancement operation, and δ () represents a Sigmoid function.

In some preferred embodiments, the target segmentation loss function is expressed as:

L_total＝λ₁L_cls+λ₂L_box+λ₃L_mask+λ₄L_af

wherein L is_totalRepresenting the target segmentation loss function, L_clsRepresenting a loss of classification, L_boxRepresents a bounding box penalty, L_maskRepresents a mask penalty, L_afRepresents the regional focus loss, λ₁、λ₂、λ₃、λ₄Balance parameters respectively representing classification loss, bounding box loss, mask loss and area local-loss in the target segmentation loss function.

In some preferred embodiments, the regional focus loss L_afIt is expressed as:

L_af＝-(1.125-p_t)^γlog(p_t)

wherein p is_tRepresents the ratio of bounding box to prototype region, and gamma represents the parameter that adjusts the rate of weight reduction in model training.

In another aspect of the present invention, an object segmentation module based on improved attention and loss functions is proposed, the object segmentation module comprising:

the feature extraction and enhancement module is configured to extract features of the image to be segmented, extract global information and enhance attention mechanism of the extracted two-dimensional image features, and obtain enhanced features of the image to be segmented;

the characteristic processing module is configured to input the enhanced characteristics of the image to be segmented into a characteristic processing network to obtain the multi-scale characteristics of the image to be segmented;

the model building and training module is configured to build a target segmentation model for the example segmentation network and perform model training through a target segmentation loss function built based on the regional focus loss;

the target segmentation module is configured to obtain the category confidence and the position of each candidate frame and set k image masks and k prototype masks through a trained target segmentation model based on the multi-scale features of the image to be segmented;

the target screening module is configured to screen the candidate frame through an NMS screening algorithm and perform matrix multiplication of k image masks and k prototype masks respectively to obtain a target boundary frame and a target object mask of the image to be segmented;

the threshold segmentation module is configured to perform binarization processing on a target object mask by using a set threshold, clear the mask outside a target boundary frame of the image to be segmented and obtain a target segmentation result of the image to be segmented;

and the segmentation result output module is configured to output a target segmentation result of the image to be segmented.

In a third aspect of the present invention, an electronic device is provided, including:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,

the memory stores instructions executable by the processor for execution by the processor to implement the above-described target segmentation method based on an improved attention and loss function.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for execution by the computer to implement the above-mentioned target segmentation method based on improved attention and loss functions.

The invention has the beneficial effects that:

(1) according to the target segmentation method based on the improved attention and loss functions, the image features are subjected to global information extraction to convert the two-dimensional image features into one-dimensional feature codes, partial position information and remote dependency relationship of the target are obtained in the longitudinal axis direction of the image, and the one-dimensional feature codes are subjected to iterative attention mechanism enhancement, so that the obtained features can represent richer information, the accuracy and precision of the target segmentation result are improved, and the segmentation performance of small targets is effectively improved.

(2) The invention discloses a target segmentation method based on improved attention and loss functions, which is characterized in that a target segmentation model is constructed on the basis of a YOLACT network, model training is carried out through a target segmentation loss function constructed on the basis of the area focal-loss, the proportion of small targets in loss is increased by using the area focal-loss function, the segmentation performance of the small targets is improved, and the problems of low detection precision of the small targets, easiness in missed detection and inaccuracy of masks after segmentation in the prior art in target segmentation are solved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flow chart diagram of the object segmentation method based on the improved attention and loss functions of the present invention;

FIG. 2 is an image to be segmented according to an embodiment of the present invention based on an improved attention and loss function object segmentation method;

FIG. 3 is a schematic structural diagram of an attention mechanism of an embodiment of an object segmentation method based on improved attention and loss functions according to the present invention;

FIG. 4 is a diagram illustrating prototype results of an embodiment of the object segmentation method based on the modified attention and loss functions according to the present invention;

FIG. 5 is a diagram illustrating a segmentation result of an example of an object according to an embodiment of the object segmentation method based on an improved attention and loss function;

FIG. 6 is a diagram of other example segmentation results of an embodiment of the object segmentation method based on the improved attention and loss functions of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention relates to a target segmentation method based on an improved attention and loss function, which comprises the following steps:

step S30, constructing a target segmentation model based on an example segmentation network (namely a YOLACT network), and performing model training through a target segmentation loss function constructed based on regional focal loss (namely area focal-loss);

In order to more clearly describe the object segmentation method based on the improved attention and loss functions of the present invention, the following describes the steps in the embodiment of the present invention in detail with reference to fig. 1.

The object segmentation method based on the improved attention and loss function according to the first embodiment of the present invention includes steps S10-S60, wherein each step is described in detail as follows:

and step S10, extracting the features of the image to be segmented, and extracting global information and enhancing an attention mechanism of the extracted two-dimensional image features to obtain enhanced features of the image to be segmented.

As shown in fig. 2, which is an image to be segmented according to an embodiment of the object segmentation method based on the improved attention and loss function of the present invention, in the left drawing of fig. 2, a racket, a tennis ball, etc. belongs to a small object, and a human large object is played, in the right drawing of fig. 2, a wheel skateboard is a small object, a human large object is on the wheel skateboard, and a human large object is incomplete due to complicated background light on the left.

And step S11, extracting the features of the image to be segmented through a feature extraction network to obtain the two-dimensional image features of the image to be segmented.

Step S12, performing global information extraction of the two-dimensional image features of the image to be segmented by global average pooling to obtain a one-dimensional feature code of the image to be segmented, as shown in formula (1):

And global information extraction is carried out on the two-dimensional image features, namely the two-dimensional image features are converted into one-dimensional features through global average pooling on the basis of SENet, and partial position information and remote dependency relationship of the target in the image are obtained in the longitudinal axis direction.

Step S131, performing feature grouping of the one-dimensional feature codes of the image to be segmented through group convolution operation to obtain m feature groups, wherein each feature group comprises n subgroups.

And S132, uniformly mixing different groups of subgroups, and performing group convolution operation and attention mechanism enhancement on the mixed first features to obtain second features.

Step S133, the first feature and the second feature are fused through a Sigmoid function to obtain an enhanced feature of the image to be segmented, as shown in formula (2):

F′(h)＝δ(M_s(GC(V(h)))) (2)

The enhanced features of the image to be segmented obtained by the method can represent richer information.

And step S20, inputting the enhanced features of the image to be segmented into a feature processing network to obtain the multi-scale features of the image to be segmented.

As shown in fig. 3, which is a schematic diagram of an attention mechanism structure according to an embodiment of an object segmentation method based on improved attention and loss functions of the present invention, an input of the attention mechanism structure is divided into two lines after residual error processing, one line is a first output, the other line is a second output after global average pooling, global maximum pooling, group convolution and Relu function processing in sequence, the second output is spliced with the second output as a third output after channel shuffling and 1 × 1 convolution processing in sequence, and the third output is spliced with the first output as a final output of the attention mechanism structure by a weighting method after Sigmoid function processing.

And step S30, constructing a target segmentation model based on the example segmentation network, and performing model training through a target segmentation loss function constructed based on the regional focus loss.

An objective segmentation loss function, which is expressed as shown in equation (3):

L_total＝λ₁L_cls+λ₂L_box+λ₃L_mask+λ₄L_af (3)

In one embodiment of the invention, λ₁＝1、λ₂＝1.25、λ₃＝6.125、λ₄＝1。

Regional focus loss L_afWhich is represented by the formula (4):

L_af＝-(1.125-p_t)^γlog(p_t) (4)

In one embodiment of the present invention, γ is 2.

And step S40, based on the multi-scale features of the image to be segmented, obtaining the class confidence and the position of each candidate frame and the set k image masks and k prototype masks through the trained target segmentation model.

Fig. 4 is a schematic diagram of prototype results of an embodiment of the target segmentation method based on the improved attention and loss functions of the present invention, which collectively shows a feature diagram during network processing after weighting by mask coefficients.

And step S50, screening the candidate frames through an NMS screening algorithm, and respectively performing matrix multiplication on k image masks and k prototype masks to obtain a target boundary frame and a target object mask of the image to be segmented.

As shown in fig. 5 and fig. 6, which are schematic diagrams of a target example and other example segmentation results of an embodiment of the target segmentation method based on the improved attention and loss function of the present invention, respectively, the diagrams include interferences such as motion shadow, illumination change, image noise, etc., and it can be seen from the segmentation results that the method of the present invention has strong robustness, can overcome these interferences, and obtains an accurate target segmentation result.

Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.

The second embodiment of the present invention is an object segmentation module based on improved attention and loss functions, the object segmentation system comprising:

the model building and training module is configured to build a target segmentation model based on an example segmentation network and carry out model training through a target segmentation loss function built based on regional focus loss;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that, the objective segmentation system based on the improved attention and loss function provided in the foregoing embodiment is only illustrated by the division of the foregoing functional modules, and in practical applications, the foregoing function allocation may be completed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

An electronic apparatus according to a third embodiment of the present invention includes:

at least one processor; and

A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the above-described target segmentation method based on improved attention and loss functions.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. An object segmentation method based on an improved attention and loss function, the object segmentation method comprising:

2. The method for object segmentation based on improved attention and loss function as claimed in claim 1, wherein the step S10 includes:

3. The method for target segmentation based on the improved attention and loss function as claimed in claim 2, wherein the one-dimensional feature of the image to be segmented is encoded as:

4. The method for object segmentation based on improved attention and loss function as claimed in claim 3, wherein the step S13 includes:

5. The method of object segmentation based on improved attention and loss functions as claimed in claim 4, wherein the enhanced features of the image to be segmented are expressed as:

F′(h)＝δ(M_s(GC(V(h))))

6. The improved attention and loss function based object segmentation method according to claim 1, characterized in that the object segmentation loss function is expressed as:

L_total＝λ₁L_cls+λ₂L_box+λ₃L_mask+λ₄L_af

7. The method of claim 6, wherein the regional focus loss L is_afIt is expressed as:

L_af＝-(1.125-p_t)γlog(p_t)

8. An object segmentation system based on an improved attention and loss function, characterized in that the object segmentation system comprises the following modules:

9. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the processor for execution by the processor to implement the improved attention and loss function based target instance segmentation method of any one of claims 1-7.

10. A computer-readable storage medium storing computer instructions for execution by the computer to implement the improved attention and loss function based object segmentation method of any one of claims 1-7.