CN113988179A - Target segmentation method, system and equipment based on improved attention and loss function - Google Patents

Target segmentation method, system and equipment based on improved attention and loss function Download PDF

Info

Publication number
CN113988179A
CN113988179A CN202111259594.4A CN202111259594A CN113988179A CN 113988179 A CN113988179 A CN 113988179A CN 202111259594 A CN202111259594 A CN 202111259594A CN 113988179 A CN113988179 A CN 113988179A
Authority
CN
China
Prior art keywords
image
segmented
segmentation
target
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111259594.4A
Other languages
Chinese (zh)
Inventor
王坤峰
徐鹏斌
张衡
李大字
楚纪正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN202111259594.4A priority Critical patent/CN113988179A/en
Publication of CN113988179A publication Critical patent/CN113988179A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform

Abstract

The invention belongs to the field of computer vision and image processing, and particularly relates to a target segmentation method, a target segmentation system and target segmentation equipment based on improved attention and loss functions, aiming at solving the problems of low detection precision and inaccurate mask representation after segmentation caused by complex picture spatial layout in the conventional example segmentation technology. The invention comprises the following steps: extracting the features of the image to be segmented, and extracting global information and enhancing an attention mechanism of the extracted two-dimensional image features; inputting the enhanced features into a feature processing network to obtain multi-scale features; constructing a target segmentation model based on an example segmentation network, and performing model training through a target segmentation loss function constructed based on regional focus loss; and carrying out target segmentation through the trained model to obtain a target segmentation result of the image to be segmented. The method has the advantages of good segmentation effect, high small target detection precision, difficulty in missing detection and accurate mask after segmentation.

Description

Target segmentation method, system and equipment based on improved attention and loss function
Technical Field
The invention belongs to the field of computer vision and image processing, and particularly relates to a target segmentation method, a system and equipment based on improved attention and loss functions.
Background
The purpose of object detection is to detect each object in the image and identify their class. The purpose of semantic segmentation is to perform pixel-level segmentation on the input image while assigning semantic classes to each object in the image. Instance segmentation is a combination of the two, with the goal of predicting class labels and pixel-level instance masks to locate the different numbers of instances present in the image. This task has a wide range of benefits for autonomous driving of cars, robots, video surveillance, etc.
Example segmentation has advanced greatly in the visual field over the past few years with deep convolutional neural networks. Currently, example segmentation methods are generally classified into two categories, one is a two-stage method with region suggestion, such as Mask R-CNN, the first stage proposes a set of regions of interest (RoIs), and the second stage predicts an example Mask from features extracted using roiign. Since the input image is processed in two stages, the two-stage model has higher accuracy than a single-stage model. Another type is a single-phase instance partitioning method without regional proposal, such as yolcat, the authors design two branch networks, and do the following in parallel: (1) the method comprises the steps that a pre-measuring head branches to generate class confidence coefficients of all candidate frames, the positions of anchor frames and coefficient of prototype masks; (2) the prototype branching network generates k prototype masks for each picture, the number of prototype masks and coefficients being equal. Since there is no area proposal step and the case position and shape are predicted at the same time, the speed is faster, yolcat becomes the first algorithm to achieve real-time segmentation speed on the COCO dataset. To achieve higher performance and accuracy, feature processing is introduced to extract multi-scale features within the network, where a top-down cross-connect path is added to propagate semantically strong features.
Some data sets released at present provide a larger space for the improvement of the algorithm, such as a COCO data set, which is composed of 20 ten thousand pictures, each picture has more examples of complex spatial layout, and the MVD and cityscaps data sets also provide a large number of street views of traffic participants in each picture.
Although the yolcat method has been very successful, the following problems remain. Firstly, each picture in the data set has complex spatial layout and low spatial positioning precision, and small targets in the images are easy to miss detection. Second, the detection accuracy of small targets is low. Third, the segmented mask is not accurate.
Disclosure of Invention
In order to alleviate the above problems in the prior art, namely the problems of low detection precision and inaccurate mask representation after segmentation caused by complex picture spatial layout in the prior example segmentation technology, the present invention provides an object segmentation method based on improved attention and loss functions, the object segmentation method comprising:
step S10, extracting the features of the image to be segmented, and extracting the global information and enhancing the attention mechanism of the extracted two-dimensional image features to obtain the enhanced features of the image to be segmented;
step S20, inputting the enhanced features of the image to be segmented into a feature processing network to obtain the multi-scale features of the image to be segmented;
step S30, constructing a target segmentation model based on the example segmentation network, and performing model training through a target segmentation loss function constructed based on the regional focus loss;
step S40, based on the multi-scale features of the image to be segmented, obtaining the category confidence and the position of each candidate frame and k set image masks and k set prototype masks through a trained target segmentation model;
step S50, screening the candidate frame through NMS screening algorithm, and performing matrix multiplication of k image masks and k prototype masks respectively to obtain a target boundary frame and a target object mask of the image to be segmented;
and step S60, performing binarization processing on the mask of the target object by using a set threshold value, and clearing the mask outside the target boundary frame of the image to be segmented to obtain the target segmentation result of the image to be segmented.
In some preferred embodiments, step S10 includes:
step S11, extracting the characteristics of the image to be segmented through a characteristic extraction network to obtain the two-dimensional image characteristics of the image to be segmented;
step S12, extracting global information of the two-dimensional image features of the image to be segmented through global average pooling to obtain one-dimensional feature codes of the image to be segmented;
and step S13, performing iterative attention mechanism enhancement for a set number of times on the one-dimensional feature code of the image to be segmented to obtain the enhancement feature of the image to be segmented.
In some preferred embodiments, the one-dimensional feature of the image to be segmented is encoded, which is represented as:
Figure BDA0003325183580000031
wherein v (h) represents the one-dimensional feature code of the image to be segmented, W represents the width of v (h), h represents the two-dimensional image feature of the image to be segmented, and f () represents the global average pooling operation.
In some preferred embodiments, step S13 includes:
step S131, performing feature grouping of one-dimensional feature codes of the image to be segmented through group convolution operation to obtain m feature groups, wherein each feature group comprises n subgroups;
step S132, uniformly mixing different groups of subgroups, and performing group convolution operation and attention mechanism enhancement on the mixed first features to obtain second features;
and S133, fusing the first characteristic and the second characteristic through a Sigmoid function to obtain an enhanced characteristic of the image to be segmented.
In some preferred embodiments, the enhanced features of the image to be segmented are expressed as:
F′(h)=δ(Ms(GC(V(h))))
wherein F' (h) represents the enhancement feature of the image to be segmented, GC () represents the group convolution operation, Ms() Represents an iterative attention mechanism enhancement operation, and δ () represents a Sigmoid function.
In some preferred embodiments, the target segmentation loss function is expressed as:
Ltotal=λ1Lcls2Lbox3Lmask4Laf
wherein L istotalRepresenting the target segmentation loss function, LclsRepresenting a loss of classification, LboxRepresents a bounding box penalty, LmaskRepresents a mask penalty, LafRepresents the regional focus loss, λ1、λ2、λ3、λ4Balance parameters respectively representing classification loss, bounding box loss, mask loss and area local-loss in the target segmentation loss function.
In some preferred embodiments, the regional focus loss LafIt is expressed as:
Laf=-(1.125-pt)γlog(pt)
wherein p istRepresents the ratio of bounding box to prototype region, and gamma represents the parameter that adjusts the rate of weight reduction in model training.
In another aspect of the present invention, an object segmentation module based on improved attention and loss functions is proposed, the object segmentation module comprising:
the feature extraction and enhancement module is configured to extract features of the image to be segmented, extract global information and enhance attention mechanism of the extracted two-dimensional image features, and obtain enhanced features of the image to be segmented;
the characteristic processing module is configured to input the enhanced characteristics of the image to be segmented into a characteristic processing network to obtain the multi-scale characteristics of the image to be segmented;
the model building and training module is configured to build a target segmentation model for the example segmentation network and perform model training through a target segmentation loss function built based on the regional focus loss;
the target segmentation module is configured to obtain the category confidence and the position of each candidate frame and set k image masks and k prototype masks through a trained target segmentation model based on the multi-scale features of the image to be segmented;
the target screening module is configured to screen the candidate frame through an NMS screening algorithm and perform matrix multiplication of k image masks and k prototype masks respectively to obtain a target boundary frame and a target object mask of the image to be segmented;
the threshold segmentation module is configured to perform binarization processing on a target object mask by using a set threshold, clear the mask outside a target boundary frame of the image to be segmented and obtain a target segmentation result of the image to be segmented;
and the segmentation result output module is configured to output a target segmentation result of the image to be segmented.
In a third aspect of the present invention, an electronic device is provided, including:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor for execution by the processor to implement the above-described target segmentation method based on an improved attention and loss function.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for execution by the computer to implement the above-mentioned target segmentation method based on improved attention and loss functions.
The invention has the beneficial effects that:
(1) according to the target segmentation method based on the improved attention and loss functions, the image features are subjected to global information extraction to convert the two-dimensional image features into one-dimensional feature codes, partial position information and remote dependency relationship of the target are obtained in the longitudinal axis direction of the image, and the one-dimensional feature codes are subjected to iterative attention mechanism enhancement, so that the obtained features can represent richer information, the accuracy and precision of the target segmentation result are improved, and the segmentation performance of small targets is effectively improved.
(2) The invention discloses a target segmentation method based on improved attention and loss functions, which is characterized in that a target segmentation model is constructed on the basis of a YOLACT network, model training is carried out through a target segmentation loss function constructed on the basis of the area focal-loss, the proportion of small targets in loss is increased by using the area focal-loss function, the segmentation performance of the small targets is improved, and the problems of low detection precision of the small targets, easiness in missed detection and inaccuracy of masks after segmentation in the prior art in target segmentation are solved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart diagram of the object segmentation method based on the improved attention and loss functions of the present invention;
FIG. 2 is an image to be segmented according to an embodiment of the present invention based on an improved attention and loss function object segmentation method;
FIG. 3 is a schematic structural diagram of an attention mechanism of an embodiment of an object segmentation method based on improved attention and loss functions according to the present invention;
FIG. 4 is a diagram illustrating prototype results of an embodiment of the object segmentation method based on the modified attention and loss functions according to the present invention;
FIG. 5 is a diagram illustrating a segmentation result of an example of an object according to an embodiment of the object segmentation method based on an improved attention and loss function;
FIG. 6 is a diagram of other example segmentation results of an embodiment of the object segmentation method based on the improved attention and loss functions of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention relates to a target segmentation method based on an improved attention and loss function, which comprises the following steps:
step S10, extracting the features of the image to be segmented, and extracting the global information and enhancing the attention mechanism of the extracted two-dimensional image features to obtain the enhanced features of the image to be segmented;
step S20, inputting the enhanced features of the image to be segmented into a feature processing network to obtain the multi-scale features of the image to be segmented;
step S30, constructing a target segmentation model based on an example segmentation network (namely a YOLACT network), and performing model training through a target segmentation loss function constructed based on regional focal loss (namely area focal-loss);
step S40, based on the multi-scale features of the image to be segmented, obtaining the category confidence and the position of each candidate frame and k set image masks and k set prototype masks through a trained target segmentation model;
step S50, screening the candidate frame through NMS screening algorithm, and performing matrix multiplication of k image masks and k prototype masks respectively to obtain a target boundary frame and a target object mask of the image to be segmented;
and step S60, performing binarization processing on the mask of the target object by using a set threshold value, and clearing the mask outside the target boundary frame of the image to be segmented to obtain the target segmentation result of the image to be segmented.
In order to more clearly describe the object segmentation method based on the improved attention and loss functions of the present invention, the following describes the steps in the embodiment of the present invention in detail with reference to fig. 1.
The object segmentation method based on the improved attention and loss function according to the first embodiment of the present invention includes steps S10-S60, wherein each step is described in detail as follows:
and step S10, extracting the features of the image to be segmented, and extracting global information and enhancing an attention mechanism of the extracted two-dimensional image features to obtain enhanced features of the image to be segmented.
As shown in fig. 2, which is an image to be segmented according to an embodiment of the object segmentation method based on the improved attention and loss function of the present invention, in the left drawing of fig. 2, a racket, a tennis ball, etc. belongs to a small object, and a human large object is played, in the right drawing of fig. 2, a wheel skateboard is a small object, a human large object is on the wheel skateboard, and a human large object is incomplete due to complicated background light on the left.
And step S11, extracting the features of the image to be segmented through a feature extraction network to obtain the two-dimensional image features of the image to be segmented.
Step S12, performing global information extraction of the two-dimensional image features of the image to be segmented by global average pooling to obtain a one-dimensional feature code of the image to be segmented, as shown in formula (1):
Figure BDA0003325183580000081
wherein v (h) represents the one-dimensional feature code of the image to be segmented, W represents the width of v (h), h represents the two-dimensional image feature of the image to be segmented, and f () represents the global average pooling operation.
And global information extraction is carried out on the two-dimensional image features, namely the two-dimensional image features are converted into one-dimensional features through global average pooling on the basis of SENet, and partial position information and remote dependency relationship of the target in the image are obtained in the longitudinal axis direction.
And step S13, performing iterative attention mechanism enhancement for a set number of times on the one-dimensional feature code of the image to be segmented to obtain the enhancement feature of the image to be segmented.
Step S131, performing feature grouping of the one-dimensional feature codes of the image to be segmented through group convolution operation to obtain m feature groups, wherein each feature group comprises n subgroups.
And S132, uniformly mixing different groups of subgroups, and performing group convolution operation and attention mechanism enhancement on the mixed first features to obtain second features.
Step S133, the first feature and the second feature are fused through a Sigmoid function to obtain an enhanced feature of the image to be segmented, as shown in formula (2):
F′(h)=δ(Ms(GC(V(h)))) (2)
wherein F' (h) represents the enhancement feature of the image to be segmented, GC () represents the group convolution operation, Ms() Represents an iterative attention mechanism enhancement operation, and δ () represents a Sigmoid function.
The enhanced features of the image to be segmented obtained by the method can represent richer information.
And step S20, inputting the enhanced features of the image to be segmented into a feature processing network to obtain the multi-scale features of the image to be segmented.
As shown in fig. 3, which is a schematic diagram of an attention mechanism structure according to an embodiment of an object segmentation method based on improved attention and loss functions of the present invention, an input of the attention mechanism structure is divided into two lines after residual error processing, one line is a first output, the other line is a second output after global average pooling, global maximum pooling, group convolution and Relu function processing in sequence, the second output is spliced with the second output as a third output after channel shuffling and 1 × 1 convolution processing in sequence, and the third output is spliced with the first output as a final output of the attention mechanism structure by a weighting method after Sigmoid function processing.
And step S30, constructing a target segmentation model based on the example segmentation network, and performing model training through a target segmentation loss function constructed based on the regional focus loss.
An objective segmentation loss function, which is expressed as shown in equation (3):
Ltotal=λ1Lcls2Lbox3Lmask4Laf (3)
wherein L istotalRepresenting the target segmentation loss function, LclsRepresenting a loss of classification, LboxRepresents a bounding box penalty, LmaskRepresents a mask penalty, LafRepresents the regional focus loss, λ1、λ2、λ3、λ4Balance parameters respectively representing classification loss, bounding box loss, mask loss and area local-loss in the target segmentation loss function.
In one embodiment of the invention, λ1=1、λ2=1.25、λ3=6.125、λ4=1。
Regional focus loss LafWhich is represented by the formula (4):
Laf=-(1.125-pt)γlog(pt) (4)
wherein p istRepresents the ratio of bounding box to prototype region, and gamma represents the parameter that adjusts the rate of weight reduction in model training.
In one embodiment of the present invention, γ is 2.
And step S40, based on the multi-scale features of the image to be segmented, obtaining the class confidence and the position of each candidate frame and the set k image masks and k prototype masks through the trained target segmentation model.
Fig. 4 is a schematic diagram of prototype results of an embodiment of the target segmentation method based on the improved attention and loss functions of the present invention, which collectively shows a feature diagram during network processing after weighting by mask coefficients.
And step S50, screening the candidate frames through an NMS screening algorithm, and respectively performing matrix multiplication on k image masks and k prototype masks to obtain a target boundary frame and a target object mask of the image to be segmented.
And step S60, performing binarization processing on the mask of the target object by using a set threshold value, and clearing the mask outside the target boundary frame of the image to be segmented to obtain the target segmentation result of the image to be segmented.
As shown in fig. 5 and fig. 6, which are schematic diagrams of a target example and other example segmentation results of an embodiment of the target segmentation method based on the improved attention and loss function of the present invention, respectively, the diagrams include interferences such as motion shadow, illumination change, image noise, etc., and it can be seen from the segmentation results that the method of the present invention has strong robustness, can overcome these interferences, and obtains an accurate target segmentation result.
Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.
The second embodiment of the present invention is an object segmentation module based on improved attention and loss functions, the object segmentation system comprising:
the feature extraction and enhancement module is configured to extract features of the image to be segmented, extract global information and enhance attention mechanism of the extracted two-dimensional image features, and obtain enhanced features of the image to be segmented;
the characteristic processing module is configured to input the enhanced characteristics of the image to be segmented into a characteristic processing network to obtain the multi-scale characteristics of the image to be segmented;
the model building and training module is configured to build a target segmentation model based on an example segmentation network and carry out model training through a target segmentation loss function built based on regional focus loss;
the target segmentation module is configured to obtain the category confidence and the position of each candidate frame and set k image masks and k prototype masks through a trained target segmentation model based on the multi-scale features of the image to be segmented;
the target screening module is configured to screen the candidate frame through an NMS screening algorithm and perform matrix multiplication of k image masks and k prototype masks respectively to obtain a target boundary frame and a target object mask of the image to be segmented;
the threshold segmentation module is configured to perform binarization processing on a target object mask by using a set threshold, clear the mask outside a target boundary frame of the image to be segmented and obtain a target segmentation result of the image to be segmented;
and the segmentation result output module is configured to output a target segmentation result of the image to be segmented.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the objective segmentation system based on the improved attention and loss function provided in the foregoing embodiment is only illustrated by the division of the foregoing functional modules, and in practical applications, the foregoing function allocation may be completed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An electronic apparatus according to a third embodiment of the present invention includes:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor for execution by the processor to implement the above-described target segmentation method based on an improved attention and loss function.
A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the above-described target segmentation method based on improved attention and loss functions.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. An object segmentation method based on an improved attention and loss function, the object segmentation method comprising:
step S10, extracting the features of the image to be segmented, and extracting the global information and enhancing the attention mechanism of the extracted two-dimensional image features to obtain the enhanced features of the image to be segmented;
step S20, inputting the enhanced features of the image to be segmented into a feature processing network to obtain the multi-scale features of the image to be segmented;
step S30, constructing a target segmentation model based on the example segmentation network, and performing model training through a target segmentation loss function constructed based on the regional focus loss;
step S40, based on the multi-scale features of the image to be segmented, obtaining the category confidence and the position of each candidate frame and k set image masks and k set prototype masks through a trained target segmentation model;
step S50, screening the candidate frame through NMS screening algorithm, and performing matrix multiplication of k image masks and k prototype masks respectively to obtain a target boundary frame and a target object mask of the image to be segmented;
and step S60, performing binarization processing on the mask of the target object by using a set threshold value, and clearing the mask outside the target boundary frame of the image to be segmented to obtain the target segmentation result of the image to be segmented.
2. The method for object segmentation based on improved attention and loss function as claimed in claim 1, wherein the step S10 includes:
step S11, extracting the characteristics of the image to be segmented through a characteristic extraction network to obtain the two-dimensional image characteristics of the image to be segmented;
step S12, extracting global information of the two-dimensional image features of the image to be segmented through global average pooling to obtain one-dimensional feature codes of the image to be segmented;
and step S13, performing iterative attention mechanism enhancement for a set number of times on the one-dimensional feature code of the image to be segmented to obtain the enhancement feature of the image to be segmented.
3. The method for target segmentation based on the improved attention and loss function as claimed in claim 2, wherein the one-dimensional feature of the image to be segmented is encoded as:
Figure FDA0003325183570000021
wherein v (h) represents the one-dimensional feature code of the image to be segmented, W represents the width of v (h), h represents the two-dimensional image feature of the image to be segmented, and f () represents the global average pooling operation.
4. The method for object segmentation based on improved attention and loss function as claimed in claim 3, wherein the step S13 includes:
step S131, performing feature grouping of one-dimensional feature codes of the image to be segmented through group convolution operation to obtain m feature groups, wherein each feature group comprises n subgroups;
step S132, uniformly mixing different groups of subgroups, and performing group convolution operation and attention mechanism enhancement on the mixed first features to obtain second features;
and S133, fusing the first characteristic and the second characteristic through a Sigmoid function to obtain an enhanced characteristic of the image to be segmented.
5. The method of object segmentation based on improved attention and loss functions as claimed in claim 4, wherein the enhanced features of the image to be segmented are expressed as:
F′(h)=δ(Ms(GC(V(h))))
wherein F' (h) represents the enhancement feature of the image to be segmented, GC () represents the group convolution operation, Ms() Represents an iterative attention mechanism enhancement operation, and δ () represents a Sigmoid function.
6. The improved attention and loss function based object segmentation method according to claim 1, characterized in that the object segmentation loss function is expressed as:
Ltotal=λ1Lcls2Lbox3Lmask4Laf
wherein L istotalRepresenting the target segmentation loss function, LclsRepresenting a loss of classification, LboxRepresents a bounding box penalty, LmaskRepresents a mask penalty, LafRepresents the regional focus loss, λ1、λ2、λ3、λ4Balance parameters respectively representing classification loss, bounding box loss, mask loss and area local-loss in the target segmentation loss function.
7. The method of claim 6, wherein the regional focus loss L isafIt is expressed as:
Laf=-(1.125-pt)γlog(pt)
wherein p istRepresents the ratio of bounding box to prototype region, and gamma represents the parameter that adjusts the rate of weight reduction in model training.
8. An object segmentation system based on an improved attention and loss function, characterized in that the object segmentation system comprises the following modules:
the feature extraction and enhancement module is configured to extract features of the image to be segmented, extract global information and enhance attention mechanism of the extracted two-dimensional image features, and obtain enhanced features of the image to be segmented;
the characteristic processing module is configured to input the enhanced characteristics of the image to be segmented into a characteristic processing network to obtain the multi-scale characteristics of the image to be segmented;
the model building and training module is configured to build a target segmentation model based on an example segmentation network and carry out model training through a target segmentation loss function built based on regional focus loss;
the target segmentation module is configured to obtain the category confidence and the position of each candidate frame and set k image masks and k prototype masks through a trained target segmentation model based on the multi-scale features of the image to be segmented;
the target screening module is configured to screen the candidate frame through an NMS screening algorithm and perform matrix multiplication of k image masks and k prototype masks respectively to obtain a target boundary frame and a target object mask of the image to be segmented;
the threshold segmentation module is configured to perform binarization processing on a target object mask by using a set threshold, clear the mask outside a target boundary frame of the image to be segmented and obtain a target segmentation result of the image to be segmented;
and the segmentation result output module is configured to output a target segmentation result of the image to be segmented.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor for execution by the processor to implement the improved attention and loss function based target instance segmentation method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for execution by the computer to implement the improved attention and loss function based object segmentation method of any one of claims 1-7.
CN202111259594.4A 2021-10-28 2021-10-28 Target segmentation method, system and equipment based on improved attention and loss function Pending CN113988179A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111259594.4A CN113988179A (en) 2021-10-28 2021-10-28 Target segmentation method, system and equipment based on improved attention and loss function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111259594.4A CN113988179A (en) 2021-10-28 2021-10-28 Target segmentation method, system and equipment based on improved attention and loss function

Publications (1)

Publication Number Publication Date
CN113988179A true CN113988179A (en) 2022-01-28

Family

ID=79742989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111259594.4A Pending CN113988179A (en) 2021-10-28 2021-10-28 Target segmentation method, system and equipment based on improved attention and loss function

Country Status (1)

Country Link
CN (1) CN113988179A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114812398A (en) * 2022-04-10 2022-07-29 同济大学 High-precision real-time crack detection platform based on unmanned aerial vehicle
CN115170934A (en) * 2022-09-05 2022-10-11 粤港澳大湾区数字经济研究院(福田) Image segmentation method, system, equipment and storage medium
CN117407557A (en) * 2023-12-13 2024-01-16 江西云眼视界科技股份有限公司 Zero sample instance segmentation method, system, readable storage medium and computer
CN117593530A (en) * 2024-01-19 2024-02-23 杭州灵西机器人智能科技有限公司 Dense carton segmentation method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114812398A (en) * 2022-04-10 2022-07-29 同济大学 High-precision real-time crack detection platform based on unmanned aerial vehicle
CN114812398B (en) * 2022-04-10 2023-10-03 同济大学 High-precision real-time crack detection platform based on unmanned aerial vehicle
CN115170934A (en) * 2022-09-05 2022-10-11 粤港澳大湾区数字经济研究院(福田) Image segmentation method, system, equipment and storage medium
CN117407557A (en) * 2023-12-13 2024-01-16 江西云眼视界科技股份有限公司 Zero sample instance segmentation method, system, readable storage medium and computer
CN117407557B (en) * 2023-12-13 2024-05-07 江西云眼视界科技股份有限公司 Zero sample instance segmentation method, system, readable storage medium and computer
CN117593530A (en) * 2024-01-19 2024-02-23 杭州灵西机器人智能科技有限公司 Dense carton segmentation method and system

Similar Documents

Publication Publication Date Title
Pathak et al. Learning features by watching objects move
Hamaguchi et al. Building detection from satellite imagery using ensemble of size-specific detectors
CN113988179A (en) Target segmentation method, system and equipment based on improved attention and loss function
CN108876791B (en) Image processing method, device and system and storage medium
CN108537824B (en) Feature map enhanced network structure optimization method based on alternating deconvolution and convolution
CN104866868A (en) Metal coin identification method based on deep neural network and apparatus thereof
CN108876804B (en) Matting model training and image matting method, device and system and storage medium
CN111640125A (en) Mask R-CNN-based aerial photograph building detection and segmentation method and device
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN107358141B (en) Data identification method and device
Mustamo Object detection in sports: TensorFlow Object Detection API case study
CN109558790B (en) Pedestrian target detection method, device and system
Lee et al. Automatic recognition of flower species in the natural environment
CN111401293A (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN110717863A (en) Single-image snow removing method based on generation countermeasure network
CN110427819A (en) The method and relevant device of PPT frame in a kind of identification image
Tetiana et al. Computer vision mobile system for education using augmented reality technology
CN114612832A (en) Real-time gesture detection method and device
CN113822383A (en) Unmanned aerial vehicle detection method and system based on multi-domain attention mechanism
CN111260687B (en) Aerial video target tracking method based on semantic perception network and related filtering
Jin et al. Cvt-assd: convolutional vision-transformer based attentive single shot multibox detector
Guo et al. ClouDet: A dilated separable CNN-based cloud detection framework for remote sensing imagery
CN103500456A (en) Object tracking method and equipment based on dynamic Bayes model network
CN111242066A (en) Large-size image target detection method and device and computer readable storage medium
CN113744280A (en) Image processing method, apparatus, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination