CN111461127A - Example segmentation method based on one-stage target detection framework - Google Patents

Example segmentation method based on one-stage target detection framework Download PDF

Info

Publication number
CN111461127A
CN111461127A CN202010239127.4A CN202010239127A CN111461127A CN 111461127 A CN111461127 A CN 111461127A CN 202010239127 A CN202010239127 A CN 202010239127A CN 111461127 A CN111461127 A CN 111461127A
Authority
CN
China
Prior art keywords
detection
segmentation
result
results
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010239127.4A
Other languages
Chinese (zh)
Other versions
CN111461127B (en
Inventor
罗荣华
李嘉明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010239127.4A priority Critical patent/CN111461127B/en
Publication of CN111461127A publication Critical patent/CN111461127A/en
Application granted granted Critical
Publication of CN111461127B publication Critical patent/CN111461127B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

An example segmentation method based on a one-stage target detection framework comprises the following steps: 1) encoding image dataset annotations, a target in an image being defined as a dense point object; 2) constructing an example segmentation network model, wherein the example segmentation network model comprises a backbone network, a main body feature extraction module, a detection module for generating a target detection result and a segmentation module for generating a semantic segmentation result; 3) deep learning training is carried out, and the method is mainly embodied as a multi-task loss function which is suitable for example segmentation tasks and designed by the invention; 4) and in the inference stage, the inference method combines the target detection result and the semantic segmentation result, and adopts a non-maximum segmentation screening method to obtain an example segmentation result. The method is simple and reasonable in design, can ensure the detection quality and the detection speed of the original target detection frame, can generate high-precision segmentation masks simultaneously, and has good robustness.

Description

Example segmentation method based on one-stage target detection framework
Technical Field
The invention belongs to the technical field of deep learning and computer vision, and particularly relates to an example segmentation method based on a one-stage target detection framework.
Background
In the past years, the object detection direction of computer vision has been greatly improved, and the example segmentation task with certain correlation with the object detection direction has been developed to a certain extent after the powerful functions of the object detection framework are used for reference.
The example segmentation task refers to positioning a potential target position in an image, wherein the position is represented by a detection target frame, pixel-by-pixel marking is carried out in different target areas in a semantic segmentation mode, and the task can be completed by carrying out corresponding expansion on the basis of target detection. Many works are expanding the existing mature target detection framework to complete the task of example segmentation, but most works are expanding the two-stage target detection framework, and the expansion is difficult to consider the detection speed while ensuring the segmentation quality.
In 2018, the simple and flexible expansion method of the two-stage target detection framework-based instance segmentation method Mask R-CNN proposed by the Renamo team from the Facebook AI institute enables the field to pay more attention to the commonality and mutual expansion functions of the target detection and instance segmentation tasks and the corresponding methods thereof. However, this method currently sacrifices efficiency in time while achieving a correspondingly high quality segmentation result, which is where this method is currently worth improving.
One of the reasons why Mask R-CNN is time-inefficient is that it uses a two-stage object detection framework that is inherently problematic. The invention provides a faster instance segmentation process by effectively expanding a one-stage target detection framework, and effectively solves the problems. However, the example segmentation method based on one-stage target detection framework extension cannot simply and directly extend Mask branches like Mask R-CNN to complete corresponding example segmentation tasks. The example segmentation task is expanded on a stage target detection frame, and the ambiguity problem of different object detection target frames and the ambiguity problem of segmentation semantics need to be solved.
Therefore, there is a need for an example segmentation method that addresses the above ambiguity problem while taking speed into account.
Disclosure of Invention
In view of the problems described in the background art, the present invention provides an example segmentation method based on a one-stage object detection framework, so as to provide an example segmentation method that can simultaneously consider both segmentation quality and model speed.
In order to solve the above technical problem, the present invention provides an example segmentation method based on a one-stage target detection framework, including:
s1, labeling the target objects in the training images into dense point objects in the network output layer;
s2, constructing a complete example segmentation network model based on a one-stage target detection framework;
s3, designing a multi-task loss function adaptive to an example segmentation task for deep learning training;
and S4, in an inference stage, combining the target detection result and the semantic segmentation result to finally obtain an example segmentation result.
Further, the multitask loss function in step S3 is composed of loss functions of four parts, namely, a target detection box, an object confidence, an object classification and a semantic segmentation, and specifically includes the following steps:
s31, adopting an intersection ratio loss function by the target detection frame part;
s32, the object confidence degree part adopts a focus loss function improved aiming at the object centrality, and the expression is as follows:
Figure BDA0002431967590000021
s33, the object classification part adopts a focus loss function improved for the multi-classification task, and the expression is as follows:
Figure BDA0002431967590000022
s34, adopting cross entropy loss function in semantic segmentation part,
wherein, ytThe actual category of the label;
Figure BDA0002431967590000023
to indicate a function, ytThe part greater than 0 is 1, the rest is 0, αtIs one [0,0.5 ]]Fractional number between; γ is a number greater than 0; p is a radical oftThe classification probabilities of different categories of the output; n is a radical oftThe number of objects in the dataset for that category.
Further, in step S1, the target object included in each training image is expressed in the form of dense point objects, and a tensor having a size of [ B, H, W ] is output for each scale, where B is a batch size, H, W is a height and a width of the corresponding scale, and if and only if a target detection frame of a certain object includes a part of points in the tensor, the part of points is encoded in the form of a point object in such a manner that:
s11, encoding the object type into a one-hot vector with the length of the vector being C;
s12, encoding the object confidence coefficient into an object centrality;
s13, encoding the target detection box into a vector (L, R, T, B) with the length of 4;
where C is the number of classification categories, L, R, T, B respectively indicate the distance from the center of the point object to the left, right, top, and bottom ends of the corresponding object detection box.
Further, in step S2, the example segmented network model based on the one-stage object detection framework includes four parts, namely a backbone network, a main feature extraction module, an object detection module, and a semantic segmentation module, and specifically includes the following steps:
s21, a main network for helping to complete basic feature extraction, which comprises a network input layer for receiving image tensor input and a five-stage down-sampling feature extraction layer;
s22, the main body feature extraction module mainly carries out further mining extraction on the basic features, and the main body feature extraction module comprises three parts, namely transverse connection from a basic feature layer to a feature extraction layer, down-sampling from a lower feature layer to a higher feature layer from bottom to top, and up-sampling from the higher feature layer to the lower feature layer from top to bottom, and finally outputs a feature tensor of each scale;
s23, the target detection module is responsible for obtaining the output of the target detection result, five detection modules under different scales are responsible for the detection of objects with different sizes respectively, each detection module comprises two branches, the two branches both receive the feature tensor extracted by the main feature extraction module, the feature tensor is obtained through the convolution layer with the four kernel sizes of 3 and the down-sampling scale of 1, one branch is a regression branch, the feature tensor is obtained through the convolution branch with the output channel number of 4 to obtain the target detection regression result, the other branch is a classification branch, the feature tensor is obtained through the convolution branches with the output channel number of C and the output channel number of 1 respectively to obtain the target confidence coefficient and classification probability result;
and S24, the semantic segmentation module is responsible for obtaining the output of the semantic segmentation result, and the module only receives the feature tensor of the bottom layer of the main body feature extraction module as input, and obtains the semantic segmentation result divided by categories through four convolutional layers with the kernel size of 3 and the downsampling scale of 1 and a convolutional layer with the output channel number of C.
Further, the example segmentation network model outputs five target detection results of different scales and a semantic segmentation result of one scale, the target detection results are merged at the inference stage, then Top K results are screened out, and finally a non-maximum segmentation screening method combining the target detection results and the semantic segmentation results is used to obtain the final example segmentation result.
Further, the algorithm steps of the non-maximum segmentation screening method are as follows:
1) sorting the detection results in a descending order according to the object scores of the target detection results;
2) selecting a detection object m with the highest score, adding the detection object m into a final detection result list, deleting the detection object m from an original list, obtaining segmentation results corresponding to pixels from semantic segmentation results according to the detection category and detection frame results, using the segmentation results as an example segmentation result set of m, and storing the set by using the score of a target object and the segmentation results corresponding to each pixel as key value pairs respectively;
3) calculating the intersection ratio of m and other detection targets, finding out all results of which the intersection ratio is more than 0.5, deleting the results from the detection result list, and adding the scores and the segmentation results of the targets of the objects into the example segmentation result set of m;
4) performing linear combination on all candidate example segmentation results of m to obtain a final example segmentation result of m;
5) and repeating 2-4) until the detection result list is empty, finishing the algorithm, and returning to the final detection result list.
Compared with the prior art, the invention has the following beneficial effects: the model can complete the example segmentation task at the same time only by slightly modifying the original mature one-stage target detection framework, and an example segmentation result with better quality is obtained on the basis of keeping the detection speed of the original framework.
Drawings
FIG. 1 is a schematic diagram of an example segmentation method based on a one-stage object detection framework;
FIG. 2 is a schematic diagram of target object labeling in an image;
FIG. 3 is a schematic diagram of a main feature extraction module of a second part of the network architecture;
FIG. 4 is a block diagram of a network detection branch and segmentation branch module;
FIG. 5 is a schematic view of a data set processing flow.
Detailed Description
The present invention will be described in further detail below, but the embodiments of the present invention are not limited thereto, and the scope of protection is not limited to the examples.
An example segmentation method based on a single-target detection framework is implemented by the following steps:
s1, acquiring the image to be detected and the label thereof related to the example segmentation task, and labeling the target object in the training image as a dense point object in the network output layer, as shown in FIG. 2. The data acquisition and preprocessing steps are shown in fig. 5. In this embodiment, the target object contained in each image is expressed in the form of dense point objects, and a tensor with a size of [ B, H, W ] is output in each scale, where B is a batch size, H, W is a height and a width of the corresponding scale, and if and only if a detection frame of a certain object contains a part of points in the tensor, the part of points is encoded in the form of a point object, as shown in fig. 2, specifically, each point object is encoded as follows:
s11, encoding the object type into a one-hot vector with the length of the vector being C;
s12, encoding the object confidence coefficient into an object centrality;
s13, the target detection box is encoded into a length-4 vector (L, R, T, B), as shown in fig. 2 (B).
Where C is the number of classification categories, L, R, T, B respectively indicate the distance from the center of the point object to the left, right, top, and bottom ends of the corresponding object detection box.
S2, constructing a complete example segmentation network model based on a one-stage target detection framework, as shown in the training stage of FIG. 1, completing basic feature extraction of an original image through a backbone network, further mining the basic features through a main feature extraction module, and finally obtaining the final output of the network model through a detection layer and a segmentation layer. The complete network model structure is shown in the training phase of fig. 1. The deep network model designed by the invention consists of a backbone network, a main body feature extraction module, a plurality of target detection modules and a semantic segmentation module, wherein the four parts are as follows:
s21, a backbone network for assisting in completing basic feature extraction, which includes a network input layer for receiving image tensor input and five stages of down-sampling feature extraction layers, in this embodiment, ResNet50-C4 is selected as the backbone network for assisting in completing basic feature extraction, the feature extraction capability of this part has been verified in an image recognition model, and its structure is roughly as shown in fig. 1, which includes a network input layer and 5 stages of feature layers, and fig. 1 omits the feature C1 layer without any modification. As shown in the bottom layer of the backbone network portion of fig. 1, the network model input is an image pixel matrix with an image size of 512 x 512.
S22, the main feature extraction module mainly carries out further mining extraction on the basic features, and the main feature extraction module mainly comprises three parts of transverse connection from a basic feature layer to a feature extraction layer, down-sampling from a low-level feature layer to a high-level feature layer from bottom to top, and up-sampling from the high-level feature layer to the low-level feature layer from top to bottom. In this embodiment, three parts of the module are described as follows:
1) the first part of the module is the cross-connect from the base feature layer to the feature extraction layer, which consists of a Batch Normalization layer (Batch Normalization) and 1 x 1 convolution kernel preceded by an Activation layer (Activation), without any downsampling or upsampling operations, as shown in the cross-connect section of fig. 3. In this embodiment, the connection is used in four parts in total, as shown in fig. 1, C2 to P2, C3 to P3, C4 to P4, and C5 to P5 parts.
2) The second part of the module is the bottom-up connection from the lower feature level to the upper feature level, as shown in the downsampling part of fig. 3, which consists of a Batch Normalization layer (Batch Normalization) and a 3 x 3 convolution kernel preceded by an Activation layer (Activation), where the convolution performs the downsampling operation. In this embodiment, this connection is used for two parts in total, as shown in fig. 1, parts P5 to P6, P6 to P7.
3) The third part of the module is a top-down connection from the upper feature level to the lower feature level, as shown in the upsampling part of fig. 3, which is a splice level (splice) that splices the upsampling level with the cross-connections. In this embodiment, the connection is used in three parts in total, as shown in fig. 1, parts P5 to P4, P4 to P3, and P3 to P2.
The three parts of the module eventually output one layer of the feature tensor per scale.
And S23, the target detection module is responsible for obtaining the output of the target detection result, the third part of the detection layer of the model performs the feature selection operation of the detection layer by using the features generated by the second part of the main feature layer of the model, and finally two result branches are output. As shown in fig. 1, the characteristics of the 5 parts P3, P4, P5, P6 and P7 are subjected to characteristic selection by the detection layer, and corresponding target detection outputs are generated. In the present embodiment, the five detection modules are respectively responsible for objects with different sizesDetection, wherein detection blocks generated by P3, P4, P5, P6 and P7 are respectively responsible for monitoring the size range of [0,16 ]]、[16,32]、[32,64]、[64,128]、[128,512]The method is favorable for detecting objects with different sizes by a model, each detection module comprises two branches, the two branches receive a feature tensor extracted by a main feature extraction module, the feature tensor is obtained through four convolution layers with the kernel size of 3 and the downsampling scale of 1, wherein the two branches are a classification branch and a regression branch respectively, the feature tensor of the classification branch passes through convolution branches with the output channel number of C and the output channel number of 1 respectively to obtain two outputs of object confidence coefficient and classification probability, as shown in a classification branch of FIG. 4, the regression branch performs regression prediction on a regression frame, the feature tensor obtains a target detection regression result through the convolution branches with the output channel number of 4, as shown in a regression branch of FIG. 4, the position parameters of the four target frames output by the regression branch are (L, R, T and B), the final convolution result is activated by using Relu function, and the two outputs of the classification branch have result domains of 0,1]And activating the final convolution result by using a sigmoid function, wherein the object confidence coefficient is expressed by adopting a target centrality, the object centrality represents the degree of a point object in the center of the object, and the expression is as follows:
Figure BDA0002431967590000061
the expression mode can effectively filter out low-quality detection objects, and the expression domain of the expression mode is [0,1 ]]. In particular, the initialization of the bias variables of the final convolution kernel of this block is all constant 0. The three parts of the module are shown in figure 4.
The fourth part of the model is responsible for obtaining the output of the semantic segmentation result, the module only receives the feature tensor of the lowest layer of the main body feature extraction module as input, and obtains a semantic segmentation mask output divided by categories through four convolutional layers with the kernel size of 3 and the downsampling scale of 1 and a convolutional layer with the output channel number of C, the size of a result matrix is [ B,128, C ], and the final convolution result is activated by using a sigmoid function, wherein B is the batch processing size, C is the category number, in the embodiment, B is equal to 12, and C is equal to 80 (adopting an MS COCO data set).
And S3, designing a multi-task loss function adaptive to the example segmentation task for deep learning training. The multitask loss function designed by the invention consists of loss functions of a target detection frame, an object confidence coefficient, an object classification and an object segmentation mask.
In the embodiment, the label and the output of the target detection frame both adopt a (L, R, T, B) representation mode, and the representation mode is more intuitive and more efficient in calculation of the intersection ratio than the traditional (X, Y, W, H), so that the target detection frame part of the multitask loss function adopts the intersection ratio loss function.
S32, the object confidence part adopts a focus loss function improved for the object centrality, which can improve the accuracy of the target detection task, and in this embodiment, the improved focus loss function is defined as:
Figure BDA0002431967590000071
wherein, ytThe actual category of the label;
Figure BDA0002431967590000072
to indicate a function, ytThe part greater than 0 is 1, the rest is 0, αtIs one [0,0.5 ]]The fraction of the cells between (a) and (b),
Figure BDA0002431967590000073
make most of the samples (i.e. y)tEqual to 0, no object labeled part) is lower than a few class samples (i.e., y)tGreater than 0, there is a portion of the object label); gamma is a number greater than 0, | yt-pt|γMake samples (| y) easily predictedt-ptWith a smaller value of | is used) ofThe loss value is lower than that of a sample difficult to predict (| y)t-ptThe value of | is greater) loss value; p is a radical oftAre the classification probabilities of the different classes of output.
S33, the object classification part adopts a focus loss function improved for the multi-classification task, and the improved focus loss function can improve the object classification accuracy of the few sample numbers in the data set, and in this embodiment, the improved focus loss function is defined as:
Figure BDA0002431967590000074
wherein N istFor the number of objects in the data set of the category,
Figure BDA0002431967590000075
making the loss value of the majority class sample lower than the loss value of the minority class sample; y istThe actual category of the label; gamma is a number greater than 0 and has a meaning consistent with the improved binary focus loss function; p is a radical oftAre the classification probabilities of the different classes of output.
S34, in this embodiment, the semantic segmentation section employs a cross entropy loss function.
In the neural network training, the network parameters are optimized by using a stochastic gradient descent method, in the embodiment, the initial learning rate is set to be 0.01, the momentum is 0.9, the weight attenuation is 0.0005, the batch size is 12, the total iteration round is 300 rounds, each round comprises 500 batches, the learning rate is descended in 180 rounds, 240 rounds and 280 rounds, and the descent ratio is 0.1.
S4, inference stage, processing the output of the model to obtain the final result, combining the target detection result and the semantic segmentation result, the deep network model outputs five target detection results of different scales and one semantic segmentation result, firstly combining the target detection results, then screening out Top K results, and finally using a non-maximum segmentation screening method combining the target detection result and the semantic segmentation result to obtain the final example segmentation result, as shown in the inference stage of FIG. 1. The method comprises the following specific steps:
the output of the semantic segmentation module can be directly used as a segmentation mask result, and the output of the target detection module is processed by the following steps of firstly obtaining the output of 5 detection modules, obtaining all target detection results, wherein the position parameters (L, R, T, B) of a target detection frame need to be decoded into a traditional representation mode of (X, Y, X, Y), multiplying the downsampling scale corresponding to each output to restore the size corresponding to the original image, finally combining all detection outputs to obtain an initial detection result, simultaneously directly using the output of the semantic segmentation module as the segmentation mask result, then sequencing the obtained initial detection results according to object scores, selecting Top K results, and screening out the result of which the object score is more than 0.5.
After obtaining the detection result and the segmentation result after post-processing, screening the detection result and the segmentation result by using a modified non-maximum segmentation screening method (NMS), wherein the algorithm comprises the following steps:
1) sorting the detection results in a descending order according to the object scores of the target detection results;
2) selecting a detection object m with the highest score, adding the detection object m into a final detection result list, deleting the detection object m from an original list, obtaining segmentation results corresponding to pixels from semantic segmentation results according to the detection category and detection frame results, using the segmentation results as an example segmentation result set of m, and storing the set by using the score of a target object and the segmentation results corresponding to each pixel as key value pairs respectively;
3) calculating the intersection ratio of m and other detection targets, finding out all results of which the intersection ratio is more than 0.5, deleting the results from the detection result list, and adding the scores and the segmentation results of the targets of the objects into the example segmentation result set of m;
4) performing linear combination on all candidate example segmentation results of m to obtain a final example segmentation result of m;
5) and repeating 2-4) until the detection result list is empty, finishing the algorithm, and returning to the final detection result list.
The finally obtained detection result list comprises the object score, the belonged category, the target detection frame and the object segmentation mask of the finally detected object, and can be used as the final output of the instance segmentation task.
In summary, the example segmentation method can give consideration to both the detection quality and the detection speed to a certain extent, and only a small amount of modification is performed on the original mature one-stage target detection framework, so that the model is simple, but efficient and practical.
The above embodiments are embodiments with better experimental effects, but the present invention is not limited to any implementation manner and form, and any modifications such as simple modification, simplification, combination, replacement, etc. according to the technical scheme of the present invention are included in the protection scope of the present invention.

Claims (6)

1. An example segmentation method based on a one-stage target detection framework is characterized by comprising the following steps:
s1, labeling the target objects in the training images into dense point objects in the network output layer;
s2, constructing a complete example segmentation network model based on a one-stage target detection framework;
s3, designing a multi-task loss function adaptive to an example segmentation task for deep learning training;
and S4, in an inference stage, combining the target detection result and the semantic segmentation result to finally obtain an example segmentation result.
2. The example segmentation method based on the one-stage object detection framework of claim 1, wherein the multitask loss function in step S3 is composed of loss functions of four parts, namely an object detection box, an object confidence, an object classification and a semantic segmentation, and specifically includes the following steps:
s31, adopting an intersection ratio loss function by the target detection frame part;
s32, the object confidence degree part adopts a focus loss function improved aiming at the object centrality, and the expression is as follows:
Figure FDA0002431967580000011
s33, the object classification part adopts a focus loss function improved for the multi-classification task, and the expression is as follows:
Figure FDA0002431967580000012
s34, adopting cross entropy loss function in semantic segmentation part,
wherein, ytThe actual category of the label;
Figure FDA0002431967580000013
to indicate a function, ytThe part greater than 0 is 1, the rest is 0, αtIs one [0,0.5 ]]Fractional number between; γ is a number greater than 0; p is a radical oftThe classification probabilities of different categories of the output; n is a radical oftThe number of objects in the dataset for that category.
3. The method of claim 2, wherein the method comprises: in step S1, the target object included in each training image is expressed in the form of dense point objects, and a tensor having a size of [ B, H, W ] is output for each scale, where B is a batch size, H, W is a height and a width of the corresponding scale, and if and only if a detection frame of an object includes a part of points in the tensor, the part of points is encoded in the form of a point object in the following manner:
s11, encoding the object type into a one-hot vector with the length of the vector being C;
s12, encoding the object confidence coefficient into an object centrality;
s13, encoding the target detection box into a vector (L, R, T, B) with the length of 4;
where C is the number of classification categories, L, R, T, B respectively indicate the distance from the center of the point object to the left, right, top, and bottom ends of the corresponding object detection box.
4. The method of one-stage object detection framework-based instance segmentation according to claim 1, wherein: in step S2, the example segmentation network model based on the one-stage object detection framework includes a backbone network, a main feature extraction module, an object detection module, and a semantic segmentation module, which are specifically as follows:
s21, a main network for helping to complete basic feature extraction, which comprises a network input layer for receiving image tensor input and a five-stage down-sampling feature extraction layer;
s22, the main body feature extraction module mainly carries out further mining extraction on the basic features, and the main body feature extraction module comprises three parts, namely transverse connection from a basic feature layer to a feature extraction layer, down-sampling from a lower feature layer to a higher feature layer from bottom to top, and up-sampling from the higher feature layer to the lower feature layer from top to bottom, and finally outputs a feature tensor of each scale;
s23, the target detection module is responsible for obtaining the output of the target detection result, five detection modules under different scales are responsible for the detection of objects with different sizes respectively, each detection module comprises two branches, the two branches both receive the feature tensor extracted by the main feature extraction module, the feature tensor is obtained through the convolution layer with the four kernel sizes of 3 and the down-sampling scale of 1, one branch is a regression branch, the feature tensor is obtained through the convolution branch with the output channel number of 4 to obtain the target detection regression result, the other branch is a classification branch, the feature tensor is obtained through the convolution branches with the output channel number of C and the output channel number of 1 respectively to obtain the target confidence coefficient and classification probability result;
and S24, the semantic segmentation module is responsible for obtaining the output of the semantic segmentation result, and the module only receives the feature tensor of the bottom layer of the main body feature extraction module as input, and obtains the semantic segmentation result divided by categories through four convolutional layers with the kernel size of 3 and the downsampling scale of 1 and a convolutional layer with the output channel number of C.
5. The method of claim 4, wherein the method comprises: the example segmentation network model outputs five target detection results of different scales and a semantic segmentation result of one scale, the target detection results are merged at the inference stage, Top K results are screened out, and finally a non-maximum segmentation screening method combining the target detection results and the semantic segmentation results is used to obtain the final example segmentation result.
6. The method of one-stage object detection framework-based instance segmentation of claim 5, wherein: the algorithm steps of the non-maximum segmentation screening method are as follows:
1) sorting the detection results in a descending order according to the object scores of the target detection results;
2) selecting a detection object m with the highest score, adding the detection object m into a final detection result list, deleting the detection object m from an original list, obtaining segmentation results corresponding to pixels from semantic segmentation results according to the detection category and detection frame results, using the segmentation results as an example segmentation result set of m, and storing the set by using the score of a target object and the segmentation results corresponding to each pixel as key value pairs respectively;
3) calculating the intersection ratio of m and other detection targets, finding out all results of which the intersection ratio is more than 0.5, deleting the results from the detection result list, and adding the scores and the segmentation results of the targets of the objects into the example segmentation result set of m;
4) performing linear combination on all candidate example segmentation results of m to obtain a final example segmentation result of m;
5) and repeating 2-4) until the detection result list is empty, finishing the algorithm, and returning to the final detection result list.
CN202010239127.4A 2020-03-30 2020-03-30 Instance segmentation method based on one-stage target detection framework Active CN111461127B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010239127.4A CN111461127B (en) 2020-03-30 2020-03-30 Instance segmentation method based on one-stage target detection framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010239127.4A CN111461127B (en) 2020-03-30 2020-03-30 Instance segmentation method based on one-stage target detection framework

Publications (2)

Publication Number Publication Date
CN111461127A true CN111461127A (en) 2020-07-28
CN111461127B CN111461127B (en) 2023-06-06

Family

ID=71679336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010239127.4A Active CN111461127B (en) 2020-03-30 2020-03-30 Instance segmentation method based on one-stage target detection framework

Country Status (1)

Country Link
CN (1) CN111461127B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016559A (en) * 2020-08-26 2020-12-01 北京推想科技有限公司 Example segmentation model training method and device and image processing method and device
CN112036555A (en) * 2020-11-05 2020-12-04 北京亮亮视野科技有限公司 Method and device for optimizing target detection framework, storage medium and electronic equipment
CN112102250A (en) * 2020-08-20 2020-12-18 西北大学 Method for establishing and detecting pathological image detection model with training data as missing label
CN112507777A (en) * 2020-10-10 2021-03-16 厦门大学 Optical remote sensing image ship detection and segmentation method based on deep learning
CN112508029A (en) * 2020-12-03 2021-03-16 苏州科本信息技术有限公司 Instance segmentation method based on target box labeling
CN112580646A (en) * 2020-12-08 2021-03-30 北京农业智能装备技术研究中心 Tomato fruit maturity dividing method and picking robot
CN112766046A (en) * 2020-12-28 2021-05-07 深圳市捷顺科技实业股份有限公司 Target detection method and related device
CN112836615A (en) * 2021-01-26 2021-05-25 西南交通大学 Remote sensing image multi-scale solid waste detection method based on deep learning and global reasoning
CN113673505A (en) * 2021-06-29 2021-11-19 北京旷视科技有限公司 Example segmentation model training method, device and system and storage medium
CN113762190A (en) * 2021-09-15 2021-12-07 中科微至智能制造科技江苏股份有限公司 Neural network-based parcel stacking detection method and device
CN114663724A (en) * 2022-03-21 2022-06-24 国网江苏省电力有限公司南通供电分公司 Intelligent identification method and system for kite string image
CN117152422A (en) * 2023-10-31 2023-12-01 国网湖北省电力有限公司超高压公司 Ultraviolet image anchor-free frame target detection method, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985269A (en) * 2018-08-16 2018-12-11 东南大学 Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure
CN110084234A (en) * 2019-03-27 2019-08-02 东南大学 A kind of sonar image target identification method of Case-based Reasoning segmentation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985269A (en) * 2018-08-16 2018-12-11 东南大学 Converged network driving environment sensor model based on convolution sum cavity convolutional coding structure
CN110084234A (en) * 2019-03-27 2019-08-02 东南大学 A kind of sonar image target identification method of Case-based Reasoning segmentation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗晖;芦春雨;郑翔文;: "一种基于多尺度角点检测的语义分割网络" *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102250B (en) * 2020-08-20 2022-11-04 西北大学 Method for establishing and detecting pathological image detection model with training data as missing label
CN112102250A (en) * 2020-08-20 2020-12-18 西北大学 Method for establishing and detecting pathological image detection model with training data as missing label
CN112016559A (en) * 2020-08-26 2020-12-01 北京推想科技有限公司 Example segmentation model training method and device and image processing method and device
CN112507777A (en) * 2020-10-10 2021-03-16 厦门大学 Optical remote sensing image ship detection and segmentation method based on deep learning
CN112036555B (en) * 2020-11-05 2021-02-05 北京亮亮视野科技有限公司 Method and device for optimizing target detection framework, storage medium and electronic equipment
CN112036555A (en) * 2020-11-05 2020-12-04 北京亮亮视野科技有限公司 Method and device for optimizing target detection framework, storage medium and electronic equipment
CN112508029A (en) * 2020-12-03 2021-03-16 苏州科本信息技术有限公司 Instance segmentation method based on target box labeling
CN112580646A (en) * 2020-12-08 2021-03-30 北京农业智能装备技术研究中心 Tomato fruit maturity dividing method and picking robot
CN112766046A (en) * 2020-12-28 2021-05-07 深圳市捷顺科技实业股份有限公司 Target detection method and related device
CN112766046B (en) * 2020-12-28 2024-05-10 深圳市捷顺科技实业股份有限公司 Target detection method and related device
CN112836615A (en) * 2021-01-26 2021-05-25 西南交通大学 Remote sensing image multi-scale solid waste detection method based on deep learning and global reasoning
CN113673505A (en) * 2021-06-29 2021-11-19 北京旷视科技有限公司 Example segmentation model training method, device and system and storage medium
CN113762190A (en) * 2021-09-15 2021-12-07 中科微至智能制造科技江苏股份有限公司 Neural network-based parcel stacking detection method and device
CN114663724A (en) * 2022-03-21 2022-06-24 国网江苏省电力有限公司南通供电分公司 Intelligent identification method and system for kite string image
CN117152422A (en) * 2023-10-31 2023-12-01 国网湖北省电力有限公司超高压公司 Ultraviolet image anchor-free frame target detection method, storage medium and electronic equipment
CN117152422B (en) * 2023-10-31 2024-02-13 国网湖北省电力有限公司超高压公司 Ultraviolet image anchor-free frame target detection method, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111461127B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN111461127B (en) Instance segmentation method based on one-stage target detection framework
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN111428718B (en) Natural scene text recognition method based on image enhancement
CN112966684B (en) Cooperative learning character recognition method under attention mechanism
CN111210443A (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN110197182A (en) Remote sensing image semantic segmentation method based on contextual information and attention mechanism
CN108062756A (en) Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN110322495A (en) A kind of scene text dividing method based on Weakly supervised deep learning
CN110070091B (en) Semantic segmentation method and system based on dynamic interpolation reconstruction and used for street view understanding
CN110674305A (en) Deep feature fusion model-based commodity information classification method
CN110826609B (en) Double-current feature fusion image identification method based on reinforcement learning
CN111310766A (en) License plate identification method based on coding and decoding and two-dimensional attention mechanism
CN113569865A (en) Single sample image segmentation method based on class prototype learning
CN112733590A (en) Pedestrian re-identification method based on second-order mixed attention
CN116645592B (en) Crack detection method based on image processing and storage medium
CN114092815B (en) Remote sensing intelligent extraction method for large-range photovoltaic power generation facility
CN112733942A (en) Variable-scale target detection method based on multi-stage feature adaptive fusion
CN114821022A (en) Credible target detection method integrating subjective logic and uncertainty distribution modeling
Sethy et al. Off-line Odia handwritten numeral recognition using neural network: a comparative analysis
CN114373092A (en) Progressive training fine-grained vision classification method based on jigsaw arrangement learning
CN116844143B (en) Embryo development stage prediction and quality assessment system based on edge enhancement
Li A deep learning-based text detection and recognition approach for natural scenes
CN116797821A (en) Generalized zero sample image classification method based on fusion visual information
CN110222222A (en) Based on deep layer theme from the multi-modal retrieval method of encoding model
CN110070018A (en) A kind of earthquake disaster scene recognition method of combination deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant