CN115019039B - Instance segmentation method and system combining self-supervision and global information enhancement - Google Patents

Instance segmentation method and system combining self-supervision and global information enhancement Download PDF

Info

Publication number
CN115019039B
CN115019039B CN202210582668.6A CN202210582668A CN115019039B CN 115019039 B CN115019039 B CN 115019039B CN 202210582668 A CN202210582668 A CN 202210582668A CN 115019039 B CN115019039 B CN 115019039B
Authority
CN
China
Prior art keywords
instance
network
supervision
global information
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210582668.6A
Other languages
Chinese (zh)
Other versions
CN115019039A (en
Inventor
高榕
沈加伟
邵雄凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202210582668.6A priority Critical patent/CN115019039B/en
Publication of CN115019039A publication Critical patent/CN115019039A/en
Application granted granted Critical
Publication of CN115019039B publication Critical patent/CN115019039B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an example segmentation method and system combining self-supervision and global information enhancement, wherein the construction method firstly obtains a feature pyramid and performs feature map fusion by a feature extraction network based on ResNet networks and FPN modules; modeling the interaction relation among pixels of the feature images by adopting a Fastformer-based global information enhancement network, and extracting global information; then, carrying out instance segmentation through a prediction network, wherein the category prediction network is used for carrying out multi-label classification on the interested instance, and the mask prediction network is used for carrying out pixel value classification on the region where the instance is located, so as to generate an instance mask; in addition, a self-supervision learning network is added for carrying out contrast learning among examples in the picture, and the understanding capability of the model on the picture is enhanced to enhance generalization. The method can solve the problem of low detection performance on shielding and incomplete objects, strengthen the generalization capability of the model and improve the segmentation performance in a scene with more noise.

Description

Instance segmentation method and system combining self-supervision and global information enhancement
Technical Field
The invention relates to the technical field of artificial intelligence and computer vision, in particular to an instance segmentation method and system combining self-supervision and global information enhancement.
Background
Instance segmentation is a more challenging task in the field of computer vision relative to object detection, involving the task of object detection and semantic segmentation. Firstly, positioning and classifying objects of interest in an image, and then, carrying out semantic segmentation on an instance to separate a foreground and a background. With the rapid development of intelligent driving and medical image segmentation and other technologies, the performance and instantaneity of an example segmentation algorithm are also put forward higher requirements. However, the conventional top-down object detection-based instance segmentation method and system and the bottom-up semantic segmentation-based method still have difficulty in meeting the requirements of the current intelligent driving and other fields on an instance segmentation algorithm in terms of instantaneity and performance.
How to enhance the performance of the instance segmentation algorithm and shorten the forward reasoning time has great significance. In recent years, some excellent single-stage instance segmentation algorithms are proposed, so that the problems are alleviated, and a more ideal effect is achieved. Nevertheless, these algorithms still suffer from a number of drawbacks: the feature extraction network based on convolution lacks global information during information extraction, so that the detection effect on incomplete or blocked objects is poor; in addition, the supervised training mode causes poor generalization capability of the trained model, and the performance of the algorithm is difficult to develop for a scene with high noise.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide an instance segmentation method and system combining self-supervision and global information enhancement, and aims to solve the problems that the existing instance segmentation method and system lack global information in a feature extraction stage, have poor generalization capability and have poor segmentation effect on a scene with large noise.
In order to achieve the above object, the present invention provides an example segmentation method and system combining self-supervision and global information enhancement, including:
Step S1: establishing an instance segmentation model;
the example segmentation model comprises a feature extraction network, a global information enhancement network, a self-supervision learning network, a category prediction network and a mask prediction network;
the feature extraction network comprises ResNet network and FPN network, resNet is used for obtaining a picture pyramid by superposing a plurality of convolution layers, relu layers and normalization layers and residual connection. The FPN is used for combining semantic information rich in the upper-layer feature map and accurate position information of the lower-layer feature map in the feature pyramid to perform feature fusion;
The global information enhancement network is composed of Fastformer modules and is used for modeling the interaction relation between each pixel point in the feature map, extracting context information and enhancing the global information of the feature map;
The self-supervision learning network is used for carrying out contrast learning on examples in the pictures, enhancing the understanding capability of the pictures and enhancing the generalization capability of the model;
The class prediction network is used for performing multi-label classification on the interested examples to obtain the corresponding class of each example;
The mask prediction network is used for carrying out two classifications on the pixel points in the selected instance area, distinguishing the foreground from the background and generating the mask of the instance.
Step S2: training an example segmentation model;
And inputting the selected training data set which comprises the picture data and the corresponding label file. Firstly extracting a feature map, and then fusing the feature map. And then, the global information is enhanced, the global information is input into a prediction network for prediction, a loss function is obtained by comparing the global information with a label file, and the model training direction is guided by back propagation of the loss function.
Step S3: instance partitioning
The picture is divided into S x S networks, each of which is responsible for predicting the instance where the center point falls in that location. I.e. centering on the grid, predicts the class and mask of the corresponding instance.
Optionally, the feature extraction network is ResNet-50 and FPN network.
Further, the global information enhancement module is a Fastformer network based on additive attention.
The additive attention is subjected to linear transformation according to the input characteristic sequence E epsilon R N×d (N is the sequence length and d is the hidden dimension) to respectively obtain a query matrix, a key matrix and a value matrix, and the query matrix, the key matrix and the value matrix are marked as Q, K and V epsilon R N×d.
And generating a weight matrix by adopting additive attention to the query matrix Q, and adding the weight matrix Q to obtain a global query matrix. The global query vector Q is then dot multiplied with the key vector K, modeling their interrelationships.
Further, the same operation is adopted to generate a global key vector, interactive modeling is carried out on the global key vector and the value vector V, and finally, a feature vector containing rich global semantic information is obtained.
The self-supervision learning network firstly utilizes the marking box label information to obtain all instance characteristic representations, and for a randomly selected sample instance A, other instances are used as candidate pools, and similarity scores of the sample instance A and the candidate pools are calculated.
Optionally, the similarity score calculating process is as follows:
Further, the examples are ranked according to the similarity score, top-k is taken as a query set Q, and then the query set is utilized to mine false positive examples in the candidate pool.
The process for mining the false positive example comprises the following steps:
(1) And calculating the similarity between each instance in the Q and the instance in the candidate pool. Each instance I of the candidate pool gets N similarity scores (N is the number of instances in the query set Q).
(2) And (3) performing aggregation operation on the similarity scores, sequencing, taking an instance of top-k exceeding a threshold value as a pseudo positive instance, and adding the pseudo positive instance into the query set Q.
(3) And continuing to perform false positive mining by using the updated query set Q until the mined false positive is lower than the threshold value. The query set is taken as a pseudo positive example set, and the remaining examples in the candidate pool are taken as negative example sets.
(4) Obtaining a similarity score of the sample A and each example in the pseudo positive example set by using a softmax function:
Where p i is a pseudo-positive example set instance, N n is the number of negative examples, and N i is a negative example set instance.
Optionally, taking a negative logarithm of the similarity score to obtain a comparison learning loss function:
Further, the class prediction network adopts a Focal loss, and a loss function is obtained by predicting the probability that each instance belongs to a certain class.
The mask prediction network is used for carrying out two classifications on the pixel points in the selected instance area, distinguishing the foreground from the background and generating the mask of the instance.
Optionally, the mask predicts the network loss function as:
Where N pos is the positive sample number, is the class score predicted by the cell at the (i, j) position, and ψ is the indicator function.
Optionally, for d mask, use is made of Dice Loss:
LDice=1-D(p,q)
Where P x,y represents the predicted pixel value of the cell at (x, y) and q x,y represents the true pixel value of the cell at (x, y).
Compared with the prior art, the technical scheme provided by the invention has the following beneficial effects:
(1) On the basis of a single-stage example segmentation algorithm, the method is used for modeling global semantic information at a pixel level in the feature map by adding the Fastformer module based on the additive attention, so that the segmentation effect of the model on the shielded and incomplete object is improved.
(2) According to the invention, the self-supervision learning module is added in the prediction network, and the understanding capability of the model to the picture is enhanced and the generalization capability of the model is enhanced by carrying out contrast learning on all examples in the picture.
Drawings
FIG. 1 is a flow chart of an example segmentation model provided by an embodiment of the present invention;
FIG. 2 is a diagram of an example segmentation model framework provided by an embodiment of the present invention;
FIG. 3 is an image to be measured provided by an embodiment;
FIG. 4 (a) is a segmentation result obtained by the original single-phase example segmentation method and system;
fig. 4 (b) is an example segmentation result obtained using the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the invention provides an example segmentation method and system combining self-supervision and global information enhancement, comprising the following steps:
Step S1: establishing an instance segmentation model;
As shown in fig. 1, the example segmentation model includes a feature extraction network, a global information enhancement network, a self-supervised learning network, a class prediction network, and a mask prediction network;
The feature extraction network comprises ResNet-50 network and FPN network, resNet is used for obtaining four layers of picture pyramids with different scales by superposing a plurality of convolution layers, relu layers and normalization layers and residual connection. The FPN is used for combining semantic information rich in the upper-layer feature images and accurate position information of the lower-layer feature images in the feature pyramid to perform feature fusion;
The global information enhancement network is Fastformer module, which is used for modeling the interaction relation between each pixel point in the feature map, extracting context information and enhancing the global information of the feature map.
According to the input characteristic sequence E epsilon R N×d (N is sequence length and d is hidden dimension), making linear transformation to obtain inquiry matrix, key matrix and value matrix, respectively, recording them as Q,K,V∈RN×d:Q=[q1,q2,...,qN],K=[k1,k2,...,kN],V=[v1,v2,...,vN].
Generating a weight matrix by adopting additive attention to a query matrix Q, and adding the weight matrix Q to obtain a global query matrix:
Where α i is the attention weight value of a certain vector Q i in the query matrix Q, and w q∈Rd is a learnable parameter vector. The global query vector Q is then dot multiplied with the key vector K, modeling their interrelationships.
And generating a global key vector by adopting the same operation, performing interactive modeling with the value vector V, and finally obtaining a feature vector containing rich global semantic information.
The self-supervision learning network is used for carrying out contrast learning on the examples in the pictures, enhancing the understanding capability of the pictures and enhancing the generalization capability of the model;
Firstly, obtaining all instance characteristic representations by utilizing the marking box label information, and for a randomly selected sample instance A, taking the rest instances as candidate pools, calculating similarity scores of the sample instances and the candidate pools, wherein the calculation formula is as follows:
The examples are ordered according to the similarity score, top-k is taken as a query set Q, then pseudo-positive examples are mined in a candidate pool by utilizing the query set, and the mining process comprises the following steps:
(1) And calculating the similarity between each instance in the Q and the instance in the candidate pool. Each instance I of the candidate pool gets N similarity scores (N is the number of instances in the query set Q):
S(I,Q)=(S(I,q1),S(I,q2),...,S(I,qN))
(2) And (3) performing aggregation operation on the similarity scores, sequencing, taking an instance of top-k exceeding a threshold value as a pseudo positive instance, and adding the pseudo positive instance into the query set Q.
(3) And continuing to perform false positive mining by using the updated query set Q until the mined false positive is lower than the threshold value. The query set is taken as a pseudo positive example set, and the remaining examples in the candidate pool are taken as negative example sets.
(4) Obtaining a similarity score of the sample A and each example in the pseudo positive example set by using a softmax function:
Where p i is a pseudo-positive example set instance, N n is the number of negative examples, and N i is a negative example set instance.
Taking the negative logarithm of the similarity score to obtain a comparison learning loss function:
The class prediction network is used for performing multi-label classification on the interested examples to obtain the corresponding class of each example;
and the mask prediction network is used for carrying out two classifications on the pixel points in the selected instance area, distinguishing the foreground from the background and generating the mask of the instance. The mask predictive network loss function is:
Where N pos is the positive sample number, is the class score predicted by the cell at the (i, j) position, and ψ is the indicator function.
For d mask, use is made of Dice Loss:
LDice=1-D(p,q)
Step S2: training an example segmentation model;
and inputting the selected training data set which comprises the picture data and the corresponding label file. Firstly extracting a feature map, and then fusing the feature map. And then, global information is enhanced, the global information is input into a head network for prediction, a loss function is obtained, the reverse propagation direction is influenced through the loss function, and model training is guided.
The present invention uses a city road street view dataset CITYSCAPES for model training, which uses street view images of different cities. Contains 2975 training sets, 500 validation sets and 1525 test images with high quality annotations.
Step S3: instance partitioning
The picture is first divided into S x S networks, each of which is responsible for predicting the instance where the center point falls in that location. I.e. centering on the grid, predicts the class and mask of the corresponding instance.
Fig. 2 is an input image, fig. 3 is an image to be measured provided in the embodiment, and the segmentation result using the original single-stage example segmentation method and system is shown in fig. 4 (a), it can be seen that the mask generated by the motorcycle on the right side of the first picture has poor matching degree, the enclosing wall is identified as a truck due to poor light and more noise on the right half part in the second picture, and the third picture is for an incomplete example: the motorcycle and rider are not well separated. The example segmentation results using the method of the present invention are shown in FIG. 4 (b), and are all very good improvements to the above case.
The method improves the problem that the original single-stage example segmentation algorithm has poor detection effect on the blocked or incomplete object to a certain extent, and further greatly improves the generalization capability of the model and the segmentation effect in the scenes such as insufficient illumination or over-strong exposure, rainy days and the like.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. An instance segmentation method combining self-supervision and global information enhancement, comprising:
Step S1: establishing an instance segmentation model;
the example segmentation model comprises a feature extraction network, a global information enhancement network, a self-supervision learning network, a category prediction network and a mask prediction network;
The feature extraction network comprises ResNet network and FPN network, resNet is used for obtaining a picture pyramid by superposing a plurality of convolution layers, relu layers and normalization layers and residual connection; the FPN is used for combining semantic information rich in the upper-layer feature map and accurate position information of the lower-layer feature map in the feature pyramid to perform feature fusion;
the global information enhancement network is composed of Fastformer modules and is used for modeling the interaction relation between each pixel point in the feature map, extracting context information and enhancing the global information extraction capability of the feature map;
The self-supervision learning network is used for carrying out self-supervision contrast learning on the examples in the pictures, enhancing the understanding capability of the pictures and enhancing the generalization capability of the model;
The class prediction network is used for performing multi-label classification on the interested examples to obtain the corresponding class of each example;
the mask prediction network is used for carrying out two classifications on the pixel points in the selected instance area, distinguishing the foreground from the background and generating a mask of the instance;
Step S2: training an example segmentation model;
Inputting a selected training data set comprising picture data and corresponding tag files; firstly extracting a feature map, and then fusing the feature map; then, global information is enhanced, the global information is input into a head network for prediction, a loss function is obtained, and the direction of model training is optimized through back propagation of the loss function;
Step S3: instance partitioning
Firstly, dividing a picture into S multiplied by S networks, wherein each grid is responsible for predicting an instance that a center point falls at the position; i.e. centering on the grid, predicts the class and mask of the corresponding instance.
2. An instance segmentation method combining self-supervision and global information enhancement according to claim 1, wherein the feature extraction networks are ResNet-50 and FPN networks.
3. An instance splitting method combining self-supervision and global information enhancement according to claim 1, wherein the global information enhancement network is an additive attention-based Fastformer network.
4. An example segmentation method according to claim 3, wherein the additive attention is linearly transformed according to the input characteristic sequence E R B×d, B is the sequence length, d is the hidden dimension to obtain a query matrix, a key matrix and a value matrix, respectively, denoted Q, K, V E R B×d.
5. The method for partitioning instances by combining self-supervision and global information enhancement as recited in claim 4, wherein said query matrix Q is weighted by additive attention, and added to Q to obtain a global query matrix vector; and then, carrying out point multiplication on Q and K, and modeling the interrelationship of the Q and the K.
6. The method for instance segmentation combining self-supervision and global information enhancement according to claim 5, wherein the key matrix K is subjected to additive attention generation weight matrix, global key vectors are obtained by adding the weight matrix K, interactive modeling is performed with V, and finally feature vectors containing rich global semantic information are obtained.
7. The method for partitioning an instance by combining self-supervision and global information enhancement according to claim 1, wherein the self-supervision learning network first obtains feature representations of all instances by using a binding box label information, and calculates similarity scores between the rest instances as candidate pools for randomly selected sample instances a.
8. The method of claim 7, wherein the similarity score calculation process is as follows:
and sorting the examples according to the similarity scores, taking top-k as a query set Q, and then mining pseudo-positive examples in the candidate pool by utilizing the query set.
9. The method of claim 8, wherein the mining pseudo-positive example process comprises:
(1) Calculating the similarity between each instance in Q and the instance in the candidate pool; n similarity scores are obtained for each instance I of the candidate pool, wherein N is the number of instances in the query set Q;
(2) Performing aggregation operation on the similarity scores, sequencing, taking an instance of top-k exceeding a threshold as a pseudo positive instance, and adding the pseudo positive instance into a query set Q;
(3) Continuing to utilize the updated query set Q to perform pseudo-positive example mining until the mined pseudo-positive examples are lower than a threshold value, taking the query set as a pseudo-positive example set and taking the rest examples in the candidate pool as negative example sets;
(4) Obtaining a similarity score of the sample A and each example in the pseudo positive example set by using a softmax function:
Wherein, p i is a pseudo positive example set instance, N n is the number of negative samples, and N i is a negative example set instance;
(5) Taking the negative logarithm of the similarity score to obtain a comparison learning loss function:
10. The method for partitioning instances by combining self-supervision and global information enhancement according to claim 1, wherein the class prediction network uses Focal loss to obtain a loss function by predicting the probability that each instance belongs to a certain class; the mask prediction network is used for carrying out two classifications on the pixel points in the selected instance area, distinguishing the foreground from the background and generating the mask of the instance.
CN202210582668.6A 2022-05-26 2022-05-26 Instance segmentation method and system combining self-supervision and global information enhancement Active CN115019039B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210582668.6A CN115019039B (en) 2022-05-26 2022-05-26 Instance segmentation method and system combining self-supervision and global information enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210582668.6A CN115019039B (en) 2022-05-26 2022-05-26 Instance segmentation method and system combining self-supervision and global information enhancement

Publications (2)

Publication Number Publication Date
CN115019039A CN115019039A (en) 2022-09-06
CN115019039B true CN115019039B (en) 2024-04-16

Family

ID=83071360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210582668.6A Active CN115019039B (en) 2022-05-26 2022-05-26 Instance segmentation method and system combining self-supervision and global information enhancement

Country Status (1)

Country Link
CN (1) CN115019039B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024103380A1 (en) * 2022-11-18 2024-05-23 Robert Bosch Gmbh Method and apparatus for instance segmentation
CN116664845B (en) * 2023-07-28 2023-10-13 山东建筑大学 Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism
CN117853732A (en) * 2024-01-22 2024-04-09 广东工业大学 Self-supervision re-digitizable terahertz image dangerous object instance segmentation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10430946B1 (en) * 2019-03-14 2019-10-01 Inception Institute of Artificial Intelligence, Ltd. Medical image segmentation and severity grading using neural network architectures with semi-supervised learning techniques
CN112927245A (en) * 2021-04-12 2021-06-08 华中科技大学 End-to-end instance segmentation method based on instance query
CN113392711A (en) * 2021-05-19 2021-09-14 中国科学院声学研究所南海研究站 Smoke semantic segmentation method and system based on high-level semantics and noise suppression
CN113837205A (en) * 2021-09-28 2021-12-24 北京有竹居网络技术有限公司 Method, apparatus, device and medium for image feature representation generation
CN114387454A (en) * 2022-01-07 2022-04-22 东南大学 Self-supervision pre-training method based on region screening module and multi-level comparison

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11830253B2 (en) * 2020-04-14 2023-11-28 Toyota Research Institute, Inc. Semantically aware keypoint matching
US11941086B2 (en) * 2020-11-16 2024-03-26 Salesforce, Inc. Systems and methods for contrastive attention-supervised tuning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10430946B1 (en) * 2019-03-14 2019-10-01 Inception Institute of Artificial Intelligence, Ltd. Medical image segmentation and severity grading using neural network architectures with semi-supervised learning techniques
CN112927245A (en) * 2021-04-12 2021-06-08 华中科技大学 End-to-end instance segmentation method based on instance query
CN113392711A (en) * 2021-05-19 2021-09-14 中国科学院声学研究所南海研究站 Smoke semantic segmentation method and system based on high-level semantics and noise suppression
CN113837205A (en) * 2021-09-28 2021-12-24 北京有竹居网络技术有限公司 Method, apparatus, device and medium for image feature representation generation
CN114387454A (en) * 2022-01-07 2022-04-22 东南大学 Self-supervision pre-training method based on region screening module and multi-level comparison

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Advances in Neural information processing systems;WANG X et al.;《Solov2: Dynamic and fast instance segmentation》;20201231;第33卷;第17721-17732页 *
Self-Supervised Attention Learning for Depth and Ego-motion Estimation;Assem Sadek et al.;《2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)》;20210124;全文 *

Also Published As

Publication number Publication date
CN115019039A (en) 2022-09-06

Similar Documents

Publication Publication Date Title
Khodabandeh et al. A robust learning approach to domain adaptive object detection
Wang et al. Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes
CN110322446B (en) Domain self-adaptive semantic segmentation method based on similarity space alignment
CN115019039B (en) Instance segmentation method and system combining self-supervision and global information enhancement
CN109711463B (en) Attention-based important object detection method
CN114005096B (en) Feature enhancement-based vehicle re-identification method
Wan et al. An efficient small traffic sign detection method based on YOLOv3
Wang et al. An advanced YOLOv3 method for small-scale road object detection
CN113159120A (en) Contraband detection method based on multi-scale cross-image weak supervision learning
Tian et al. Small object detection via dual inspection mechanism for UAV visual images
Shen et al. Vehicle detection in aerial images based on lightweight deep convolutional network and generative adversarial network
Li et al. Detection-friendly dehazing: Object detection in real-world hazy scenes
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
Yuan Language bias in visual question answering: A survey and taxonomy
Liu et al. Density saliency for clustered building detection and population capacity estimation
Yan et al. Video scene parsing: An overview of deep learning methods and datasets
Wu et al. Vehicle detection based on adaptive multi-modal feature fusion and cross-modal vehicle index using RGB-T images
Huang et al. Pedestrian detection using RetinaNet with multi-branch structure and double pooling attention mechanism
Lv et al. Contour deformation network for instance segmentation
Li et al. Object extraction from very high-resolution images using a convolutional neural network based on a noisy large-scale dataset
Nam et al. A novel unsupervised domain adaption method for depth-guided semantic segmentation using coarse-to-fine alignment
CN115965968A (en) Small sample target detection and identification method based on knowledge guidance
Li et al. Prediction model of urban street public space art design indicators based on deep convolutional neural network
Islam et al. Faster R-CNN based traffic sign detection and classification
CN113298037B (en) Vehicle weight recognition method based on capsule network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant