CN112926673A - Semi-supervised target detection method based on consistency constraint - Google Patents

Semi-supervised target detection method based on consistency constraint Download PDF

Info

Publication number
CN112926673A
CN112926673A CN202110286708.8A CN202110286708A CN112926673A CN 112926673 A CN112926673 A CN 112926673A CN 202110286708 A CN202110286708 A CN 202110286708A CN 112926673 A CN112926673 A CN 112926673A
Authority
CN
China
Prior art keywords
image
images
reconstructed
training
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110286708.8A
Other languages
Chinese (zh)
Other versions
CN112926673B (en
Inventor
王好谦
王颢涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202110286708.8A priority Critical patent/CN112926673B/en
Publication of CN112926673A publication Critical patent/CN112926673A/en
Application granted granted Critical
Publication of CN112926673B publication Critical patent/CN112926673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

A semi-supervised target detection method based on consistency constraint comprises the following steps: performing data enhancement on the training set to obtain a reconstructed training set; constructing any target detection model based on deep learning; in each training cycle, in each training batch, simultaneously inputting the images sampled in the training set and the corresponding reconstructed images in the reconstructed training set into a model network, calculating the error between the prediction result of the original image and the truth value label of the original image, calculating the consistency error between the original image and the reconstructed images, and performing weighted summation of the two errors to obtain the total error of model training; updating parameters by using a batch gradient descending method; and carrying out target detection on the input image by using the trained network to obtain the position and the category of the target in the input image. Compared with the traditional fully-supervised target detection model, the method and the device can achieve equivalent performance by using fewer manual labels, or achieve better performance by using the same number of labels and more label-free images.

Description

Semi-supervised target detection method based on consistency constraint
Technical Field
The invention relates to the field of computer vision and image processing, in particular to a semi-supervised target detection method based on consistency constraint.
Background
Object Detection (Object Detection) is one of the most important and challenging problems in the field of computer vision. Given an input image of arbitrary size, the object detection model outputs the locations and classes of one or more predefined classes of objects in the image. The target detection has very wide application scenes, such as automatic driving, industrial production, video monitoring, medical image processing, satellite image processing and the like. Therefore, target detection is always a very interesting research problem in academia and industry.
Currently, most of mainstream target detection models are based on a deep neural network and adopt a training mode of full supervision learning. Under the full-supervised learning mode, each training image should have an accurate and comprehensive label. According to research, it takes about 10 seconds to accurately mark one object, and one image often has a plurality of objects. Since the training of the deep neural network requires a large amount of data, it takes a lot of time and labor to label the training image. Meanwhile, the non-label data is not lacked in many application scenarios, but the existing fully supervised learning method cannot effectively utilize the non-label data. As described above, the use of the unlabeled training data is helpful for reducing the dependence of the deep neural network on artificial labeling, and is also helpful for the model to make full use of the unlabeled data with wider sources and larger quantity.
The semi-supervised learning is a learning mode which not only can obtain strong supervised learning signals from the labels, but also can mine useful learning information from unlabelled training data. However, existing semi-supervised learning focuses mainly on classification tasks. The semi-supervised learning is not sufficiently explored on the aspect of target detection problems with higher labeling cost and more difficult learning process. Therefore, the semi-supervised learning method is introduced into the target detection task, and has stronger academic value and application prospect.
The existing semi-supervised target detection method and the semi-supervised classification method have many common points, wherein the mainstream semi-supervised target detection method adopts a learning mode based on self-training. Self-training means that an initial model is generated from a labeled image by training in a full-supervised learning mode, then the unlabeled image is processed by the model, and a high-confidence result is used as a pseudo label of the unlabeled image; this process iterates multiple times until a stop condition is met. However, this type of method requires lengthy training time and is too sensitive to the hyper-parameters of pseudo-label screening.
Another common and effective semi-supervised classification approach is based on consistency constraints. The consistency constraint refers to a small amount of perturbation to the input image, and the output should remain consistent. Since the output of the classification problem is only a fixed-dimension class vector, which is robust to the pixel position distribution and color distribution of the input image, it is very simple and natural to perturb the input, such as mirror inversion, clipping, color dithering, etc. However, for the object detection problem, the output is highly correlated with the pixel position of the input image, and therefore, it is very challenging to design a suitable perturbation to the input image so that the object detection task can learn consistency from the perturbation.
It is to be noted that the information disclosed in the above background section is only for understanding the background of the present application and thus may include information that does not constitute prior art known to a person of ordinary skill in the art.
Disclosure of Invention
The invention mainly aims to provide a semi-supervised target detection method based on consistency constraint, so as to solve the problem that model training in the background technology is highly dependent on artificial labels.
In order to achieve the purpose, the invention adopts the following technical scheme:
a semi-supervised target detection method based on consistency constraint comprises the following steps:
first step, data enhancement: performing data enhancement on the training set to obtain a reconstructed training set;
secondly, model initialization: constructing any target detection model based on deep learning;
thirdly, model training: in each training cycle, in each training batch, simultaneously inputting the images sampled in the training set and the corresponding reconstructed images in the reconstructed training set into a model network, and calculating the error between the prediction result of the original images and the true value label of the original images, wherein if the original images are not labeled, the corresponding loss function value is 0; calculating a consistency error between the original image and the reconstructed image, and performing weighted summation on the two errors to serve as a total error of model training, wherein if the original image is not labeled, the value of a corresponding item in the weighted summation process is 0; then updating the model parameters by using a batch gradient descent method;
the fourth step: target detection: and carrying out target detection on the input image by using the trained network model to obtain the position and the category of the target in the input image.
Further:
the first step comprises: for each image a in the training set, it is cropped into a plurality of sub-images and rearranged spatially, generating a reconstructed image b of the same size as image a.
In the first step, each image in the training set is cut along a horizontal center line and a vertical center line to obtain four sub-images of upper left, upper right, lower right and lower left, which are recorded as A, B, C and D, then A is horizontally translated to the position of B, B is vertically translated to the position of C, C is horizontally translated to the position of D, and D is vertically translated to the position of A, so that a reconstructed image is obtained.
The target detection model is fast R-CNN, YOLO, SSD, CenterNet or CornerNet; the Faster R-CNN comprises a neural Network framework (Backbone), a Feature Pyramid Network (Feature Pyramid Network), a Region suggestion Network (RPN) and a Head neural Network (Head Network).
In the third step, the original image with the label and the image without the label are mixed and disordered, and then the image and the corresponding reconstructed image are sequentially input into a network to calculate a loss function; the loss function includes two parts, one is the error between the output of the labeled image and the corresponding label, and the other is the loss of consistency of the results between all images and the corresponding reconstructed image.
The third step includes:
inputting the original images and the corresponding reconstructed images in the training set into a network in sequence, and respectively predicting to obtain a boundary box set B of the original images and a boundary box set B' of the reconstructed images; constructing a loss function, wherein the loss function comprises two parts: one part is the error between the bounding box of the original image with labels and the truth label, wherein the position error uses smooth L1 loss function, the category error uses cross entropy loss function, if the original image has no truth label, the part loss function is 0; the other part is consistency loss between the original image and the reconstructed image; wherein, the boundary frame set B on the theoretically reconstructed image is obtained by reconstructing the boundary frame set B of the original image2With B2Predicting B' and B as truth labels2Wherein the error is defined using a DIoU loss function as
Figure BDA0002980759620000031
Wherein IoU represents the ratio of the area of intersection of the two bounding boxes to the area of union, d represents the Euclidean distance between the center points of the two bounding boxes, and c represents the diagonal distance of the minimum closure region containing both bounding boxes.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method.
The beneficial effect of this application and prior art contrast includes:
compared with other traditional full-supervision target detection models, the semi-supervision target detection method based on consistency constraint can achieve equivalent performance by using fewer manual labels or achieve better performance by using the same number of labels and more label-free images. The method and the device design a consistency constraint, so that a prediction result of a reconstructed image and a prediction result of an original image meet a certain geometric relationship in space, a model is ensured to obtain a supervision signal under the condition that a training image is not labeled, and useful knowledge is learned. By adding a geometric constraint to the unlabeled image, the model can effectively learn useful knowledge from the unlabeled image, thereby reducing the number of labels required for model training.
Drawings
FIG. 1 is a simplified flowchart of a semi-supervised target detection method based on consistency constraints according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a method for splitting and reconstructing a training image in a semi-supervised target detection method based on consistency constraint according to an embodiment of the present invention, where the left side in fig. 2 is an original image, and the right side is a reconstructed image.
Fig. 3 is a schematic diagram of a theoretical change of a bounding box after a training image is split and reconstructed in the semi-supervised target detection method based on the consistency constraint according to an embodiment of the present invention, where the left side in fig. 3 is an original image prediction result B, and the right side is a theoretically reconstructed image prediction result B2.
Fig. 4 is a schematic diagram of a DIoU in a consistency constraint-based loss function in a consistency constraint-based semi-supervised target detection method according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of a training process when an Faster R-CNN target detection model is used in the semi-supervised target detection method based on consistency constraint according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
Referring to fig. 1, an embodiment of the present invention provides a semi-supervised target detection method based on consistency constraint, including the following steps:
first step, data enhancement: performing data enhancement on the training set to obtain a reconstructed training set;
secondly, model initialization: constructing any target detection model based on deep learning;
thirdly, model training: in each training cycle, in each training batch, simultaneously inputting the images sampled in the training set and the corresponding reconstructed images in the reconstructed training set into a model network, and calculating the error between the prediction result of the original images and the true value label of the original images, wherein if the original images are not labeled, the corresponding loss function value is 0; calculating a consistency error between the original image and the reconstructed image, and performing weighted summation on the two errors to serve as a total error of model training, wherein if the original image is not labeled, the value of a corresponding item in the weighted summation process is 0; then updating the model parameters by using a batch gradient descent method;
the fourth step: target detection: and carrying out target detection on the input image by using the trained network model to obtain the position and the category of the target in the input image.
In a preferred embodiment, the first step comprises: for each image a in the training set, it is cropped into a plurality of sub-images and rearranged spatially, generating a reconstructed image b of the same size as image a.
In a preferred embodiment, in the first step, each image in the training set is cropped along a horizontal center line and a vertical center line to obtain four sub-images, namely, upper left, upper right, lower right and lower left, which are denoted as a, B, C and D, then a is horizontally translated to the position of B, B is vertically translated to the position of C, C is horizontally translated to the position of D, and D is vertically translated to the position of a, so as to obtain a reconstructed image.
The target detection model can be fast R-CNN, YOLO, SSD, CenterNet, or CornerNet, among others.
In a preferred embodiment, the Faster R-CNN includes a neural Network framework (Backbone), a Feature Pyramid Network (Feature Pyramid Network), a Region suggestion Network (RPN), and a cranial neural Network (Head Network).
In a preferred embodiment, in the third step, the labeled image and the unlabeled image are mixed and scrambled, and then the image and the corresponding reconstructed image are sequentially input into a network to calculate a loss function; the loss function includes two parts, one is the error between the output of the labeled image and the corresponding label, and the other is the loss of consistency of the results between all images and the corresponding reconstructed image.
Specific embodiments of the present invention are further described below.
A semi-supervised target detection method based on consistency constraint comprises the following steps:
in a first step, for each image a in the data set, it is cropped into N sub-images and rearranged spatially, generating a reconstructed image b of the same size as a. In the second step, an arbitrary deep learning-based target detection model (e.g., Faster R-CNN) is constructed. The fast R-CNN is composed of a neural Network framework (Backbone), a Feature Pyramid Network (Feature Pyramid Network), a Region suggestion Network (RPN) and a Head neural Network (Head Network). And thirdly, training a model. Firstly, the image with the label and the image without the label are mixed and disordered, then the image and the corresponding reconstructed image are sequentially input into a network, and a loss function is calculated. The loss function consists of two parts, one is the error between the output of the labeled image and the corresponding label, and one is the consistency of the results between all images and the corresponding reconstructed images. The fourth step: and carrying out target detection on the input image by using the trained network to obtain the position and the category of the target in the input image.
The first step specifically comprises: and cutting each image in the training set along a horizontal center line and a vertical center line to obtain four sub-images, namely, upper left sub-image, upper right sub-image, lower right sub-image and lower left sub-image, which are marked as A, B, C and D, as shown in the left image of FIG. 2. Then, a is horizontally translated to the position of B, B is vertically translated to the position of C, C is horizontally translated to the position of D, and D is vertically translated to the position of a, so as to obtain a new reconstructed image, as shown in the right diagram of fig. 2.
The target detection model used in the second step is not limited to Faster R-CNN, and may be any structure of deep learning-based target detection models, such as YOLO, SSD, CenterNet, CornerNet, and the like. The method has no special requirements on the structure of the target detection model, and only needs the position and the category of the boundary box which can be output by the model.
The third step specifically comprises: and inputting the original images and the corresponding reconstructed images in the training set into the network in sequence, and predicting to obtain a bounding box set B of the original images and a bounding box set B' of the reconstructed images respectively. A loss function is then constructed, which consists of two parts, one part being the error between the bounding box of the labeled original image and the truth label, where the position error uses the smooth L1 loss function and the class error uses the cross entropy loss function, and the part is 0 if the original image has no truth label. The other part of the loss function is the consistency loss between the original image and the reconstructed image, namely, according to the image reconstruction method as claimed in claim 2, the original image bounding box set B is reconstructed to obtain a bounding box set B on the theoretically reconstructed image2(B2Directly reconstructed by B, rather than the output of the reconstructed image after it entered the network), as shown in fig. 3. Then with B2Predicting B' and B as truth labels2Wherein the error is defined using a DIoU loss function as
Figure BDA0002980759620000061
Figure BDA0002980759620000062
Where IoU represents the ratio of the area of intersection of the two bounding boxes to the area of union, d represents the Euclidean distance between the center points of the two bounding boxes, and c represents the diagonal distance of the minimum closure region containing both bounding boxes, as shown in FIG. 4. In the method, an original image is input into a network to obtain B, a reconstructed image is input into the network to obtain B', and B is obtained by reconstruction transformation under an ideal condition2. The loss of consistency is therefore B2And loss between B'. The detailed flow of the third step is shown in fig. 5.
As described in further detail below.
Data enhancement:
and cutting each image in the training set along a horizontal center line and a vertical center line to obtain four sub-images, namely, upper left sub-image, upper right sub-image, lower right sub-image and lower left sub-image, which are marked as A, B, C and D, as shown in the left image of FIG. 2. Then, a is horizontally translated to the position of B, B is vertically translated to the position of C, C is horizontally translated to the position of D, and D is vertically translated to the position of a, so as to obtain a new reconstructed image, as shown in the right diagram of fig. 2.
Model training:
a batch gradient descent method is used. And if the Batch Size (Batch Size) is N (N is an even number), sampling N/2 images from the training set, and finding out a reconstructed image corresponding to the N/2 images from the reconstructed training set. The N images are sequentially input into a model network (e.g., Faster R-CNN). Let the images in the training set and the reconstruction training set be I respectivelyi,I′i(i is 1,2 … … N/2), and the output result is B after passing through the model networki,B′i. Wherein the content of the first and second substances,
Bi={(bi,ci),i=1,2......X}
B′i={(b′i,c′i),i=1,2......Y}
bi,cirespectively representing a four-dimensional bounding box vector (x, y, w, h) and a category vector. X, Y represent the number of bounding boxes that the original image and the reconstructed image are ultimately predicted to (after NMS if the detection model involves an NMS procedure).
Define Loss function Loss ═ Ldet+w(t)*LconWherein, in the step (A),
Ldet=Lsmooth l1(Bi,B*)+LCE(Ci,C*)
Lcon=Ldiou(B2i,B′i)
Lsmooth l1representing the Smooth L1 loss function, LCERepresenting the cross entropy loss function, LaiouRepresenting the DIoU loss function. B is*,C*Respectively representing bounding boxes and class labels, B2iIs represented by BiAccording to the new bounding box position after the split reconstruction shown in fig. 1, the generation process is shown in the following table:
Figure BDA0002980759620000071
Figure BDA0002980759620000081
b is to be2iIs regarded as B'iUsing the DIoU loss function:
Figure BDA0002980759620000082
Figure BDA0002980759620000083
where IoU represents the ratio of the area of intersection of the two bounding boxes to the area of union, d represents the Euclidean distance between the center points of the two bounding boxes, and c represents the diagonal distance of the minimum closure region containing both bounding boxes, as shown in FIG. 4. Fig. 4 is a schematic diagram of a DIoU, with two dashed black lines representing bounding boxes in the prediction and truth labels, and the outermost dotted line representing the minimum closure containing both.
w (t) represents the weight of the loss of consistency. In the training starting stage, the detection capability of the network model is poor, so that the quality of the detection result is low, in this case, a low weight of consistency loss is required, and the network is prevented from learning too much wrong information. As the training is carried out, the quality of the detection result of the model is improved, so that the consistency of the result also contains more correct information, and a higher consistency loss weight can be adopted. In the present application, w (t) takes a value of 0 in the first three-phase training, then increases linearly, takes a value of 1 in the last three-phase training, and then remains unchanged until the training is finished.
Target detection:
and inputting the test set image to be detected into the trained semi-supervised neural network model based on the consistency constraint, so as to obtain the position and the category of the object boundary box in the image.
The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.
The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.

Claims (7)

1. A semi-supervised target detection method based on consistency constraint is characterized by comprising the following steps:
first step, data enhancement: performing data enhancement on the training set to obtain a reconstructed training set;
secondly, model initialization: constructing any target detection model based on deep learning;
thirdly, model training: in each training cycle, in each training batch, simultaneously inputting the images sampled in the training set and the corresponding reconstructed images in the reconstructed training set into a model network, and calculating the error between the prediction result of the original images and the true value label of the original images, wherein if the original images are not labeled, the corresponding loss function value is 0; calculating a consistency error between the original image and the reconstructed image, and performing weighted summation on the two errors to serve as a total error of model training, wherein if the original image is not labeled, the value of a corresponding item in the weighted summation process is 0; then updating the model parameters by using a batch gradient descent method;
the fourth step: target detection: and carrying out target detection on the input image by using the trained network model to obtain the position and the category of the target in the input image.
2. The semi-supervised target detection method of claim 1, wherein the first step comprises: for each image a in the training set, it is cropped into a plurality of sub-images and rearranged spatially, generating a reconstructed image b of the same size as image a.
3. The semi-supervised object detection method of claim 2, wherein in the first step, each image in the training set is cropped along a horizontal center line and a vertical center line to obtain four sub-images, namely, upper left sub-image, upper right sub-image, lower right sub-image and lower left sub-image, which are marked as A, B, C and D, then, A is horizontally translated to the position of B, B is vertically translated to the position of C, C is horizontally translated to the position of D, and D is vertically translated to the position of A, so that a reconstructed image is obtained.
4. The semi-supervised object detection method as recited in any one of claims 1 to 3, wherein the object detection model is fast R-CNN, YOLO, SSD, CenterNet, or CornerNet; the Faster R-CNN comprises a neural Network framework (Backbone), a Feature Pyramid Network (Feature Pyramid Network), a Region suggestion Network (RPN) and a Head neural Network (Head Network).
5. The semi-supervised object detection method as recited in any one of claims 1 to 4, wherein in the third step, the labeled image and the unlabeled image are mixed and scrambled, and then the images and the corresponding reconstructed images are sequentially input into a network to calculate a loss function; the loss function includes two parts, one is the error between the bounding box of the labeled original image output and the corresponding label, and the other is the loss of consistency of the results between all images and the corresponding reconstructed images.
6. The semi-supervised object detection method of claim 5, wherein the third step comprises:
inputting the original images and the corresponding reconstructed images in the training set into a network in sequence, and respectively predicting to obtain a boundary box set B of the original images and a boundary box set B' of the reconstructed images; constructing a loss function, wherein the loss function comprises two parts: one part is the error between the bounding box of the output of the original image with labels and the truth labels, wherein the position error uses smooth L1 loss function, the category error uses cross entropy loss function, if the original image has no truth labels, the part loss function is 0; the other part is consistency loss between the original image and the reconstructed image; wherein, the boundary frame set B on the theoretically reconstructed image is obtained by reconstructing the boundary frame set B of the original image2With B2Predicting B' and B as truth labels2Wherein the error is defined using a DIoU loss function as
Figure FDA0002980759610000021
Wherein IoU represents the ratio of the area of intersection of the two bounding boxes to the area of union, d represents the Euclidean distance between the center points of the two bounding boxes, and c represents the diagonal distance of the minimum closure region containing both bounding boxes.
7. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.
CN202110286708.8A 2021-03-17 2021-03-17 Semi-supervised target detection method based on consistency constraint Active CN112926673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110286708.8A CN112926673B (en) 2021-03-17 2021-03-17 Semi-supervised target detection method based on consistency constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110286708.8A CN112926673B (en) 2021-03-17 2021-03-17 Semi-supervised target detection method based on consistency constraint

Publications (2)

Publication Number Publication Date
CN112926673A true CN112926673A (en) 2021-06-08
CN112926673B CN112926673B (en) 2023-01-17

Family

ID=76175866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110286708.8A Active CN112926673B (en) 2021-03-17 2021-03-17 Semi-supervised target detection method based on consistency constraint

Country Status (1)

Country Link
CN (1) CN112926673B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113454649A (en) * 2021-06-17 2021-09-28 商汤国际私人有限公司 Target detection method, target detection device, electronic equipment and computer-readable storage medium
CN113627479A (en) * 2021-07-09 2021-11-09 中国科学院信息工程研究所 Graph data anomaly detection method based on semi-supervised learning
CN113780389A (en) * 2021-08-31 2021-12-10 中国人民解放军战略支援部队信息工程大学 Deep learning semi-supervised dense matching method and system based on consistency constraint
CN113962737A (en) * 2021-10-26 2022-01-21 北京沃东天骏信息技术有限公司 Target recognition model training method and device, and target recognition method and device
CN115514686A (en) * 2021-06-23 2022-12-23 深信服科技股份有限公司 Flow acquisition method and device, electronic equipment and storage medium
WO2023047164A1 (en) * 2021-09-22 2023-03-30 Sensetime International Pte. Ltd. Object sequence recognition method, network training method, apparatuses, device, and medium
CN113454649B (en) * 2021-06-17 2024-05-24 商汤国际私人有限公司 Target detection method, apparatus, electronic device, and computer-readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753992A (en) * 2018-12-10 2019-05-14 南京师范大学 The unsupervised domain for generating confrontation network based on condition adapts to image classification method
CN112115916A (en) * 2020-09-29 2020-12-22 西安电子科技大学 Domain-adaptive fast R-CNN semi-supervised SAR detection method
US20200405242A1 (en) * 2019-06-27 2020-12-31 Retrace Labs System And Methods For Restorative Dentistry Treatment Planning Using Adversarial Learning
CN112395987A (en) * 2020-11-18 2021-02-23 西安电子科技大学 SAR image target detection method based on unsupervised domain adaptive CNN

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753992A (en) * 2018-12-10 2019-05-14 南京师范大学 The unsupervised domain for generating confrontation network based on condition adapts to image classification method
US20200405242A1 (en) * 2019-06-27 2020-12-31 Retrace Labs System And Methods For Restorative Dentistry Treatment Planning Using Adversarial Learning
CN112115916A (en) * 2020-09-29 2020-12-22 西安电子科技大学 Domain-adaptive fast R-CNN semi-supervised SAR detection method
CN112395987A (en) * 2020-11-18 2021-02-23 西安电子科技大学 SAR image target detection method based on unsupervised domain adaptive CNN

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JISOO JEONG ET AL: "Consistency-based Semi-supervised Learning for Object Detection", 《33RD CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2019)》 *
XU ZHIWEI ET AL: ""Semi-supervised self-growing generative adversarial networks for image recognition"", 《MULTIMEDIA TOOLS AND APPLICATIONS》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113454649A (en) * 2021-06-17 2021-09-28 商汤国际私人有限公司 Target detection method, target detection device, electronic equipment and computer-readable storage medium
CN113454649B (en) * 2021-06-17 2024-05-24 商汤国际私人有限公司 Target detection method, apparatus, electronic device, and computer-readable storage medium
CN115514686A (en) * 2021-06-23 2022-12-23 深信服科技股份有限公司 Flow acquisition method and device, electronic equipment and storage medium
CN113627479A (en) * 2021-07-09 2021-11-09 中国科学院信息工程研究所 Graph data anomaly detection method based on semi-supervised learning
CN113627479B (en) * 2021-07-09 2024-02-20 中国科学院信息工程研究所 Graph data anomaly detection method based on semi-supervised learning
CN113780389A (en) * 2021-08-31 2021-12-10 中国人民解放军战略支援部队信息工程大学 Deep learning semi-supervised dense matching method and system based on consistency constraint
CN113780389B (en) * 2021-08-31 2023-05-26 中国人民解放军战略支援部队信息工程大学 Deep learning semi-supervised dense matching method and system based on consistency constraint
WO2023047164A1 (en) * 2021-09-22 2023-03-30 Sensetime International Pte. Ltd. Object sequence recognition method, network training method, apparatuses, device, and medium
CN113962737A (en) * 2021-10-26 2022-01-21 北京沃东天骏信息技术有限公司 Target recognition model training method and device, and target recognition method and device

Also Published As

Publication number Publication date
CN112926673B (en) 2023-01-17

Similar Documents

Publication Publication Date Title
CN112926673B (en) Semi-supervised target detection method based on consistency constraint
US11176715B2 (en) Method and system for color representation generation
CN108537269B (en) Weak interactive object detection deep learning method and system thereof
CN111738908B (en) Scene conversion method and system for generating countermeasure network by combining instance segmentation and circulation
EP3754549A1 (en) A computer vision method for recognizing an object category in a digital image
JP6612487B1 (en) Learning device, classification device, learning method, classification method, learning program, and classification program
CN111274981B (en) Target detection network construction method and device and target detection method
CN114067119B (en) Training method of panorama segmentation model, panorama segmentation method and device
JP6612486B1 (en) Learning device, classification device, learning method, classification method, learning program, and classification program
WO2022133627A1 (en) Image segmentation method and apparatus, and device and storage medium
US20180032806A1 (en) Producing a flowchart object from an image
CN114387608B (en) Table structure identification method combining convolution and graph neural network
KR20230073751A (en) System and method for generating images of the same style based on layout
CN115631374A (en) Control operation method, control detection model training method, device and equipment
Bakhtiarnia et al. PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks
Singh et al. Automatic trimap and alpha-matte generation for digital image matting
CN109754416A (en) Image processing apparatus and method
Hsu et al. DensityLayout: Density-Conditioned Layout GAN for Visual-Textual Presentation Designs
Metri et al. Image generation using generative adversarial networks
KR20230162115A (en) Learning devices and learning methods
CN113065336A (en) Text automatic generation method and device based on deep learning and content planning
Liu et al. Anime Sketch Coloring with Swish-gated Residual U-net and Spectrally Normalized GAN.
Mütze et al. Semi-Supervised Domain Adaptation with CycleGAN Guided by Downstream Task Awareness.
Yu et al. Meta-simulation for the Automated Design of Synthetic Overhead Imagery
Stelling et al. " Just Drive": Colour Bias Mitigation for Semantic Segmentation in the Context of Urban Driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant