CN111091151A - Method for generating countermeasure network for target detection data enhancement - Google Patents

Method for generating countermeasure network for target detection data enhancement Download PDF

Info

Publication number
CN111091151A
CN111091151A CN201911301874.XA CN201911301874A CN111091151A CN 111091151 A CN111091151 A CN 111091151A CN 201911301874 A CN201911301874 A CN 201911301874A CN 111091151 A CN111091151 A CN 111091151A
Authority
CN
China
Prior art keywords
image
objects
generator
target detection
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911301874.XA
Other languages
Chinese (zh)
Other versions
CN111091151B (en
Inventor
王智慧
李豪杰
刘崇威
王世杰
唐涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201911301874.XA priority Critical patent/CN111091151B/en
Publication of CN111091151A publication Critical patent/CN111091151A/en
Application granted granted Critical
Publication of CN111091151B publication Critical patent/CN111091151B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer image generation, and provides a method for generating a countermeasure network for enhancing target detection data. The method fuses Poisson fusion in traditional digital image processing and generators in a generating countermeasure network, so that the size, the number and the position of detected objects on one picture can be changed by the generating countermeasure network. We also specifically designed a loss function for the generator to allow the generator to better generate the picture. The method can effectively solve the problem of class imbalance in the target detection task, so that the trained detection model can obtain better performance. Meanwhile, a large amount of labor cost is saved by automatically amplifying the small data set into the large data set.

Description

Method for generating countermeasure network for target detection data enhancement
Technical Field
The invention belongs to the field of computer image generation, and relates to a method for generating a countermeasure network for enhancing target detection data.
Background
Data enhancement refers to adding more variation in the training data to improve the generalization ability of the training model. Currently, data enhancement strategies are widely applied in training CNNs, such as flipping, scaling, and the like. In recent years, the generation of antagonistic networks GANs has been shown to be excellent in a number of image2image jobs (Orest Kupyn, Volodymy Budzan, Mykola Mykhaliych, Ddyro Mishkin, and Jiri Matas. DeblurgAN: Blind Motion Deblurring Using conditional Adversal networks. arXiv e-prints, page arXiv:1711.07064, Nov 2017.). Thus, there has been work (Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-YiWu, Po-Hao Hsu, and Shang-Hong Lai. Auggan: Cross do-main adaptation with gate-based data augmentation. in vitamin-rio Ferrari, Martial Hebert, CristianSmith scu, and Yair Weiss, editors, Computer Vision-ECCV 2018, pages 731 and 744, Cham,2018.spring International Publishing). AugGAN (Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-Yi Wu, Po-Hao Hsu, and Shang-Hong Lai. Auggan: Cross do main adaptation with gan-based data augmentation. Invite-rio Ferrari, Martial Hebert, Cristian Smilacisc, and Yair Weiss, editors, Computer Vision-ECCV 2018, pages 731 and 744, Chagem, 2018.spring International publication.) has the ability to structure-aware semantic segmentation and soft weight sharing, so the resulting image is realistic enough to be trained. But the ground route used by this method contains an instance mask, which is inconvenient for labeling. CycleGAN (Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexia. Efront. Ungained image-to-image transformation using cycle-dependent transformed network. in IEEE International Con-reference on Computer Vision,2017.) does not require paired training data, thus reducing the difficulty of training set preparation, there have been a number of efforts to use CycleGAN to achieve data enhancement (Weijian Deng, Liang Zheng, Guoliang Kang, Yi Yang, Qiax-iang Ye, and Jiankin. collagen-induced mapping with derived data enhancement and RR. 1711.07027,2017). However, we cannot ignore the drawbacks of CycleGAN. Which tends to produce overfitting when generating images, thereby affecting the accuracy of the model.
Moreover, the existing GAN-based data enhancement method only realizes data enhancement by transforming the style of the image. This approach, while helpful for the image classification task, does not work well for the target detection task because the number, size, and location of objects in the image cannot be changed.
Disclosure of Invention
The invention aims to provide a method (Poisson GAN) for generating a countermeasure network for enhancing target detection data, which can change the size, the number and the position of detected objects on a picture by fusing Poisson in the traditional digital image processing and generators in the countermeasure network. We also specifically designed a loss function for the generator to allow the generator to better generate the picture. The method can effectively solve the problem of class imbalance in the target detection task, so that the trained detection model can obtain better performance.
The technical scheme of the invention is as follows:
a generative confrontation network for target detection data augmentation comprising the steps of:
1) the Poisson fusion part in the generator is constructed, and the flow is shown in FIG. 1 (left). We embed poisson fusion into the generator to alter the number, position or size of objects when generating a picture. We select X, Y and Z objects from the original data set (assuming 3 classes of objects) and then build a set of objects P. Each time a picture is generated, x, y and z objects are randomly selected from the set P to form a subset
Figure BDA0002322024190000032
Pa={A1,...,Ax,B1,...,By,C1,...Cz} (1)
Wherein, in the step A,b and C respectively represent the categories in the original data set. Then P is addedaEmbedding into a temporary image T ∈ R3 ×720×405In the method, a source image S epsilon R is generated3×720×405. To eliminate sharp boundaries, a mask M (each instance having its own mask) is automatically created from the embedded positions in S, and then compared with the background image B ∈ R3×720×405And combining the source image S with the source image S to obtain a clone image C.
2) The network learning part in the construction generator has a structure shown in fig. 1 (right). We constructed our network on the basis of the work of Ci et al (Yuanzheng Ci, Xinzhu Ma, Zhuihui Wang, Haojie Li, and Zhongxuan Luo. user-guidededep animal line art consensus with a conditional adaptation of network.2018 ACM Mul-time Conference on Multimedia Conference-MM' 18,2018). We build the encoder using a 3 x 3 convolutional layer stack, using U-Net as the backbone structure. For the decoder, 4 ResNeXt blocks are used to construct, denoted as block n, n ∈ {1, …, 4 }. In the experiment, we set block n to [20,10,10,5 ].
3) The discriminator was constructed as shown in fig. 1 (top). The discriminator is also made up of a stack of resenext blocks and convolutional layers. The architecture is similar to the setup for SRGAN. We add more layers to process 512 x 512 resolution input.
4) A loss function is set. For the loss function of the discriminator, we follow the function proposed by Ci et al in the paper. For the loss function of the generator, we define:
LG=Lcont1Ladv+Lreg(2)
wherein λ1Is 1e-4, LcontAnd LadvSame as the setup in Ci et al, LregIs defined as:
Figure BDA0002322024190000031
wherein c, h and w are the channel number, height and width of the characteristic diagram, and M is a mask. The fused portion was taken as 100, and the other regions were taken as 0.1.
5) An image pair required for training is generated. The two images in an image pair need only differ in the edge information of the embedded portion to enable the generator to learn the mapping of the fused image to the normal image. Thus, we create an image pair by overlaying objects in the images in the original dataset with objects of the same class automatically cropped from the clone image C. Considering the appearance similarity of the same species, the two images (original image and processed image) are almost the same except for the edge information, so we can directly regard the original image as a real image and the processed image as a false image.
The method can expand the original data sets with the quantity of about 1000 or 2000 data sets to the quantity level of tens of thousands, and can save a great deal of labor labeling cost.
Drawings
Fig. 1 is a diagram of a network architecture of the present invention.
Fig. 2 shows the result of the UDD data set generation of the present invention (a) as the original image, (b) as the image after poisson fusion, and (c) as the final generated image.
FIG. 3 is an enhanced image of the Copy-packing method on UDD, with (a) input; (b) is the output.
FIG. 4 is an enhanced image of another style of transforming GAN on UDD (a) into cycleGAN; (b) is StarGAN.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention is provided.
Our Poisson GAN was implemented on PyTorch. We train and reason for this using an input size of 512 x 512. We use the Adam optimizer, initializing the learning rate to 1e-4 in the generator and discriminator, and then reducing it to 1e-5 after 125K iterations. Our experiments were performed on a single NVIDIA TITAN XP GPU with a batch size of 4. We use the UDD dataset as the original dataset for data enhancement. UDD is a real marine ranch target detection data set, which comprises 2227 pictures (1827 training and 400 testing) of three detection targets of sea cucumber, sea urchin and scallop.
To construct the object set P, we cut 1000 sea cucumbers, 150 sea cucumbers and 35 scallops from UDD and then fuse them into a background image by Poisson GAN. We generated a Poisson GAN composite image using the image in UDD as the background image. Thus, these images are more realistic and can be used as a complement to UDD. The data set contained 18661 images of 18350 sea cucumbers, 101422 sea urchins, 9624 scallops, which we named AUDD.
We put the example and background image cut from the NSFC-dataset (another underwater target detection dataset) into poisson fusion, generate a clone image and construct a pre-trained dataset. The main purpose of this data set is to make the detector more robust in the automatic grabbing process, including up to 589080 images of different background colors, different viewing angles, different terrain.
FIG. 4 shows images generated by cycleGAN and StarGAN. They all achieve data enhancement by changing the background color. As mentioned previously, they cannot solve the category imbalance problem. Also, some small objects may disappear during the transformation, which is detrimental to training the target detection model. In contrast, Poisson GAN can change the location of objects and retain all objects.
The Copy-paging method is a small target data enhancement method that uses an instance split mask to Copy small objects from their original location to another location, as shown in FIG. 3. Unfortunately, this approach is only applicable to datasets with instance segmentation masks and is therefore difficult to use in UDD. And the method does not carry out extra smoothing processing on the edge of the pasted object, thereby reducing the quality of the generated image and leading the synthesized image to be obviously unnatural.
We used the extended dataset to train YOLOv3 to demonstrate the effectiveness of Poisson GAN. The detector is trained on 70 epochs on a pre-training data set and then on the AUDD using pre-training parameters. The test results of the UDD test set are shown in table 1. Compared with the results in the third row of table 2, the model has significant improvement (improvement of about 30%) in the mAP, and the problem of class imbalance is solved to a great extent.
In addition, we also performed experiments to compare the performance of different initializations. We trained yollov 3 on AUDD using random initialization, ImageNet pre-training model, and pre-training model from the pre-training dataset, and tested on the UDD test set with the results shown in table 2. It is clear that AUDD is advantageous in solving the problems of insufficient training data and class imbalance. Using our pre-trained model, YOLOv3 achieved better accuracy than random initialization (+ 12%) and ImageNet (+ 9%). Poisson GAN improved the accuracy of YOLOv3 by 33.7% compared to the original results on UDD.
TABLE 1 UDD accuracy for different detection networks
Figure BDA0002322024190000061
TABLE 2 accuracy of YOLOv3 on AUDD with different initialization modes
Figure BDA0002322024190000062
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (2)

1. A method for generating a countermeasure network for target detection data enhancement, the method comprising the steps of:
1) building a Poisson fusion part in a generator: embedding poisson fusion into a generator to alter the number, position or size of objects when generating a picture; assuming 3 types of objects, respectively selecting X, Y and Z objects from each type of objects in the original data set and then establishing an object set P; each time a picture is generated, x, y, z objects are randomly selected from the set P to form a subset
Figure FDA0002322024180000011
Pa={A1,...,Ax,B1,...,By,C1,...Cz}(1)
Wherein A, B and C respectively represent the category in the original data set; then P is addedaEmbedding into a temporary image T ∈ R3×720×405In the method, a source image S epsilon R is generated3×720×405(ii) a To eliminate sharp boundaries, a mask M is automatically created from the embedded positions in S, and then compared with the background image B ∈ R3×720×405Combining the source image S with the source image S to obtain a clone image C;
2) the network learning part in the construction generator: constructing a network; building an encoder using a 3 x 3 convolutional layer stack using U-Net as a backbone structure; for the decoder, 4 ResNeXt blocks are used for construction, and are marked as a block n, n is equal to {1, …, 4 };
3) building an identifier; the discriminator is also made up of a stack of resenext blocks and convolutional layers; this architecture is the same as the setup in SRGAN, adding more layers to handle 512 x 512 resolution input;
4) setting a loss function; for the loss function of the generator, defined as: l isG=Lcont1Ladv+Lreg(2)
Wherein λ1Is 1e-4, LregIs defined as:
Figure FDA0002322024180000012
wherein c, h and w are the channel number, height and width of the characteristic diagram, and M is a mask; taking 100 as a fusion part and 0.1 as other areas;
5) generating an image pair required for training: the two images in one image pair only need to differ in the edge information of the embedded portion, causing the generator to learn the mapping of the fused image to the normal image, creating an image pair by overlaying the objects in the images in the original dataset with the same class of objects automatically cropped from the clone image C; in consideration of the appearance similarity of the same species, the original image and the processed image are almost the same except for the edge information, so that the original image is directly regarded as a real image and the processed image is regarded as a false image.
2. A method for generation of a competing network for target detection data enhancement as claimed in claim 1 wherein for the decoder, 4 ResNeXt blocks are used to construct, setting block n to [20,10,10,5 ].
CN201911301874.XA 2019-12-17 2019-12-17 Construction method of generation countermeasure network for target detection data enhancement Active CN111091151B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911301874.XA CN111091151B (en) 2019-12-17 2019-12-17 Construction method of generation countermeasure network for target detection data enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911301874.XA CN111091151B (en) 2019-12-17 2019-12-17 Construction method of generation countermeasure network for target detection data enhancement

Publications (2)

Publication Number Publication Date
CN111091151A true CN111091151A (en) 2020-05-01
CN111091151B CN111091151B (en) 2021-11-05

Family

ID=70395675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911301874.XA Active CN111091151B (en) 2019-12-17 2019-12-17 Construction method of generation countermeasure network for target detection data enhancement

Country Status (1)

Country Link
CN (1) CN111091151B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832443A (en) * 2020-06-28 2020-10-27 华中科技大学 Construction method and application of construction violation detection model
CN112800906A (en) * 2021-01-19 2021-05-14 吉林大学 Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
US11785262B1 (en) 2022-03-16 2023-10-10 International Business Machines Corporation Dynamic compression of audio-visual data
CN117409192A (en) * 2023-12-14 2024-01-16 武汉大学 Data enhancement-based infrared small target detection method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018053340A1 (en) * 2016-09-15 2018-03-22 Twitter, Inc. Super resolution using a generative adversarial network
EP3404586A1 (en) * 2017-05-18 2018-11-21 INTEL Corporation Novelty detection using discriminator of generative adversarial network
US20190057520A1 (en) * 2017-08-18 2019-02-21 Synapse Technology Corporation Generating Synthetic Image Data
CN109377448A (en) * 2018-05-20 2019-02-22 北京工业大学 A kind of facial image restorative procedure based on generation confrontation network
CN109409274A (en) * 2018-10-18 2019-03-01 广州云从人工智能技术有限公司 A kind of facial image transform method being aligned based on face three-dimensional reconstruction and face
CN109993825A (en) * 2019-03-11 2019-07-09 北京工业大学 A kind of three-dimensional rebuilding method based on deep learning
CN110222628A (en) * 2019-06-03 2019-09-10 电子科技大学 A kind of face restorative procedure based on production confrontation network
CN110378985A (en) * 2019-07-19 2019-10-25 中国传媒大学 A kind of animation drawing auxiliary creative method based on GAN
US20190332894A1 (en) * 2018-08-10 2019-10-31 Baidu Online Network Technology (Beijing) Co., Ltd. Method for Processing Automobile Image Data, Apparatus, and Readable Storage Medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018053340A1 (en) * 2016-09-15 2018-03-22 Twitter, Inc. Super resolution using a generative adversarial network
EP3404586A1 (en) * 2017-05-18 2018-11-21 INTEL Corporation Novelty detection using discriminator of generative adversarial network
US20190057520A1 (en) * 2017-08-18 2019-02-21 Synapse Technology Corporation Generating Synthetic Image Data
CN109377448A (en) * 2018-05-20 2019-02-22 北京工业大学 A kind of facial image restorative procedure based on generation confrontation network
US20190332894A1 (en) * 2018-08-10 2019-10-31 Baidu Online Network Technology (Beijing) Co., Ltd. Method for Processing Automobile Image Data, Apparatus, and Readable Storage Medium
CN109409274A (en) * 2018-10-18 2019-03-01 广州云从人工智能技术有限公司 A kind of facial image transform method being aligned based on face three-dimensional reconstruction and face
CN109993825A (en) * 2019-03-11 2019-07-09 北京工业大学 A kind of three-dimensional rebuilding method based on deep learning
CN110222628A (en) * 2019-06-03 2019-09-10 电子科技大学 A kind of face restorative procedure based on production confrontation network
CN110378985A (en) * 2019-07-19 2019-10-25 中国传媒大学 A kind of animation drawing auxiliary creative method based on GAN

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHRISTIAN LEDIG 等: "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network", 《HTTPS://ARXIV.ORG/ABS/1609.04802》 *
HUIKAI WU等: "Towards Realistic High-Resolution Image Blending", 《HTTPS://ARXIV.ORG/PDF/1703.07195.PDF》 *
LANLAN LIU 等: "Generative Modeling for Small-Data Object Detection", 《HTTPS://ARXIV.ORG/ABS/1910.07169》 *
YUANZHENG CI 等: "User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks", 《HTTPS://ARXIV.ORG/ABS/1808.03240》 *
孙亮 等: "基于生成对抗网络的多视图学习与重构算法", 《自动化学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832443A (en) * 2020-06-28 2020-10-27 华中科技大学 Construction method and application of construction violation detection model
CN112800906A (en) * 2021-01-19 2021-05-14 吉林大学 Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
US11785262B1 (en) 2022-03-16 2023-10-10 International Business Machines Corporation Dynamic compression of audio-visual data
CN117409192A (en) * 2023-12-14 2024-01-16 武汉大学 Data enhancement-based infrared small target detection method and device
CN117409192B (en) * 2023-12-14 2024-03-08 武汉大学 Data enhancement-based infrared small target detection method and device

Also Published As

Publication number Publication date
CN111091151B (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN111091151B (en) Construction method of generation countermeasure network for target detection data enhancement
Li et al. AADS: Augmented autonomous driving simulation using data-driven algorithms
Lu et al. From depth what can you see? Depth completion via auxiliary image reconstruction
CN101714262B (en) Method for reconstructing three-dimensional scene of single image
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN109829391B (en) Significance target detection method based on cascade convolution network and counterstudy
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN108416751A (en) A kind of new viewpoint image combining method assisting full resolution network based on depth
Zhang et al. Personal photograph enhancement using internet photo collections
Xiao et al. Single image dehazing based on learning of haze layers
CN115049556A (en) StyleGAN-based face image restoration method
CN114581571A (en) Monocular human body reconstruction method and device based on IMU and forward deformation field
CN115272437A (en) Image depth estimation method and device based on global and local features
Ali et al. Single image Façade segmentation and computational rephotography of House images using deep learning
Yang et al. Underwater self-supervised depth estimation
CN114793457A (en) Apparatus and method for improving the process of determining a depth map, relative pose or semantic segmentation
Haji-Esmaeili et al. Large-scale monocular depth estimation in the wild
Goncalves et al. Guidednet: Single image dehazing using an end-to-end convolutional neural network
CN111738061A (en) Binocular vision stereo matching method based on regional feature extraction and storage medium
CN116958393A (en) Incremental image rendering method and device
Berenguel-Baeta et al. Fredsnet: Joint monocular depth and semantic segmentation with fast fourier convolutions from single panoramas
Zhang et al. Capitalizing on RGB-FIR hybrid imaging for road detection
Jain et al. Enhanced stable view synthesis
Feng et al. Foreground-aware dense depth estimation for 360 images
Qiu et al. Multi-scale Fusion for Visible Watermark Removal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant