CN111091151B - Construction method of generation countermeasure network for target detection data enhancement - Google Patents

Construction method of generation countermeasure network for target detection data enhancement Download PDF

Info

Publication number
CN111091151B
CN111091151B CN201911301874.XA CN201911301874A CN111091151B CN 111091151 B CN111091151 B CN 111091151B CN 201911301874 A CN201911301874 A CN 201911301874A CN 111091151 B CN111091151 B CN 111091151B
Authority
CN
China
Prior art keywords
image
objects
generator
target detection
countermeasure network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911301874.XA
Other languages
Chinese (zh)
Other versions
CN111091151A (en
Inventor
王智慧
李豪杰
刘崇威
王世杰
唐涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201911301874.XA priority Critical patent/CN111091151B/en
Publication of CN111091151A publication Critical patent/CN111091151A/en
Application granted granted Critical
Publication of CN111091151B publication Critical patent/CN111091151B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer image generation, and provides a method for constructing a generation countermeasure network for enhancing target detection data. The method fuses Poisson fusion in traditional digital image processing and generators in a generating countermeasure network, so that the size, the number and the position of detected objects on one picture can be changed by the generating countermeasure network. We also specifically designed a loss function for the generator to allow the generator to better generate the picture. The method can effectively solve the problem of class imbalance in the target detection task, so that the trained detection model can obtain better performance. Meanwhile, a large amount of labor cost is saved by automatically amplifying the small data set into the large data set.

Description

Construction method of generation countermeasure network for target detection data enhancement
Technical Field
The invention belongs to the field of computer image generation, and relates to a method for generating a countermeasure network for enhancing target detection data.
Background
Data enhancement refers to adding more variation in the training data to improve the generalization ability of the training model. Currently, data enhancement strategies are widely applied in training CNNs, such as flipping, scaling, and the like. In recent years, the generation of antagonistic networks GAN has been shown to be excellent in a number of image2image jobs (OrestKupyn, VolodymyBurzan, Mykola Mykhailych, DmbroMishkin, and Jiri Matas. Deblurg GAN: bland Motion Debluring Using Conditional additive networks. arXiv e-prints, page arXiv:1711.07064, Nov 2017.). Thus, there has been work (Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-Yi Wu, Po-Hao Hsu, and Shang-Hong Lai. Auggan: Cross do-main adaptation with gate-based data augmentation. in vitamin-rio Ferrari, Martial Hebert, CristianSmith scu, and Yair Weiss, editors, Computer Vision-ECCV 2018, pages 731 and 744, Cham,2018.spring International Publishing). AugGAN (Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-Yi Wu, Po-Hao Hsu, and Shang-Hong Lai. Auggan: Cross do main adaptation with gan-based data augmentation. in vitamin-rio Ferrari, Martial Hebert, CristianSminChinese scu, and Yair Weiss, editors, Computer Vision-ECCV 2018, pages 731 and 744, Cham,2018.spring International publication.) has the ability to structure-aware semantic segmentation and soft weight sharing, so the resulting image is sufficiently realistic to be trained. But the ground route used by this method contains an instance mask, which is inconvenient for labeling. CycleGAN (Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexi A. Efront. Innovated image-to-image transformation using cycle-dependent adaptive training networks. in IEEE International Con-reference on Computer Vision,2017.) No pairing of training data is required, and there have been a number of efforts to use CycleGAN to achieve data enhancement (Weijian Deng, Liang Zheng, Guoliang Ka, Yi Yang, Qiax-iang Ye, and Jianbin J. image-added training with derived from RR-dependent simulation and phase-modification 1711.07027,2017). However, we cannot ignore the drawbacks of CycleGAN. Which tends to produce overfitting when generating images, thereby affecting the accuracy of the model.
Moreover, the existing GAN-based data enhancement method only realizes data enhancement by transforming the style of the image. This approach, while helpful for the image classification task, does not work well for the target detection task because the number, size, and location of objects in the image cannot be changed.
Disclosure of Invention
The invention aims to provide a construction method (Poisson GAN) of a generative confrontation network for enhancing target detection data, which can change the size, the number and the position of detected objects on a picture by fusing Poisson fusion in the traditional digital image processing and a generator in the generative confrontation network. We also specifically designed a loss function for the generator to allow the generator to better generate the picture. The method can effectively solve the problem of class imbalance in the target detection task, so that the trained detection model can obtain better performance.
The technical scheme of the invention is as follows:
a method of constructing a spanning confrontation network for target detection data augmentation comprising the steps of:
1) the Poisson fusion part in the generator is constructed, and the flow is shown in FIG. 1 (left). We embed poisson fusion into the generator to alter the number, position or size of objects when generating a picture. We select X, Y and Z objects from the original data set (assuming 3 classes of objects) and then build a set of objects P. Each time a picture is generated, x, y and z objects are randomly selected from the set P to form a subset
Figure GDA0003256219200000032
Pa={A1,...,Ax,B1,...,By,C1,...Cz} (1)
Wherein, A, B and C respectively represent the categories in the original data set. Then P is addedaEmbedding into a temporary image T ∈ R3 ×720×405In the method, a source image S epsilon R is generated3×720×405. To eliminate sharp boundaries, a mask M (each instance having its own mask) is automatically created from the embedded positions in S, and then compared with the background image B ∈ R3×720×405And combining the source image S with the source image S to obtain a clone image C.
2) The network learning part in the construction generator has a structure shown in fig. 1 (right). We constructed our network on the basis of the work of Ci et al (Yuanzheng Ci, Xinzhu Ma, Zhuihui Wang, Haojie Li, and ZhongxuanLuo. user-defined deep animal line art discovery with conditional adaptation network.2018 ACM Mul-time Conference on Multimedia Conference-MM' 18,2018). We build the encoder using a 3 x 3 convolutional layer stack, using U-Net as the backbone structure. For the decoder, 4 ResNeXt blocks are used to construct, denoted as block n, n ∈ {1, …, 4 }. In the experiment, we set block n to [20,10,10,5 ].
3) The discriminator was constructed as shown in fig. 1 (top). The discriminator is also made up of a stack of resenext blocks and convolutional layers. The architecture is similar to the setup for SRGAN. We add more layers to process 512 x 512 resolution input.
4) A loss function is set. For the loss function of the discriminator, we follow the function proposed by Ci et al in the paper. For the loss function of the generator, we define:
LG=Lcont1Ladv+Lreg (2)
wherein λ1Is 1e-4, LcontAnd LadvSame as the setup in Ci et al, LregIs defined as:
Figure GDA0003256219200000031
wherein c, h and w are the channel number, height and width of the characteristic diagram, and M is a mask. The fused portion was taken as 100, and the other regions were taken as 0.1.
5) An image pair required for training is generated. The two images in an image pair need only differ in the edge information of the embedded portion to enable the generator to learn the mapping of the fused image to the normal image. Thus, we create an image pair by overlaying objects in the images in the original dataset with objects of the same class automatically cropped from the clone image C. Considering the appearance similarity of the same species, the two images (original image and processed image) are almost the same except for the edge information, so we can directly regard the original image as a real image and the processed image as a false image.
The method can expand the original data sets with the quantity of about 1000 or 2000 data sets to the quantity level of tens of thousands, and can save a great deal of labor labeling cost.
Drawings
Fig. 1 is a diagram of a network architecture of the present invention.
Fig. 2 shows the result of the UDD data set generation of the present invention (a) as the original image, (b) as the image after poisson fusion, and (c) as the final generated image.
FIG. 3 is an enhanced image of the Copy-packing method on UDD, with (a) input; (b) is the output.
FIG. 4 is an enhanced image of another style of transforming GAN on UDD (a) into cycleGAN; (b) is StarGAN.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention is provided.
Our Poisson GAN was implemented on PyTorch. We train and reason for this using an input size of 512 x 512. We use the Adam optimizer, initializing the learning rate to 1e-4 in the generator and discriminator, and then reducing it to 1e-5 after 125K iterations. Our experiments were performed on a single NVIDIA TITAN XP GPU with a batch size of 4. We use the UDD dataset as the original dataset for data enhancement. UDD is a real marine ranch target detection data set, which comprises 2227 pictures (1827 training and 400 testing) of three detection targets of sea cucumber, sea urchin and scallop.
To construct the object set P, we cut 1000 sea cucumbers, 150 sea cucumbers and 35 scallops from UDD and then fuse them into a background image by Poisson GAN. We generated a Poisson GAN composite image using the image in UDD as the background image. Thus, these images are more realistic and can be used as a complement to UDD. The data set contained 18661 images of 18350 sea cucumbers, 101422 sea urchins, 9624 scallops, which we named AUDD.
We put the example and background image cut from the NSFC-dataset (another underwater target detection dataset) into poisson fusion, generate a clone image and construct a pre-trained dataset. The main purpose of this data set is to make the detector more robust in the automatic grabbing process, including up to 589080 images of different background colors, different viewing angles, different terrain.
FIG. 4 shows images generated by cycleGAN and StarGAN. They all achieve data enhancement by changing the background color. As mentioned previously, they cannot solve the category imbalance problem. Also, some small objects may disappear during the transformation, which is detrimental to training the target detection model. In contrast, Poisson GAN can change the location of objects and retain all objects.
The Copy-paging method is a small target data enhancement method that uses an instance split mask to Copy small objects from their original location to another location, as shown in FIG. 3. Unfortunately, this approach is only applicable to datasets with instance segmentation masks and is therefore difficult to use in UDD. And the method does not carry out extra smoothing processing on the edge of the pasted object, thereby reducing the quality of the generated image and leading the synthesized image to be obviously unnatural.
We used the extended dataset to train YOLOv3 to demonstrate the effectiveness of Poisson GAN. The detector is trained on 70 epochs on a pre-training data set and then on the AUDD using pre-training parameters. The test results of the UDD test set are shown in table 1. Compared with the results in the third row of table 2, the model has significant improvement (improvement of about 30%) in the mAP, and the problem of class imbalance is solved to a great extent.
In addition, we also performed experiments to compare the performance of different initializations. We trained yollov 3 on AUDD using random initialization, ImageNet pre-training model, and pre-training model from the pre-training dataset, and tested on the UDD test set with the results shown in table 2. It is clear that AUDD is advantageous in solving the problems of insufficient training data and class imbalance. Using our pre-trained model, YOLOv3 achieved better accuracy than random initialization (+ 12%) and ImageNet (+ 9%). Poisson GAN improved the accuracy of YOLOv3 by 33.7% compared to the original results on UDD.
TABLE 1 UDD accuracy for different detection networks
Figure GDA0003256219200000061
TABLE 2 accuracy of YOLOv3 on AUDD with different initialization modes
Figure GDA0003256219200000062
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (2)

1. A method for constructing a spanning confrontation network for target detection data enhancement, the method comprising the steps of:
1) building a Poisson fusion part in a generator: embedding poisson fusion into a generator to alter the number, position or size of objects when generating a picture; assuming 3 types of objects, respectively selecting X, Y and Z objects from each type of objects in the original data set and then establishing an object set P; each time a picture is generated, x, y, z objects are randomly selected from the set P to form a subset
Figure FDA0003256219190000011
Pa={N1,...,Nx,O1,...,Oy,U1,...Uz}(1)
Wherein N, O and U respectively represent the categories in the original data set; then P is addedaEmbedding into a temporary image T ∈ R3×720×405In the method, a source image S epsilon R is generated3×720×405(ii) a To eliminate sharp boundaries, a mask M is automatically created from the embedded positions in S, and then compared with the background image B ∈ R3×720×405Combining the source image S with the source image S to obtain a clone image C;
2) the network learning part in the construction generator: constructing a network; building an encoder using a 3 x 3 convolutional layer stack using U-Net as a backbone structure; for the decoder, 4 ResNeXt blocks are used for construction, and are marked as a block n, n is equal to {1, …, 4 };
3) building an identifier; the discriminator is also made up of a stack of resenext blocks and convolutional layers; this architecture is the same as the setup in SRGAN, adding more layers to handle 512 x 512 resolution input;
4) setting a loss function; for the loss function of the generator, defined as: l isG=Lcont1Ladv+Lreg(2)
Wherein λ1Is 1e-4, LregIs defined as:
Figure FDA0003256219190000012
wherein c, h and w are the channel number, height and width of the characteristic diagram, and M is a mask; taking 100 as a fusion part and 0.1 as other areas;
5) generating an image pair required for training: the two images in one image pair only need to differ in the edge information of the embedded portion, causing the generator to learn the mapping of the fused image to the normal image, creating an image pair by overlaying the objects in the images in the original dataset with the same class of objects automatically cropped from the clone image C; in consideration of the appearance similarity of the same species, the original image and the processed image are almost the same except for the edge information, so that the original image is directly regarded as a real image and the processed image is regarded as a false image.
2. The method of claim 1, wherein for a decoder, 4 ResNeXt block constructions are used, with block n set to [20,10,10,5 ].
CN201911301874.XA 2019-12-17 2019-12-17 Construction method of generation countermeasure network for target detection data enhancement Active CN111091151B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911301874.XA CN111091151B (en) 2019-12-17 2019-12-17 Construction method of generation countermeasure network for target detection data enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911301874.XA CN111091151B (en) 2019-12-17 2019-12-17 Construction method of generation countermeasure network for target detection data enhancement

Publications (2)

Publication Number Publication Date
CN111091151A CN111091151A (en) 2020-05-01
CN111091151B true CN111091151B (en) 2021-11-05

Family

ID=70395675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911301874.XA Active CN111091151B (en) 2019-12-17 2019-12-17 Construction method of generation countermeasure network for target detection data enhancement

Country Status (1)

Country Link
CN (1) CN111091151B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832443B (en) * 2020-06-28 2022-04-12 华中科技大学 Construction method and application of construction violation detection model
CN112800906B (en) * 2021-01-19 2022-08-30 吉林大学 Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
US11785262B1 (en) 2022-03-16 2023-10-10 International Business Machines Corporation Dynamic compression of audio-visual data
US12079912B2 (en) 2022-11-10 2024-09-03 International Business Machines Corporation Enhancing images in text documents
CN117409192B (en) * 2023-12-14 2024-03-08 武汉大学 Data enhancement-based infrared small target detection method and device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018053340A1 (en) * 2016-09-15 2018-03-22 Twitter, Inc. Super resolution using a generative adversarial network
US20180336439A1 (en) * 2017-05-18 2018-11-22 Intel Corporation Novelty detection using discriminator of generative adversarial network
US10210631B1 (en) * 2017-08-18 2019-02-19 Synapse Technology Corporation Generating synthetic image data
CN109377448B (en) * 2018-05-20 2021-05-07 北京工业大学 Face image restoration method based on generation countermeasure network
CN109190504B (en) * 2018-08-10 2020-12-22 百度在线网络技术(北京)有限公司 Automobile image data processing method and device and readable storage medium
CN109409274B (en) * 2018-10-18 2020-09-04 四川云从天府人工智能科技有限公司 Face image transformation method based on face three-dimensional reconstruction and face alignment
CN109993825B (en) * 2019-03-11 2023-06-20 北京工业大学 Three-dimensional reconstruction method based on deep learning
CN110222628A (en) * 2019-06-03 2019-09-10 电子科技大学 A kind of face restorative procedure based on production confrontation network
CN110378985B (en) * 2019-07-19 2023-04-28 中国传媒大学 Animation drawing auxiliary creation method based on GAN

Also Published As

Publication number Publication date
CN111091151A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
CN111091151B (en) Construction method of generation countermeasure network for target detection data enhancement
Pittaluga et al. Revealing scenes by inverting structure from motion reconstructions
Li et al. AADS: Augmented autonomous driving simulation using data-driven algorithms
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN111292264B (en) Image high dynamic range reconstruction method based on deep learning
CN110111236B (en) Multi-target sketch image generation method based on progressive confrontation generation network
CN109919209B (en) Domain self-adaptive deep learning method and readable storage medium
CN109829391B (en) Significance target detection method based on cascade convolution network and counterstudy
CN101714262A (en) Method for reconstructing three-dimensional scene of single image
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN108416751A (en) A kind of new viewpoint image combining method assisting full resolution network based on depth
Hong et al. USOD10K: a new benchmark dataset for underwater salient object detection
CN115131492A (en) Target object relighting method and device, storage medium and background replacement method
CN115049556A (en) StyleGAN-based face image restoration method
Lin et al. Immesh: An immediate lidar localization and meshing framework
CN115272437A (en) Image depth estimation method and device based on global and local features
Yang et al. Underwater self-supervised depth estimation
Vallone et al. Danish airs and grounds: A dataset for aerial-to-street-level place recognition and localization
Haji-Esmaeili et al. Large-scale monocular depth estimation in the wild
Sun et al. Learning robust image-based rendering on sparse scene geometry via depth completion
Qiu et al. Multi-scale Fusion for Visible Watermark Removal
Domingo et al. 3d visualization of mangrove and aquaculture conversion in Banate Bay, Iloilo
Tang et al. NDPC-Net: A dehazing network in nighttime hazy traffic environments
Liang et al. Building placements in urban modeling using conditional generative latent optimization
Tang et al. Feature Matching-Based Undersea Panoramic Image Stitching in VR Animation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant