CN111091151B - Construction method of generation countermeasure network for target detection data enhancement - Google Patents
Construction method of generation countermeasure network for target detection data enhancement Download PDFInfo
- Publication number
- CN111091151B CN111091151B CN201911301874.XA CN201911301874A CN111091151B CN 111091151 B CN111091151 B CN 111091151B CN 201911301874 A CN201911301874 A CN 201911301874A CN 111091151 B CN111091151 B CN 111091151B
- Authority
- CN
- China
- Prior art keywords
- image
- objects
- generator
- target detection
- countermeasure network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/251—Fusion techniques of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of computer image generation, and provides a method for constructing a generation countermeasure network for enhancing target detection data. The method fuses Poisson fusion in traditional digital image processing and generators in a generating countermeasure network, so that the size, the number and the position of detected objects on one picture can be changed by the generating countermeasure network. We also specifically designed a loss function for the generator to allow the generator to better generate the picture. The method can effectively solve the problem of class imbalance in the target detection task, so that the trained detection model can obtain better performance. Meanwhile, a large amount of labor cost is saved by automatically amplifying the small data set into the large data set.
Description
Technical Field
The invention belongs to the field of computer image generation, and relates to a method for generating a countermeasure network for enhancing target detection data.
Background
Data enhancement refers to adding more variation in the training data to improve the generalization ability of the training model. Currently, data enhancement strategies are widely applied in training CNNs, such as flipping, scaling, and the like. In recent years, the generation of antagonistic networks GAN has been shown to be excellent in a number of image2image jobs (OrestKupyn, VolodymyBurzan, Mykola Mykhailych, DmbroMishkin, and Jiri Matas. Deblurg GAN: bland Motion Debluring Using Conditional additive networks. arXiv e-prints, page arXiv:1711.07064, Nov 2017.). Thus, there has been work (Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-Yi Wu, Po-Hao Hsu, and Shang-Hong Lai. Auggan: Cross do-main adaptation with gate-based data augmentation. in vitamin-rio Ferrari, Martial Hebert, CristianSmith scu, and Yair Weiss, editors, Computer Vision-ECCV 2018, pages 731 and 744, Cham,2018.spring International Publishing). AugGAN (Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-Yi Wu, Po-Hao Hsu, and Shang-Hong Lai. Auggan: Cross do main adaptation with gan-based data augmentation. in vitamin-rio Ferrari, Martial Hebert, CristianSminChinese scu, and Yair Weiss, editors, Computer Vision-ECCV 2018, pages 731 and 744, Cham,2018.spring International publication.) has the ability to structure-aware semantic segmentation and soft weight sharing, so the resulting image is sufficiently realistic to be trained. But the ground route used by this method contains an instance mask, which is inconvenient for labeling. CycleGAN (Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexi A. Efront. Innovated image-to-image transformation using cycle-dependent adaptive training networks. in IEEE International Con-reference on Computer Vision,2017.) No pairing of training data is required, and there have been a number of efforts to use CycleGAN to achieve data enhancement (Weijian Deng, Liang Zheng, Guoliang Ka, Yi Yang, Qiax-iang Ye, and Jianbin J. image-added training with derived from RR-dependent simulation and phase-modification 1711.07027,2017). However, we cannot ignore the drawbacks of CycleGAN. Which tends to produce overfitting when generating images, thereby affecting the accuracy of the model.
Moreover, the existing GAN-based data enhancement method only realizes data enhancement by transforming the style of the image. This approach, while helpful for the image classification task, does not work well for the target detection task because the number, size, and location of objects in the image cannot be changed.
Disclosure of Invention
The invention aims to provide a construction method (Poisson GAN) of a generative confrontation network for enhancing target detection data, which can change the size, the number and the position of detected objects on a picture by fusing Poisson fusion in the traditional digital image processing and a generator in the generative confrontation network. We also specifically designed a loss function for the generator to allow the generator to better generate the picture. The method can effectively solve the problem of class imbalance in the target detection task, so that the trained detection model can obtain better performance.
The technical scheme of the invention is as follows:
a method of constructing a spanning confrontation network for target detection data augmentation comprising the steps of:
1) the Poisson fusion part in the generator is constructed, and the flow is shown in FIG. 1 (left). We embed poisson fusion into the generator to alter the number, position or size of objects when generating a picture. We select X, Y and Z objects from the original data set (assuming 3 classes of objects) and then build a set of objects P. Each time a picture is generated, x, y and z objects are randomly selected from the set P to form a subset
Pa={A1,...,Ax,B1,...,By,C1,...Cz} (1)
Wherein, A, B and C respectively represent the categories in the original data set. Then P is addedaEmbedding into a temporary image T ∈ R3 ×720×405In the method, a source image S epsilon R is generated3×720×405. To eliminate sharp boundaries, a mask M (each instance having its own mask) is automatically created from the embedded positions in S, and then compared with the background image B ∈ R3×720×405And combining the source image S with the source image S to obtain a clone image C.
2) The network learning part in the construction generator has a structure shown in fig. 1 (right). We constructed our network on the basis of the work of Ci et al (Yuanzheng Ci, Xinzhu Ma, Zhuihui Wang, Haojie Li, and ZhongxuanLuo. user-defined deep animal line art discovery with conditional adaptation network.2018 ACM Mul-time Conference on Multimedia Conference-MM' 18,2018). We build the encoder using a 3 x 3 convolutional layer stack, using U-Net as the backbone structure. For the decoder, 4 ResNeXt blocks are used to construct, denoted as block n, n ∈ {1, …, 4 }. In the experiment, we set block n to [20,10,10,5 ].
3) The discriminator was constructed as shown in fig. 1 (top). The discriminator is also made up of a stack of resenext blocks and convolutional layers. The architecture is similar to the setup for SRGAN. We add more layers to process 512 x 512 resolution input.
4) A loss function is set. For the loss function of the discriminator, we follow the function proposed by Ci et al in the paper. For the loss function of the generator, we define:
LG=Lcont+λ1Ladv+Lreg (2)
wherein λ1Is 1e-4, LcontAnd LadvSame as the setup in Ci et al, LregIs defined as:
wherein c, h and w are the channel number, height and width of the characteristic diagram, and M is a mask. The fused portion was taken as 100, and the other regions were taken as 0.1.
5) An image pair required for training is generated. The two images in an image pair need only differ in the edge information of the embedded portion to enable the generator to learn the mapping of the fused image to the normal image. Thus, we create an image pair by overlaying objects in the images in the original dataset with objects of the same class automatically cropped from the clone image C. Considering the appearance similarity of the same species, the two images (original image and processed image) are almost the same except for the edge information, so we can directly regard the original image as a real image and the processed image as a false image.
The method can expand the original data sets with the quantity of about 1000 or 2000 data sets to the quantity level of tens of thousands, and can save a great deal of labor labeling cost.
Drawings
Fig. 1 is a diagram of a network architecture of the present invention.
Fig. 2 shows the result of the UDD data set generation of the present invention (a) as the original image, (b) as the image after poisson fusion, and (c) as the final generated image.
FIG. 3 is an enhanced image of the Copy-packing method on UDD, with (a) input; (b) is the output.
FIG. 4 is an enhanced image of another style of transforming GAN on UDD (a) into cycleGAN; (b) is StarGAN.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention is provided.
Our Poisson GAN was implemented on PyTorch. We train and reason for this using an input size of 512 x 512. We use the Adam optimizer, initializing the learning rate to 1e-4 in the generator and discriminator, and then reducing it to 1e-5 after 125K iterations. Our experiments were performed on a single NVIDIA TITAN XP GPU with a batch size of 4. We use the UDD dataset as the original dataset for data enhancement. UDD is a real marine ranch target detection data set, which comprises 2227 pictures (1827 training and 400 testing) of three detection targets of sea cucumber, sea urchin and scallop.
To construct the object set P, we cut 1000 sea cucumbers, 150 sea cucumbers and 35 scallops from UDD and then fuse them into a background image by Poisson GAN. We generated a Poisson GAN composite image using the image in UDD as the background image. Thus, these images are more realistic and can be used as a complement to UDD. The data set contained 18661 images of 18350 sea cucumbers, 101422 sea urchins, 9624 scallops, which we named AUDD.
We put the example and background image cut from the NSFC-dataset (another underwater target detection dataset) into poisson fusion, generate a clone image and construct a pre-trained dataset. The main purpose of this data set is to make the detector more robust in the automatic grabbing process, including up to 589080 images of different background colors, different viewing angles, different terrain.
FIG. 4 shows images generated by cycleGAN and StarGAN. They all achieve data enhancement by changing the background color. As mentioned previously, they cannot solve the category imbalance problem. Also, some small objects may disappear during the transformation, which is detrimental to training the target detection model. In contrast, Poisson GAN can change the location of objects and retain all objects.
The Copy-paging method is a small target data enhancement method that uses an instance split mask to Copy small objects from their original location to another location, as shown in FIG. 3. Unfortunately, this approach is only applicable to datasets with instance segmentation masks and is therefore difficult to use in UDD. And the method does not carry out extra smoothing processing on the edge of the pasted object, thereby reducing the quality of the generated image and leading the synthesized image to be obviously unnatural.
We used the extended dataset to train YOLOv3 to demonstrate the effectiveness of Poisson GAN. The detector is trained on 70 epochs on a pre-training data set and then on the AUDD using pre-training parameters. The test results of the UDD test set are shown in table 1. Compared with the results in the third row of table 2, the model has significant improvement (improvement of about 30%) in the mAP, and the problem of class imbalance is solved to a great extent.
In addition, we also performed experiments to compare the performance of different initializations. We trained yollov 3 on AUDD using random initialization, ImageNet pre-training model, and pre-training model from the pre-training dataset, and tested on the UDD test set with the results shown in table 2. It is clear that AUDD is advantageous in solving the problems of insufficient training data and class imbalance. Using our pre-trained model, YOLOv3 achieved better accuracy than random initialization (+ 12%) and ImageNet (+ 9%). Poisson GAN improved the accuracy of YOLOv3 by 33.7% compared to the original results on UDD.
TABLE 1 UDD accuracy for different detection networks
TABLE 2 accuracy of YOLOv3 on AUDD with different initialization modes
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (2)
1. A method for constructing a spanning confrontation network for target detection data enhancement, the method comprising the steps of:
1) building a Poisson fusion part in a generator: embedding poisson fusion into a generator to alter the number, position or size of objects when generating a picture; assuming 3 types of objects, respectively selecting X, Y and Z objects from each type of objects in the original data set and then establishing an object set P; each time a picture is generated, x, y, z objects are randomly selected from the set P to form a subset
Pa={N1,...,Nx,O1,...,Oy,U1,...Uz}(1)
Wherein N, O and U respectively represent the categories in the original data set; then P is addedaEmbedding into a temporary image T ∈ R3×720×405In the method, a source image S epsilon R is generated3×720×405(ii) a To eliminate sharp boundaries, a mask M is automatically created from the embedded positions in S, and then compared with the background image B ∈ R3×720×405Combining the source image S with the source image S to obtain a clone image C;
2) the network learning part in the construction generator: constructing a network; building an encoder using a 3 x 3 convolutional layer stack using U-Net as a backbone structure; for the decoder, 4 ResNeXt blocks are used for construction, and are marked as a block n, n is equal to {1, …, 4 };
3) building an identifier; the discriminator is also made up of a stack of resenext blocks and convolutional layers; this architecture is the same as the setup in SRGAN, adding more layers to handle 512 x 512 resolution input;
4) setting a loss function; for the loss function of the generator, defined as: l isG=Lcont+λ1Ladv+Lreg(2)
Wherein λ1Is 1e-4, LregIs defined as:
wherein c, h and w are the channel number, height and width of the characteristic diagram, and M is a mask; taking 100 as a fusion part and 0.1 as other areas;
5) generating an image pair required for training: the two images in one image pair only need to differ in the edge information of the embedded portion, causing the generator to learn the mapping of the fused image to the normal image, creating an image pair by overlaying the objects in the images in the original dataset with the same class of objects automatically cropped from the clone image C; in consideration of the appearance similarity of the same species, the original image and the processed image are almost the same except for the edge information, so that the original image is directly regarded as a real image and the processed image is regarded as a false image.
2. The method of claim 1, wherein for a decoder, 4 ResNeXt block constructions are used, with block n set to [20,10,10,5 ].
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911301874.XA CN111091151B (en) | 2019-12-17 | 2019-12-17 | Construction method of generation countermeasure network for target detection data enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911301874.XA CN111091151B (en) | 2019-12-17 | 2019-12-17 | Construction method of generation countermeasure network for target detection data enhancement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111091151A CN111091151A (en) | 2020-05-01 |
CN111091151B true CN111091151B (en) | 2021-11-05 |
Family
ID=70395675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911301874.XA Active CN111091151B (en) | 2019-12-17 | 2019-12-17 | Construction method of generation countermeasure network for target detection data enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111091151B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111832443B (en) * | 2020-06-28 | 2022-04-12 | 华中科技大学 | Construction method and application of construction violation detection model |
CN112800906B (en) * | 2021-01-19 | 2022-08-30 | 吉林大学 | Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile |
US11785262B1 (en) | 2022-03-16 | 2023-10-10 | International Business Machines Corporation | Dynamic compression of audio-visual data |
US12079912B2 (en) | 2022-11-10 | 2024-09-03 | International Business Machines Corporation | Enhancing images in text documents |
CN117409192B (en) * | 2023-12-14 | 2024-03-08 | 武汉大学 | Data enhancement-based infrared small target detection method and device |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018053340A1 (en) * | 2016-09-15 | 2018-03-22 | Twitter, Inc. | Super resolution using a generative adversarial network |
US20180336439A1 (en) * | 2017-05-18 | 2018-11-22 | Intel Corporation | Novelty detection using discriminator of generative adversarial network |
US10210631B1 (en) * | 2017-08-18 | 2019-02-19 | Synapse Technology Corporation | Generating synthetic image data |
CN109377448B (en) * | 2018-05-20 | 2021-05-07 | 北京工业大学 | Face image restoration method based on generation countermeasure network |
CN109190504B (en) * | 2018-08-10 | 2020-12-22 | 百度在线网络技术(北京)有限公司 | Automobile image data processing method and device and readable storage medium |
CN109409274B (en) * | 2018-10-18 | 2020-09-04 | 四川云从天府人工智能科技有限公司 | Face image transformation method based on face three-dimensional reconstruction and face alignment |
CN109993825B (en) * | 2019-03-11 | 2023-06-20 | 北京工业大学 | Three-dimensional reconstruction method based on deep learning |
CN110222628A (en) * | 2019-06-03 | 2019-09-10 | 电子科技大学 | A kind of face restorative procedure based on production confrontation network |
CN110378985B (en) * | 2019-07-19 | 2023-04-28 | 中国传媒大学 | Animation drawing auxiliary creation method based on GAN |
-
2019
- 2019-12-17 CN CN201911301874.XA patent/CN111091151B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111091151A (en) | 2020-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111091151B (en) | Construction method of generation countermeasure network for target detection data enhancement | |
Pittaluga et al. | Revealing scenes by inverting structure from motion reconstructions | |
Li et al. | AADS: Augmented autonomous driving simulation using data-driven algorithms | |
CN113158862B (en) | Multitasking-based lightweight real-time face detection method | |
CN111292264B (en) | Image high dynamic range reconstruction method based on deep learning | |
CN110111236B (en) | Multi-target sketch image generation method based on progressive confrontation generation network | |
CN109919209B (en) | Domain self-adaptive deep learning method and readable storage medium | |
CN109829391B (en) | Significance target detection method based on cascade convolution network and counterstudy | |
CN101714262A (en) | Method for reconstructing three-dimensional scene of single image | |
CN110827312B (en) | Learning method based on cooperative visual attention neural network | |
CN108416751A (en) | A kind of new viewpoint image combining method assisting full resolution network based on depth | |
Hong et al. | USOD10K: a new benchmark dataset for underwater salient object detection | |
CN115131492A (en) | Target object relighting method and device, storage medium and background replacement method | |
CN115049556A (en) | StyleGAN-based face image restoration method | |
Lin et al. | Immesh: An immediate lidar localization and meshing framework | |
CN115272437A (en) | Image depth estimation method and device based on global and local features | |
Yang et al. | Underwater self-supervised depth estimation | |
Vallone et al. | Danish airs and grounds: A dataset for aerial-to-street-level place recognition and localization | |
Haji-Esmaeili et al. | Large-scale monocular depth estimation in the wild | |
Sun et al. | Learning robust image-based rendering on sparse scene geometry via depth completion | |
Qiu et al. | Multi-scale Fusion for Visible Watermark Removal | |
Domingo et al. | 3d visualization of mangrove and aquaculture conversion in Banate Bay, Iloilo | |
Tang et al. | NDPC-Net: A dehazing network in nighttime hazy traffic environments | |
Liang et al. | Building placements in urban modeling using conditional generative latent optimization | |
Tang et al. | Feature Matching-Based Undersea Panoramic Image Stitching in VR Animation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |