CN114049536A - Virtual sample generation method and device, storage medium and electronic equipment - Google Patents

Virtual sample generation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN114049536A
CN114049536A CN202111365555.2A CN202111365555A CN114049536A CN 114049536 A CN114049536 A CN 114049536A CN 202111365555 A CN202111365555 A CN 202111365555A CN 114049536 A CN114049536 A CN 114049536A
Authority
CN
China
Prior art keywords
sample
data
samples
virtual
data samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111365555.2A
Other languages
Chinese (zh)
Inventor
韦泰丞
刘雁兵
左少燕
王吉斌
陈浩
王金桥
朱优松
赵朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Tobacco Guangxi Industrial Co Ltd
Original Assignee
China Tobacco Guangxi Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Tobacco Guangxi Industrial Co Ltd filed Critical China Tobacco Guangxi Industrial Co Ltd
Priority to CN202111365555.2A priority Critical patent/CN114049536A/en
Publication of CN114049536A publication Critical patent/CN114049536A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了一种虚拟样本生成方法、装置、存储介质及电子设备,该方法包括:获取背景场景样本和待检测的数据样本;根据数据样本的信息构建样本关系模型;将待检测的数据样本进行数据增强,得到增强后的数据样本;根据样本关系模型构建的排列规则对增强后的数据样本进行排列后嵌入背景场景样本,生成第一样本;根据对抗式网络对第一样本进行风格迁移,得到虚拟样本。通过实施本发明,在进行高质量虚拟样本生成的过程中,通过引入关系建模,使得生成虚拟样本中样本排列更加符合实际应用场景,同时利用生成对抗式网络对数据样本进行风格迁移,从而使最终生成的样本风格具有一致性,且使得样本图像能够平滑的嵌入背景中,使图像更加逼真。

Figure 202111365555

The invention discloses a virtual sample generation method, device, storage medium and electronic equipment. The method includes: acquiring background scene samples and data samples to be detected; constructing a sample relationship model according to the information of the data samples; Perform data enhancement to obtain enhanced data samples; arrange the enhanced data samples according to the arrangement rules constructed by the sample relationship model, and then embed the background scene samples to generate the first sample; according to the adversarial network, the first sample is styled Migrate to get a dummy sample. By implementing the present invention, in the process of generating high-quality virtual samples, by introducing relational modeling, the arrangement of samples in the generated virtual samples is more in line with the actual application scenario, and at the same time, the generative adversarial network is used to transfer the style of the data samples, so that the The final generated sample style is consistent, and the sample image can be smoothly embedded in the background, making the image more realistic.

Figure 202111365555

Description

一种虚拟样本生成方法、装置、存储介质及电子设备A virtual sample generation method, device, storage medium and electronic device

技术领域technical field

本发明涉及计算机视觉和模式识别技术领域,具体涉及一种虚拟样本生成方法、装置、存储介质及电子设备。The invention relates to the technical field of computer vision and pattern recognition, and in particular to a virtual sample generation method, device, storage medium and electronic device.

背景技术Background technique

近年来人工智能技术在工业发展迅速,人们利用深度学习方法,进行图像识别、目标检测,帮助人们完成各种需求。其中利用神经网络算法构建的分类与检测模型在工业上取得了较好的效果。而图像识别等技术需要依赖大数据的驱动,依靠大量的训练数据对模型进行训练学习后,才能达到较好的实际应用效果。In recent years, artificial intelligence technology has developed rapidly in the industry. People use deep learning methods to perform image recognition and target detection to help people meet various needs. Among them, the classification and detection model constructed by neural network algorithm has achieved good results in industry. However, technologies such as image recognition need to rely on the drive of big data, and only after relying on a large amount of training data to train and learn the model can achieve better practical application results.

在实际应用场景下,常常遇到数据不足的情况,模型在少量数据下无法学习得到较好的表现,因此为了获得更好的算法效果,需要通过人工的手段去获得更多包含丰富特征的图片。然而,样本量越大,不仅会导致成本增加,同时,还会增加样本获取的难度。In practical application scenarios, there is often insufficient data, and the model cannot learn to perform well with a small amount of data. Therefore, in order to obtain better algorithm effects, it is necessary to obtain more pictures with rich features by artificial means. . However, a larger sample size will not only increase the cost, but also increase the difficulty of sample acquisition.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明实施例提供了涉及一种虚拟样本生成方法、装置、存储介质及电子设备,以解决现有技术中在对模型进行训练学习时,训练样本少的技术问题。In view of this, embodiments of the present invention provide a virtual sample generation method, device, storage medium and electronic device to solve the technical problem of few training samples in the prior art when training and learning a model.

本发明提出的技术方案如下:The technical scheme proposed by the present invention is as follows:

本发明实施例第一方面提供一种虚拟样本生成方法,包括:获取背景场景样本和待检测的数据样本;根据数据样本的信息构建样本关系模型;将待检测的数据样本进行数据增强,得到增强后的数据样本;根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列后嵌入所述背景场景样本,生成第一样本;根据对抗式网络对第一样本进行风格迁移,得到虚拟样本。A first aspect of the embodiments of the present invention provides a method for generating virtual samples, including: acquiring background scene samples and data samples to be detected; constructing a sample relationship model according to information of the data samples; performing data enhancement on the data samples to be detected to obtain enhanced After the enhanced data samples are arranged according to the arrangement rules constructed by the sample relationship model, the background scene samples are embedded to generate the first sample; the style transfer is performed on the first sample according to the adversarial network, Get a dummy sample.

可选地,数据增强的方式包括:尺度随机抖动、基于HSV空间的数据增强、图像锐度增强以及随机翻转与旋转;将待检测的数据样本进行数据增强,包括:对每种数据扩增方式根据预设概率随机对待检测的数据样本进行数据增强。Optionally, the data enhancement method includes: random scale jitter, data enhancement based on HSV space, image sharpness enhancement, and random flip and rotation; data enhancement is performed on the data samples to be detected, including: for each data enhancement method Data enhancement is performed randomly on the data samples to be detected according to a preset probability.

可选地,根据数据样本的信息构建样本关系模型,包括:获取数据样本的销售信息;根据所述销售信息确定数据样本的品规信息;基于数据挖掘算法,根据所述销售信息和品规信息计算每个数据样本品牌销售量与其他品牌销售量的关联性,得到相关性大于阈值的数据品规;根据数据样本的品规信息和相关性大于阈值的数据品规构建样本关系模型。Optionally, constructing a sample relationship model according to the information of the data samples includes: acquiring sales information of the data samples; determining product specification information of the data samples according to the sales information; based on a data mining algorithm, according to the sales information and the product specification information Calculate the correlation between the sales volume of each data sample brand and other brands, and obtain the data specifications with the correlation greater than the threshold; build the sample relationship model according to the quality information of the data samples and the data specifications with the correlation greater than the threshold.

可选地,根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列后嵌入所述背景场景样本,生成第一样本,包括:根据随机摆放方式、品规信息或者相关性信息对数据样本进行排列,得到排列后的数据样本;将排列后的数据样本嵌入所述背景场景样本,生成第一样本。Optionally, the enhanced data samples are arranged according to the arrangement rules constructed by the sample relationship model and then embedded in the background scene samples to generate the first sample, including: according to the random arrangement method, product specification information or correlation The information arranges the data samples to obtain the arranged data samples; and embeds the arranged data samples into the background scene samples to generate a first sample.

可选地,所述对抗式网络的损失函数通过以下公式表示:Optionally, the loss function of the adversarial network is represented by the following formula:

Figure BDA0003360245090000021
Figure BDA0003360245090000021

其中,

Figure BDA0003360245090000022
表示对抗性损失,G表示生成器,DY表示判别器,
Figure BDA0003360245090000023
表示循环一致性损失。in,
Figure BDA0003360245090000022
represents the adversarial loss, G represents the generator, D Y represents the discriminator,
Figure BDA0003360245090000023
represents the cycle consistency loss.

可选地,根据对抗式网络对第一样本进行风格迁移,得到虚拟样本,包括:将第一样本中的每个数据样本图像剪切后输入至对抗式网络进行风格迁移,得到迁移后的数据样本图像;将迁移后的数据样本图像嵌入会第一样本中,得到虚拟样本。Optionally, performing style transfer on the first sample according to the adversarial network to obtain a virtual sample, comprising: cutting each data sample image in the first sample and inputting it to the adversarial network to perform style transfer, and obtaining a transferred image. Embed the migrated data sample image into the first sample to obtain a virtual sample.

可选地,待检测的数据样本为卷烟小盒外包装样本。Optionally, the data sample to be detected is a sample of the outer packaging of a cigarette pack.

本发明实施例第二方面提供一种虚拟样本生成装置,包括:样本获取模块,用于获取背景场景样本和待检测的数据样本;关系模型构建模块,用于根据数据样本的信息构建样本关系模型;增强模块,用于将待检测的数据样本进行数据增强,得到增强后的数据样本;排列模块,用于根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列后嵌入所述背景场景样本,生成第一样本;风格迁移模块,用于根据对抗式网络对第一样本进行风格迁移,得到虚拟样本。A second aspect of the embodiments of the present invention provides a virtual sample generation device, including: a sample acquisition module for acquiring background scene samples and data samples to be detected; a relational model building module for constructing a sample relational model according to information of the data samples The enhancement module is used to perform data enhancement on the data samples to be detected to obtain the enhanced data samples; the arrangement module is used for arranging the enhanced data samples according to the arrangement rules constructed by the sample relationship model and then embedding the enhanced data samples. The background scene sample is used to generate a first sample; the style transfer module is used for performing style transfer on the first sample according to the adversarial network to obtain a virtual sample.

本发明实施例第三方面提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使所述计算机执行如本发明实施例第一方面及第一方面任一项所述的虚拟样本生成方法。A third aspect of the embodiments of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the first and first aspects of the embodiments of the present invention. The virtual sample generation method of any one of the aspects.

本发明实施例第四方面提供一种电子设备,包括:存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行如本发明实施例第一方面及第一方面任一项所述的虚拟样本生成方法。A fourth aspect of the embodiments of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor are connected in communication with each other, the memory stores computer instructions, and the processor executes the computer instructions, so as to execute the virtual sample generation method according to any one of the first aspect and the first aspect of the embodiments of the present invention.

本发明提供的技术方案,具有如下效果:The technical scheme provided by the invention has the following effects:

本发明实施例提供的虚拟样本生成方法、装置、存储介质及电子设备,在进行高质量虚拟样本生成的过程中,通过引入关系建模,使得生成虚拟样本中样本排列更加符合实际应用场景,同时利用生成对抗式网络对数据样本进行风格迁移,从而使最终生成的样本风格具有一致性,且使得样本图像能够平滑的嵌入背景中,使图像更加逼真。In the virtual sample generation method, device, storage medium and electronic device provided by the embodiments of the present invention, in the process of generating high-quality virtual samples, by introducing relationship modeling, the arrangement of samples in the generated virtual samples is more in line with actual application scenarios, and at the same time Generative adversarial network is used to transfer the style of the data samples, so that the final generated sample style is consistent, and the sample image can be smoothly embedded in the background, making the image more realistic.

附图说明Description of drawings

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the specific embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the specific embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative efforts.

图1是根据本发明实施例的虚拟样本生成方法的流程图;1 is a flowchart of a method for generating a virtual sample according to an embodiment of the present invention;

图2是根据本发明实施例的基于图像尺寸随机抖动的数据增强方式示意图;2 is a schematic diagram of a data enhancement method based on random dithering of image size according to an embodiment of the present invention;

图3是根据本发明实施例的基于HSV空间的数据增强方式示意图;3 is a schematic diagram of a data enhancement method based on HSV space according to an embodiment of the present invention;

图4是根据本发明实施例的基于图像锐度增强的数据增强方式示意图;4 is a schematic diagram of a data enhancement method based on image sharpness enhancement according to an embodiment of the present invention;

图5是根据本发明实施例的基于随机翻转与旋转的数据增强方式示意图;5 is a schematic diagram of a data enhancement method based on random flip and rotation according to an embodiment of the present invention;

图6是根据本发明实施例的虚拟样本生成方法生成的虚拟样本示意图;6 is a schematic diagram of a virtual sample generated by a method for generating a virtual sample according to an embodiment of the present invention;

图7是根据本发明另一实施例的虚拟样本生成方法的流程图;7 is a flowchart of a method for generating a virtual sample according to another embodiment of the present invention;

图8是根据本发明实施例的虚拟样本生成装置的结构框图;8 is a structural block diagram of a virtual sample generating apparatus according to an embodiment of the present invention;

图9是根据本发明实施例提供的计算机可读存储介质的结构示意图;9 is a schematic structural diagram of a computer-readable storage medium provided according to an embodiment of the present invention;

图10是根据本发明实施例提供的电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device provided according to an embodiment of the present invention.

具体实施方式Detailed ways

正如在背景技术中所述,模型在少量数据下无法学习得到较好的表现,因此为了获得更好的算法效果,需要通过人工的手段去获得更多包含丰富特征的图片。目前增大样本的方式两种,一种是通过调节采样频率扩大样本被训练的次数,但是这种方法可能造成模型对于训练样本的过拟合,同时也降低了模型的鲁棒性,有可能反而降低模型实际的表现。另一种方式是通过对少量数据样本进行数据增强,通过利用现有数据集图像进行微小的调整,比如旋转、位移、翻转等方式,这是通过引入了先验知识语义不变性,即这些操作不会改变图像的语义信息,通过这些操作,可以让网络更具有鲁棒性,同时对那些数量较少类别的图像进行一种补充。As described in the background art, the model cannot learn to obtain better performance with a small amount of data, so in order to obtain better algorithm effects, it is necessary to obtain more pictures containing rich features by artificial means. At present, there are two ways to increase the sample. One is to adjust the sampling frequency to expand the number of times the sample is trained. However, this method may cause the model to overfit the training sample and reduce the robustness of the model. It is possible that On the contrary, it reduces the actual performance of the model. Another way is to perform data enhancement on a small number of data samples, by using existing dataset images to make minor adjustments, such as rotation, displacement, flip, etc., which is achieved by introducing a priori knowledge semantic invariance, that is, these operations Without changing the semantic information of the image, these operations can make the network more robust, and at the same time make a complement to those images with a smaller number of categories.

在图像分类领域的数据增强,即采用第二种方式增大样本,是针对整张图像进行变换操作,由于整张图像中只有一个类别,所以无论进行怎么样的变化,其标签不会发生变化,在目标检测领域,一张图像中包含了不同种类型的待识别样本,同时还需定位目标的位置信息,即采用第二种方式增大样本时生成的样本质量较差。因此,增大样本量时还需要更加复杂的处理。Data enhancement in the field of image classification, that is, using the second method to increase the sample, is to transform the entire image. Since there is only one category in the entire image, no matter how the change is made, its label will not change. , in the field of target detection, an image contains different types of samples to be identified, and the location information of the target needs to be located at the same time, that is, the quality of the samples generated when the samples are increased by the second method is poor. Therefore, increasing the sample size also requires more complex processing.

有鉴于此,本发明实施例提供一种虚拟样本生成方法,包括:获取背景场景样本和待检测的数据样本;根据数据样本的品规信息构建样本关系模型;将待检测的数据样本进行数据增强,得到增强后的数据样本;根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列,得到排列后的数据样本;将排列后的数据样本嵌入所述背景场景样本,生成第一样本;将第一样本中的每个数据样本图像剪切后输入至对抗式网络进行风格迁移,得到迁移后的数据样本图像;将迁移后的数据样本图像嵌入会第一样本中,得到虚拟样本。In view of this, an embodiment of the present invention provides a method for generating a virtual sample, including: acquiring a background scene sample and a data sample to be detected; constructing a sample relationship model according to the product specification information of the data sample; performing data enhancement on the data sample to be detected , obtain the enhanced data samples; arrange the enhanced data samples according to the arrangement rules constructed by the sample relation model to obtain the arranged data samples; embed the arranged data samples into the background scene samples to generate the first sample; cut each data sample image in the first sample and input it to the adversarial network for style transfer to obtain the migrated data sample image; embed the migrated data sample image into the first sample, Get a dummy sample.

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of the present invention.

本发明实施例提供一种虚拟样本生成方法,如图1所示,该方法包括如下步骤:An embodiment of the present invention provides a virtual sample generation method. As shown in FIG. 1 , the method includes the following steps:

步骤S101:获取背景场景样本和待检测的数据样本。Step S101: Obtain background scene samples and data samples to be detected.

具体地,该数据样本可以是需要增大样本的数据。在一实施方式中,该数据为卷烟小盒外包装样本。目前,对卷烟图像采用神经网络进行检测时,通常是获取卷烟零售终端的图像。但是,由于部分卷烟品规(品牌和规格)上架率交底,在采集的图像中数量较少,从而导致在模型训练时该类别无法得到充分的训练,影响模型的检测效果。由此,可以采用该虚拟样本对卷烟小盒外包装图像进行虚拟样本的生成,增大样本量。此外,该数据样本也可以是其他需要进行增大样本的数据。Specifically, the data sample may be data for which the sample needs to be increased. In one embodiment, the data is a sample of cigarette pack outer packaging. At present, when a neural network is used to detect a cigarette image, an image of a cigarette retail terminal is usually obtained. However, due to the disclosure rate of some cigarette product specifications (brands and specifications), the number of collected images is small, so that the category cannot be fully trained during model training, which affects the detection effect of the model. Therefore, the virtual sample can be used to generate a virtual sample for the outer package image of the cigarette pack, thereby increasing the sample size. In addition, the data sample may also be other data that needs to be sampled.

在一实施方式中,当待检测的数据样本为卷烟小盒外包装样本时,该背景场景样本可以是包含摆放卷烟的零售柜图像的样本。In one embodiment, when the data sample to be detected is a sample of the outer packaging of a cigarette pack, the background scene sample may be a sample including an image of a retail cabinet where cigarettes are placed.

步骤S102:根据数据样本的信息构建样本关系模型。具体地,为了使生成的虚拟样本更贴合实际应用场景,提升生成虚拟样本的质量,可以根据数据样本的相关信息构建样本关系模型。如为了使卷烟虚拟样本更贴合零售场景,可以获取卷烟的零售或销售信息构建卷烟的关系模型。Step S102: Build a sample relationship model according to the information of the data samples. Specifically, in order to make the generated virtual samples more suitable for actual application scenarios and improve the quality of the generated virtual samples, a sample relationship model can be constructed according to the relevant information of the data samples. For example, in order to make the virtual sample of cigarettes more suitable for the retail scene, the retail or sales information of cigarettes can be obtained to build a relationship model of cigarettes.

在一实施方式中,根据数据样本的信息构建样本关系模型,包括:获取数据样本的销售信息;根据所述销售信息确定数据样本的品规信息;基于数据挖掘算法,根据所述销售信息和品规信息计算每个数据样本品牌销售量与其他品牌销售量的关联性,得到相关性大于阈值的数据品规;根据数据样本的品规信息和相关性大于阈值的数据品规构建样本关系模型。In one embodiment, constructing a sample relationship model according to the information of the data samples includes: acquiring the sales information of the data samples; determining the product specification information of the data samples according to the sales information; According to the regulation information, the correlation between the sales volume of each data sample brand and the sales volume of other brands is calculated, and the data regulation with the correlation greater than the threshold is obtained.

对于卷烟样本,可以通过收集零售店铺在售卷烟品规信息和卷烟销售的相关数据,利用数据挖掘算法如Apriori算法对收集到的卷烟销售信息计算每个卷烟品牌销售量与其它卷烟品牌销售量的关联性,通过将需要计算的卷烟品牌输入数据挖掘算法模型,计算后得到与卷烟样本中每个卷烟品规相关性最大的5个其余卷烟品规,将其与卷烟品规信息一同存入数据库,得到卷烟的关系模型。由此,通过对数据库查询可以获得与指定某一卷烟品牌规格关联性最高的几种卷烟品牌规格,也可以通过查询获取该品规卷烟所属的卷烟品牌,生产厂商等信息。For cigarette samples, by collecting cigarette product specification information and cigarette sales related data in retail stores, data mining algorithms such as Apriori algorithm can be used to calculate the difference between the sales volume of each cigarette brand and the sales volume of other cigarette brands on the collected cigarette sales information. Correlation, by inputting the cigarette brand to be calculated into the data mining algorithm model, after calculation, the 5 remaining cigarette specifications with the greatest correlation with each cigarette specification in the cigarette sample are obtained, and they are stored in the database together with the cigarette specification information. , get the relational model of cigarettes. Therefore, by querying the database, several cigarette brand specifications that are most related to a specified cigarette brand specification can be obtained, and information such as the cigarette brand and manufacturer to which the cigarette of this specification belongs can also be obtained by querying.

步骤S103:将待检测的数据样本进行数据增强,得到增强后的数据样本。具体地,由于需要以少量的图像数据生成大量的高质量虚拟样本,利用单纯的切图、贴图手段生成的样本不能满足卷烟小盒外包装图像样本大批量的需求,同时也缺少了图像的多样性。为了能够增强生成图像的多样性,提高扩增数据的质量,在每一次数据扩增的过程中采用针对样本的随机数据增强手段,以此来增添图像的丰富性。因此,将待检测的数据样本进行数据增强时,可以选择多种数据扩增方式,针对每种数据扩增方式都按照预设概率随机对待检测的数据样本进行数据增强。Step S103: Perform data enhancement on the data samples to be detected to obtain enhanced data samples. Specifically, due to the need to generate a large number of high-quality virtual samples with a small amount of image data, the samples generated by simply cutting and mapping methods cannot meet the needs of large quantities of image samples for cigarette packs, and also lack the variety of images. sex. In order to enhance the diversity of generated images and improve the quality of augmented data, random data augmentation methods for samples are used in each data augmentation process to increase the richness of images. Therefore, when performing data enhancement on the data samples to be detected, a variety of data amplification methods can be selected, and for each data amplification method, the data samples to be detected are randomly enhanced according to a preset probability.

在一实施方式中,对于卷烟样本,其数据扩增策略主要围绕颜色通道和几何变换两个大方向进行。由此,该数据增强方式包括:图像尺度随机抖动、基于HSV颜色空间的数据增强、图像锐度增强、随机翻转与旋转等增强手段,从而增加虚拟样本生成时的丰富度,使得每次生成的图像都不相同。In one embodiment, for cigarette samples, the data augmentation strategy is mainly carried out in two general directions: color channel and geometric transformation. Therefore, the data enhancement method includes: image scale random dithering, data enhancement based on HSV color space, image sharpness enhancement, random flip and rotation and other enhancement methods, so as to increase the richness of virtual samples during generation, so that each generated The images are all different.

其中,如图2所示,尺度随机抖动是将待扩增的图像大小随机以0.5-1.5的倍率之间选择一个系数进行放缩变化,扩增后图像大小为原图长宽乘以该系数;如图3所示,基于HSV空间的数据增强是将图像从RGB空间转为HSV空间,随后从色相、饱和度、亮度这三个维度值对图像微调,调节范围保持在0.8-1.2的范围内,模仿不同光照下的图像采集,为了不生成过于失真的图像,所以在较小的范围内进行调节,从而进一步增强了扩增图像的多样性,RGB到HSV转换公式通过如下公式表示:Among them, as shown in Figure 2, the random scale jitter is to randomly select a coefficient between 0.5 and 1.5 to scale the size of the image to be amplified, and the size of the amplified image is the length and width of the original image multiplied by the coefficient. ; As shown in Figure 3, the data enhancement based on HSV space is to convert the image from RGB space to HSV space, and then fine-tune the image from the three dimensional values of hue, saturation and brightness, and the adjustment range is kept in the range of 0.8-1.2 In order to imitate the image acquisition under different lighting conditions, in order not to generate too distorted images, adjustments are made in a small range, thereby further enhancing the diversity of the amplified images. The RGB to HSV conversion formula is expressed by the following formula:

Figure BDA0003360245090000071
Figure BDA0003360245090000071

H=0°,max(R,G,B)=min(R,G,B)H=0°, max(R,G,B)=min(R,G,B)

Figure BDA0003360245090000081
Figure BDA0003360245090000081

Figure BDA0003360245090000082
Figure BDA0003360245090000082

Figure BDA0003360245090000083
Figure BDA0003360245090000083

如图4所示,图像锐度增强是通过图像锐化将图像的边缘特征和整体清晰度进行调节;如图5所示,随机翻转与旋转是随机对扩增卷烟的实例进行水平翻转操作和微小角度的倾斜旋转,用来模拟卷烟摆放时的多种摆放方式,增强模型的鲁棒性。As shown in Figure 4, image sharpness enhancement is to adjust the edge characteristics and overall sharpness of the image through image sharpening; as shown in Figure 5, random flip and rotation is to randomly perform horizontal flip operations and The tilt and rotation of a small angle is used to simulate various placement methods of cigarettes, which enhances the robustness of the model.

基于上述提到的四种数据扩增方式,可以得到十种对样本进行扩增的方式。具体包括:1、边界框的水平移动2、边界框的垂直移动3、边界框的大小抖动4、边界框的亮度调节5、边界框的饱和度调节6、边界框的色相调节7、边界框的锐化调节8、边界框的水平翻转9、边界框的旋转10、直方图均衡化。Based on the four data amplification methods mentioned above, ten methods for sample amplification can be obtained. Specifically, it includes: 1. Horizontal movement of the bounding box 2. Vertical movement of the bounding box 3. Size jitter of the bounding box 4. Brightness adjustment of the bounding box 5. Saturation adjustment of the bounding box 6. Hue adjustment of the bounding box 7. Bounding box The sharpening adjustment of 8, the horizontal flip of the bounding box 9, the rotation of the bounding box 10, the histogram equalization.

在对数据增强时,可以采用上述提到的十种数据扩增方法,基于预设概率随机对图像进行扩增。例如,该预设概率为0.5,那么,通过十种数据扩增方式,理论上每张图像可以通过扩增得到210=1024种不同的样本。但是,在实际应用时,对于卷烟数据样本通常是通过店铺采集的卷烟货柜图像剪裁获取,而剪裁得到卷烟样本时,不仅会剪裁该卷烟图像,还会剪裁卷烟周围的背景信息。由此,即使卷烟图像相同,其周围的背景信息也各不相同(例如采用的货柜不相同)。并且,对于剪裁的卷烟样本,在实际处理时,还会将其通过贴图的方式将数十个不同的卷烟样本贴在一张图像中。这样,即使对于相同的卷烟图像,由于其周围的背景信息以及周围的卷烟排列也是各不相同的。During data enhancement, the above-mentioned ten data augmentation methods can be used to augment images randomly based on preset probabilities. For example, if the preset probability is 0.5, then, through ten data augmentation methods, theoretically, each image can be augmented to obtain 2 10 =1024 different samples. However, in practical applications, cigarette data samples are usually obtained by clipping images of cigarette containers collected by stores. When a cigarette sample is obtained by clipping, not only the cigarette image but also the background information around the cigarette will be clipped. Therefore, even if the cigarette images are the same, the surrounding background information is different (for example, the containers used are different). And, for the cropped cigarette samples, during the actual processing, dozens of different cigarette samples will be pasted into one image by means of maps. In this way, even for the same cigarette image, the surrounding background information and the surrounding cigarette arrangement are different.

因此,在对一张图像中的每个卷烟样本进行数据增强时,每个卷烟样本周围的位置信息(包括周围的背景信息以及周围的卷烟排列)是各不相同的。由此,通过该方式进行数据增强,相比直接进行过采样等方式,可以防止网络过拟合,增强其鲁棒性。同时,通过数据增强,采用极少量的卷烟小盒外包装图像,生成大量的虚拟卷烟小盒外包装样本,可以扩充平衡卷烟小盒外包装识别的训练数据集,从而实现丰富数据多样性和卷烟类别均衡性的目的。Therefore, when data enhancement is performed on each cigarette sample in an image, the location information around each cigarette sample (including surrounding background information and surrounding cigarette arrangement) is different. Therefore, data enhancement in this way can prevent the network from overfitting and enhance its robustness compared with direct oversampling. At the same time, through data enhancement, a very small number of cigarette pack outer packaging images are used to generate a large number of virtual cigarette pack outer packaging samples, which can expand the training data set for balanced cigarette pack outer packaging identification, thereby realizing rich data diversity and cigarette packaging. The purpose of class balance.

步骤S104:根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列后嵌入所述背景场景样本,生成第一样本。具体地,对于虽然增强后的数据样本满足了增大样本的需求,但是,得到的增大的样本或者说扩充的训练数据集的质量不能满足需求。由此,为了使得增强后的数据样本更符合实际情况,可以采用基于样本关系模型构建的排列规则对增强后的数据样本进行排列。Step S104: Arrange the enhanced data samples according to the arrangement rule constructed by the sample relationship model, and then embed the background scene samples to generate a first sample. Specifically, although the enhanced data samples meet the requirements for increasing the samples, the obtained increased samples or the quality of the expanded training data set cannot meet the requirements. Therefore, in order to make the enhanced data samples more in line with the actual situation, the enhanced data samples can be arranged using an arrangement rule constructed based on the sample relationship model.

在一实施方式中,该排列规则包括随机摆放方式、基于品规信息或者相关性信息对数据样本进行排列的方式。其中,随机摆放方式是指完全随机的选取不同类别的卷烟样本,随机选择不同的位置进行摆放,在摆放时,需要注意位置和大小不会超过限制。基于品规信息摆放是指在每一行摆放时随机抽取一个卷烟品规,输入关系模型后选取所有该厂家的卷烟品规进行摆放,保证每一行的香烟品规是相同的。基于相关性信息摆放是指对某个种类的卷烟样本,由关系模型得到与其最关联程度最高的几个卷烟品规,选取该品规放置在其左右两侧。In one embodiment, the arrangement rule includes a random arrangement manner, a manner of arranging the data samples based on product specification information or correlation information. Among them, the random placement method refers to completely randomly selecting different types of cigarette samples, and randomly selecting different positions for placement. When placing, it is necessary to pay attention to the position and size not exceeding the limit. Placement based on product specification information means randomly selecting a cigarette product specification in each row, inputting the relationship model and selecting all the cigarette product specifications of the manufacturer for placement, ensuring that the cigarette product specifications in each row are the same. Placement based on correlation information means that for a certain type of cigarette sample, several cigarette specifications with the highest degree of correlation are obtained from the relationship model, and the specifications are selected and placed on the left and right sides.

在通过上述摆放方式对增强后的样本进行摆放后,可以将其嵌入到背景场景样本中,得到第一样本。具体地,对于摆放后的样本,可以将其嵌入到背景场景样本中的一个样本中或者说嵌入到背景场景样本的零售柜中。由此,摆放时的限制可以基于背景场景样本确定。通过根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列,可以模拟真实零售场景下的多样的卷烟位置摆放方式,使得形成的第一样本更接近实际场景,质量更高。After the enhanced samples are placed in the above-mentioned placement manner, they can be embedded in the background scene samples to obtain the first sample. Specifically, for the placed sample, it can be embedded in one of the background scene samples or embedded in the retail cabinet of the background scene samples. Thus, the constraints on placement can be determined based on the background scene samples. By arranging the enhanced data samples according to the arrangement rules constructed by the sample relationship model, it is possible to simulate various cigarette placement methods in a real retail scene, so that the first sample formed is closer to the actual scene and has higher quality .

步骤S105:根据对抗式网络对第一样本进行风格迁移,得到虚拟样本。Step S105: Perform style transfer on the first sample according to the adversarial network to obtain a virtual sample.

在采集图像时,每张图像拍摄时的光照、视角等都各不相同,所以每张图像的图像风格也不尽相同,在生成虚拟样本的过程中,不同卷烟品规图像的风格可能并不统一,这会导致生成后的虚拟样本与真实的采集图像存在较大的差异,同时由于直接将增强后卷烟图像嵌入背景图片中时,会导致卷烟图像的边缘和原背景图象间存在较大差异,边缘不够平滑,这也会影响生成虚拟样本的质量。为了解决这一问题,利用对抗式网络(CycleGAN)进行图像的风格迁移,让图像保持相对一致的风格。When collecting images, the lighting and viewing angle of each image are different, so the image style of each image is also different. In the process of generating virtual samples, the styles of different cigarette quality images may not be the same. Unification, this will cause the generated virtual sample to be quite different from the real collected image. At the same time, when the enhanced cigarette image is directly embedded in the background image, there will be a large difference between the edge of the cigarette image and the original background image. Differences, the edges are not smooth enough, which also affects the quality of the generated virtual samples. In order to solve this problem, an adversarial network (CycleGAN) is used to transfer the style of the image, so that the image maintains a relatively consistent style.

在采用该对抗式网络进行风格迁移之前,首先利用真实卷烟陈列图像和人工生成的贴图图像训练CycleGAN网络,得到训练后的对抗式网络。该网络的损失函数如下所示:Before adopting the adversarial network for style transfer, the CycleGAN network was first trained with real cigarette display images and artificially generated texture images, and the trained adversarial network was obtained. The loss function of this network is as follows:

Figure BDA0003360245090000101
Figure BDA0003360245090000101

其中,

Figure BDA0003360245090000102
表示对抗性损失,通过生成器G输入域X内的样本得到伪造样本G(X),目标是使得G(X)与域Y内的样本y尽可能相似,判别器DY的目标是尽可能的区分出y与G(X)。
Figure BDA0003360245090000103
表示循环一致性损失,用来约束生成器得到的样本G(X)在内容上能与原来的样本x保持一致。in,
Figure BDA0003360245090000102
Represents an adversarial loss, and obtains a fake sample G(X) through the sample in the input domain X of the generator G. The goal is to make G(X) as similar as possible to the sample y in the domain Y. The goal of the discriminator D Y is to be as similar as possible. distinguishes y from G(X).
Figure BDA0003360245090000103
Represents the cycle consistency loss, which is used to constrain the sample G(X) obtained by the generator to be consistent with the original sample x in content.

由此,在风格迁移时,利用训练好的CycleGAN网络,将虚拟样本中每个卷烟小盒外包装样本及其周边上下文区域进行剪裁,送入CycleGAN网络中,网络输出得到风格迁移后的图像,该图像使得卷烟小盒外包装样本与背景图像整体风格保持了一致。对于进行风格迁移后的数据样本图像,可以将其再嵌入到第一样本的背景场景样本中,从而得到高质量的虚拟样本。如图6所示,为采用该虚拟样本生成方法生成的虚拟样本结果。Therefore, during the style transfer, the trained CycleGAN network is used to cut each cigarette pack outer sample and its surrounding context area in the virtual sample, and send it to the CycleGAN network, and the network outputs the image after style transfer. This image aligns the overall style of the sample cigarette pack with the background image. For the data sample image after style transfer, it can be re-embedded into the background scene sample of the first sample, so as to obtain a high-quality virtual sample. As shown in FIG. 6 , it is the virtual sample result generated by the virtual sample generation method.

本发明实施例提供的虚拟样本生成方法,在进行高质量虚拟样本生成的过程中,通过引入关系建模,使得生成虚拟样本中样本排列更加符合实际应用场景,同时利用生成对抗式网络对数据样本进行风格迁移,从而使最终生成的样本风格具有一致性,且使得样本图像能够平滑的嵌入背景中,使图像更加逼真。In the method for generating virtual samples provided by the embodiments of the present invention, in the process of generating high-quality virtual samples, by introducing relational modeling, the arrangement of samples in the generated virtual samples is more in line with the actual application scenario, and the generative adversarial network is used to analyze the data samples. Perform style transfer, so that the final generated sample style is consistent, and the sample image can be smoothly embedded in the background, making the image more realistic.

在一实施方式中,该虚拟样本生成方法可以应用于卷烟小盒外包装图像的虚拟样本生成过程中;如图7所示,该生成过程采用如下流程实现:利用少量稀有规格的卷烟小盒外包装图像作为虚拟样本生成的基础,根据零售卷烟图像具有的排列规律和场景固定的特点,使用收集到的稀少卷烟类别的图片,通过贴图手段与图像的数据增强方法,生成增强后的训练图片,加入到数据集中训练模型,从而提高对于图像稀少的卷烟品规的识别能力,为了保证人工生成的虚拟样本有较高的质量,构建卷烟类别的关系模型,对在销的不同卷烟种类进行建模,分析卷烟陈列时卷烟品种间的关系,并在生成虚拟样本时利用这种规则进行卷烟排列的搭配。为了消除贴图时产生的风格不一致与图像边缘的差异,利用CycleGAN算法学习风格迁移的变换方式,经过风格变换模块,生成更加贴合实际的卷烟外包装样本图像。In one embodiment, the virtual sample generation method can be applied to the virtual sample generation process of the outer packaging image of the cigarette pack; as shown in FIG. The packaging image is used as the basis for the generation of virtual samples. According to the arrangement rules of retail cigarette images and the characteristics of fixed scenes, the collected images of rare cigarette categories are used to generate enhanced training images through mapping methods and image data enhancement methods. Add to the data set to train the model to improve the ability to identify cigarette specifications with few images. In order to ensure that the artificially generated virtual samples have high quality, a relationship model of cigarette categories is constructed to model different cigarette types on sale. , to analyze the relationship between cigarette varieties when cigarettes are displayed, and to use this rule to match cigarette arrangements when generating virtual samples. In order to eliminate the style inconsistency and the difference between the image edges during mapping, the CycleGAN algorithm is used to learn the transformation method of style transfer, and through the style transformation module, a more realistic cigarette packaging sample image is generated.

本发明实施例提供的虚拟样本生成方法,针对零售柜台采集图像采集中提供的某些卷烟小盒外包装样本数量少,训练不充分的问题,通过使用收集的稀少类别卷烟的小盒图片,使用贴图的手段人工生成卷烟小盒外包装虚拟样本。具体利用零售卷烟小盒图像有排列规律、场景较为固定的特点,通过少量稀少类别的卷烟小盒外包装图像样本,利用多种数据增强手段,人工生成大量高质量的虚拟样本。与此同时,为了生成虚拟样本更加真实,利用不同卷烟类别之间的关联性,生成更加有针对性的卷烟位置摆放策略,再利用CycleGAN网络统一每个虚拟样本中卷烟小盒外包装的背景图像风格,平滑嵌入的卷烟图像边界使生成的样本更加逼真,更符合真实零售场景下的卷烟小盒外包装图像。由此,该方法可以为基于数据驱动的深度学习方法如卷烟小盒的图像识别、目标检测提供一种高质量扩充小盒外包装训练样本的方法。The virtual sample generation method provided by the embodiment of the present invention aims at the problem that the number of samples of some cigarette packs provided in the image collection of the retail counter is small and the training is insufficient. The method of mapping is to artificially generate virtual samples of cigarette packs. Specifically, the images of retail cigarette packs are arranged regularly and the scene is relatively fixed. A large number of high-quality virtual samples are artificially generated by a small number of rare categories of cigarette pack outer packaging image samples and a variety of data enhancement methods. At the same time, in order to generate a more realistic virtual sample, use the correlation between different cigarette categories to generate a more targeted cigarette position placement strategy, and then use the CycleGAN network to unify the background of the cigarette pack in each virtual sample. Image style, the smooth embedded cigarette image boundary makes the generated samples more realistic and more in line with the cigarette pack outer packaging images in real retail scenarios. Therefore, the method can provide a high-quality method for expanding the training samples of the outer packaging of the small box for data-driven deep learning methods such as image recognition and target detection of cigarette packets.

本发明实施例还提供一种虚拟样本生成装置,如图8所示,该装置包括:An embodiment of the present invention further provides a virtual sample generation device, as shown in FIG. 8 , the device includes:

样本获取模块,用于获取背景场景样本和待检测的数据样本;具体内容参见上述方法实施例对应部分,在此不再赘述。The sample acquisition module is used to acquire background scene samples and data samples to be detected; for details, refer to the corresponding parts of the above method embodiments, which will not be repeated here.

关系模型构建模块,用于根据数据样本的品规信息构建样本关系模型;具体内容参见上述方法实施例对应部分,在此不再赘述。The relational model building module is used for constructing a sample relational model according to the quality regulation information of the data sample; for details, refer to the corresponding part of the above method embodiment, which will not be repeated here.

增强模块,用于将待检测的数据样本进行数据增强,得到增强后的数据样本;具体内容参见上述方法实施例对应部分,在此不再赘述。The enhancement module is used to perform data enhancement on the data samples to be detected to obtain the enhanced data samples; for details, refer to the corresponding parts of the above method embodiments, which will not be repeated here.

排列模块,用于根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列后嵌入所述背景场景样本,生成第一样本;具体内容参见上述方法实施例对应部分,在此不再赘述。an arrangement module, configured to arrange the enhanced data samples according to the arrangement rules constructed by the sample relationship model, and then embed the background scene samples to generate a first sample; for details, refer to the corresponding part of the above method embodiment, which is not described here. Repeat.

风格迁移模块,用于根据对抗式网络对第一样本进行风格迁移,得到虚拟样本;具体内容参见上述方法实施例对应部分,在此不再赘述。The style transfer module is used for performing style transfer on the first sample according to the adversarial network to obtain a virtual sample; for details, refer to the corresponding part of the above method embodiment, which will not be repeated here.

本发明实施例提供的虚拟样本生成装置,在进行高质量虚拟样本生成的过程中,通过引入关系建模,使得生成虚拟样本中样本排列更加符合实际应用场景,同时利用生成对抗式网络对数据样本进行风格迁移,从而使最终生成的样本风格具有一致性,且使得样本图像能够平滑的嵌入背景中,使图像更加逼真。In the virtual sample generation device provided by the embodiment of the present invention, in the process of generating high-quality virtual samples, by introducing relational modeling, the arrangement of samples in the generated virtual samples is more in line with the actual application scenario, and the generative adversarial network is used to analyze the data samples. Perform style transfer, so that the final generated sample style is consistent, and the sample image can be smoothly embedded in the background, making the image more realistic.

本发明实施例提供的虚拟样本生成装置的功能描述详细参见上述实施例中虚拟样本生成方法描述。For a functional description of the virtual sample generating apparatus provided by the embodiment of the present invention, refer to the description of the virtual sample generating method in the foregoing embodiment for details.

本发明实施例还提供一种存储介质,如图9所示,其上存储有计算机程序601,该指令被处理器执行时实现上述实施例中虚拟样本生成方法的步骤。该存储介质上还存储有音视频流数据,特征帧数据、交互请求信令、加密数据以及预设数据大小等。其中,存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)、随机存储记忆体(RandomAccess Memory,RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,缩写:HDD)或固态硬盘(Solid-State Drive,SSD)等;所述存储介质还可以包括上述种类的存储器的组合。An embodiment of the present invention further provides a storage medium, as shown in FIG. 9 , on which a computer program 601 is stored, and when the instruction is executed by a processor, implements the steps of the virtual sample generation method in the foregoing embodiment. The storage medium also stores audio and video stream data, feature frame data, interaction request signaling, encrypted data, preset data size, and the like. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive, Abbreviation: HDD) or Solid-State Drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memories.

本领域技术人员可以理解,实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)、随机存储记忆体(RandomAccessMemory,RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,缩写:HDD)或固态硬盘(Solid-State Drive,SSD)等;所述存储介质还可以包括上述种类的存储器的组合。Those skilled in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium. During execution, the processes of the embodiments of the above-mentioned methods may be included. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive) , abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memories.

本发明实施例还提供了一种电子设备,如图10所示,该电子设备可以包括处理器51和存储器52,其中处理器51和存储器52可以通过总线或者其他方式连接,图10中以通过总线连接为例。An embodiment of the present invention further provides an electronic device. As shown in FIG. 10 , the electronic device may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or in other ways. Take bus connection as an example.

处理器51可以为中央处理器(Central Processing Unit,CPU)。处理器51还可以为其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等芯片,或者上述各类芯片的组合。The processor 51 may be a central processing unit (Central Processing Unit, CPU). The processor 51 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components and other chips, or a combination of the above types of chips.

存储器52作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态计算机可执行程序以及模块,如本发明实施例中的对应的程序指令/模块。处理器51通过运行存储在存储器52中的非暂态软件程序、指令以及模块,从而执行处理器的各种功能应用以及数据处理,即实现上述方法实施例中的虚拟样本生成方法。As a non-transitory computer-readable storage medium, the memory 52 can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as corresponding program instructions/modules in the embodiments of the present invention. The processor 51 executes various functional applications and data processing of the processor by running the non-transitory software programs, instructions and modules stored in the memory 52, ie, implements the virtual sample generation method in the above method embodiments.

存储器52可以包括存储程序区和存储数据区,其中,存储程序区可存储操作装置、至少一个功能所需要的应用程序;存储数据区可存储处理器51所创建的数据等。此外,存储器52可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施例中,存储器52可选包括相对于处理器51远程设置的存储器,这些远程存储器可以通过网络连接至处理器51。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 52 may include a storage program area and a storage data area, wherein the storage program area may store an operating device, an application program required for at least one function; the storage data area may store data created by the processor 51 and the like. Additionally, memory 52 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 52 may optionally include memory located remotely from processor 51 , which may be connected to processor 51 via a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

所述一个或者多个模块存储在所述存储器52中,当被所述处理器51执行时,执行如图1-7所示实施例中的虚拟样本生成方法。The one or more modules are stored in the memory 52, and when executed by the processor 51, execute the virtual sample generation method in the embodiments shown in FIGS. 1-7.

上述电子设备具体细节可以对应参阅图1至图7所示的实施例中对应的相关描述和效果进行理解,此处不再赘述。The specific details of the above electronic device can be understood by referring to the corresponding descriptions and effects in the embodiments shown in FIG. 1 to FIG. 7 , and details are not repeated here.

虽然结合附图描述了本发明的实施例,但是本领域技术人员可以在不脱离本发明的精神和范围的情况下做出各种修改和变型,这样的修改和变型均落入由所附权利要求所限定的范围之内。Although the embodiments of the present invention have been described with reference to the accompanying drawings, various modifications and variations can be made by those skilled in the art without departing from the spirit and scope of the present invention, and such modifications and variations fall within the scope of the appended claims within the limits of the requirements.

Claims (10)

1.一种虚拟样本生成方法,其特征在于,包括:1. a virtual sample generation method, is characterized in that, comprises: 获取背景场景样本和待检测的数据样本;Obtain background scene samples and data samples to be detected; 根据数据样本的信息构建样本关系模型;Build a sample relationship model based on the information of the data sample; 将待检测的数据样本进行数据增强,得到增强后的数据样本;Perform data enhancement on the data samples to be detected to obtain enhanced data samples; 根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列后嵌入所述背景场景样本,生成第一样本;Arrange the enhanced data samples according to the arrangement rule constructed by the sample relationship model, and then embed the background scene samples to generate a first sample; 根据对抗式网络对第一样本进行风格迁移,得到虚拟样本。Perform style transfer on the first sample according to the adversarial network to obtain a virtual sample. 2.根据权利要求1所述的虚拟样本生成方法,其特征在于,数据增强的方式包括:尺度随机抖动、基于HSV空间的数据增强、图像锐度增强以及随机翻转与旋转;2. The virtual sample generation method according to claim 1, wherein the data enhancement method comprises: random scale jitter, data enhancement based on HSV space, image sharpness enhancement, and random flip and rotation; 将待检测的数据样本进行数据增强,包括:Perform data enhancement on the data samples to be detected, including: 对每种数据扩增方式根据预设概率随机对待检测的数据样本进行数据增强。Data augmentation is performed randomly on the data samples to be detected for each data augmentation method according to a preset probability. 3.根据权利要求1所述的虚拟样本生成方法,其特征在于,根据数据样本的信息构建样本关系模型,包括:3. virtual sample generation method according to claim 1, is characterized in that, builds sample relation model according to the information of data sample, comprises: 获取数据样本的销售信息;Obtain sales information for data samples; 根据所述销售信息确定数据样本的品规信息;Determine the product specification information of the data sample according to the sales information; 基于数据挖掘算法,根据所述销售信息和品规信息计算每个数据样本品牌销售量与其他品牌销售量的关联性,得到相关性大于阈值的数据品规;Based on the data mining algorithm, the correlation between the sales volume of each data sample brand and the sales volume of other brands is calculated according to the sales information and the product regulation information, and the data product regulation with the correlation greater than the threshold is obtained; 根据数据样本的品规信息和相关性大于阈值的数据品规构建样本关系模型。The sample relationship model is constructed according to the quality information of the data samples and the data quality whose correlation is greater than the threshold. 4.根据权利要求1所述的虚拟样本生成方法,其特征在于,根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列后嵌入所述背景场景样本,生成第一样本,包括:4. The method for generating virtual samples according to claim 1, characterized in that, after arranging the enhanced data samples according to the arrangement rule constructed by the sample relation model, the background scene samples are embedded to generate the first sample, include: 根据随机摆放方式、品规信息或者相关性信息对数据样本进行排列,得到排列后的数据样本;Arrange the data samples according to the random placement method, product regulation information or correlation information, and obtain the arranged data samples; 将排列后的数据样本嵌入所述背景场景样本,生成第一样本。Embed the arranged data samples into the background scene samples to generate a first sample. 5.根据权利要求1所述的虚拟样本生成方法,其特征在于,所述对抗式网络的损失函数通过以下公式表示:5. The virtual sample generation method according to claim 1, wherein the loss function of the adversarial network is represented by the following formula:
Figure FDA0003360245080000021
Figure FDA0003360245080000021
其中,
Figure FDA0003360245080000022
表示对抗性损失,G表示生成器,DY表示判别器,
Figure FDA0003360245080000023
表示循环一致性损失。
in,
Figure FDA0003360245080000022
represents the adversarial loss, G represents the generator, D Y represents the discriminator,
Figure FDA0003360245080000023
represents the cycle consistency loss.
6.根据权利要求1所述的虚拟样本生成方法,其特征在于,根据对抗式网络对第一样本进行风格迁移,得到虚拟样本,包括:6. The virtual sample generation method according to claim 1, wherein the style transfer is performed on the first sample according to an adversarial network to obtain a virtual sample, comprising: 将第一样本中的每个数据样本图像剪切后输入至对抗式网络进行风格迁移,得到迁移后的数据样本图像;Cut each data sample image in the first sample and input it to the adversarial network for style transfer to obtain a migrated data sample image; 将迁移后的数据样本图像嵌入会第一样本中,得到虚拟样本。Embed the migrated data sample image into the first sample to obtain a virtual sample. 7.根据权利要求1所述的虚拟样本生成方法,其特征在于,待检测的数据样本为卷烟小盒外包装样本。7 . The virtual sample generation method according to claim 1 , wherein the data sample to be detected is a sample of the outer packaging of a cigarette pack. 8 . 8.一种虚拟样本生成装置,其特征在于,包括:8. A device for generating virtual samples, comprising: 样本获取模块,用于获取背景场景样本和待检测的数据样本;The sample acquisition module is used to acquire background scene samples and data samples to be detected; 关系模型构建模块,用于根据数据样本的信息构建样本关系模型;The relational model building module is used to construct the sample relational model according to the information of the data sample; 增强模块,用于将待检测的数据样本进行数据增强,得到增强后的数据样本;The enhancement module is used to perform data enhancement on the data samples to be detected to obtain the enhanced data samples; 排列模块,用于根据所述样本关系模型构建的排列规则对增强后的数据样本进行排列后嵌入所述背景场景样本,生成第一样本;an arrangement module, configured to arrange the enhanced data samples according to the arrangement rules constructed by the sample relationship model, and then embed the background scene samples to generate a first sample; 风格迁移模块,用于根据对抗式网络对第一样本进行风格迁移,得到虚拟样本。The style transfer module is used for performing style transfer on the first sample according to the adversarial network to obtain a virtual sample. 9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使所述计算机执行如权利要求1-7任一项所述的虚拟样本生成方法。9 . A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the virtual machine according to any one of claims 1 to 7 . Sample generation method. 10.一种电子设备,其特征在于,包括:存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行如权利要求1-7任一项所述的虚拟样本生成方法。10. An electronic device, comprising: a memory and a processor, wherein the memory and the processor are connected in communication with each other, the memory stores computer instructions, and the processor executes the computer instructions by executing the computer instructions. , so as to execute the virtual sample generation method according to any one of claims 1-7.
CN202111365555.2A 2021-11-17 2021-11-17 Virtual sample generation method and device, storage medium and electronic equipment Pending CN114049536A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111365555.2A CN114049536A (en) 2021-11-17 2021-11-17 Virtual sample generation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111365555.2A CN114049536A (en) 2021-11-17 2021-11-17 Virtual sample generation method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114049536A true CN114049536A (en) 2022-02-15

Family

ID=80209949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111365555.2A Pending CN114049536A (en) 2021-11-17 2021-11-17 Virtual sample generation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114049536A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782642A (en) * 2022-04-08 2022-07-22 珠海金山数字网络科技有限公司 Virtual model placing method and device
CN114882207A (en) * 2022-06-21 2022-08-09 上海商汤临港智能科技有限公司 Image generation method, model training method, detection method, device and system
CN114882937A (en) * 2022-04-30 2022-08-09 苏州浪潮智能科技有限公司 Solid state disk durability testing method, sample amount calculating method and device
CN115205432A (en) * 2022-09-03 2022-10-18 深圳爱莫科技有限公司 Simulation method and model for automatic generation of cigarette terminal display sample image
CN115601631A (en) * 2022-12-15 2023-01-13 深圳爱莫科技有限公司(Cn) Cigarette display image recognition method, model, equipment and storage medium
CN116341561A (en) * 2023-03-27 2023-06-27 京东科技信息技术有限公司 A voice sample data generation method, device, equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782642A (en) * 2022-04-08 2022-07-22 珠海金山数字网络科技有限公司 Virtual model placing method and device
CN114882937A (en) * 2022-04-30 2022-08-09 苏州浪潮智能科技有限公司 Solid state disk durability testing method, sample amount calculating method and device
CN114882207A (en) * 2022-06-21 2022-08-09 上海商汤临港智能科技有限公司 Image generation method, model training method, detection method, device and system
CN115205432A (en) * 2022-09-03 2022-10-18 深圳爱莫科技有限公司 Simulation method and model for automatic generation of cigarette terminal display sample image
CN115205432B (en) * 2022-09-03 2022-11-29 深圳爱莫科技有限公司 Simulation method and model for automatic generation of cigarette terminal display sample image
CN115601631A (en) * 2022-12-15 2023-01-13 深圳爱莫科技有限公司(Cn) Cigarette display image recognition method, model, equipment and storage medium
CN116341561A (en) * 2023-03-27 2023-06-27 京东科技信息技术有限公司 A voice sample data generation method, device, equipment and storage medium
CN116341561B (en) * 2023-03-27 2024-02-02 京东科技信息技术有限公司 Voice sample data generation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN114049536A (en) Virtual sample generation method and device, storage medium and electronic equipment
JP7490141B2 (en) IMAGE DETECTION METHOD, MODEL TRAINING METHOD, IMAGE DETECTION APPARATUS, TRAINING APPARATUS, DEVICE, AND PROGRAM
CN109472365B (en) Systems and methods for refining synthetic data using auxiliary inputs via generative adversarial networks
WO2020165557A1 (en) 3d face reconstruction system and method
CN110992238B (en) Digital image tampering blind detection method based on dual-channel network
CN111860380B (en) Face image generation method, device, server and storage medium
CN114511041B (en) Model training method, image processing method, apparatus, equipment and storage medium
CN110717962B (en) Dynamic photo generation method, device, photographing equipment and storage medium
US20210166073A1 (en) Image generation method and computing device
CN117094986A (en) Self-adaptive defect detection method based on small sample and terminal equipment
CN109377552B (en) Image occlusion calculating method, device, calculating equipment and storage medium
CN116074087B (en) Encrypted traffic classification method based on network traffic context representation, electronic equipment and storage medium
WO2019127940A1 (en) Video classification model training method, device, storage medium, and electronic device
CN112991208B (en) Image processing method and device, computer readable medium and electronic equipment
CN114511702A (en) Remote sensing image segmentation method and system based on multi-scale weighted attention
CN114511441A (en) Model training, image stylization method, device, electronic device and storage medium
CN111814811B (en) Image information extraction method, training method and device, medium and electronic equipment
Wu et al. FlagDetSeg: Multi-nation flag detection and segmentation in the wild
CN108230227A (en) A kind of recognition methods of distorted image, device and electronic equipment
CN114821128B (en) Scale-adaptive template matching method
KR102572415B1 (en) Method and apparatus for creating a natural three-dimensional digital twin through verification of a reference image
US11954875B2 (en) Method for determining height of plant, electronic device, and storage medium
CN112950641B (en) Image processing method and device, computer readable storage medium and electronic equipment
CN115937372A (en) Facial expression simulation method, device, equipment and storage medium
CN114882510A (en) Corrugated carton trademark robust detection method with attack prevention function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination