WO2024022149A1 - Data enhancement method and apparatus, and electronic device - Google Patents

Data enhancement method and apparatus, and electronic device Download PDF

Info

Publication number
WO2024022149A1
WO2024022149A1 PCT/CN2023/107709 CN2023107709W WO2024022149A1 WO 2024022149 A1 WO2024022149 A1 WO 2024022149A1 CN 2023107709 W CN2023107709 W CN 2023107709W WO 2024022149 A1 WO2024022149 A1 WO 2024022149A1
Authority
WO
WIPO (PCT)
Prior art keywords
original image
image
category
target detection
categories
Prior art date
Application number
PCT/CN2023/107709
Other languages
French (fr)
Chinese (zh)
Inventor
吕永春
朱徽
周迅溢
蒋宁
吴海英
Original Assignee
马上消费金融股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 马上消费金融股份有限公司 filed Critical 马上消费金融股份有限公司
Publication of WO2024022149A1 publication Critical patent/WO2024022149A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

Provided in the embodiments of the present application are a data enhancement method and apparatus, and an electronic device. The data enhancement method comprises: acquiring an original image and a background image set, wherein one original image corresponds to one background image set; performing target detection on the original image according to a target detection network, so as to obtain a first detection frame of the original image; and fusing an area corresponding to the first detection frame with at least one background image in the background image set corresponding to the original image, so as to obtain an enhanced image for the original image. In this way, the data enhancement effect can be improved.

Description

数据增强方法、装置及电子设备Data enhancement method, device and electronic equipment
本申请要求于2022年07月29日提交的申请号为202210904226.9、名称为“数据增强方法、装置及电子设备”的中国专利申请的优先权,上述申请的内容通过引用并入本文。This application claims priority to the Chinese patent application with application number 202210904226.9 and titled "Data Enhancement Method, Device and Electronic Equipment" submitted on July 29, 2022. The content of the above application is incorporated herein by reference.
技术领域Technical field
本申请涉及图像处理技术领域,尤其涉及到一种数据增强方法、装置及电子设备。The present application relates to the field of image processing technology, and in particular to a data enhancement method, device and electronic equipment.
背景技术Background technique
近年来,深度学习在图像处理、计算机视觉等领域得到了广泛的应用。然而,随着神经网络深度的增加,大规模深度神经网络的过拟合现象越发严重,进而会导致性能下降问题。过拟合问题的一个重要起因是训练集数据量不足,为扩充可用的训练集数据,多种适用于图像类型数据的数据增强技术被广泛提出。目前,常用的图像增强方案,是通过对图像进行翻转、剪裁、平移以及颜色变换等,得到新的图像,从而达到扩展图像数据的目的。In recent years, deep learning has been widely used in image processing, computer vision and other fields. However, as the depth of neural networks increases, the overfitting phenomenon of large-scale deep neural networks becomes more and more serious, which will lead to performance degradation. An important cause of the overfitting problem is the insufficient amount of training set data. In order to expand the available training set data, a variety of data enhancement techniques suitable for image type data have been widely proposed. Currently, commonly used image enhancement solutions are to obtain new images by flipping, cropping, translating, and color changing images, thereby achieving the purpose of expanding image data.
发明内容Contents of the invention
本申请实施例提供一种数据增强方法、装置及电子设备。Embodiments of the present application provide a data enhancement method, device and electronic equipment.
一方面,本申请实施例提供了一种数据增强方法,包括:On the one hand, embodiments of the present application provide a data enhancement method, including:
获取原始图像以及背景图像集,一张原始图像对应一个背景图像集;Obtain the original image and background image set, one original image corresponds to one background image set;
根据目标检测网络,对所述原始图像进行目标检测,获得所述原始图像的第一检测框;According to the target detection network, perform target detection on the original image to obtain the first detection frame of the original image;
对所述第一检测框对应的区域以及所述原始图像对应的背景图像集中至少一张背景图像进行融合,得到所述原始图像的增强图像。The area corresponding to the first detection frame and at least one background image in the background image set corresponding to the original image are fused to obtain an enhanced image of the original image.
可以看出,在本申请实施例的数据增强方法中,可通过对原始图像进行目标检测来获得其中的第一检测框,并利用原始图像中第一检测框对应的区域与该原始图像对应的背景图像集中至少一张背景图像进行融合,得到原始图像对应的至少一张增强图像,实现对原始图像的数据增强。It can be seen that in the data enhancement method of the embodiment of the present application, the first detection frame can be obtained by performing target detection on the original image, and the area corresponding to the first detection frame in the original image and the area corresponding to the original image can be used At least one background image in the background image set is fused to obtain at least one enhanced image corresponding to the original image, thereby realizing data enhancement of the original image.
一方面,本申请实施例提供了一种数据增强装置,包括:On the one hand, embodiments of the present application provide a data enhancement device, including:
获取模块,用于获取原始图像以及背景图像集,一张原始图像对应一个背景图像集;The acquisition module is used to obtain the original image and the background image set. One original image corresponds to one background image set;
目标检测模块,用于根据目标检测网络,对所述原始图像进行目标检测,获得所述原始图像的第一检测框;A target detection module, configured to perform target detection on the original image according to the target detection network, and obtain the first detection frame of the original image;
融合模块,用于对所述第一检测框对应的区域以及所述原始图像对应的背景图像集中至少一张背景图像进行融合,得到所述原始图像的增强图像。A fusion module configured to fuse the area corresponding to the first detection frame and at least one background image in the background image set corresponding to the original image to obtain an enhanced image of the original image.
一方面,本申请实施例还提供一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述数据增强方法中的步骤。On the one hand, embodiments of the present application also provide an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the computer program Implement the steps in the above data augmentation method.
一方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述数据增强方法中的步骤。 On the one hand, embodiments of the present application also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the steps in the above data enhancement method are implemented.
一方面,本公开提供一种计算机程序,所述计算机程序被处理器执行时实现上述数据增强方法中的步骤。In one aspect, the present disclosure provides a computer program that, when executed by a processor, implements the steps in the above data enhancement method.
一方面,本公开提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现上述数据增强方法中的步骤。In one aspect, the present disclosure provides a computer program product, including a computer program that implements the steps in the above data enhancement method when executed by a processor.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort.
图1是本申请实施例提供的一种数据增强方法的流程图;Figure 1 is a flow chart of a data enhancement method provided by an embodiment of the present application;
图2是本申请实施例提供的一种数据增强方法的流程图;Figure 2 is a flow chart of a data enhancement method provided by an embodiment of the present application;
图3是本申请实施例提供的一种卷积神经网络训练原理图;Figure 3 is a schematic diagram of a convolutional neural network training provided by an embodiment of the present application;
图4是本申请实施例提供的一种数据增强方法的原理图;Figure 4 is a schematic diagram of a data enhancement method provided by an embodiment of the present application;
图5是本申请实施例提供的一种数据增强方法的应用场景图;Figure 5 is an application scenario diagram of a data enhancement method provided by an embodiment of the present application;
图6是本申请实施例提供的一种数据增强装置的结构示意图;Figure 6 is a schematic structural diagram of a data enhancement device provided by an embodiment of the present application;
图7是本申请实施例提供的一种数据增强装置的结构示意图;Figure 7 is a schematic structural diagram of a data enhancement device provided by an embodiment of the present application;
图8是本申请实施例提供的一种电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
在通过图像数据进行网络训练过程中,用于训练的图像数据也称为图像训练样本,图像数据可直接影响网络训练的好坏,而过拟合现象是网络训练过程中常见的问题,过拟合现象的一个重要起因是图像训练样本量不足。图像数据增强是增加样本量的一个重要方式,然而目前通过对原始图像进行翻转、剪裁、平移以及颜色变换等增强方式得到的增强图像数据,相对于原始图像,提供的额外信息有限,图像数据增强的效果较差,即使将得到的增强图像与原始图像用于网络训练,由于增强图像相对于原始图像的改动较小,能提供的额外信息较少,仍然容易导致过拟合现象。基于此,本申请实施例提供一种数据增强方法,通过对原始图像进行目标检测得到的第一检测框与额外的背景图像进行融合,得到增强图像,这样,增强图像相对于原始图像,改动较大,得到的增强图像可提供较多的额外信息,提高图像增强效果,后续通过原始图像以及相对于原始图像改动较大且能提供较多额外信息的增强图像进行网络训练,可减少过拟合情况的出现。In the process of network training through image data, the image data used for training is also called image training samples. Image data can directly affect the quality of network training, and over-fitting is a common problem in the network training process. Over-fitting An important cause of the coalescence phenomenon is the insufficient number of image training samples. Image data enhancement is an important way to increase the sample size. However, currently, the enhanced image data obtained through enhancement methods such as flipping, cropping, translation, and color transformation of the original image provides limited additional information compared to the original image. Image data enhancement The effect is poor. Even if the obtained enhanced image and the original image are used for network training, since the enhanced image has smaller changes compared to the original image and can provide less additional information, it is still easy to lead to overfitting. Based on this, embodiments of the present application provide a data enhancement method, which fuses the first detection frame obtained by target detection on the original image with an additional background image to obtain an enhanced image. In this way, the enhanced image is less modified than the original image. Large, the obtained enhanced image can provide more additional information and improve the image enhancement effect. Subsequent network training is carried out through the original image and the enhanced image that has changed greatly relative to the original image and can provide more additional information, which can reduce overfitting. the situation arises.
需要说明的是,该方法可应用于电子设备,由电子设备执行该方法,该电子设备可以是能够实现数据增强的任意设备,例如,可包括但不限于终端设备或服务器设备等。It should be noted that this method can be applied to an electronic device, and the method is executed by the electronic device. The electronic device can be any device that can implement data enhancement, for example, it can include but is not limited to terminal equipment or server equipment.
参见图1,图1是本申请实施例提供的一种数据增强方法的流程图,如图1所示,包括以 下步骤:Referring to Figure 1, Figure 1 is a flow chart of a data enhancement method provided by an embodiment of the present application. As shown in Figure 1, it includes: Next steps:
步骤101:获取原始图像以及背景图像集,一张原始图像对应一个背景图像集。Step 101: Obtain the original image and background image set. One original image corresponds to one background image set.
原始图像也可以称作待数据增强的图像,原始图像与背景图像集之间具有对应关系。此处对原始图像的数量不做具体限定,可以是一张或多张,对应地,背景图像集也可以有一个或多个。针对每个背景图像集,其可以包括一张或多张背景图像,本申请实施例并不限制。The original image can also be called the image to be data enhanced, and there is a corresponding relationship between the original image and the background image set. There is no specific limit on the number of original images here. It can be one or more. Correspondingly, the background image set can also have one or more. Each background image set may include one or more background images, which is not limited by the embodiments of this application.
步骤102:根据目标检测网络,对原始图像进行目标检测,获得原始图像的第一检测框。Step 102: Perform target detection on the original image according to the target detection network, and obtain the first detection frame of the original image.
在获取原始图像后,可以根据目标检测网络对原始图像进行目标检测,以获得原始图像的第一检测框。目标检测可以理解为是检测图像中目标的位置,目标检测的结果可用检测框来表示,检测框也可称为边界框,其一种实现方式是矩形框,图像中检测的目标位于对应的目标检测框内。需要说明的是,虽然本申请实施例是以矩形框为例,但并不以此为限定,本申请实施例的技术方案同样可应用于其他形状的目标检测框。After obtaining the original image, target detection can be performed on the original image according to the target detection network to obtain the first detection frame of the original image. Target detection can be understood as detecting the position of the target in the image. The result of target detection can be represented by a detection frame. The detection frame can also be called a bounding box. One of its implementation methods is a rectangular frame. The detected target in the image is located in the corresponding target. within the detection frame. It should be noted that although the embodiment of the present application takes a rectangular frame as an example, it is not limited thereto. The technical solution of the embodiment of the present application can also be applied to target detection frames of other shapes.
需要说明的是,如果有多张原始图像,那么需要对原始图像逐一进行目标检测,得到每张原始图像的第一检测框。当然,如果有多个目标检测网络,可以利用每个目标检测网络对原始图像进行目标检测,再基于检测结果得到原始图像对应的第一检测框。It should be noted that if there are multiple original images, then target detection needs to be performed on the original images one by one to obtain the first detection frame of each original image. Of course, if there are multiple target detection networks, each target detection network can be used to perform target detection on the original image, and then the first detection frame corresponding to the original image is obtained based on the detection results.
步骤103:对第一检测框对应的区域以及原始图像对应的背景图像集中至少一张背景图像进行融合,得到原始图像的增强图像。Step 103: Fusion of the area corresponding to the first detection frame and at least one background image in the background image set corresponding to the original image to obtain an enhanced image of the original image.
在得到原始图像的第一检测框后,可根据该原始图像的第一检测框,从该原始图像中提取第一检测框对应的区域,然后将提取的该原始图像中第一检测框对应的区域与该原始图像对应的背景图像集中至少一张背景图像进行融合,从而可得到该原始图像对应的增强图像,实现图像数据增强。如上所述,背景图像集中可能存在不止一张背景图像,在融合时,可以根据实际需求,将第一检测框对应的区域与每张背景图像进行融合,或者与背景图像集中的部分背景图像(一张或多张)进行融合,本申请实施例对此不做限定。After obtaining the first detection frame of the original image, the area corresponding to the first detection frame can be extracted from the original image according to the first detection frame of the original image, and then the extracted area corresponding to the first detection frame in the original image The region is fused with at least one background image in the background image set corresponding to the original image, so that an enhanced image corresponding to the original image can be obtained to achieve image data enhancement. As mentioned above, there may be more than one background image in the background image set. During fusion, the area corresponding to the first detection frame can be fused with each background image according to actual needs, or with some background images in the background image set ( One or more pictures) are fused, and the embodiments of the present application do not limit this.
原始图像中第一检测框对应的区域可以理解为是利用第一检测框的各顶点坐标在原始图像中所确定的区域,以第一检测框为矩形框为例,第一检测框对应的区域可以是第一检测框的四个顶点坐标在原始图像中所圈定的区域。The area corresponding to the first detection frame in the original image can be understood as the area determined in the original image using the vertex coordinates of the first detection frame. Taking the first detection frame as a rectangular frame as an example, the area corresponding to the first detection frame It may be an area enclosed by the four vertex coordinates of the first detection frame in the original image.
需要说明的是,由于背景图像集中可以有至少一张背景图像,因此,可以得到原始图像对应的至少一张增强图像,上述至少一张增强图像可称为增强图像集,后续可利用原始图像以及它对应的增强图像集,对深度神经网络进行训练,不但可增加训练样本量,而且由于本申请实施例得到的增强图像相对于原始图像,改动较大,得到的增强图像可提供较多的额外信息,从而在训练过程中可减少过拟合情况的出现。It should be noted that since there can be at least one background image in the background image set, at least one enhanced image corresponding to the original image can be obtained. The above-mentioned at least one enhanced image can be called an enhanced image set. The original image and the Its corresponding enhanced image set can not only increase the amount of training samples for training deep neural networks, but also because the enhanced images obtained by the embodiment of the present application have larger changes compared to the original images, the obtained enhanced images can provide more additional information, thereby reducing the occurrence of overfitting during the training process.
在本实施例的数据增强方法中,可通过对原始图像进行目标检测来获得其中的第一检测框,并利用原始图像中第一检测框对应的区域与该原始图像对应的背景图像集中至少一张背景图像进行融合,得到原始图像对应的至少一张增强图像,实现对原始图像的数据增强。与相关技术中所描述的对原始图像进行翻转等处理不同,在本申请实施例中,不再是对整张原始图像进行处理,而是获取原始图像的第一检测框,再利用第一检测框与背景图像的融合得到增强图像,从而可以得到相对于原始图像改动较大的增强图像,以这种方式获得的增强图像可提供较多的额外信息,因而可提高图像增强效果。In the data enhancement method of this embodiment, the first detection frame can be obtained by performing target detection on the original image, and using at least one of the areas corresponding to the first detection frame in the original image and the background image corresponding to the original image. The background images are fused to obtain at least one enhanced image corresponding to the original image, thereby realizing data enhancement of the original image. Different from the processing such as flipping the original image described in the related art, in the embodiment of the present application, the entire original image is no longer processed, but the first detection frame of the original image is obtained, and then the first detection frame is obtained. The fusion of the frame and the background image results in an enhanced image, which results in an enhanced image that is significantly changed relative to the original image. The enhanced image obtained in this way can provide more additional information, thus improving the image enhancement effect.
以上参照图1对本申请提供的数据增强方法进行了说明。在实际中,为了提高原始图像的第一检测框的准确性,可以采用不止一个目标检测网络来对原始图像进行目标检测,从而基 于检测结果来综合得出原始图像的第一检测框。下面结合图2,对以上过程进行详细说明。需要说明的是,以下实施例中,原始图像可以不止一张,但可以理解的是,针对每张原始图像的处理过程都是类似的。The data enhancement method provided by this application has been described above with reference to Figure 1 . In practice, in order to improve the accuracy of the first detection frame of the original image, more than one target detection network can be used to perform target detection on the original image, thereby basically The first detection frame of the original image is obtained based on the detection results. The above process will be described in detail below in conjunction with Figure 2. It should be noted that in the following embodiments, there may be more than one original image, but it is understandable that the processing process for each original image is similar.
参见图2,图2是本申请实施例提供的一种数据增强方法的流程图,如图2所示,包括以下步骤:Refer to Figure 2, which is a flow chart of a data enhancement method provided by an embodiment of the present application. As shown in Figure 2, it includes the following steps:
步骤201:获取N张原始图像以及N个背景图像集,一张原始图像对应一个背景图像集;N为大于1或等于1的整数。Step 201: Obtain N original images and N background image sets. One original image corresponds to one background image set; N is an integer greater than or equal to 1.
原始图像也可以称作待数据增强的图像,N张原始图像与N个背景图像集一一对应。The original image can also be called the image to be data enhanced. N original images correspond to N background image sets one-to-one.
N个背景图像集中的每个背景图像集均包括至少一张背景图像,即每个背景图像集可包括一张或多张背景图像。N个背景图像集中,各背景图像集所包括的背景图像的数量可以相同,也可以不同,本申请实施例并不限制。Each background image set in the N background image sets includes at least one background image, that is, each background image set may include one or more background images. In the N background image sets, the number of background images included in each background image set may be the same or different, which is not limited by the embodiment of the present application.
步骤202:根据M个目标检测网络,对N张原始图像中每张原始图像进行目标检测,获得每张原始图像的M个目标检测框,M为大于1的整数。Step 202: Perform target detection on each of the N original images based on M target detection networks, and obtain M target detection frames for each original image, where M is an integer greater than 1.
与步骤101类似,目标检测可以理解为是检测图像中目标的位置,目标检测的结果可用检测框来表示,检测框也可称为边界框,其一种实现方式是矩形框,图像中检测的目标位于对应的目标检测框内。Similar to step 101, target detection can be understood as detecting the position of the target in the image. The result of target detection can be represented by a detection frame. The detection frame can also be called a bounding box. One implementation method is a rectangular frame. The detection frame in the image The target is located within the corresponding target detection box.
另外,M个目标检测网络可以是对初始检测网络进行迭代训练过程中,M个迭代轮次(round)对应得到的网络,且M个迭代轮次中最小的迭代轮次大于目标轮次,目标轮次可根据实际情况预先设定,也可以根据为模型训练设定的最大训练轮次设定,在本实施例中,对此不作具体限定。例如,目标轮次可以设置为位于最大训练轮次的一半到最大训练轮次之间的轮次,M个迭代轮次可以是对初始检测网络进行了目标轮次的训练后的M次迭代训练,M个迭代轮次中最大的迭代轮次小于或等于模型训练的最大训练轮次。In addition, the M target detection networks can be networks obtained corresponding to M iteration rounds (rounds) during the iterative training process of the initial detection network, and the smallest iteration round among the M iteration rounds is greater than the target round, and the target The rounds can be preset according to the actual situation, or can be set according to the maximum training rounds set for model training. In this embodiment, there is no specific limit to this. For example, the target round can be set to a round between half of the maximum training round and the maximum training round, and the M iteration rounds can be M iterations of training after the initial detection network has been trained for the target round. , the largest iteration round among the M iteration rounds is less than or equal to the maximum training round of model training.
例如,模型训练的最大训练轮次是40,可设定目标轮次为20,M可以取20,先对初始检测网络进行20次迭代训练,从第21次迭代训练开始,每次迭代训练完成后,记录得到的网络,直到第40次迭代训练完成,如此,可记录20个网络,即20个目标检测网络为记录的第21轮次直到第40轮次分别对应训练得到的网络。For example, the maximum number of training rounds for model training is 40, the target number can be set to 20, M can be 20, and the initial detection network is first trained for 20 iterations, starting from the 21st iteration training, and each iteration training is completed Then, record the obtained network until the 40th iteration of training is completed. In this way, 20 networks can be recorded, that is, the 20 target detection networks are the recorded networks obtained from the 21st round to the 40th round respectively.
步骤203:确定每张原始图像的第一检测框,原始图像的第一检测框为通过原始图像的M个目标检测框确定的检测框。Step 203: Determine the first detection frame of each original image. The first detection frame of the original image is the detection frame determined by the M target detection frames of the original image.
在得到每张原始图像的M个目标检测框后,可对每张原始图像的M个目标检测框进行处理得到每张原始图像的第一检测框。After obtaining the M target detection frames of each original image, the M target detection frames of each original image can be processed to obtain the first detection frame of each original image.
作为一种实现方式,每张原始图像的M个目标检测框(或称为每张原始图像对应的M个目标检测框)可以为矩形框。需要说明的是,虽然本申请实施例是以矩形框为例,但并不以此为限定,本申请实施例的技术方案同样可应用于其他形状的目标检测框。As an implementation method, the M target detection frames of each original image (or M target detection frames corresponding to each original image) can be rectangular frames. It should be noted that although the embodiment of the present application takes a rectangular frame as an example, it is not limited thereto. The technical solution of the embodiment of the present application can also be applied to target detection frames of other shapes.
针对每张原始图像的M个目标检测框的处理过程可以是类似的。下面以对N张原始图像中的一张原始图像的处理过程为例进行说明,可对该原始图像的M个目标检测框进行平均处理,得到该原始图像的第一检测框。例如,目标检测框可通过四个顶点坐标表示,可对该原始图像的M个目标检测框的四个顶点坐标进行平均,得到该原始图像的第一检测框,即第一检测框的四个顶点坐标为该原始图像的M个目标检测框的四个顶点坐标的平均值,例如,第一检测框的任一顶点坐标为该原始图像的M个目标检测框的该顶点坐标的平均值。需要说明的 是,任一顶点坐标中可包括两个分量坐标,平均是指对M个目标检测框中该顶点坐标中的相同分量坐标进行平均。比如,一个原始图像的M个目标检测框包括第一目标检测框和第二目标检测框,第一目标检测框的四个顶点坐标分别为J11(X11,Y11)、J12(X12,Y12)、J13(X13,Y13)和J14(X14,Y14),第二目标检测框的四个顶点分别为J21(X21,Y21)、J22(X22,Y22)、J23(X23,Y23)和J24(X24,Y24),J11可以理解是第一目标检测框的左上角顶点,J12是第一目标检测框的右上角顶点,J13是第一目标检测框的左下角顶点,J14为第一目标检测框的右下角顶点,J21可以理解是第二目标检测框的左上角顶点,J22是第二目标检测框的右上角顶点,J23是第二目标检测框的左下角顶点,J24为第二目标检测框的右下角顶点,进行平均处理得到该原始图像的第一检测框的四个顶点坐标,分别((X11+X21)/2,(Y11+Y21)/2)、((X12+X22)/2,(Y12+Y22)/2)、((X13+X23)/2,(Y13+Y23)/2)和((X14+X24)/2,(Y14+Y24)/2)。对每张原始图像的M个目标检测框进行上述类似过程,即可得到每张原始图像的第一检测框。The processing process for the M target detection frames for each original image can be similar. The following takes the processing process of one original image among N original images as an example. The M target detection frames of the original image can be averaged to obtain the first detection frame of the original image. For example, the target detection frame can be represented by four vertex coordinates, and the four vertex coordinates of the M target detection frames of the original image can be averaged to obtain the first detection frame of the original image, that is, the four first detection frames of the original image. The vertex coordinates are the average of the four vertex coordinates of the M target detection frames of the original image. For example, the coordinates of any vertex of the first detection frame are the average of the vertex coordinates of the M target detection frames of the original image. Need to explain Yes, any vertex coordinate can include two component coordinates. Average means averaging the same component coordinates of the vertex coordinate in M target detection frames. For example, M target detection frames of an original image include a first target detection frame and a second target detection frame. The four vertex coordinates of the first target detection frame are J11 (X11, Y11), J12 (X12, Y12), J13 (X13, Y13) and J14 (X14, Y14), the four vertices of the second target detection frame are J21 (X21, Y21), J22 (X22, Y22), J23 (X23, Y23) and J24 (X24, Y24), J11 can be understood as the upper left corner vertex of the first target detection frame, J12 is the upper right corner vertex of the first target detection frame, J13 is the lower left corner vertex of the first target detection frame, and J14 is the right corner vertex of the first target detection frame For the lower vertex, J21 can be understood as the upper left corner vertex of the second target detection frame, J22 is the upper right corner vertex of the second target detection frame, J23 is the lower left corner vertex of the second target detection frame, and J24 is the right corner vertex of the second target detection frame. The lower corner vertex is averaged to obtain the four vertex coordinates of the first detection frame of the original image, respectively ((X11+X21)/2, (Y11+Y21)/2), ((X12+X22)/2, ( Y12+Y22)/2), ((X13+X23)/2, (Y13+Y23)/2) and ((X14+X24)/2, (Y14+Y24)/2). Perform the above-mentioned similar process on the M target detection frames of each original image to obtain the first detection frame of each original image.
步骤204:对每张原始图像中第一检测框对应的区域以及原始图像对应的背景图像集中至少一张背景图像进行融合,得到每张原始图像对应的至少一张增强图像。Step 204: Fusion of the area corresponding to the first detection frame in each original image and at least one background image in the background image set corresponding to the original image, to obtain at least one enhanced image corresponding to each original image.
针对每张原始图像,在确定该原始图像的第一检测框后,可根据该原始图像的第一检测框,从该原始图像中提取第一检测框对应的区域,然后将提取的该原始图像中第一检测框对应的区域与该原始图像对应的背景图像集中至少一张背景图像进行融合,从而可得到该原始图像对应的至少一张增强图像,实现图像数据增强。需要说明的是,上述将该原始图像中第一检测框对应的区域与该原始图像对应的背景图像集中至少一张背景图像进行融合的过程,既可以是将该原始图像中第一检测框对应的区域与该原始图像对应的背景图像集中每张背景图像分别融合(此时得到的原始图像对应的增强图像的数量与该原始图像对应的背景图像集中背景图像的数量相同),也可以是将该原始图像中第一检测框对应的区域与该原始图像对应的背景图像集中的部分背景图像分别融合(此时得到的原始图像对应的增强图像的数量小于该原始图像对应的背景图像集中背景图像的数量)。对于每张原始图像进行上述类似融合过程,即可得到每张原始图像对应的至少一张增强图像。For each original image, after determining the first detection frame of the original image, the area corresponding to the first detection frame can be extracted from the original image according to the first detection frame of the original image, and then the extracted original image The area corresponding to the first detection frame is fused with at least one background image in the background image set corresponding to the original image, so that at least one enhanced image corresponding to the original image can be obtained to achieve image data enhancement. It should be noted that the above-mentioned process of fusing the area corresponding to the first detection frame in the original image with at least one background image in the background image set corresponding to the original image may be to fuse the area corresponding to the first detection frame in the original image. The area of is separately fused with each background image in the background image set corresponding to the original image (the number of enhanced images corresponding to the original image obtained at this time is the same as the number of background images in the background image set corresponding to the original image), or it can be The area corresponding to the first detection frame in the original image is merged with part of the background image in the background image set corresponding to the original image (the number of enhanced images corresponding to the original image obtained at this time is smaller than the background image in the background image set corresponding to the original image) quantity). Performing the above-mentioned similar fusion process for each original image can obtain at least one enhanced image corresponding to each original image.
需要说明的是,得到N张图像的每张原始图像对应的至少一张增强图像,可以理解是得到每张原始图像对应的增强图像集,即得到N个增强图像集,任一原始图像对应的增强图像集包括该原始图像对应的至少一张增强图像,N张原始图像以及N个增强图像集可用于后续的深度神经网络的训练,即后续可利用N张原始图像以及N个增强图像集,对深度神经网络进行训练,不但可增加训练样本量,而且由于本申请实施例得到的增强图像相对于原始图像,改动较大,得到的增强图像可提供较多的额外信息,从而在训练过程中可减少过拟合情况的出现。It should be noted that obtaining at least one enhanced image corresponding to each original image of N images can be understood as obtaining an enhanced image set corresponding to each original image, that is, obtaining N enhanced image sets, and any original image corresponding to The enhanced image set includes at least one enhanced image corresponding to the original image. The N original images and N enhanced image sets can be used for subsequent training of the deep neural network, that is, the N original images and N enhanced image sets can be used subsequently. Training a deep neural network can not only increase the amount of training samples, but also because the enhanced image obtained by the embodiment of the present application has a larger change compared to the original image, the obtained enhanced image can provide more additional information, so that during the training process It can reduce the occurrence of overfitting.
此外,原始图像中第一检测框对应的区域可以理解为是利用第一检测框的各顶点坐标在原始图像中所确定的区域,以第一检测框为矩形框为例,第一检测框对应的区域可以是第一检测框的四个顶点坐标在原始图像中所圈定的区域。In addition, the area corresponding to the first detection frame in the original image can be understood as the area determined in the original image using the coordinates of each vertex of the first detection frame. Taking the first detection frame as a rectangular frame as an example, the first detection frame corresponds to The area of may be the area enclosed by the four vertex coordinates of the first detection frame in the original image.
在本实施例的数据增强方法中,可通过M个不同目标检测网络得到原始图像的M个目标检测框,利用原始图像的M个目标检测框确定原始图像的第一检测框,以提高原始图像的第一检测框的准确性,利用的是原始图像中第一检测框对应的区域与该原始图像对应的背景图像集中至少一张背景图像进行融合,得到原始图像对应的至少一张增强图像,实现对原始图像 的数据增强。与相关技术中所描述的对原始图像进行翻转等处理不同,在本申请实施例中,不再是对整张原始图像进行处理,而是获取原始图像的第一检测框,再利用第一检测框与背景图像的融合得到增强图像,从而可以得到相对于原始图像改动较大的增强图像,以这种方式获得的增强图像可提供较多的额外信息,因而可提高图像增强效果。In the data enhancement method of this embodiment, M target detection frames of the original image can be obtained through M different target detection networks, and the M target detection frames of the original image are used to determine the first detection frame of the original image to improve the original image. The accuracy of the first detection frame uses the fusion of the area corresponding to the first detection frame in the original image and at least one background image in the background image set corresponding to the original image to obtain at least one enhanced image corresponding to the original image, To achieve the original image data enhancement. Different from the processing such as flipping the original image described in the related art, in the embodiment of the present application, the entire original image is no longer processed, but the first detection frame of the original image is obtained, and then the first detection frame is obtained. The fusion of the frame and the background image results in an enhanced image, which results in an enhanced image that is significantly changed relative to the original image. The enhanced image obtained in this way can provide more additional information, thus improving the image enhancement effect.
在一个实施例中,根据M个目标检测网络,对N张原始图像中每张原始图像进行目标检测,获得N张原始图像中每张原始图像的M个目标检测框,包括:In one embodiment, target detection is performed on each of the N original images according to M target detection networks, and M target detection frames for each of the N original images are obtained, including:
将N张原始图像输入M个目标检测网络进行特征提取,得到每张原始图像的M张特征图;Input N original images into M target detection networks for feature extraction, and obtain M feature maps of each original image;
对每张原始图像的M张特征图进行归一化处理,得到每张原始图像的M张热力图;Normalize the M feature maps of each original image to obtain M heat maps of each original image;
计算每张原始图像的M张热力图的M个目标检测框,作为每张原始图像的M个目标检测框。Calculate M target detection frames of M heat maps of each original image as M target detection frames of each original image.
作为一个示例,目标检测网络可以包括卷积神经网络,该卷积神经网络中可包括多个卷积层,特征图可以是卷积神经网络中最后一个卷积层输出的特征图。利用卷积神经网络对原始图像进行特征提取,可提取到原始图像的较多细节特征,以得到较好地表征原始图像特征的特征图。然后,对原始图像的M张特征图进行归一化处理,得到原始图像的M张热力图,以便后续确定图像中的目标检测框。作为一个示例,可以将特征图归一化到像素值在[0,1]范围内的热力图。As an example, the target detection network may include a convolutional neural network, which may include multiple convolutional layers, and the feature map may be a feature map output by the last convolutional layer in the convolutional neural network. By using a convolutional neural network to extract features from the original image, more detailed features of the original image can be extracted to obtain a feature map that better characterizes the features of the original image. Then, the M feature maps of the original image are normalized to obtain M heat maps of the original image, so as to subsequently determine the target detection frame in the image. As an example, the feature map can be normalized to a heat map with pixel values in the range [0, 1].
作为一个示例,可通过SimSiam自监督方法进行迭代训练,得到上述卷积神经网络。这种自监督方法在不使用负样本,也不需要动量编码器的情况下,直接最大化一张图像的两个视图的相似性。如图3所示,对于一个图像x(image x),对其进行两次随机增广(例如,旋转、颜色处理等),得到两个不同视图x1、x2,作为输入,两个视图x1、x2分别经过编码网络,得到对应的第一向量z1、第二向量z2,z1经过投影层(可以是多层感知机层)处理得到第三向量p1,z2经过投影层处理得到第四向量p2,编码网络(encoder)如图3中encoder f,投影层(projector)如图3中projector h。如图3所示的停止梯度操作(即图3中的stop-grad),是防止模型坍塌的关键。然后,最小化余弦相似度的负值:
As an example, the above convolutional neural network can be obtained by iterative training through the SimSiam self-supervised method. This self-supervised method directly maximizes the similarity of two views of an image without using negative samples and without requiring a momentum encoder. As shown in Figure 3, for an image x (image x), perform two random augmentations (for example, rotation, color processing, etc.) to obtain two different views x 1 and x 2. As input, the two views x1 , _ _ _ _ _ The fourth vector p 2 is obtained through the projection layer processing. The encoding network (encoder) is encoder f in Figure 3, and the projection layer (projector) is projector h in Figure 3. The stop gradient operation shown in Figure 3 (i.e. stop-grad in Figure 3) is the key to preventing model collapse. Then, minimize the negative value of cosine similarity:
余弦相似度如图3中similarity,D(p1,z2)为p1与z2的余弦相似度的负值。Cosine similarity is the similarity in Figure 3. D(p 1 , z 2 ) is the negative value of the cosine similarity between p 1 and z 2 .
损失函数L为对称形式:
The loss function L is in symmetric form:
其中,D(p2,z1)为p2与z1的余弦相似度的负值,表示如下:
Among them, D(p 2 , z 1 ) is the negative value of the cosine similarity between p 2 and z 1 , which is expressed as follows:
通过随机梯度下降(stochastic gradient descent,SGD)优化器,使用上述损失函数训练模型n(例如,40)轮次。需要说明的是,上述编码网络可以是卷积神经网络(可包括特征提取网络和转换层,特征提取网络的最后一层卷积层输出特征图,转换层将特征图转换成向量),训练完成之后,卷积神经网络训练完成。需要说明的是,在训练过程中,采用的训练集可包括上述N张原始图像。 The model is trained for n (e.g., 40) epochs using the above loss function via a stochastic gradient descent (SGD) optimizer. It should be noted that the above-mentioned encoding network can be a convolutional neural network (which can include a feature extraction network and a conversion layer. The last convolutional layer of the feature extraction network outputs a feature map, and the conversion layer converts the feature map into a vector). The training is completed. After that, the convolutional neural network training is completed. It should be noted that during the training process, the training set used may include the above-mentioned N original images.
上述自监督学习可以捕捉目标大概的位置信息,本申请实施例利用这一特性估计图像对应特征图中的物体边框。自监督训练初期的位置信息可能不够准确,因此可以采用迭代训练m(例如,20)轮次后直到收敛轮次(训练结束时的训练轮次,例如,上述n个轮次)的卷积神经网络对原始图像A进行特征提取,即采用第m+1轮次到第n轮次训练的卷积神经网络对原始图像A进行特征提取,可得到(n-m)个,即M个特征图(取卷积神经网络的最后一个卷积层输出的特征图)。将特征图归一化到[0,1]之间,生成热力图,然后根据下述方式计算每个热力图中的目标检测框:The above-mentioned self-supervised learning can capture the approximate position information of the target. The embodiment of the present application uses this feature to estimate the object border in the corresponding feature map of the image. The position information at the early stage of self-supervised training may not be accurate enough, so it is possible to use convolutional neural iterative training after m (for example, 20) rounds until the convergence round (the training rounds at the end of training, for example, the above n rounds) The network performs feature extraction on the original image A, that is, using the convolutional neural network trained from the m+1th round to the nth round to perform feature extraction on the original image A, (n-m), that is, M feature maps (taken The feature map output by the last convolutional layer of the convolutional neural network). Normalize the feature map to between [0,1], generate a heat map, and then calculate the target detection box in each heat map according to the following method:
B=K(l[R>i])B=K(l[R>i])
其中,R代表热力图,i代表了激活点的阈值(即预设像素阈值),l是指示函数,l[R>i]表示R中的像素点的值大于i的情况下结果为1,否则结果0,对R中每个像素点均进行这种处理,即实现对R的二值化处理,得到二值化图像,K是计算矩形闭包的函数,即时计算目标检测框的函数,K函数返回热力图R的二值化图像的目标检测框。Among them, R represents the heat map, i represents the threshold of the activation point (i.e., the preset pixel threshold), l is the indicator function, l[R>i] means that the result is 1 when the value of the pixel in R is greater than i, Otherwise, the result is 0. This processing is performed on each pixel in R, that is, the binary processing of R is implemented to obtain the binary image. K is a function for calculating the rectangular closure and a function for instantly calculating the target detection frame. The K function returns the target detection box of the binary image of the heat map R.
由于记录了M个卷积神经网络,每个卷积神经网络输出一张热力图,得到原始图像A的M张热力图,分别计算出这M张热力图的M个目标检测框作为原始图像A的M个目标检测框,将M个目标检测框的结果平均处理,比如若目标检测框为矩形检测框,则可以对M个目标检测框的四个顶点坐标依次求平均数,获得原始图像A的最终的第一检测框。Since M convolutional neural networks are recorded, each convolutional neural network outputs a heat map, and M heat maps of the original image A are obtained. M target detection frames of these M heat maps are respectively calculated as the original image A. M target detection frames, average the results of the M target detection frames. For example, if the target detection frame is a rectangular detection frame, you can average the four vertex coordinates of the M target detection frames in sequence to obtain the original image A the final first detection frame.
在一个实施例中,计算每张原始图像的M张热力图的M个目标检测框,作为每张原始图像的M个目标检测框,包括:In one embodiment, M target detection frames of M heat maps of each original image are calculated as M target detection frames of each original image, including:
对于每张热力图,根据预设像素阈值对热力图进行二值化处理,得到热力图的二值化图像,并基于热力图的二值化图像计算热力图的目标检测框。For each heat map, the heat map is binarized according to the preset pixel threshold to obtain a binary image of the heat map, and the target detection frame of the heat map is calculated based on the binarized image of the heat map.
二值化处理可将热力图中大于预设像素阈值的像素点的值调整为第一值,例如,1,以及将热力图中小于或等于预设像素阈值的像素点的值调整为第二值,例如0,得到的二值化图像中任一像素点的值为第一值或第二值。作为一个示例,二值化处理可实现为上文提及的指示函数l[R>i]。由于二值化处理后的图像中只存在两种像素值,利用二值化图像进行目标检测框计算,可提高所计算的检测框的准确性。The binarization process can adjust the values of pixels in the heat map that are greater than a preset pixel threshold to a first value, for example, 1, and adjust the values of pixels in the heat map that are less than or equal to the preset pixel threshold to a second value. value, such as 0, and the value of any pixel in the binary image obtained is the first value or the second value. As an example, the binarization process can be implemented as the indicator function l[R>i] mentioned above. Since there are only two pixel values in the binarized image, using the binarized image to calculate the target detection frame can improve the accuracy of the calculated detection frame.
在一个实施例中,获取N个背景图像集,包括:In one embodiment, N background image sets are obtained, including:
对于每张原始图像,执行如下操作:For each original image, do the following:
根据该原始图像的类别,确定与该原始图像的类别匹配的至少一种类别,至少一种类别中每种类别与该原始图像的类别之间的相似度大于预设阈值;Determine at least one category that matches the category of the original image based on the category of the original image, and the similarity between each category in the at least one category and the category of the original image is greater than a preset threshold;
获取至少一种类别中每种类别对应的至少一张参考图像,以得到该原始图像对应的参考图像集;Obtain at least one reference image corresponding to each category in at least one category to obtain a reference image set corresponding to the original image;
获取参考图像集的每张参考图像中的背景图像,以得到该原始图像对应的背景图像集。Obtain the background image in each reference image of the reference image set to obtain the background image set corresponding to the original image.
原始图像的类别也可理解为是原始图像中目标的类别,可预先设置多个类别,原始图像的类别可以是多个类别中的类别,示例性地,多个类别可以是但不限于人、猪、羊、猫、狗、鹿、马以及鸟等,原始图像的类别可预先获取得到。背景图像可以理解是去除目标后的图像,例如,可以是对参考图像去除目标后剩余的背景区域等。The category of the original image can also be understood as the category of the object in the original image. Multiple categories can be set in advance. The category of the original image can be a category in multiple categories. For example, the multiple categories can be but are not limited to people, Pigs, sheep, cats, dogs, deer, horses, birds, etc., the categories of the original images can be obtained in advance. The background image can be understood as an image after removing the target, for example, it may be the remaining background area after removing the target from the reference image, etc.
在本实施例中,是对参考图像的背景图像与第一检测框的区域进行融合,而参考图 像是类别与原始图像的类别匹配的图像,从而可减小第一检测框的区域与参考图像的背景图像之间的差异,提高融合得到的增强图像的合理性。In this embodiment, the background image of the reference image and the area of the first detection frame are fused, and the reference image For example, the image category matches the category of the original image, thereby reducing the difference between the area of the first detection frame and the background image of the reference image, and improving the rationality of the enhanced image obtained by fusion.
在一个实施例,对于每张原始图像,根据该原始图像的类别,确定与该原始图像的类别匹配的至少一种类别之前,还包括:In one embodiment, for each original image, before determining at least one category that matches the category of the original image according to the category of the original image, the method further includes:
确定多个类别中每两个类别之间的相似度,多个类别包括原始图像的类别以及至少一种类别;determining a similarity between each two categories in a plurality of categories, the plurality of categories including a category of the original image and at least one category;
其中,根据原始图像的类别,确定与原始图像的类别匹配的至少一种类别,包括:Wherein, according to the category of the original image, at least one category matching the category of the original image is determined, including:
根据原始图像的类别与多个类别中其余类别之间的相似度,从其余类别中确定与原始图像的类别匹配的至少一种类别,其余类别为多个类别中除原始图像的类别之外的类别,与原始图像的类别匹配的至少一种类别中每种类别与原始图像的类别之间的相似度大于预设阈值。According to the similarity between the category of the original image and the remaining categories in the plurality of categories, at least one category matching the category of the original image is determined from the remaining categories, and the remaining categories are the categories of the plurality of categories other than the category of the original image. Category, the similarity between each category of at least one category that matches the category of the original image and the category of the original image is greater than a preset threshold.
可以理解的是,上述多个类别中包括有原始图像的类别以及其余类别,因此,可以通过类别之间的相似度,从其余类别中选择与原始图像的类别匹配的类别作为上述至少一种类别,即在本实施例中,是对参考图像的背景图像与第一检测框的区域进行融合,而参考图像是通过类别之间的相似度确定的与原始图像的类别的相似度大于预设阈值的至少一个类别对应的图像,从而可减小第一检测框的区域与参考图像的背景图像之间的差异,提高融合得到的增强图像的合理性。It can be understood that the above multiple categories include the category of the original image and other categories. Therefore, a category matching the category of the original image can be selected from the remaining categories as at least one of the above categories based on the similarity between the categories. , that is, in this embodiment, the background image of the reference image is fused with the area of the first detection frame, and the similarity between the reference image and the category of the original image is determined by the similarity between categories greater than the preset threshold. images corresponding to at least one category, thereby reducing the difference between the area of the first detection frame and the background image of the reference image, and improving the rationality of the enhanced image obtained by fusion.
作为一个示例,至少一种类别为一种类别,即与原始图像匹配的类别为一种,该匹配的一种类别可以为多个类别中与原始图像类别之间的相似度最大的类别。As an example, at least one category is one category, that is, the category that matches the original image is one category, and the matching category can be the category with the greatest similarity to the original image category among multiple categories.
例如,如图4和5所示,将原始图像A输入M个目标检测网络,提取原始图像的M个目标检测框,利用原始图像A的M个目标检测框得到原始图像A的第一检测框J,根据记录的多个类别中每两个类别之间的相似度,确定与原始图像A的类别相似度最高的类别,获取相似度最高的类别对应的参考图像C中的背景区域D,将第一检测框J融入到参考图像C中的背景区域D,得到增强图像Q。需要说明的是,相似度最高的类别对应的参考图像可以有一张或多张,即参考图像C可以为一张或多张,对应地,基于参考图像C所获得的增强图像Q也可以为一张或多张。For example, as shown in Figures 4 and 5, input the original image A into M target detection networks, extract M target detection frames of the original image, and use the M target detection frames of the original image A to obtain the first detection frame of the original image A. J, based on the similarity between each two categories in the recorded categories, determine the category with the highest similarity to the category of the original image A, obtain the background area D in the reference image C corresponding to the category with the highest similarity, and The first detection frame J is integrated into the background area D in the reference image C, and an enhanced image Q is obtained. It should be noted that the reference image corresponding to the category with the highest similarity can be one or more, that is, the reference image C can be one or more. Correspondingly, the enhanced image Q obtained based on the reference image C can also be one. One or more sheets.
在一个实施例中,确定多个类别中每两个类别之间的相似度,包括:In one embodiment, determining the similarity between each two categories in multiple categories includes:
将多个类别输入语义模型进行语义分析,获得多个类别中每个类别的语义向量表示;Input multiple categories into the semantic model for semantic analysis, and obtain the semantic vector representation of each category in the multiple categories;
计算多个类别中每两个类别的语义向量表示之间的相似度。Calculate the similarity between the semantic vector representations of each two categories in multiple categories.
语义模型的种类有多种,在本实施例中不作具体限定,例如,语义模型可以采用Glove模型、word2vec(一种词向量模型)等。由于在本实施例中,是通过语义模型对类别进行语义分析,可提取到类别的语义向量表示(或者称为词向量表示),可以理解,一个类别的语义向量表示可用于表征该类别的语义信息(即将该类别的语义信息以向量的形式表示出来),可根据两个类别的语义向量表示,计算这两个类别之间的相似度,也称语义相似度,例如,将这两个类别的语义向量表示之间的余弦相似度作为这两个类别之间的相似度,这样,可提高类别之间相似度的准确性。There are many types of semantic models, which are not specifically limited in this embodiment. For example, the semantic model can use Glove model, word2vec (a word vector model), etc. Since in this embodiment, the semantic analysis of the category is performed through the semantic model, the semantic vector representation (or word vector representation) of the category can be extracted. It can be understood that the semantic vector representation of a category can be used to characterize the semantics of the category. Information (that is, the semantic information of the category is expressed in the form of a vector), the similarity between the two categories can be calculated based on the semantic vector representation of the two categories, also called semantic similarity. For example, the two categories The cosine similarity between the semantic vector representations is used as the similarity between the two categories. In this way, the accuracy of the similarity between categories can be improved.
需要说明的是,通过使用各类别之间的语义相似度,得到与原始图像的类别对应的相似类别,将原始图像中第一检测框对应的区域和与原始图像的类别相似的至少一个类 别对应的参考图像的背景图像进行融合,实现图像重构,得到该原始图像的增强图像,在一定程度上约束和降低重构生成的增强图像出现歧义的可能,提高增强图像的合理性。例如,对于狗的图像,背景可能为草地、家具等,而对于其最相似类别,猫,背景也同样可能是相似的,将狗作为前景,和与狗类别最相似的猫的图像的背景相融合,可以得到增强图像作为增广样本,提高融合得到的增强图像的合理性。本申请实施例提出的数据增强方法对原始图像的改动简单、合理且幅度较大,所能提供的额外信息更多,更加有潜力抵抗过拟合问题。It should be noted that by using the semantic similarity between categories, a similar category corresponding to the category of the original image is obtained, and the area corresponding to the first detection frame in the original image and at least one category similar to the category of the original image are The background image of the corresponding reference image is fused to achieve image reconstruction, and an enhanced image of the original image is obtained. This constrains and reduces the possibility of ambiguity in the enhanced image generated by reconstruction to a certain extent, and improves the rationality of the enhanced image. For example, for a dog image, the background may be grass, furniture, etc., and for its most similar category, cat, the background may also be similar, with dog as the foreground, and the background of the cat image that is most similar to the dog category. Through fusion, the enhanced image can be obtained as an augmented sample to improve the rationality of the enhanced image obtained by fusion. The data enhancement method proposed in the embodiment of this application makes simple, reasonable and large changes to the original image, can provide more additional information, and has greater potential to resist over-fitting problems.
需要说明的是,可以使用的原始图像可包括但不局限于表情数据、人脸图像、自然界生物分类等,原始图像中背景信息丰富的图像效果更佳。在对原始图像进行数据增强前,先使用原始图像进行自监督学习,可获得原始图像中目标的位置估计结果。再根据类别,计算各类别间语义相似度,获得最相似类别。而后根据原始图像、第一检测框和最相似类别等,进行数据增强,用于后续深度神经网络训练过程。在训练过程中,对于某个图像,将其目标与其类别相似的参考图像的去除目标的背景区域混合重构出更多样本图像(增强图像)。It should be noted that the original images that can be used can include but are not limited to expression data, face images, natural biological classification, etc. The original images with rich background information have better effects. Before performing data enhancement on the original image, the original image is first used for self-supervised learning to obtain the position estimation result of the target in the original image. Then according to the categories, the semantic similarity between each category is calculated to obtain the most similar category. Then, data enhancement is performed based on the original image, the first detection frame, and the most similar category, etc., for subsequent deep neural network training process. During the training process, for a certain image, more sample images (enhanced images) are reconstructed by mixing the target-removed background areas of reference images with similar categories to its targets.
本申请实施例的数据增强方法可适用于深度神经网络的训练过程中图像数据增强,例如,在深度神经网络训练过程中,由于原始图像的量不足,容易导致训练过拟合,为减少训练过拟合,需对原始图像进行图像数据增强,在本申请实施例中,通过对原始图像进行目标检测,确定原始图像的第一检测框,将原始图像中第一检测框对应的区域与原始图像对应的背景图像集中至少一张背景图像进行融合,得到原始图像对应的至少一张增强图像,实现图像数据增强。可以理解,是将原始图像与额外的背景图像进行融合得到增强图像,这样,相对于原始图像,得到的增强图像能够提供较多的额外信息。利用原始图像和所得到的增强图像进行深度神经网络训练,不但可增加训练样本量,而且由于本申请实施例得到的增强图像相对于原始图像,改动较大,得到的增强图像可提供较多的额外信息,从而在训练过程中可减少过拟合情况的出现。The data enhancement method of the embodiment of the present application can be applied to image data enhancement during the training process of a deep neural network. For example, during the training process of a deep neural network, due to insufficient amount of original images, it is easy to cause training overfitting. In order to reduce training overfitting, Fitting requires image data enhancement of the original image. In the embodiment of the present application, the first detection frame of the original image is determined by performing target detection on the original image, and the area corresponding to the first detection frame in the original image is compared with the original image. At least one background image in the corresponding background image set is fused to obtain at least one enhanced image corresponding to the original image, thereby achieving image data enhancement. It can be understood that the enhanced image is obtained by fusing the original image with the additional background image. In this way, the obtained enhanced image can provide more additional information compared to the original image. Using the original image and the obtained enhanced image for deep neural network training can not only increase the amount of training samples, but also because the enhanced image obtained in the embodiment of the present application has a larger change compared to the original image, the obtained enhanced image can provide more Additional information can reduce the occurrence of overfitting during training.
参见图6,图6是本申请实施例提供的数据增强装置的结构图,能实现上述实施例中数据增强方法的细节,并达到相同的效果。如图6所示,数据增强装置600,包括:Referring to Figure 6, Figure 6 is a structural diagram of a data enhancement device provided by an embodiment of the present application, which can implement the details of the data enhancement method in the above embodiment and achieve the same effect. As shown in Figure 6, data enhancement device 600 includes:
获取模块601,用于获取原始图像以及背景图像集,一张原始图像对应一个背景图像集;The acquisition module 601 is used to acquire the original image and the background image set. One original image corresponds to one background image set;
目标检测模块602,用于根据目标检测网络,对原始图像进行目标检测,获得原始图像的第一检测框;The target detection module 602 is used to perform target detection on the original image according to the target detection network and obtain the first detection frame of the original image;
融合模块603,用于对第一检测框对应的区域以及原始图像对应的背景图像集中至少一张背景图像进行融合,得到原始图像的增强图像。The fusion module 603 is used to fuse the area corresponding to the first detection frame and at least one background image in the background image set corresponding to the original image to obtain an enhanced image of the original image.
在一个实施例中,目标检测网络为M个;In one embodiment, there are M target detection networks;
目标检测模块602具体用于:The target detection module 602 is specifically used for:
根据M个目标检测网络,对原始图像进行目标检测,获得原始图像的M个目标检测框;According to M target detection networks, target detection is performed on the original image to obtain M target detection frames of the original image;
根据原始图像的M个目标检测框确定原始图像的第一检测框。The first detection frame of the original image is determined based on the M target detection frames of the original image.
在一个实施例中,目标检测模块602具体用于:In one embodiment, the target detection module 602 is specifically used to:
将原始图像输入M个目标检测网络进行特征提取,得到原始图像的M张特征图;Input the original image into M target detection networks for feature extraction, and obtain M feature maps of the original image;
对原始图像的M张特征图进行归一化处理,得到原始图像的M张热力图;Normalize the M feature maps of the original image to obtain M heat maps of the original image;
计算原始图像的M张热力图的目标检测框,作为原始图像的M个目标检测框。 Calculate the target detection frames of M heat maps of the original image as M target detection frames of the original image.
在一个实施例中,目标检测模块602具体用于:In one embodiment, the target detection module 602 is specifically used to:
对于每张热力图,根据预设像素阈值对热力图进行二值化处理,得到热力图的二值化图像,并基于热力图的二值化图像计算热力图的目标检测框。For each heat map, the heat map is binarized according to the preset pixel threshold to obtain a binary image of the heat map, and the target detection frame of the heat map is calculated based on the binarized image of the heat map.
在一个实施例中,目标检测模块602具体用于:In one embodiment, the target detection module 602 is specifically used to:
对于每张原始图像,对原始图像的M个目标检测框进行平均处理,得到原始图像的第一检测框。For each original image, the M target detection frames of the original image are averaged to obtain the first detection frame of the original image.
在一个实施例中,获取模块601具体用于:In one embodiment, the acquisition module 601 is specifically used to:
根据原始图像的类别,确定与原始图像的类别匹配的至少一种类别;determining at least one category that matches the category of the original image based on the category of the original image;
获取至少一种类别中每种类别对应的至少一张参考图像,以得到原始图像对应的参考图像集;Obtain at least one reference image corresponding to each category in at least one category to obtain a reference image set corresponding to the original image;
获取参考图像集的每张参考图像中的背景图像,以得到原始图像对应的背景图像集。Obtain the background image in each reference image of the reference image set to obtain the background image set corresponding to the original image.
在一个实施例中,装置600还包括:In one embodiment, the apparatus 600 further includes:
相似度确定模块,确定多个类别中每两个类别之间的相似度,多个类别包括原始图像的类别以及至少一种类别;a similarity determination module that determines the similarity between each two categories in a plurality of categories, the plurality of categories including the category of the original image and at least one category;
其中,获取模块601具体用于:Among them, the acquisition module 601 is specifically used for:
根据原始图像的类别与多个类别中其余类别之间的相似度,从其余类别中确定与原始图像的类别匹配的至少一种类别,其余类别为多个类别中除原始图像的类别之外的类别,至少一种类别中每种类别与原始图像的类别之间的相似度大于预设阈值。According to the similarity between the category of the original image and the remaining categories in the plurality of categories, at least one category matching the category of the original image is determined from the remaining categories, and the remaining categories are the categories of the plurality of categories other than the category of the original image. Categories, the similarity between each category in at least one category and the category of the original image is greater than a preset threshold.
在一个实施例中,相似度确定模块具体用于:In one embodiment, the similarity determination module is specifically used to:
将多个类别输入语义模型进行语义分析,获得多个类别中每个类别的语义向量表示;Input multiple categories into the semantic model for semantic analysis, and obtain the semantic vector representation of each category in the multiple categories;
计算多个类别中每两个类别的语义向量表示之间的相似度。Calculate the similarity between the semantic vector representations of each two categories in multiple categories.
在一个实施例中,每两个类别的语义向量表示之间的相似度为每两个类别的语义向量表示之间的余弦相似度。In one embodiment, the similarity between the semantic vector representations of each two categories is a cosine similarity between the semantic vector representations of each two categories.
在一个实施例中,第一检测框为矩形检测框。In one embodiment, the first detection frame is a rectangular detection frame.
本申请实施例提供的数据增强装置能够实现上述实施例中数据增强方法实现的各个过程,技术特征一一对应,为避免重复,这里不再赘述。The data enhancement device provided by the embodiment of the present application can implement each process implemented by the data enhancement method in the above embodiment, and the technical features correspond one to one. To avoid duplication, they will not be described again here.
参见图7,图7是本申请实施例提供的数据增强装置的结构图,能实现上述实施例中数据增强方法的细节,并达到相同的效果。如图7所示,数据增强装置700,包括:Referring to Figure 7, Figure 7 is a structural diagram of a data enhancement device provided by an embodiment of the present application, which can implement the details of the data enhancement method in the above embodiment and achieve the same effect. As shown in Figure 7, data enhancement device 700 includes:
第一获取模块701,用于获取N张原始图像以及N个背景图像集,一张原始图像对应一个背景图像集,N为大于1的整数;The first acquisition module 701 is used to acquire N original images and N background image sets. One original image corresponds to one background image set, and N is an integer greater than 1;
目标检测模块702,用于根据M个目标检测网络,对N张原始图像中每张原始图像进行目标检测,获得每张原始图像的M个目标检测框,M为大于1的整数;The target detection module 702 is used to perform target detection on each of the N original images according to M target detection networks, and obtain M target detection frames for each original image, where M is an integer greater than 1;
第一确定模块703,用于确定每张原始图像的第一检测框,原始图像的第一检测框为通过原始图像的M个目标检测框确定的检测框;The first determination module 703 is used to determine the first detection frame of each original image. The first detection frame of the original image is the detection frame determined by the M target detection frames of the original image;
融合模块704,用于对每张原始图像中第一检测框对应的区域以及原始图像对应的背景图像集中至少一张背景图像进行融合,得到每张原始图像对应的至少一张增强图像。The fusion module 704 is used to fuse the area corresponding to the first detection frame in each original image and at least one background image in the background image set corresponding to the original image, to obtain at least one enhanced image corresponding to each original image.
在一个实施例中,目标检测模块702,包括:In one embodiment, the target detection module 702 includes:
提取模块,用于将N张原始图像输入M个目标检测网络进行特征提取,得到每张 原始图像的M张特征图;The extraction module is used to input N original images into M target detection networks for feature extraction to obtain each M feature maps of the original image;
归一化处理模块,用于对每张原始图像的M张特征图进行归一化处理,得到每张原始图像的M张热力图;The normalization processing module is used to normalize the M feature maps of each original image to obtain M heat maps of each original image;
检测框确定模块,用于计算每张原始图像的M张热力图的M个目标检测框,作为每张原始图像的M个目标检测框。The detection frame determination module is used to calculate M target detection frames of M heat maps of each original image as M target detection frames of each original image.
在一个实施例中,检测框确定模块,包括二值化处理模块和检测框计算模块;In one embodiment, the detection frame determination module includes a binarization processing module and a detection frame calculation module;
对于每张热力图,二值化处理模块,用于根据预设像素阈值对热力图进行二值化处理,得到热力图的二值化图像;检测框计算模块,用于基于热力图的二值化图像,计算热力图的目标检测框。For each heat map, the binarization processing module is used to binarize the heat map according to the preset pixel threshold to obtain a binary image of the heat map; the detection frame calculation module is used to perform binary processing based on the heat map. image, and calculate the target detection frame of the heat map.
在一个实施例中,第一获取模块701,包括类别确定模块、第一图像获取模块和第二图像获取模块;In one embodiment, the first acquisition module 701 includes a category determination module, a first image acquisition module and a second image acquisition module;
对于每张原始图像,类别确定模块用于根据原始图像的类别,确定与原始图像的类别匹配的至少一种类别;第一图像获取模块,用于获取至少一种类别中每种类别对应的至少一张参考图像,以得到原始图像对应的参考图像集;第二图像获取模块,用于获取参考图像集的每张参考图像中的背景图像,以得到原始图像对应的背景图像集。For each original image, the category determination module is used to determine at least one category that matches the category of the original image according to the category of the original image; the first image acquisition module is used to acquire at least one category corresponding to each category in the at least one category. A reference image is used to obtain the reference image set corresponding to the original image; the second image acquisition module is used to obtain the background image in each reference image of the reference image set to obtain the background image set corresponding to the original image.
在一个实施例中,装置700还包括:In one embodiment, the apparatus 700 further includes:
相似度确定模块,用于确定多个类别中每两个类别之间的相似度,多个类别包括原始图像的类别以及至少一种类别;a similarity determination module, configured to determine the similarity between each two categories in a plurality of categories, the plurality of categories including the category of the original image and at least one category;
其中,类别确定模块,用于:Among them, the category determination module is used for:
根据原始图像的类别与多个类别中其余类别之间的相似度,从其余类别中确定与原始图像的类别匹配的至少一种类别,其余类别为多个类别中除原始图像的类别之外的类别,至少一种类别中每种类别与原始图像的类别之间的相似度大于预设阈值。According to the similarity between the category of the original image and the remaining categories in the plurality of categories, at least one category matching the category of the original image is determined from the remaining categories, and the remaining categories are the categories of the plurality of categories other than the category of the original image. Categories, the similarity between each category in at least one category and the category of the original image is greater than a preset threshold.
在一个实施例中,相似度确定模块,包括:In one embodiment, the similarity determination module includes:
向量表示获取模块,用于将多个类别输入语义模型进行语义分析,获得多个类别中每个类别的语义向量表示;The vector representation acquisition module is used to input multiple categories into the semantic model for semantic analysis and obtain the semantic vector representation of each category in the multiple categories;
相似度计算模块,用于计算多个类别中每两个类别的语义向量表示之间的相似度。The similarity calculation module is used to calculate the similarity between the semantic vector representations of each two categories in multiple categories.
在一个实施例中,每两个类别的语义向量表示之间的相似度为每两个类别的语义向量表示之间的余弦相似度。In one embodiment, the similarity between the semantic vector representations of each two categories is a cosine similarity between the semantic vector representations of each two categories.
在一个实施例中,第一确定模块703,用于:In one embodiment, the first determination module 703 is used for:
对于每张原始图像,对原始图像的M个目标检测框进行平均处理,得到原始图像的第一检测框。For each original image, the M target detection frames of the original image are averaged to obtain the first detection frame of the original image.
在一个实施例中,第一检测框为矩形检测框。In one embodiment, the first detection frame is a rectangular detection frame.
本申请实施例提供的数据增强装置能够实现上述实施例中数据增强方法实现的各个过程,技术特征一一对应,为避免重复,这里不再赘述。The data enhancement device provided by the embodiment of the present application can implement each process implemented by the data enhancement method in the above embodiment, and the technical features correspond one to one. To avoid duplication, they will not be described again here.
图8为实现本申请各个实施例的一种电子设备的硬件结构示意图。FIG. 8 is a schematic diagram of the hardware structure of an electronic device that implements various embodiments of the present application.
该电子设备800包括但不限于:射频单元801、网络模块802、音频输出单元803、输入单元804、传感器805、显示单元806、用户输入单元807、接口单元808、存储器809、处理器810、以及电源811等部件。本领域技术人员可以理解,图8中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或 者组合某些部件,或者不同的部件布置。在本申请实施例中,电子设备包括但不限于手机、平板电脑、笔记本电脑、掌上电脑、车载终端、可穿戴设备、以及计步器等。The electronic device 800 includes but is not limited to: radio frequency unit 801, network module 802, audio output unit 803, input unit 804, sensor 805, display unit 806, user input unit 807, interface unit 808, memory 809, processor 810, and Power supply 811 and other components. Those skilled in the art can understand that the structure of the electronic device shown in Figure 8 does not constitute a limitation on the electronic device, and the electronic device may include more or fewer components than shown in the figure, or Or combine certain parts, or arrange different parts. In the embodiment of the present application, electronic devices include but are not limited to mobile phones, tablet computers, notebook computers, PDAs, vehicle-mounted terminals, wearable devices, and pedometers.
其中,处理器810,用于:Among them, processor 810 is used for:
获取N张原始图像以及N个背景图像集,一张原始图像对应一个背景图像集,N为大于1的整数;Obtain N original images and N background image sets. One original image corresponds to one background image set, and N is an integer greater than 1;
根据M个目标检测网络,对N张原始图像中每张原始图像进行目标检测,获得每张原始图像的M个目标检测框,M为大于1的整数;According to M target detection networks, target detection is performed on each of the N original images to obtain M target detection frames for each original image, where M is an integer greater than 1;
确定每张原始图像的第一检测框,原始图像的第一检测框为通过原始图像的M个目标检测框确定的检测框;Determine the first detection frame of each original image. The first detection frame of the original image is the detection frame determined by the M target detection frames of the original image;
对每张原始图像中第一检测框对应的区域以及原始图像对应的背景图像集中至少一张背景图像进行融合,得到每张原始图像对应的至少一张增强图像。The area corresponding to the first detection frame in each original image and at least one background image in the background image set corresponding to the original image are fused to obtain at least one enhanced image corresponding to each original image.
在一个实施例中,处理器810,具体用于:In one embodiment, the processor 810 is specifically configured to:
将N张原始图像输入M个目标检测网络进行特征提取,得到每张原始图像的M张特征图;Input N original images into M target detection networks for feature extraction, and obtain M feature maps of each original image;
对每张原始图像的M张特征图进行归一化处理,得到每张原始图像的M张热力图;Normalize the M feature maps of each original image to obtain M heat maps of each original image;
计算每张原始图像的M张热力图的M个目标检测框,作为每张原始图像的M个目标检测框。Calculate M target detection frames of M heat maps of each original image as M target detection frames of each original image.
在一个实施例中,处理器810,具体用于:In one embodiment, the processor 810 is specifically configured to:
对于每张热力图,根据预设像素阈值对热力图进行二值化处理,得到热力图的二值化图像,并基于热力图的二值化图像计算热力图的目标检测框。For each heat map, the heat map is binarized according to the preset pixel threshold to obtain a binary image of the heat map, and the target detection frame of the heat map is calculated based on the binarized image of the heat map.
在一个实施例中,处理器810,具体用于:In one embodiment, the processor 810 is specifically configured to:
对于每张原始图像,执行如下操作:For each original image, do the following:
根据原始图像的类别,确定与原始图像的类别匹配的至少一种类别;determining at least one category that matches the category of the original image based on the category of the original image;
获取至少一种类别中每种类别对应的至少一张参考图像,以得到原始图像对应的参考图像集;Obtain at least one reference image corresponding to each category in at least one category to obtain a reference image set corresponding to the original image;
获取参考图像集的每张参考图像中的背景图像,以得到原始图像对应的背景图像集。Obtain the background image in each reference image of the reference image set to obtain the background image set corresponding to the original image.
在一个实施例中,处理器810,还用于:In one embodiment, processor 810 is also used to:
相似度确定模块,用于确定多个类别中每两个类别之间的相似度,多个类别包括原始图像的类别以及至少一种类别;a similarity determination module, configured to determine the similarity between each two categories in a plurality of categories, the plurality of categories including the category of the original image and at least one category;
其中,处理器810,还具体用于:Among them, the processor 810 is also specifically used for:
根据原始图像的类别与多个类别中其余类别之间的相似度,从其余类别中确定与原始图像的类别匹配的至少一种类别,其余类别为多个类别中除原始图像的类别之外的类别,至少一种类别中每种类别与原始图像的类别之间的相似度大于预设阈值。According to the similarity between the category of the original image and the remaining categories in the plurality of categories, at least one category matching the category of the original image is determined from the remaining categories, and the remaining categories are the categories of the plurality of categories other than the category of the original image. Categories, the similarity between each category in at least one category and the category of the original image is greater than a preset threshold.
在一个实施例中,处理器810,还具体用于:In one embodiment, the processor 810 is also specifically configured to:
将多个类别输入语义模型进行语义分析,获得多个类别中每个类别的语义向量表示;Input multiple categories into the semantic model for semantic analysis, and obtain the semantic vector representation of each category in the multiple categories;
计算多个类别中每两个类别的语义向量表示之间的相似度。 Calculate the similarity between the semantic vector representations of each two categories in multiple categories.
在一个实施例中,每两个类别的语义向量表示之间的相似度为每两个类别的语义向量表示之间的余弦相似度。In one embodiment, the similarity between the semantic vector representations of each two categories is a cosine similarity between the semantic vector representations of each two categories.
在一个实施例中,处理器810,还具体用于:In one embodiment, the processor 810 is also specifically configured to:
对于每张原始图像,对原始图像的M个目标检测框进行平均处理,得到原始图像的第一检测框。For each original image, the M target detection frames of the original image are averaged to obtain the first detection frame of the original image.
在一个实施例中,第一检测框为矩形检测框。In one embodiment, the first detection frame is a rectangular detection frame.
本申请实施例同样具有与上述数据增强方法实施例相同的有益技术效果,具体在此不再赘述。此外,处理器810还可执行与图6对应的实施例中各模块的操作。The embodiments of the present application also have the same beneficial technical effects as the above-mentioned data enhancement method embodiments, and details will not be described again here. In addition, the processor 810 can also perform operations of each module in the embodiment corresponding to FIG. 6 .
应理解的是,本申请实施例中,射频单元801可用于收发信息或通话过程中,信号的接收和发送,具体的,将来自基站的下行数据接收后,给处理器810处理;另外,将上行的数据发送给基站。通常,射频单元801包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频单元801还可以通过无线通信系统与网络和其他设备通信。It should be understood that in the embodiment of the present application, the radio frequency unit 801 can be used to receive and send information or signals during a call. Specifically, after receiving downlink data from the base station, it is processed by the processor 810; in addition, Uplink data is sent to the base station. Generally, the radio frequency unit 801 includes, but is not limited to, an antenna, at least one amplifier, transceiver, coupler, low noise amplifier, duplexer, etc. In addition, the radio frequency unit 801 can also communicate with the network and other devices through a wireless communication system.
电子设备通过网络模块802为用户提供了无线的宽带互联网访问,如帮助用户收发电子邮件、浏览网页和访问流式媒体等。The electronic device provides users with wireless broadband Internet access through the network module 802, such as helping users send and receive emails, browse web pages, and access streaming media.
音频输出单元803可以将射频单元801或网络模块802接收的或者在存储器809中存储的音频数据转换成音频信号并且输出为声音。而且,音频输出单元803还可以提供与电子设备800执行的特定功能相关的音频输出(例如,呼叫信号接收声音、消息接收声音等等)。音频输出单元803包括扬声器、蜂鸣器以及受话器等。The audio output unit 803 may convert the audio data received by the radio frequency unit 801 or the network module 802 or stored in the memory 809 into an audio signal and output it as a sound. Furthermore, the audio output unit 803 may also provide audio output related to a specific function performed by the electronic device 800 (eg, call signal reception sound, message reception sound, etc.). The audio output unit 803 includes a speaker, a buzzer, a receiver, and the like.
输入单元804用于接收音频或视频信号。输入单元804可以包括图形处理器The input unit 804 is used to receive audio or video signals. Input unit 804 may include a graphics processor
(Graphics Processing Unit,GPU)8041和麦克风8042,图形处理器8041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。处理后的图像帧可以显示在显示单元806上。经图形处理器8041处理后的图像帧可以存储在存储器809(或其它存储介质)中或者经由射频单元801或网络模块802进行发送。麦克风8042可以接收声音,并且能够将这样的声音处理为音频数据。处理后的音频数据可以在电话通话模式的情况下转换为可经由射频单元801发送到移动通信基站的格式输出。(Graphics Processing Unit, GPU) 8041 and microphone 8042, the graphics processor 8041 processes image data of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode. The processed image frames may be displayed on the display unit 806. The image frames processed by the graphics processor 8041 may be stored in the memory 809 (or other storage media) or sent via the radio frequency unit 801 or the network module 802. Microphone 8042 can receive sounds and can process such sounds into audio data. The processed audio data can be converted into a format that can be sent to a mobile communication base station via the radio frequency unit 801 for output in the case of a phone call mode.
电子设备800还包括至少一种传感器805,比如光传感器、运动传感器以及其他传感器。具体地,光传感器包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板8081的亮度,接近传感器可在电子设备800移动到耳边时,关闭显示面板8081和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别电子设备姿态(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;传感器805还可以包括指纹传感器、压力传感器、虹膜传感器、分子传感器、陀螺仪、气压计、湿度计、温度计、红外线传感器等,在此不再赘述。Electronic device 800 also includes at least one sensor 805, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel 8081 according to the brightness of the ambient light. The proximity sensor can close the display panel 8081 when the electronic device 800 moves to the ear. /or backlight. As a type of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). It can detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of electronic devices (such as horizontal and vertical screen switching, related games , magnetometer attitude calibration), vibration recognition related functions (such as pedometer, knock), etc.; the sensor 805 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, Infrared sensors, etc. will not be described in detail here.
显示单元806用于显示由用户输入的信息或提供给用户的信息。显示单元806可包括显示面板8081,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板8081。 The display unit 806 is used to display information input by the user or information provided to the user. The display unit 806 may include a display panel 8081, which may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (OLED), or the like.
用户输入单元807可用于接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入。具体地,用户输入单元807包括触控面板8081以及其他输入设备8072。触控面板8081,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板8081上或在触控面板8081附近的操作)。触控面板8081可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器810,接收处理器810发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板8081。除了触控面板8081,用户输入单元807还可以包括其他输入设备8072。具体地,其他输入设备8072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。The user input unit 807 may be used to receive input numeric or character information and generate key signal input related to user settings and function control of the electronic device. Specifically, the user input unit 807 includes a touch panel 8081 and other input devices 8072. The touch panel 8081, also known as a touch screen, can collect the user's touch operations on or near the touch panel 8081 (for example, the user uses a finger, stylus, or any suitable object or accessory on or near the touch panel 8081 operate). The touch panel 8081 may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact point coordinates, and then sends it to the touch controller. To the processor 810, receive the command sent by the processor 810 and execute it. In addition, touch panel 8081 can be implemented using various types such as resistive, capacitive, infrared and surface acoustic wave. In addition to the touch panel 8081, the user input unit 807 may also include other input devices 8072. Specifically, other input devices 8072 may include but are not limited to physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be described again here.
进一步的,触控面板8081可覆盖在显示面板8081上,当触控面板8081检测到在其上或附近的触摸操作后,传送给处理器810以确定触摸事件的类型,随后处理器810根据触摸事件的类型在显示面板8081上提供相应的视觉输出。虽然在图8中,触控面板8081与显示面板8081是作为两个独立的部件来实现电子设备的输入和输出功能,但是在某些实施例中,可以将触控面板8081与显示面板8081集成而实现电子设备的输入和输出功能,具体此处不做限定。Further, the touch panel 8081 can be covered on the display panel 8081. When the touch panel 8081 detects a touch operation on or near it, it is sent to the processor 810 to determine the type of touch event. Then the processor 810 determines the type of touch event according to the touch. The type of event provides corresponding visual output on display panel 8081. Although in Figure 8, the touch panel 8081 and the display panel 8081 are used as two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 8081 and the display panel 8081 can be integrated. The implementation of input and output functions of electronic equipment is not limited here.
接口单元808为外部装置与电子设备800连接的接口。例如,外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元808可以用于接收来自外部装置的输入(例如,数据信息、电力等等)并且将接收到的输入传输到电子设备800内的一个或多个元件或者可以用于在电子设备800和外部装置之间传输数据。The interface unit 808 is an interface for connecting external devices to the electronic device 800 . For example, external devices may include a wired or wireless headphone port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, audio input/output (I/O) port, video I/O port, headphone port, etc. The interface unit 808 may be used to receive input (eg, data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic device 800 or may be used to connect the electronic device 800 to the external device 800 . Transfer data between devices.
存储器809可用于存储软件程序以及各种数据。存储器809可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器809可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。Memory 809 can be used to store software programs as well as various data. The memory 809 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store data based on Data created by the use of mobile phones (such as audio data, phone books, etc.), etc. In addition, memory 809 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
处理器810是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器809内的软件程序和/或模块,以及调用存储在存储器809内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。处理器810可包括一个或多个处理单元;优选的,处理器810可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器810中。The processor 810 is the control center of the electronic device, using various interfaces and lines to connect various parts of the entire electronic device, by running or executing software programs and/or modules stored in the memory 809, and calling data stored in the memory 809 , perform various functions of the electronic device and process data, thereby overall monitoring the electronic device. The processor 810 may include one or more processing units; preferably, the processor 810 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc., and the modem processor The processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 810.
电子设备800还可以包括给各个部件供电的电源811(比如电池),优选的,电源811可以通过电源管理系统与处理器810逻辑相连,从而通过电源管理系统实现管理充 电、放电、以及功耗管理等功能。The electronic device 800 may also include a power supply 811 (such as a battery) that supplies power to various components. Preferably, the power supply 811 may be logically connected to the processor 810 through a power management system, thereby managing charging through the power management system. power, discharge, and power consumption management functions.
另外,电子设备800包括一些未示出的功能模块,在此不再赘述。In addition, the electronic device 800 includes some not-shown functional modules, which will not be described again here.
优选的,本申请实施例还提供一种电子设备,包括处理器810,存储器809,存储在存储器809上并可在所述处理器810上运行的计算机程序,该计算机程序被处理器810执行时实现上述数据增强方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Preferably, the embodiment of the present application also provides an electronic device, including a processor 810, a memory 809, and a computer program stored on the memory 809 and executable on the processor 810. When the computer program is executed by the processor 810 Each process of the above-mentioned data enhancement method embodiment can be implemented and can achieve the same technical effect. To avoid duplication, it will not be described again here.
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述数据增强方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。其中,所述的计算机可读存储介质,如只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等。Embodiments of the present application also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, each process of the above-mentioned data enhancement method embodiment is implemented, and the same technology can be achieved. The effect will not be described here to avoid repetition. Among them, the computer-readable storage medium is such as read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
本申请实施例还提供一种计算机程序,该计算机程序被处理器执行时实现上述数据增强方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Embodiments of the present application also provide a computer program that, when executed by a processor, implements each process of the above data enhancement method embodiment and can achieve the same technical effect. To avoid duplication, the details will not be described here.
本申请实施例还提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述数据增强方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Embodiments of the present application also provide a computer program product, including a computer program. When executed by a processor, the computer program implements each process of the above data enhancement method embodiment, and can achieve the same technical effect. To avoid duplication, this computer program will not be repeated here. Repeat.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this document, the terms "comprising", "comprises" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or device that includes a series of elements not only includes those elements, It also includes other elements not expressly listed or inherent in the process, method, article or apparatus. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article or apparatus that includes that element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of this application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。 The embodiments of the present application have been described above in conjunction with the accompanying drawings. However, the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Inspired by this application, many forms can be made without departing from the purpose of this application and the scope protected by the claims, all of which fall within the protection of this application.

Claims (15)

  1. 一种数据增强方法,包括:A data augmentation method that includes:
    获取原始图像以及背景图像集,一张原始图像对应一个背景图像集;Obtain the original image and background image set, one original image corresponds to one background image set;
    根据目标检测网络,对所述原始图像进行目标检测,获得所述原始图像的第一检测框;According to the target detection network, perform target detection on the original image to obtain the first detection frame of the original image;
    对所述第一检测框对应的区域以及所述原始图像对应的背景图像集中至少一张背景图像进行融合,得到所述原始图像的增强图像。The area corresponding to the first detection frame and at least one background image in the background image set corresponding to the original image are fused to obtain an enhanced image of the original image.
  2. 根据权利要求1所述的方法,其中,所述目标检测网络为M个;The method according to claim 1, wherein there are M target detection networks;
    所述根据目标检测网络,对所述原始图像进行目标检测,获得所述原始图像的第一检测框,包括:The method of performing target detection on the original image and obtaining the first detection frame of the original image according to the target detection network includes:
    根据M个所述目标检测网络,对所述原始图像进行目标检测,获得所述原始图像的M个目标检测框;Perform target detection on the original image according to M target detection networks, and obtain M target detection frames of the original image;
    根据所述原始图像的M个目标检测框确定所述原始图像的第一检测框。The first detection frame of the original image is determined according to the M target detection frames of the original image.
  3. 根据权利要求2所述的方法,其中,所述根据M个所述目标检测网络,对所述原始图像进行目标检测,获得所述原始图像的M个目标检测框,包括:The method according to claim 2, wherein performing target detection on the original image according to M target detection networks to obtain M target detection frames of the original image includes:
    将所述原始图像输入M个所述目标检测网络进行特征提取,得到所述原始图像的M张特征图;Input the original image into M target detection networks for feature extraction, and obtain M feature maps of the original image;
    对所述原始图像的M张特征图进行归一化处理,得到所述原始图像的M张热力图;Perform normalization processing on the M feature maps of the original image to obtain M heat maps of the original image;
    计算所述原始图像的M张热力图的目标检测框,作为所述原始图像的M个目标检测框。Calculate the target detection frames of the M heat maps of the original image as the M target detection frames of the original image.
  4. 根据权利要求3所述的方法,其中,所述计算所述原始图像的M张热力图的M个目标检测框,作为所述原始图像的M个目标检测框,包括:The method according to claim 3, wherein calculating the M target detection frames of the M heat maps of the original image as the M target detection frames of the original image includes:
    对于每张热力图,根据预设像素阈值对所述热力图进行二值化处理,得到所述热力图的二值化图像,并基于所述热力图的二值化图像计算所述热力图的目标检测框。For each heat map, the heat map is binarized according to a preset pixel threshold to obtain a binary image of the heat map, and the value of the heat map is calculated based on the binarized image of the heat map. Target detection frame.
  5. 根据权利要求2至4中任一项所述的方法,其中,所述确定所述原始图像的第一检测框,包括:The method according to any one of claims 2 to 4, wherein determining the first detection frame of the original image includes:
    对所述原始图像的M个所述目标检测框进行平均处理,得到所述原始图像的第一检测框。The M target detection frames of the original image are averaged to obtain the first detection frame of the original image.
  6. 根据权利要求1至5中任一项所述的方法,其中,所述获取所述背景图像集,包括:The method according to any one of claims 1 to 5, wherein said obtaining the background image set includes:
    根据所述原始图像的类别,确定与所述原始图像的类别匹配的至少一种类别;determining at least one category that matches the category of the original image according to the category of the original image;
    获取所述至少一种类别中每种类别对应的至少一张参考图像,以得到所述原始图像对应的参考图像集;Obtain at least one reference image corresponding to each category in the at least one category to obtain a reference image set corresponding to the original image;
    获取所述参考图像集的每张参考图像中的背景图像,以得到所述原始图像对应的背景图像集。Obtain the background image in each reference image of the reference image set to obtain the background image set corresponding to the original image.
  7. 根据权利要求6所述的方法,其中,所述根据所述原始图像的类别,确定与所述原始图像的类别匹配的至少一种类别之前,所述方法还包括:The method according to claim 6, wherein before determining at least one category matching the category of the original image according to the category of the original image, the method further includes:
    确定多个类别中每两个类别之间的相似度,所述多个类别包括所述原始图像的类别以及所述至少一种类别;determining a degree of similarity between each two categories in a plurality of categories, the plurality of categories including a category of the original image and the at least one category;
    其中,所述根据所述原始图像的类别,确定与所述原始图像的类别匹配的至少一种类别,包括:Wherein, determining at least one category that matches the category of the original image according to the category of the original image includes:
    根据所述原始图像的类别与所述多个类别中其余类别之间的相似度,从所述其余类别中确定与所述原始图像的类别匹配的至少一种类别,所述其余类别为所述多个类别中除所述原 始图像的类别之外的类别,所述至少一种类别中每种类别与所述原始图像的类别之间的相似度大于预设阈值。According to the similarity between the category of the original image and the remaining categories in the plurality of categories, at least one category matching the category of the original image is determined from the remaining categories, the remaining categories are the in multiple categories other than the original categories other than the category of the original image, and the similarity between each category in the at least one category and the category of the original image is greater than a preset threshold.
  8. 根据权利要求7所述的方法,其中,所述确定多个类别中每两个类别之间的相似度,包括:The method according to claim 7, wherein determining the similarity between each two categories in the plurality of categories includes:
    将所述多个类别输入语义模型进行语义分析,获得所述多个类别中每个类别的语义向量表示;Input the multiple categories into the semantic model for semantic analysis, and obtain the semantic vector representation of each category in the multiple categories;
    计算所述多个类别中每两个类别的语义向量表示之间的相似度。The similarity between the semantic vector representations of each two categories in the plurality of categories is calculated.
  9. 根据权利要求8所述的方法,其中,所述每两个类别的语义向量表示之间的相似度为所述每两个类别的语义向量表示之间的余弦相似度。The method according to claim 8, wherein the similarity between the semantic vector representations of every two categories is a cosine similarity between the semantic vector representations of every two categories.
  10. 根据权利要求1至9中任一项所述的方法,其中,所述第一检测框为矩形检测框。The method according to any one of claims 1 to 9, wherein the first detection frame is a rectangular detection frame.
  11. 一种数据增强装置,包括:A data enhancement device including:
    获取模块,用于获取原始图像以及背景图像集,一张原始图像对应一个背景图像集;The acquisition module is used to obtain the original image and the background image set. One original image corresponds to one background image set;
    目标检测模块,用于根据目标检测网络,对所述原始图像进行目标检测,获得所述原始图像的第一检测框;A target detection module, configured to perform target detection on the original image according to the target detection network, and obtain the first detection frame of the original image;
    融合模块,用于对所述第一检测框对应的区域以及所述原始图像对应的背景图像集中至少一张背景图像进行融合,得到所述原始图像的增强图像。A fusion module configured to fuse the area corresponding to the first detection frame and at least one background image in the background image set corresponding to the original image to obtain an enhanced image of the original image.
  12. 一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至10中任一项所述的数据增强方法中的步骤。An electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, any one of claims 1 to 10 is implemented. A step in the data augmentation method described in claim 1.
  13. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至10中任一项所述的数据增强方法中的步骤。A computer-readable storage medium having a computer program stored on the computer-readable storage medium. When the computer program is executed by a processor, the steps in the data enhancement method as claimed in any one of claims 1 to 10 are implemented. .
  14. 一种计算机程序产品,包括计算机可执行指令,当处理器执行所述计算机可执行指令时,实现如权利要求1至10中任一项所述的数据增强方法中的步骤。A computer program product includes computer-executable instructions. When a processor executes the computer-executable instructions, the steps in the data enhancement method according to any one of claims 1 to 10 are implemented.
  15. 一种计算机程序,当处理器执行所述计算机程序时,实现如权利要求1至10中任一项所述的数据增强方法中的步骤。 A computer program that, when executed by a processor, implements the steps in the data enhancement method as claimed in any one of claims 1 to 10.
PCT/CN2023/107709 2022-07-29 2023-07-17 Data enhancement method and apparatus, and electronic device WO2024022149A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210904226.9A CN117541770A (en) 2022-07-29 2022-07-29 Data enhancement method and device and electronic equipment
CN202210904226.9 2022-07-29

Publications (1)

Publication Number Publication Date
WO2024022149A1 true WO2024022149A1 (en) 2024-02-01

Family

ID=89705367

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/107709 WO2024022149A1 (en) 2022-07-29 2023-07-17 Data enhancement method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN117541770A (en)
WO (1) WO2024022149A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279346A1 (en) * 2018-03-07 2019-09-12 Adobe Inc. Image-blending via alignment or photometric adjustments computed by a neural network
WO2020038065A1 (en) * 2018-08-21 2020-02-27 中兴通讯股份有限公司 Image processing method, terminal, and computer storage medium
CN112258504A (en) * 2020-11-13 2021-01-22 腾讯科技(深圳)有限公司 Image detection method, device and computer readable storage medium
CN112348765A (en) * 2020-10-23 2021-02-09 深圳市优必选科技股份有限公司 Data enhancement method and device, computer readable storage medium and terminal equipment
CN112581522A (en) * 2020-11-30 2021-03-30 平安科技(深圳)有限公司 Method and device for detecting position of target object in image, electronic equipment and storage medium
CN113012054A (en) * 2019-12-20 2021-06-22 舜宇光学(浙江)研究院有限公司 Sample enhancement method and training method based on sectional drawing, system and electronic equipment thereof
CN113688957A (en) * 2021-10-26 2021-11-23 苏州浪潮智能科技有限公司 Target detection method, device, equipment and medium based on multi-model fusion

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279346A1 (en) * 2018-03-07 2019-09-12 Adobe Inc. Image-blending via alignment or photometric adjustments computed by a neural network
WO2020038065A1 (en) * 2018-08-21 2020-02-27 中兴通讯股份有限公司 Image processing method, terminal, and computer storage medium
CN113012054A (en) * 2019-12-20 2021-06-22 舜宇光学(浙江)研究院有限公司 Sample enhancement method and training method based on sectional drawing, system and electronic equipment thereof
CN112348765A (en) * 2020-10-23 2021-02-09 深圳市优必选科技股份有限公司 Data enhancement method and device, computer readable storage medium and terminal equipment
CN112258504A (en) * 2020-11-13 2021-01-22 腾讯科技(深圳)有限公司 Image detection method, device and computer readable storage medium
CN112581522A (en) * 2020-11-30 2021-03-30 平安科技(深圳)有限公司 Method and device for detecting position of target object in image, electronic equipment and storage medium
CN113688957A (en) * 2021-10-26 2021-11-23 苏州浪潮智能科技有限公司 Target detection method, device, equipment and medium based on multi-model fusion

Also Published As

Publication number Publication date
CN117541770A (en) 2024-02-09

Similar Documents

Publication Publication Date Title
WO2021135601A1 (en) Auxiliary photographing method and apparatus, terminal device, and storage medium
KR20210076110A (en) Methods for finding image regions, model training methods and related devices
WO2018228118A1 (en) Method and device for identifying similar pictures and storage medium
CN107766403B (en) Photo album processing method, mobile terminal and computer readable storage medium
WO2015003606A1 (en) Method and apparatus for recognizing pornographic image
WO2019233216A1 (en) Gesture recognition method, apparatus and device
WO2018133717A1 (en) Image thresholding method and device, and terminal
CN111177180A (en) Data query method and device and electronic equipment
CN111401463B (en) Method for outputting detection result, electronic equipment and medium
CN109063558A (en) A kind of image classification processing method, mobile terminal and computer readable storage medium
WO2017088434A1 (en) Human face model matrix training method and apparatus, and storage medium
CN108156374A (en) A kind of image processing method, terminal and readable storage medium storing program for executing
CN113822427A (en) Model training method, image matching device and storage medium
CN114722937A (en) Abnormal data detection method and device, electronic equipment and storage medium
CN110827217B (en) Image processing method, electronic device, and computer-readable storage medium
CN110083742B (en) Video query method and device
CN113269279B (en) Multimedia content classification method and related device
CN112995757B (en) Video clipping method and device
CN114399813A (en) Face shielding detection method, model training method and device and electronic equipment
WO2024022149A1 (en) Data enhancement method and apparatus, and electronic device
CN110674294A (en) Similarity determination method and electronic equipment
WO2023137923A1 (en) Person re-identification method and apparatus based on posture guidance, and device and storage medium
CN113836946B (en) Method, device, terminal and storage medium for training scoring model
CN111145083B (en) Image processing method, electronic equipment and computer readable storage medium
CN112464831B (en) Video classification method, training method of video classification model and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23845361

Country of ref document: EP

Kind code of ref document: A1