CN111340124A

CN111340124A - Method and device for identifying entity category in image

Info

Publication number: CN111340124A
Application number: CN202010139339.5A
Authority: CN
Inventors: 贾玉虎
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-03-03
Filing date: 2020-03-03
Publication date: 2020-06-26

Abstract

The present application provides a method and device for identifying entity categories in an image, wherein the method includes: extracting image features of a target image according to a preset algorithm, and determining at least one target area in the target image including the target entity according to the image features; extracting at least one target area The classification features of each target area in a target area, and fuse all the classification features to obtain the local classification features; extract the global classification features of the target image, and fuse the local classification features and global classification features to obtain the target classification features; determine the target according to the target classification features The entity class of the entity. As a result, the local features of the entity's existence are strengthened, and the entity category in the image can be accurately determined according to the combination of the global feature and the local feature, which solves the technical problem in the prior art that it is difficult to determine the specific category of the entity due to the similarity of the global features of the image.

Description

Method and device for identifying entity categories in images

技术领域technical field

本申请涉及图像处理技术领域，尤其涉及一种图像中实体类别的识别方法和装置。The present application relates to the technical field of image processing, and in particular, to a method and apparatus for identifying entity categories in an image.

背景技术Background technique

图像物体分类理解作为图像分类理解的一项基本而重要的任务，近年来引起了人们极大的研究兴趣，并同时在许多应用产品中成功部署，智能化地解决了许多实际问题。随着近年来深度学习技术的快速发展，深度学习也已成为图像物体分类中的最先进技术。即根据图像的全局深度特征来学习图像中的实体类别。As a basic and important task of image classification and understanding, image object classification and understanding has aroused great research interest in recent years, and has been successfully deployed in many application products, intelligently solving many practical problems. With the rapid development of deep learning technology in recent years, deep learning has also become the most advanced technology in image object classification. That is, learning entity categories in an image based on the global depth features of the image.

相关技术中，图像中的全局特征在很多场景下类似，根据全局特征来识别实体类别的难度较大，存在将不同类别的实体划分为一类，比如，识别食物的类别时，由于食物之间相似度高，为食物的类别确定增加了难度。In the related art, the global features in images are similar in many scenarios, and it is difficult to identify entity categories according to the global features. There are entities of different categories into one category. For example, when identifying the category of food, due to the The high similarity increases the difficulty in determining the category of food.

发明内容SUMMARY OF THE INVENTION

本申请旨在至少在一定程度上解决相关技术中的技术问题之一。The present application aims to solve one of the technical problems in the related art at least to a certain extent.

为此，本申请的第一个目的在于提出一种图像中实体类别的识别方法，以强化实体存在的局部特征，根据全局特征和局部特征的结合，可以准确确定图像中的实体类别，解决了现有技术中，图像全局特征相似难以确定实体具体类别的技术问题。To this end, the first purpose of this application is to propose a method for identifying entity categories in images, so as to strengthen the local features of the entity's existence. According to the combination of global features and local features, the entity categories in the image can be accurately determined. In the prior art, it is difficult to determine the specific category of the entity due to the similarity of the global features of the images.

本申请的第二个目的在于提出一种像中实体类别的识别装置。The second objective of the present application is to provide a device for identifying the type of entity in the image.

本申请的第三个目的在于提出一种终端设备。The third objective of the present application is to provide a terminal device.

本申请的第四个目的在于提出一种计算机可读存储介质。A fourth object of the present application is to propose a computer-readable storage medium.

为达到上述目的，本申请第一方面实施例提出的图像中实体类别的识别方法，所述方法包括：根据预设算法提取目标图像的图像特征，根据所述图像特征在所述目标图像中确定包含目标实体的至少一个目标区域；提取所述至少一个目标区域中每个目标区域的分类特征，并融合所有分类特征获取局部分类特征；提取所述目标图像的全局分类特征，并融合所述局部分类特征和所述全局分类特征获取目标分类特征；根据所述目标分类特征确定所述目标实体的实体类别。In order to achieve the above purpose, a method for identifying entity categories in an image proposed by an embodiment of the first aspect of the present application includes: extracting image features of a target image according to a preset algorithm, and determining in the target image according to the image features including at least one target area of the target entity; extracting the classification features of each target area in the at least one target area, and fusing all the classification features to obtain local classification features; extracting the global classification features of the target image, and fusing the local classification features The classification feature and the global classification feature obtain a target classification feature; and the entity category of the target entity is determined according to the target classification feature.

本申请第二方面实施例提出的图像中实体类别的识别装置，包括：第一确定模块，用于根据预设算法提取目标图像的图像特征，根据所述图像特征在所述目标图像中确定包含目标实体的至少一个目标区域；第一融合模块，用于提取所述至少一个目标区域中每个目标区域的分类特征，并融合所有分类特征获取局部分类特征；第二融合模块，用于提取所述目标图像的全局分类特征，并融合所述局部分类特征和所述全局分类特征获取目标分类特征；第二确定模块，用于根据所述目标分类特征确定所述目标实体的实体类别。The device for identifying entity categories in an image proposed by an embodiment of the second aspect of the present application includes: a first determination module, configured to extract image features of a target image according to a preset algorithm, and determine, according to the image features, the at least one target area of the target entity; the first fusion module is used to extract the classification features of each target area in the at least one target area, and fuse all classification features to obtain local classification features; the second fusion module is used to extract all the classification features. The global classification feature of the target image is obtained, and the local classification feature and the global classification feature are combined to obtain the target classification feature; the second determination module is used for determining the entity category of the target entity according to the target classification feature.

本申请第三方面实施例提出的终端设备，包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现本申请第一方面实施例所述的图像中实体类别的识别方法。The terminal device proposed by the embodiment of the third aspect of the present application includes: a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the present invention when the processor executes the computer program. The method for identifying entity categories in images described in the embodiments of the first aspect of the application is provided.

本申请第四方面实施例提出的计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现本申请第一方面实施例所述的图像中实体类别的识别方法。The computer-readable storage medium provided by the embodiment of the fourth aspect of the present application stores a computer program thereon, and when the computer program is executed by the processor, implements the method for identifying the entity category in the image according to the embodiment of the first aspect of the present application.

本申请提供的技术方案，至少包括如下技术效果：The technical solution provided by this application at least includes the following technical effects:

根据预设算法提取目标图像的图像特征，根据图像特征在目标图像中确定包含目标实体的至少一个目标区域，提取至少一个目标区域中每个目标区域的分类特征，并融合所有分类特征获取局部分类特征，进而，提取目标图像的全局分类特征，并融合局部分类特征和全局分类特征获取目标分类特征，最后，根据目标分类特征确定目标实体的实体类别。由此，根据所述目标分类特征确定所述目标实体的实体类别。Extract the image features of the target image according to the preset algorithm, determine at least one target region containing the target entity in the target image according to the image features, extract the classification features of each target region in the at least one target region, and fuse all the classification features to obtain local classification Then, the global classification feature of the target image is extracted, and the local classification feature and the global classification feature are combined to obtain the target classification feature. Finally, the entity category of the target entity is determined according to the target classification feature. Thus, the entity category of the target entity is determined according to the target classification feature.

本申请附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本申请的实践了解到。Additional aspects and advantages of the present application will be set forth, in part, in the following description, and in part will be apparent from the following description, or learned by practice of the present application.

附图说明Description of drawings

本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

图1为本申请实施例所提供的一种实体类别的识别方法的流程示意图；FIG. 1 is a schematic flowchart of a method for identifying an entity type provided by an embodiment of the present application;

图2为本申请实施例所提供的一种实体类别的识别应用场景示意图；FIG. 2 is a schematic diagram of an application scenario for identifying an entity type provided by an embodiment of the present application;

图3是根据本申请一个实施例的P-net算法的架构示意图；3 is a schematic diagram of the architecture of the P-net algorithm according to an embodiment of the present application;

图4是根据本申请一个实施例的P-net算法训练流程示意图；4 is a schematic diagram of a P-net algorithm training process according to an embodiment of the present application;

图5是根据本申请一个实施例的R-Net算法的架构图；5 is an architectural diagram of an R-Net algorithm according to an embodiment of the present application;

图6是根据本申请一个具体实施例的目标区域获取方法的流程示意图；6 is a schematic flowchart of a method for acquiring a target area according to a specific embodiment of the present application;

图7是根据本申请一个实施例的R-Net算法的训练流程示意图；7 is a schematic diagram of a training flow of the R-Net algorithm according to an embodiment of the present application;

图8是根据本申请一个实施例的细分类模型的架构示意图；FIG. 8 is a schematic diagram of the architecture of a subdivision model according to an embodiment of the present application;

图9是根据本申请一个实施例的细分类模型的训练示意图；FIG. 9 is a schematic diagram of training of a subdivision model according to an embodiment of the present application;

图10是根据本申请另一个实施例的实体类别的识别方法的流程示意图；10 is a schematic flowchart of a method for identifying an entity category according to another embodiment of the present application;

图11是根据本申请一个实施例的分类器的训练流程示意图；11 is a schematic diagram of a training process of a classifier according to an embodiment of the present application;

图12是根据本申请一个实施例的实体类别的识别装置的结构示意图；以及FIG. 12 is a schematic structural diagram of a device for identifying an entity category according to an embodiment of the present application; and

图13是根据本申请一个实施例的终端设备的结构示意图。FIG. 13 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

具体实施方式Detailed ways

下面详细描述本申请的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本申请，而不能理解为对本申请的限制。The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to be used to explain the present application, but should not be construed as a limitation to the present application.

下面参考附图描述本申请实施例的实体类别的识别方法和装置。The method and apparatus for identifying entity categories according to the embodiments of the present application are described below with reference to the accompanying drawings.

其中，本申请实施例中的实体可以为食物、体育用品首饰等，本申请所指的类别的识别，可以理解为对实体下的子类别的识别。The entities in the embodiments of the present application may be food, sporting goods, jewelry, etc., and the identification of categories referred to in the present application may be understood as identification of subcategories under the entity.

图1为本申请实施例所提供的一种实体类别的识别方法的流程示意图。FIG. 1 is a schematic flowchart of a method for identifying an entity type according to an embodiment of the present application.

针对上述背景中提到的实体类别难以区分这一问题，本申请实施例提供了实体类别的识别方法，以实现对实体类别的准确识别，如图1所示，该方法包括以下步骤：Aiming at the problem of difficulty in distinguishing entity categories mentioned in the above background, an embodiment of the present application provides a method for identifying entity categories to achieve accurate identification of entity categories. As shown in FIG. 1 , the method includes the following steps:

步骤101，根据预设算法提取目标图像的图像特征，根据图像特征在目标图像中确定包含目标实体的至少一个目标区域。Step 101 , extract image features of the target image according to a preset algorithm, and determine at least one target area including the target entity in the target image according to the image features.

其中，目标图像可以是用户主动上传的，也可以是在相册中自动确定的当前拍摄的图像Among them, the target image can be uploaded by the user actively, or it can be the currently captured image automatically determined in the album

本实施例中，根据预设算法提取目标图像的图像特征，根据该图像特征在目标图像中确定包含目标实体的至少一个目标区域，即框选出目标图像中实体存在的区域，举例而言，如图2所示，可以根据图像特征确定出至少一个目标区域，每个目标区域中都包含对应的实体。In this embodiment, the image feature of the target image is extracted according to a preset algorithm, and at least one target area including the target entity is determined in the target image according to the image feature, that is, the area where the entity exists in the target image is framed. For example, As shown in FIG. 2 , at least one target area can be determined according to the image features, and each target area contains a corresponding entity.

需要说明的是，在不同的应用场景下，获取至少一个目标区域的方式不同，示例说明如下：It should be noted that, in different application scenarios, the methods of obtaining at least one target area are different. Examples are as follows:

示例一：Example one:

在本示例中，根据预设的第一卷积算法，获取目标图像的第一一维特征图，即提取图像的一维特征图，以更高效的确定图像中的实体，进而，提取第一一维特征图的图像特征，确定目标实体所在的至少一个候选区域，将至少一个候选区域确定为至少一个目标区域。In this example, according to the preset first convolution algorithm, the first one-dimensional feature map of the target image is obtained, that is, the one-dimensional feature map of the extracted image is extracted, so as to more efficiently determine the entities in the image, and then the first one-dimensional feature map is extracted. The image features of the one-dimensional feature map determine at least one candidate region where the target entity is located, and determine the at least one candidate region as at least one target region.

其中，上述第一一维特征图的提取方式，可以为多种多样，作为一种示例，第一卷积算法为P-net算法，参照图3所示的P-net算法的架构，在实际执行过程中，根据下采样的方式将目标图像处理为12*12的大小，进而，将目标图像按照颜色通道处理为12*12*3的结构。Among them, the extraction methods of the first one-dimensional feature map can be various. As an example, the first convolution algorithm is the P-net algorithm. Referring to the architecture of the P-net algorithm shown in FIG. 3, in practice During the execution process, the target image is processed into a size of 12*12 according to the down-sampling method, and further, the target image is processed into a structure of 12*12*3 according to the color channel.

首先，将12*12*3的目标图像输入到第一层卷积层，通过第一层卷积层的10个3*3*3的卷积核，2*2的Max Pooling(stride＝2)操作，生成10个5*5的特征图，进而，将该特征图输入到第二层卷积层，通过第二层卷积层的16个3*3*10的卷积核，生成16个3*3的特征图，将该特征图输入到第三层卷积层，通过该第三层卷积层的32个3*3*16的卷积核，生成32个1*1的特征图，针对32个1*1的特征图，可以通过2个1*1*32的卷积核，生成2个第一一维1*1的特征图用于分类，在本示例中，可以通过7个1*1*32的卷积核，生成9个1*1的特征图用于回归框判断，以优化该算法的结构。First, input the target image of 12*12*3 into the first convolutional layer, pass 10 3*3*3 convolution kernels of the first convolutional layer, 2*2 Max Pooling (stride=2 ) operation to generate 10 feature maps of 5*5, and then input the feature map to the second convolutional layer, and generate 16 convolution kernels of 3*3*10 through the second convolutional layer. A 3*3 feature map, input the feature map to the third convolutional layer, and generate 32 1*1 features through 32 3*3*16 convolution kernels of the third convolutional layer Figure, for 32 1*1 feature maps, two 1*1*32 convolution kernels can be used to generate two first one-dimensional 1*1 feature maps for classification. In this example, you can pass Seven 1*1*32 convolution kernels are used to generate nine 1*1 feature maps for regression box judgment to optimize the structure of the algorithm.

当然，为了提高P-net算法的功效，如图4所示，预先训练P-net算法的架构，将大量样本图像进行下采样处理后，训练该P-net算法，根据得到的候选区域和实际标注区域的差异计算损失函数，根据损失函数来优化P-net算法的架构。Of course, in order to improve the efficacy of the P-net algorithm, as shown in Figure 4, the architecture of the P-net algorithm is pre-trained, and the P-net algorithm is trained after downsampling a large number of sample images. The difference of the marked area calculates the loss function, and optimizes the architecture of the P-net algorithm according to the loss function.

考虑到P-net算法仅仅可以大致确定出对应的实体所在的区域，为了进一步确定实体所在的区域，可以进一步在上述候选区域的基础上精细化确定出目标区域。Considering that the P-net algorithm can only roughly determine the region where the corresponding entity is located, in order to further determine the region where the entity is located, the target region can be further refined on the basis of the above candidate regions.

即根据预设的第二卷积算法，提取至少一个候选区域中每个候选区域的第二一维特征图，进而，提取第一一维特征图的图像特征，根据第一一维特征图的图像特征收敛至少一个候选区域，将收敛后的至少一个候选区域确定为至少一个目标区域。这里的收敛可以理解为对错误的不存在实体的候选区域删除，或者对存在实体的候选区域的不包含实体的子区域的删除。That is, according to the preset second convolution algorithm, the second one-dimensional feature map of each candidate region in the at least one candidate region is extracted, and then the image features of the first one-dimensional feature map are extracted. The image feature converges at least one candidate region, and the converged at least one candidate region is determined as at least one target region. Convergence here can be understood as the deletion of the wrong candidate region of non-existent entities, or the deletion of sub-regions of the candidate region of existing entities that do not contain entities.

在一些可能的示例中，当第二卷积算法是R-Net算法时，参照图5所示的R-Net算法的架构图，结合图3所示的架构图，将上述P-net算法得到的候选区域处理为24*24大小的图像，将该图像输入到R-Net算法的第一卷积层，通过28个3*3*3的卷积核和3*3(stride＝2)的max pooling后生成28个11*11的特征图，将该特征图输入到第二卷积层，通过第二卷积层的48个3*3*28的卷积核和3*3(stride＝2)的max pooling后生成48个4*4的特征图，将该特征图输入到该算法的第三卷积层，通过第三卷积层的64个2*2*48的卷积核后，生成64个3*3的特征图，把3*3*64的特征图转换为128大小的全连接层后，接着转换对回归框分类问题转的全连接层，该全连接层是对bounding box的位置回归问题的全连接层，根据全连接层的输出确定出候选区域中的目标区域。In some possible examples, when the second convolution algorithm is the R-Net algorithm, referring to the architecture diagram of the R-Net algorithm shown in FIG. 5 and combining the architecture diagram shown in FIG. 3, the above P-net algorithm is obtained. The candidate area of is processed as an image of 24*24 size, and the image is input into the first convolutional layer of the R-Net algorithm, through 28 3*3*3 convolution After max pooling, 28 feature maps of 11*11 are generated, and the feature map is input to the second convolutional layer, through 48 3*3*28 convolution kernels of the second convolutional layer and 3*3 (stride= 2) After the max pooling, 48 4*4 feature maps are generated, and the feature map is input into the third convolutional layer of the algorithm, and after passing through the 64 2*2*48 convolution kernels of the third convolutional layer , generate 64 3*3 feature maps, convert the 3*3*64 feature map into a 128-size fully connected layer, and then convert the fully connected layer for the regression box classification problem. The fully connected layer is for bounding The fully connected layer of the box position regression problem determines the target area in the candidate area according to the output of the fully connected layer.

也即是说，当目标实体为美食时，如图6所示，将目标图像输入到先通过PNet算法，得到大量大致可能为美食的候选区域，再通过Rnet收敛，进一步过滤掉不合格的候选区域。在过滤的同时，得到更精细紧致的目标区域(因为PNet得到的区域可能太大或错误不包含美食)。R-Net以P-Net预测的候选区域(Bounding box，bbox)信息作为基础，对原始目标图像进行裁剪，将候选区域调整尺寸到固定尺，对候选区域进行非极大值抑制算法后(NonMaximum Suppression，NMS)筛选确定目标区域。That is to say, when the target entity is food, as shown in Figure 6, the target image is input to the PNet algorithm to obtain a large number of candidate regions that may be food, and then converged by Rnet to further filter out unqualified candidates. area. At the same time of filtering, a more refined and compact target area is obtained (because the area obtained by PNet may be too large or incorrectly not contain food). Based on the candidate region (Bounding box, bbox) information predicted by P-Net, R-Net crops the original target image, adjusts the size of the candidate region to a fixed scale, and performs a non-maximum suppression algorithm on the candidate region (NonMaximum Suppression, NMS) screening to identify target regions.

同样的，参照图7，需要预先训练R-Net算法，将标注好实体所在目标区域的图像下采样到固定尺寸后，输入到R-Net算法，根据R-Net算法得到的实体所在的区域和预先标注的区域比对，确定损失函数，根据损失函数调整R-Net算法的架构。Similarly, referring to Figure 7, it is necessary to pre-train the R-Net algorithm, downsample the image of the target area where the entity is located to a fixed size, and then input it into the R-Net algorithm. According to the R-Net algorithm, the area where the entity is located and The pre-labeled regions are compared, the loss function is determined, and the architecture of the R-Net algorithm is adjusted according to the loss function.

示例二：Example two:

在本示例中，预先设置目标实体的通用图像特征，确定包含该通用图像特征的联通区域为目标区域。In this example, the general image feature of the target entity is preset, and the connected area including the general image feature is determined as the target area.

步骤102，提取至少一个目标区域中每个目标区域的分类特征，并融合所有分类特征获取局部分类特征。Step 102: Extract the classification features of each target area in the at least one target area, and fuse all the classification features to obtain local classification features.

具体而言，提取至少一个目标区域中每个目标区域的分类特征，并融合所有分类特征获取局部分类特征，即重点考量实体所在的局部分类特征来确定实体类别。Specifically, the classification features of each target region in at least one target region are extracted, and all the classification features are fused to obtain local classification features, that is, the entity category is determined by focusing on the local classification features where the entity is located.

作为一种可能的实现方式，预先设置细分类模型，该细分类模型可以提取每个目标区域中每个目标区域的分类特征。As a possible implementation manner, a subdivision classification model is preset, and the subdivision classification model can extract classification features of each target region in each target region.

在本申请的一个实施例中，会对筛选出的目标区域进行排序，排序的目的是，保证顺序的一致性，如只有三个目标区域，分别是4x4，3x3,2x2大小，他们经过细分类模型提取后的体征向量长度都是一样的，如1x1028，但是每组向量实际代表的含义不同。因此，比如在训练时用的是100个随机框且从大到小排序训练，则在训练完成后，对目标图像进行处理时，也应该遵循排序规则，100个随机框和从大到小顺序，接着将目标区域依次输入细分类网络，得到每个目标区域对应的候选分类特征，如有n个目标区域对应的向量为i,i∈{i₁,...,i_n}输入细分类模型后可得n个特征向量v,v∈{v₁,...,v_n}。In an embodiment of the present application, the screened target areas are sorted. The purpose of sorting is to ensure the consistency of the order. For example, there are only three target areas, which are 4x4, 3x3, and 2x2 in size, and they are subdivided. The length of the sign vector extracted by the model is the same, such as 1x1028, but the actual meaning of each set of vectors is different. Therefore, for example, if 100 random boxes are used for training and the training is sorted from large to small, after the training is completed, when processing the target image, the sorting rules should also be followed, 100 random boxes and the order from large to small. , and then input the target regions into the sub-classification network in turn to obtain the candidate classification features corresponding to each target region. If there are _n target regions corresponding to the vector i, i∈{i ₁ ,...,in } input sub-classification After the model, n eigenvectors v, v∈{v ₁ ,...,v _n } can be obtained.

也即是，确定每个目标区域的尺寸大小，根据尺寸大小对至少一个目标区域排序，按照排序结果将至少一个目标区域依次输入预先训练的细分类模型，确定每个目标区域的分类特征，以保证分类特征的获取准确性，保证模型可以越来越优化，每提取一次分类特征后根据回归机制对模型优化保证后续的输入得到的分类特征越来越准确。That is, determine the size of each target area, sort at least one target area according to the size, input at least one target area into the pre-trained sub-classification model in turn according to the sorting result, and determine the classification features of each target area to To ensure the accuracy of the classification feature acquisition, to ensure that the model can be more and more optimized, after each classification feature is extracted, the model is optimized according to the regression mechanism to ensure that the classification features obtained by the subsequent input are more and more accurate.

在本申请的一个实施例中，还可以确定每个目标区域的包含目标实体的置信度，比如，根据与目标实体的特征吻合度确定置信度，提取每个目标区域的候选分类特征后，根据每个目标区域的候选分类特征和置信度确定对应的分类特征，比如，确定置信度对应的权重值，根据权重值和对应的分类特征的乘积确定分类特征。In an embodiment of the present application, the confidence level of each target area including the target entity can also be determined. For example, the confidence level is determined according to the degree of coincidence with the feature of the target entity. After extracting the candidate classification features of each target area, according to The candidate classification feature and the confidence level of each target area determine the corresponding classification feature, for example, determine the weight value corresponding to the confidence level, and determine the classification feature according to the product of the weight value and the corresponding classification feature.

步骤103，提取目标图像的全局分类特征，并融合局部分类特征和全局分类特征获取目标分类特征。Step 103 , extract the global classification feature of the target image, and obtain the target classification feature by fusing the local classification feature and the global classification feature.

具体的，考量图像的全局特征，提取目标图像的全局分类特征，并融合局部分类特征和全局分类特征获取目标分类特征。在本申请的一个实施例中，可以将全局分类特征和局部分类特征处理为一维特征后拼接为目标分类特征，以降低运算量。Specifically, considering the global features of the image, the global classification features of the target image are extracted, and the local classification features and the global classification features are fused to obtain the target classification features. In an embodiment of the present application, global classification features and local classification features may be processed into one-dimensional features and then spliced into target classification features to reduce the amount of computation.

在本申请的一个实施例中，参照图8，对于细分类模型，采取的是基于自监督训练的来的细分类模型。该架构分为两路分支。值得注意的是，上下两路分支中的特征提取器共享一个网络参数和结构。网络选取依据任务需求，如服务器上可选取ResNet，手机等设备上选取Mobilenet等。一般采取在Imagnet等大数据集下预训练得到的模型。图像中上支路就是一个传统前向网络，将目标图像一整张输入特征提取器，得到全局分类特征，再送入特征融合网络。图8中的引导网络解决的是，选取局部区域的候选框，解决了人为不知道目标图像中何处局部才是最优的问题，其中，参照图9，引导网络需要训练得到，其中教师网络目的是起监督引导作用，通过比较引导网络得到的各个目标区域的信息置信度，来监督特征的提取，教师网络根据损失的计算来调整引导网络的网路参数，将目标区域输入引导网络后，即图8中的下支，可以获取每个目标区域的分类特征(图中以目标区域为3个为例)，将从各个目标区域提取的一维特征拼接，形成融合后的一维向量。采用一维向量的原因是，一维向量的计算量小，处理效率高，同时相对更高维特征向量，得到的结果相似。In an embodiment of the present application, referring to FIG. 8 , for the sub-classification model, a sub-classification model based on self-supervised training is adopted. The architecture is divided into two branches. It is worth noting that the feature extractors in the upper and lower branches share a network parameter and structure. The network selection is based on task requirements. For example, ResNet can be selected on the server, and Mobilenet can be selected on devices such as mobile phones. Generally, models pre-trained under large data sets such as Imagnet are used. The upper branch in the image is a traditional forward network. The entire target image is input into the feature extractor to obtain the global classification features, and then sent to the feature fusion network. The guiding network in Figure 8 solves the problem of selecting the candidate frame of the local area, which solves the problem of artificially not knowing which part in the target image is the best. Among them, referring to Figure 9, the guiding network needs to be trained, and the teacher network The purpose is to supervise and guide the feature extraction by comparing the information confidence of each target area obtained by the guiding network. The teacher network adjusts the network parameters of the guiding network according to the loss calculation. After inputting the target area into the guiding network, That is, the lower branch in Figure 8 can obtain the classification features of each target area (in the figure, there are three target areas as an example), and splicing the one-dimensional features extracted from each target area to form a fused one-dimensional vector. The reason for using a one-dimensional vector is that the one-dimensional vector has a small amount of calculation and high processing efficiency, and at the same time, the results obtained are similar to those of a higher-dimensional feature vector.

步骤104，根据目标分类特征确定目标实体的实体类别。Step 104: Determine the entity category of the target entity according to the target classification feature.

具体的，目标分类特征不但体现了局部特征还包含了全局特征，根据目标分类特征确定目标实体的实体类别，比如确定食物的具体子类别等。Specifically, the target classification feature not only reflects local features but also includes global features, and the entity category of the target entity is determined according to the target classification feature, for example, specific subcategories of food are determined.

进一步的，在确定出实体类别后，可以按照类别对相册进行划分，当实体类别为多个时，可以在文件家中添加多个实体类别作为图像的描述，以是的用户无需观看图像即可获知图像类型。Further, after the entity category is determined, the albums can be divided according to the category. When there are multiple entity categories, multiple entity categories can be added as image descriptions in the file home, so that users can know without viewing the images. Image type.

在本申请的一个实施例中，在确定出实体类别后，根据实体类别匹配对应的美颜参数，对目标图像中不同的实体进行合适的美颜处理，大大提高了美颜体验。In an embodiment of the present application, after the entity category is determined, the corresponding beautification parameters are matched according to the entity category, and appropriate beautification processing is performed on different entities in the target image, which greatly improves the beautification experience.

参照图10，在细分类模型得到融合后的目标分类特征后，可以将目标分类特征输入到预先训练好的分类器，该分类器可以是非线性分类器，如非线性SVM。非线性分类器能有效拓展分类维度，降低softmax在非线性分类上的缺陷。对于分类器的训练，如下图11所示，通过对训练样本的不停迭代，直至优化函数的值达到最优，分类器收敛。分类器能有效拓展分类维度，以SVM为例，SVM是将特征投影到高维空间中，再对特征进行非线性区分。对于Softmax、全连接层等线性分类器，只对低维度线性分类有较好效果。使用此方案，能降低Softmax在非线性分类上的缺陷。Referring to FIG. 10 , after the subdivided classification model obtains the fused target classification features, the target classification features can be input into a pre-trained classifier, and the classifier can be a nonlinear classifier, such as a nonlinear SVM. The nonlinear classifier can effectively expand the classification dimension and reduce the defects of softmax in nonlinear classification. For the training of the classifier, as shown in Figure 11 below, through the continuous iteration of the training samples, until the value of the optimization function reaches the optimal value, the classifier converges. The classifier can effectively expand the classification dimension. Taking SVM as an example, SVM projects the features into a high-dimensional space, and then non-linearly distinguishes the features. For linear classifiers such as Softmax and fully connected layers, only low-dimensional linear classification has a good effect. Using this scheme, the defect of Softmax in nonlinear classification can be reduced.

由此，本申请实施例的方法，解决了难以人为确定局部特征位置的问题。同时，保持了全局深度特征的提取。通过全局和局部特征相结合，解决实体分类这一痛点，能够准确识别图像中不同种类的实体，在用户拍照时能对不同实体做定向美化处理，而食物等实体同时又是拍照功能中的大需求点，因此，应用前景较大。Therefore, the method of the embodiment of the present application solves the problem that it is difficult to manually determine the location of the local feature. At the same time, the extraction of global deep features is maintained. Through the combination of global and local features, the pain point of entity classification can be solved, and different types of entities in the image can be accurately identified, and different entities can be directional beautified when the user takes pictures, and entities such as food are also a large part of the camera function. Therefore, the application prospect is relatively large.

综上，本申请实施例的图像中实体类别的识别方法，根据预设算法提取目标图像的图像特征，根据图像特征在目标图像中确定包含目标实体的至少一个目标区域，提取至少一个目标区域中每个目标区域的分类特征，并融合所有分类特征获取局部分类特征，进而，提取目标图像的全局分类特征，并融合局部分类特征和全局分类特征获取目标分类特征，最后，根据目标分类特征确定目标实体的实体类别。由此，根据目标分类特征确定目标实体的实体类别。To sum up, the method for identifying entity categories in an image according to the embodiment of the present application extracts image features of the target image according to a preset algorithm, determines at least one target area containing the target entity in the target image according to the image features, and extracts at least one target area in the target image. The classification features of each target area, and fuse all the classification features to obtain local classification features, and then extract the global classification features of the target image, and fuse the local classification features and global classification features to obtain the target classification features, and finally, determine the target according to the target classification features. The entity class of the entity. Thus, the entity category of the target entity is determined according to the target classification feature.

为了实现上述实施例，本申请还提出一种图像中实体类别的识别装置。In order to realize the above-mentioned embodiments, the present application also proposes a device for identifying entity categories in an image.

图12为本申请实施例提供的一种图像中实体类别的识别装置的结构示意图。FIG. 12 is a schematic structural diagram of an apparatus for identifying entity categories in an image according to an embodiment of the present application.

如图12所示，该图像中实体类别的识别装置包括：第一确定模块10、第一融合模块20、第二融合模块30和第二确定模块40。As shown in FIG. 12 , the apparatus for identifying the entity category in the image includes: a first determination module 10 , a first fusion module 20 , a second fusion module 30 and a second determination module 40 .

其中，第一确定模块10，用于根据预设算法提取目标图像的图像特征，根据图像特征在目标图像中确定包含目标实体的至少一个目标区域；Wherein, the first determination module 10 is configured to extract the image features of the target image according to a preset algorithm, and determine at least one target area including the target entity in the target image according to the image features;

第一融合模块20，用于提取至少一个目标区域中每个目标区域的分类特征，并融合所有分类特征获取局部分类特征；The first fusion module 20 is used to extract the classification features of each target area in at least one target area, and fuse all the classification features to obtain local classification features;

第二融合模块30，用于提取目标图像的全局分类特征，并融合局部分类特征和全局分类特征获取目标分类特征；The second fusion module 30 is used to extract the global classification feature of the target image, and obtain the target classification feature by fusing the local classification feature and the global classification feature;

第二确定模块40，用于根据目标分类特征确定目标实体的实体类别。The second determining module 40 is configured to determine the entity category of the target entity according to the target classification feature.

进一步地，在本申请实施例的一种可能的实现方式中，第一确定模块10，具体用于：Further, in a possible implementation manner of the embodiment of the present application, the first determination module 10 is specifically used for:

根据预设的第一卷积算法，获取目标图像的第一一维特征图；According to the preset first convolution algorithm, obtain the first one-dimensional feature map of the target image;

提取第一一维特征图的图像特征，确定目标实体所在的至少一个候选区域；Extracting the image features of the first one-dimensional feature map, and determining at least one candidate region where the target entity is located;

将至少一个候选区域确定为至少一个目标区域。At least one candidate area is determined as at least one target area.

在本实施例中，第一确定模块10，具体用于：根据预设的第二卷积算法，提取至少一个候选区域中每个候选区域的第二一维特征图；In this embodiment, the first determination module 10 is specifically configured to: extract a second one-dimensional feature map of each candidate region in the at least one candidate region according to a preset second convolution algorithm;

提取第一一维特征图的图像特征，根据第一一维特征图的图像特征收敛至少一个候选区域；Extracting the image features of the first one-dimensional feature map, and converging at least one candidate region according to the image features of the first one-dimensional feature map;

将收敛后的至少一个候选区域确定为至少一个目标区域。The converged at least one candidate region is determined as at least one target region.

需要说明的是，前述对图像中实体类别的识别方法实施例的解释说明也适用于该实施例的图像中实体类别的识别装置，此处不再赘述。It should be noted that, the foregoing explanations on the embodiment of the method for recognizing entity types in images are also applicable to the device for recognizing entity types in images in this embodiment, which will not be repeated here.

综上，本申请实施例的图像中实体类别的识别装置，根据预设算法提取目标图像的图像特征，根据图像特征在目标图像中确定包含目标实体的至少一个目标区域，提取至少一个目标区域中每个目标区域的分类特征，并融合所有分类特征获取局部分类特征，进而，提取目标图像的全局分类特征，并融合局部分类特征和全局分类特征获取目标分类特征，最后，根据目标分类特征确定目标实体的实体类别。由此，根据目标分类特征确定目标实体的实体类别。To sum up, the device for recognizing the entity category in the image according to the embodiment of the present application extracts the image features of the target image according to the preset algorithm, determines at least one target area including the target entity in the target image according to the image features, and extracts at least one target area in the target image. The classification features of each target area, and fuse all the classification features to obtain local classification features, and then extract the global classification features of the target image, and fuse the local classification features and global classification features to obtain the target classification features, and finally, determine the target according to the target classification features. The entity class of the entity. Thus, the entity category of the target entity is determined according to the target classification feature.

为了实现上述实施例，本申请还提出了一种终端设备。In order to realize the above embodiments, the present application also proposes a terminal device.

图13是根据本申请一个实施例的终端设备的结构示意图。如图13所示，该终端设备1300可以包括：存储器1302、处理器1304及存储在存储器1303上并可在处理器1304上运行的计算机程序1306，处理器1304执行计算机程序406时实现本申请上述任一个实施例所述的图像中实体类别的识别方法。FIG. 13 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in FIG. 13 , the terminal device 1300 may include: a memory 1302 , a processor 1304 , and a computer program 1306 stored in the memory 1303 and running on the processor 1304 . When the processor 1304 executes the computer program 406 , the above-mentioned application is implemented. The method for identifying entity categories in an image according to any one of the embodiments.

为了实现上述实施例，本申请还提出了一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现本申请上述任一个实施例所述的图像中实体类别的识别方法。In order to implement the above-mentioned embodiments, the present application further provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the entity category in the image described in any one of the above-mentioned embodiments of the present application method of identification.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In the description of the present application, "plurality" means at least two, such as two, three, etc., unless expressly and specifically defined otherwise.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本申请的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of the process , and the scope of the preferred embodiments of the present application includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application belong.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.

应当理解，本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如，如果用硬件来实现和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of this application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one of the following techniques known in the art, or a combination thereof: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。Those skilled in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. When executed, one or a combination of the steps of the method embodiment is included.

此外，在本申请各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器，磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本申请的限制，本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present application have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limitations to the present application. Embodiments are subject to variations, modifications, substitutions and variations.

Claims

1. A method for identifying a category of an entity in an image, the method comprising:

extracting image features of a target image according to a preset algorithm, and determining at least one target area containing a target entity in the target image according to the image features;

extracting the classification characteristic of each target area in the at least one target area, and fusing all the classification characteristics to obtain a local classification characteristic;

extracting global classification features of the target image, and fusing the local classification features and the global classification features to obtain target classification features;

and determining the entity class of the target entity according to the target classification characteristic.

2. The method of claim 1, wherein the extracting image features of the target image according to a preset algorithm, and determining at least one target region containing a target entity in the target image according to the image features comprises:

acquiring a first one-dimensional feature map of the target image according to a preset first convolution algorithm;

extracting image features of the first one-dimensional feature map, and determining at least one candidate region where the target entity is located;

determining the at least one candidate region as the at least one target region.

3. The method of claim 2, wherein said determining the at least one candidate region as the at least one target region comprises:

extracting a second one-dimensional feature map of each candidate region in the at least one candidate region according to a preset second convolution algorithm;

extracting image features of the first one-dimensional feature map, and converging the at least one candidate region according to the image features of the first one-dimensional feature map;

determining the at least one candidate region after convergence as the at least one target region.

4. The method of claim 1, wherein said extracting classification features for each of said at least one target region comprises:

determining a confidence level of each target region containing the target entity;

extracting candidate classification features of each target region;

and determining corresponding classification features according to the candidate classification features and the confidence degrees of each target region.

5. The method of claim 1, wherein said extracting classification features for each of said at least one target region comprises:

determining the size of each target area;

sorting the at least one target area according to the size;

and sequentially inputting the at least one target area into a pre-trained fine classification model according to the sequencing result, and determining the classification characteristic of each target area.

6. An apparatus for identifying a category of an entity in an image, comprising:

the first determination module is used for extracting image characteristics of a target image according to a preset algorithm and determining at least one target area containing a target entity in the target image according to the image characteristics;

the first fusion module is used for extracting the classification characteristic of each target area in the at least one target area and fusing all the classification characteristics to obtain a local classification characteristic;

the second fusion module is used for extracting the global classification characteristic of the target image and fusing the local classification characteristic and the global classification characteristic to obtain a target classification characteristic;

and the second determining module is used for determining the entity class of the target entity according to the target classification characteristic.

7. The apparatus of claim 6, wherein the first determining module is specifically configured to:

8. The apparatus of claim 6, wherein the first determining module is specifically configured to:

9. A terminal device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing a method of identifying a category of an entity in an image according to any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of identifying a category of an entity in an image according to any one of claims 1 to 5.