CN113902944A

CN113902944A - Model training and scene recognition method, device, equipment and medium

Info

Publication number: CN113902944A
Application number: CN202111159087.3A
Authority: CN
Inventors: 常河河; 查林; 白晓楠
Original assignee: Qingdao Xinxin Microelectronics Technology Co Ltd
Current assignee: Qingdao Xinxin Microelectronics Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-01-07

Abstract

The application discloses a model training and scene recognition method, device, equipment and medium. The original scene recognition model can be trained based on the scene probability vector, the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category, so that the trained scene recognition model can be drawn close to the class center feature of the scene category according to the image features of the images in the same scene category, and is far away from the features of the class center features of other scene categories, and the feature level of the images is further combined, thereby realizing accurate processing of the scene category images which are not contained in the closed image set, and improving the precision, performance and naturalness of the scene recognition model.

Description

Model training and scene recognition method, device, equipment and medium

技术领域technical field

本申请涉及图像处理技术领域，尤其涉及一种模型的训练及场景识别方法、装置、设备及介质。The present application relates to the technical field of image processing, and in particular, to a method, apparatus, device and medium for model training and scene recognition.

背景技术Background technique

随着多媒体技术的发展，人们日常观看视频图像的种类越来越多，视频内容涉及到的产品也越来越丰富。自动识别并分类图像的场景信息有助于帮助机器更好的理解图像，并且帮助下游算法开发针对于不同场景的功能。With the development of multimedia technology, people watch more and more kinds of video images every day, and the products involved in the video content are also more and more abundant. Automatically identifying and classifying image scene information helps machines better understand images and helps downstream algorithms develop functions for different scenes.

随着神经网络在视觉领域上的发展，其在图像分类任务的表现上也超越了绝大部分传统算法。然而大部分基于神经网络的场景识别系统都是在封闭的图像集中训练和测试的，即该场景识别系统只能识别该封闭的图像集中包含的场景类别。然而在实际应用中，由于所有图像可能归属的场景类别是不可穷举的，当前需要进行场景识别的图像实际所归属的场景类别，有可能不是该封闭的图像集中包含的场景类别，但如果通过该场景识别系统去识别该图像所归属的场景类别，则会得到错误的结果，进而影响下游算法的处理。With the development of neural networks in the field of vision, their performance in image classification tasks has also surpassed most traditional algorithms. However, most neural network-based scene recognition systems are trained and tested in a closed image set, that is, the scene recognition system can only identify the scene categories contained in the closed image set. However, in practical applications, since the scene categories to which all images may belong are inexhaustible, the scene category to which the images currently required for scene recognition actually belong may not be the scene categories contained in the closed image set, but if the scene category is If the scene recognition system tries to identify the scene category to which the image belongs, an erroneous result will be obtained, thereby affecting the processing of the downstream algorithm.

因此，亟需一种不仅可以准确地识别归属于封闭的图像集中包含的场景类别图像，还能够处理不归属于封闭的图像集中包含的场景类别图像的场景识别系统。Therefore, there is an urgent need for a scene recognition system that can not only accurately identify images belonging to scene categories contained in a closed image set, but also process images that do not belong to scene categories contained in a closed image set.

发明内容SUMMARY OF THE INVENTION

本申请提供了一种模型的训练及场景识别方法、装置、设备及介质，用以解决现有场景识别系统无法准确地处理不归属于封闭的图像集中包含的场景类别图像的问题。The present application provides a model training and scene recognition method, device, device and medium to solve the problem that the existing scene recognition system cannot accurately process images of scene categories that are not included in a closed image set.

本申请提供了一种场景识别模型训练方法，所述方法包括：The application provides a scene recognition model training method, the method includes:

获取样本集中任一样本图像；其中，所述样本图像对应有场景标签，所述场景标签用于标识所述样本图像所归属的第一场景类别；Obtain any sample image in the sample set; wherein, the sample image corresponds to a scene label, and the scene label is used to identify the first scene category to which the sample image belongs;

通过原始场景识别模型，确定所述样本图像对应的场景概率向量以及所述样本图像的样本特征；其中，所述场景概率向量包括所述样本图像分别归属于每个场景类别的概率值；Determine the scene probability vector corresponding to the sample image and the sample feature of the sample image through the original scene recognition model; wherein, the scene probability vector includes the probability value of the sample image belonging to each scene category respectively;

基于所述场景概率向量以及所述场景标签、所述样本特征以及所述第一场景类别对应的类中心特征、所述样本特征以及第二场景类别对应的类中心特征，对所述原始场景识别模型进行训练，以获取到训练完成的场景识别模型；其中，所述第二场景类别为所述每个场景类别中，除所述第一场景类别之外的场景类别。Identify the original scene based on the scene probability vector and the scene label, the sample feature and the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category The model is trained to obtain a trained scene recognition model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.

本申请提供了一种场景识别方法，所述方法包括：The application provides a scene recognition method, the method includes:

通过预先训练的场景识别模型，确定待识别图像的图像特征；Determine the image features of the image to be recognized through the pre-trained scene recognition model;

确定所述图像特征，分别与每个场景类别的目标类中心特征的相似度；Determine the similarity of the image feature with the target class center feature of each scene category;

根据每个所述相似度以及相似度阈值，确定所述每个场景类别是否包含所述待识别图像所归属的场景类别；According to each of the similarity and the similarity threshold, determine whether each scene category includes the scene category to which the to-be-recognized image belongs;

若确定所述每个场景类别包含所述待识别图像所归属的场景类别，则通过所述场景识别模型，确定所述待识别图像所归属的场景类别；If it is determined that each scene category includes the scene category to which the to-be-recognized image belongs, determining the scene category to which the to-be-identified image belongs by using the scene recognition model;

若确定所述每个场景类别不包含所述待识别图像所归属的场景类别，则不继续识别所述待识别图像所归属的场景类别。If it is determined that each scene category does not include the scene category to which the to-be-identified image belongs, the scene category to which the to-be-identified image belongs is not continued to be identified.

本申请提供了一种场景识别模型训练装置，所述装置包括：The application provides a scene recognition model training device, the device includes:

获取单元，用于获取样本集中任一样本图像；其中，所述样本图像对应有场景标签，所述场景标签用于标识所述样本图像所归属的第一场景类别；an obtaining unit, configured to obtain any sample image in the sample set; wherein, the sample image corresponds to a scene label, and the scene label is used to identify the first scene category to which the sample image belongs;

处理单元，用于通过原始场景识别模型，确定所述样本图像对应的场景概率向量以及所述样本图像的样本特征；其中，所述场景概率向量包括所述样本图像分别归属于每个场景类别的概率值；The processing unit is used to determine the scene probability vector corresponding to the sample image and the sample feature of the sample image through the original scene recognition model; wherein, the scene probability vector includes the sample images belonging to each scene category respectively. probability value;

训练单元，用于基于所述场景概率向量以及所述场景标签、所述样本特征以及所述第一场景类别对应的类中心特征、所述样本特征以及第二场景类别对应的类中心特征，对所述原始场景识别模型进行训练，以获取到训练完成的场景识别模型；其中，所述第二场景类别为所述每个场景类别中，除所述第一场景类别之外的场景类别。The training unit is configured to, based on the scene probability vector and the scene label, the sample feature and the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category, to The original scene recognition model is trained to obtain a trained scene recognition model; wherein the second scene category is a scene category other than the first scene category in each scene category.

本申请提供了一种场景识别装置，所述装置包括：The application provides a scene recognition device, the device includes:

第一处理模块，用于通过预先训练的场景识别模型，确定待识别图像的图像特征；The first processing module is used to determine the image feature of the image to be recognized by using a pre-trained scene recognition model;

第二处理模块，用于确定所述图像特征，分别与每个场景类别的目标类中心特征的相似度；The second processing module is used to determine the similarity between the image features and the target class center features of each scene category;

第三处理模块，用于根据每个所述相似度以及相似度阈值，确定所述每个场景类别是否包含所述待识别图像所归属的场景类别；若确定所述每个场景类别包含所述待识别图像所归属的场景类别，则通过所述场景识别模型，确定所述待识别图像所归属的场景类别；若确定所述每个场景类别不包含所述待识别图像所归属的场景类别，则不继续识别所述待识别图像所归属的场景类别。a third processing module, configured to determine whether each scene category includes the scene category to which the to-be-recognized image belongs according to each of the similarity and the similarity threshold; if it is determined that each scene category includes the The scene category to which the image to be recognized belongs, the scene recognition model is used to determine the scene category to which the image to be recognized belongs; if it is determined that each scene category does not include the scene category to which the image to be recognized belongs, Then, the scene category to which the to-be-identified image belongs is not continued to be identified.

本申请提供了一种电子设备，所述电子设备包括处理器，所述处理器用于执行存储器中存储的计算机程序时实现如上述所述场景识别模型训练方法的步骤，或，实现如上述所述场景识别方法的步骤。The application provides an electronic device, the electronic device includes a processor, and the processor is configured to implement the steps of the above-mentioned scene recognition model training method when executing the computer program stored in the memory, or, to implement the above-mentioned method The steps of the scene recognition method.

本申请提供了一种计算机可读存储介质，其存储有计算机程序，所述计算机程序被处理器执行时实现如上述所述场景识别模型训练方法的步骤，或，实现如上述所述场景识别方法的步骤。The present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, realizes the steps of the above-mentioned scene recognition model training method, or, realizes the above-mentioned scene recognition method A step of.

由于在基于样本集中的样本图像对原始场景识别模型进行训练的过程中，通过原始场景识别模型，可以获取到输入的样本图像对应的场景概率向量以及样本图像的样本特征，使得后续可以基于所述场景概率向量以及所述场景标签、所述样本特征以及所述第一场景类别对应的类中心特征、所述样本特征以及第二场景类别对应的类中心特征，对所述原始场景识别模型进行训练，以获取到训练完成的场景识别模型，使得训练得到的场景识别模型，可以根据相同场景类别内图像的图像特征向该场景类别的类中心特征靠拢，同时远离其他场景类别的类中心特征的特性，进一步结合图像的特征层面，确定该图像的场景类别是否可以识别以及在该图像的场景类别可以识别的情况下，该图像所归属的场景类别，不仅实现了准确地识别归属于封闭的图像集中包含的场景类别图像，还能够处理不归属于封闭的图像集中包含的场景类别图像，提高了场景识别模型的精度、性能以及自然度。Because in the process of training the original scene recognition model based on the sample images in the sample set, through the original scene recognition model, the scene probability vector corresponding to the input sample image and the sample features of the sample image can be obtained, so that the follow-up can be based on the The scene probability vector and the scene label, the sample feature and the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category, to train the original scene recognition model , to obtain the trained scene recognition model, so that the trained scene recognition model can move closer to the class center feature of the scene category according to the image features of the images in the same scene category, while staying away from the characteristics of the class center features of other scene categories , further combined with the feature level of the image, to determine whether the scene category of the image can be identified and if the scene category of the image can be identified, the scene category to which the image belongs, not only realizes accurate identification belonging to a closed image set The included scene category images can also process scene category images that are not included in the closed image set, which improves the accuracy, performance and naturalness of the scene recognition model.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1为本申请一些实施例提供的场景识别模型训练过程示意图；1 is a schematic diagram of a scene recognition model training process provided by some embodiments of the present application;

图2为本申请一些实施例提供的具体的场景识别模型训练流程示意图；2 is a schematic diagram of a specific scene recognition model training process provided by some embodiments of the present application;

图3为本申请一些实施例提供的一种原始场景识别模型的结构示意图；3 is a schematic structural diagram of an original scene recognition model provided by some embodiments of the present application;

图4为本申请一些实施例提供的场景识别过程示意图；FIG. 4 is a schematic diagram of a scene recognition process provided by some embodiments of the present application;

图5为本申请一些实施例提供的具体的场景识别流程示意图；5 is a schematic diagram of a specific scene recognition process provided by some embodiments of the present application;

图6为本申请一些实施例提供的一种场景识别模型训练装置结构示意图；6 is a schematic structural diagram of a scene recognition model training apparatus provided by some embodiments of the present application;

图7为本申请一些实施例提供的一种场景识别装置结构示意图；FIG. 7 is a schematic structural diagram of a scene recognition apparatus according to some embodiments of the present application;

图8为本申请一些实施例提供的一种电子设备结构示意图；FIG. 8 is a schematic structural diagram of an electronic device provided by some embodiments of the present application;

图9为本申请一些实施例提供的一种电子设备结构示意图。FIG. 9 is a schematic structural diagram of an electronic device according to some embodiments of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案和优点更加清楚，下面将结合附图本申请作进一步地详细描述，显然，所描述的实施例仅仅是本申请的一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

如何让场景识别系统能够准确地处理不归属于封闭的图像集中包含的场景类别图像，这本质上是一个开集识别问题，场景识别系统需要能够发现并学习未知场景类别图像所归属的场景类别。综上，开集识别问题是模式识别和多媒体社区中一个重要而又具有挑战性的问题。How to enable the scene recognition system to accurately process images that do not belong to the scene categories contained in the closed image set is essentially an open set recognition problem. The scene recognition system needs to be able to discover and learn the scene categories to which images of unknown scene categories belong. In conclusion, the open set recognition problem is an important and challenging problem in the pattern recognition and multimedia community.

因此，为了实现场景识别系统能够准确地处理不归属于封闭的图像集中包含的场景类别图像，本申请提供了一种模型的训练及场景识别方法、装置、设备及介质。Therefore, in order to realize that the scene recognition system can accurately process images of scene categories that are not included in a closed image set, the present application provides a model training and scene recognition method, device, device and medium.

实施例1：Example 1:

图1为本申请一些实施例提供的场景识别模型训练过程示意图，该过程包括：1 is a schematic diagram of a scene recognition model training process provided by some embodiments of the present application, and the process includes:

S101：获取样本集中任一样本图像；其中，所述样本图像对应有场景标签，所述场景标签用于标识所述样本图像所归属的第一场景类别。S101: Acquire any sample image in the sample set; wherein, the sample image corresponds to a scene label, and the scene label is used to identify the first scene category to which the sample image belongs.

本申请提供的场景识别模型训练方法应用于电子设备，该电子设备可以是如移动终端等智能设备，也可以是家庭大脑等服务器。当然，该电子设备还可以是如电视机等显示设备。The scene recognition model training method provided in the present application is applied to an electronic device, and the electronic device may be a smart device such as a mobile terminal, or a server such as a home brain. Of course, the electronic device may also be a display device such as a television.

为了获取到准确地场景识别模型，需要根据预先获取的样本集中的每个样本图像，对原始场景识别模型进行训练。其中，样本集中的任一样本图像通过如下方式获取：将采集到的原始图像确定为样本图像；和/或，对采集到的原始图像中的像素点的像素值进行调整后，将调整后的图像确定为样本图像。In order to obtain an accurate scene recognition model, the original scene recognition model needs to be trained according to each sample image in the pre-acquired sample set. Wherein, any sample image in the sample set is obtained by: determining the collected original image as the sample image; and/or, after adjusting the pixel values of the pixels in the collected original image, the adjusted The image is determined to be a sample image.

需要说明的是，为了方便训练场景识别模型，样本集中的任一样本图像对应有场景标签，任一场景标签用于标识该样本图像所归属的场景类别(为了方便说明，记为第一场景类别)。比如，该场景类别为直播场景、游戏场景、吃播场景等类别。It should be noted that, in order to facilitate the training of the scene recognition model, any sample image in the sample set corresponds to a scene label, and any scene label is used to identify the scene category to which the sample image belongs (for convenience of description, it is recorded as the first scene category. ). For example, the scene categories are live broadcast scenes, game scenes, eating and broadcasting scenes, and other categories.

作为一种可能的实施方式，如果样本集中包含充足数量的样本图像，即包含了大量不同环境下的采集到的原始图像，则可以根据样本集中的样本图像对原始场景识别模型进行训练。As a possible implementation, if the sample set contains a sufficient number of sample images, that is, contains a large number of original images collected in different environments, the original scene recognition model can be trained according to the sample images in the sample set.

作为另一种可能的实施方式，如果为了保证样本图像的多样性，以提高场景识别模型的准确性，可以通过对原始图像中的像素点的像素值进行调整的方式，比如，对该原始图像进行模糊处理，锐化处理、对比度处理等，得到大量的调整后的图像，将调整后的图像确定为样本图像，以对原始场景识别模型进行训练。As another possible implementation, in order to ensure the diversity of sample images and improve the accuracy of the scene recognition model, the pixel values of the pixels in the original image can be adjusted, for example, the original image Perform blur processing, sharpening processing, contrast processing, etc. to obtain a large number of adjusted images, and determine the adjusted images as sample images to train the original scene recognition model.

据统计，以电子设备为显示设备为例，如电视机，在显示设备的工作场景中，获取到的图像中存在的比较常见的画质问题包括：模糊、曝光、过暗、对比度过低、画面中存在噪点等，比如，在直播场景中，获取到的图像中可能存在曝光的问题等。为了保证样本图像的多样性，以提高场景识别模型的准确性，可以预先针对显示设备的工作场景中，获取到的图像中可能存在的画质问题，对采集到的原始图像的画质进行调整。可以通过以下至少一种方式对采集到的原始图像中像素点的像素值进行调整包括：According to statistics, taking an electronic device as a display device, such as a TV, in the working scene of the display device, the more common image quality problems in the acquired images include: blur, exposure, too dark, too low contrast, There is noise in the picture, for example, in the live broadcast scene, there may be exposure problems in the acquired image. In order to ensure the diversity of sample images and improve the accuracy of the scene recognition model, it is possible to adjust the image quality of the collected original images in advance for possible image quality problems in the acquired images in the working scene of the display device. . The pixel value of the pixel point in the collected original image can be adjusted in at least one of the following ways, including:

方式一、通过预设的卷积核，对原始图像中像素点的像素值进行调整；Method 1: Adjust the pixel value of the pixel in the original image through a preset convolution kernel;

方式二、对所述原始图像中像素点的像素值进行对比度调整；Mode 2: Contrast adjustment is performed on the pixel values of the pixel points in the original image;

方式三、对所述原始图像中像素点的像素值进行亮度调整；Mode 3: Adjust the brightness of the pixel values of the pixel points in the original image;

方式四、对所述原始图像中像素点的像素值进行加噪处理。Manner 4: Perform noise processing on the pixel values of the pixel points in the original image.

比如，如果希望对原始图像进行加噪处理，从而获取存在不同噪声的调整后的图像，可以对原始图像中像素点的像素值进行加噪处理，即随机向原始图像添加噪声。其中，在对原始图像进行加噪处理的过程中，所使用的噪声种类还应尽可能的多，比如，白噪声、椒盐噪声、高斯噪声等，以使样本集中的样本图像更加的多样化，从而提高场景识别模型的准确性和鲁棒性。For example, if you want to perform noise processing on the original image to obtain an adjusted image with different noises, you can perform noise processing on the pixel values of the pixel points in the original image, that is, randomly add noise to the original image. Among them, in the process of adding noise to the original image, the types of noise used should be as many as possible, such as white noise, salt and pepper noise, Gaussian noise, etc., so as to make the sample images in the sample set more diverse. Thereby improving the accuracy and robustness of the scene recognition model.

需要说明的是，对原始图像中像素点的像素值进行处理的过程属于现有技术，具体不在此进行赘述。It should be noted that the process of processing the pixel values of the pixel points in the original image belongs to the prior art, and details are not described here.

通过上述的方式，获取样本图像，可以使样本集中的样本图像的数量倍增，使得可以快速获取到大量的样本图像，降低获取样本图像的难度、成本和所耗费的资源。后续可以根据更多的样本图像，对原始场景识别模型进行训练，提高了场景识别模型的准确性和鲁棒性。Obtaining sample images in the above manner can double the number of sample images in the sample set, so that a large number of sample images can be quickly obtained, and the difficulty, cost and resource consumption of obtaining sample images can be reduced. Subsequently, the original scene recognition model can be trained according to more sample images, which improves the accuracy and robustness of the scene recognition model.

作为再一种可能的实施方式，还可以将采集到的原始图像、以及对采集到的原始图像中的像素点的像素值进行调整后获取到的调整后的图像，均确定为样本图像。根据样本集中的原始图像以及调整后的图像，一起训练原始场景识别模型。As another possible implementation manner, the collected original image and the adjusted image obtained by adjusting the pixel values of the pixel points in the collected original image may also be determined as sample images. Based on the original images in the sample set and the adjusted images, the original scene recognition model is trained together.

S102：通过原始场景识别模型，确定所述样本图像对应的场景概率向量以及所述样本图像的样本特征；其中，所述场景概率向量包括所述样本图像分别归属于每个场景类别的概率值。S102: Determine the scene probability vector corresponding to the sample image and the sample feature of the sample image by using the original scene recognition model; wherein, the scene probability vector includes the probability value that the sample image belongs to each scene category respectively.

当基于上述的实施例获取到用于训练原始场景识别模型的样本集后，可以基于样本集中的每个样本图像，对原始场景识别图像进行训练。After the sample set for training the original scene recognition model is obtained based on the above-mentioned embodiment, the original scene recognition image can be trained based on each sample image in the sample set.

具体实施过程中，将任一样本图像输入到原始场景识别模型。通过原始场景识别模型，可以获得上述样本图像对应的场景概率向量以及该样本图像的图像特征(为了方便说明，记为样本特征)。其中，该场景概率向量包括该样本图像分别归属于每个场景类别的概率值，该每个场景类别是由样本集中各个样本图像所归属的场景类别确定的。任一样本特征表示从样本图像中提取到的更高维度、更抽象的图像特征。In the specific implementation process, any sample image is input into the original scene recognition model. Through the original scene recognition model, the scene probability vector corresponding to the sample image and the image feature of the sample image (for convenience of description, denoted as sample feature) can be obtained. Wherein, the scene probability vector includes a probability value that the sample image belongs to each scene category, and each scene category is determined by the scene category to which each sample image in the sample set belongs. Any sample feature represents a higher-dimensional, more abstract image feature extracted from the sample image.

其中，该原始场景识别模型可以是决策树、逻辑回归(logistic regression，LR)，朴素贝叶斯(naive bayes，NB)分类算法，随机森林(random forest，RF)算法，支持向量机(support vector machines，SVM)分类算法、方向梯度直方图(histogram of orientedgradients,HOG)、深度学习算法等。其中，深度学习算法可以包括神经网络、深度神经网络、卷积神经网络(Convolutional Neuron Network，CNN)等。The original scene recognition model may be a decision tree, logistic regression (LR), naive bayes (NB) classification algorithm, random forest (RF) algorithm, support vector machine (support vector machine) machines, SVM) classification algorithms, histogram of oriented gradients (HOG), deep learning algorithms, etc. The deep learning algorithm may include a neural network, a deep neural network, a convolutional neural network (Convolutional Neuron Network, CNN), and the like.

在一种可能的实施方式中，为了通过场景识别模型进行场景识别，该原始场景识别模型中包括特征提取层、特征输出层以及分类输出层。该特征提取层特征输出层输出，特征输出层与分类输出层连接，当样本图像输入到该原始场景模型中，通过该原始场景识别模型中的特征提取层，可以获取输入的样本图像的样本特征。然后通过原始场景识别模型中的特征输出层，可以将该样本特征输出。通过原始场景识别模型中的分类输出层，基于该样本特征，可以获取并输出样本图像对应的场景概率向量。In a possible implementation manner, in order to perform scene recognition through the scene recognition model, the original scene recognition model includes a feature extraction layer, a feature output layer and a classification output layer. The feature extraction layer outputs the feature output layer, and the feature output layer is connected with the classification output layer. When the sample image is input into the original scene model, the sample features of the input sample image can be obtained through the feature extraction layer in the original scene recognition model. . Then, the sample features can be output through the feature output layer in the original scene recognition model. Through the classification output layer in the original scene recognition model, based on the sample feature, the scene probability vector corresponding to the sample image can be obtained and output.

S103：基于所述场景概率向量以及所述场景标签、所述样本特征以及所述第一场景类别对应的类中心特征、所述样本特征以及第二场景类别对应的类中心特征，对所述原始场景识别模型进行训练，以获取到训练完成的场景识别模型；其中，所述第二场景类别为所述每个场景类别中，除所述第一场景类别之外的场景类别。S103: Based on the scene probability vector and the scene label, the sample feature and the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category, perform a The scene recognition model is trained to obtain the trained scene recognition model; wherein, the second scene category is a scene category other than the first scene category in each of the scene categories.

由于样本集中任一样本图像均对应有场景标签，即标识了该样本图像所实际归属的场景类别，因此，在本申请中，可以在确定了样本图像对应的场景概率向量以及该样本图像的样本特征之后，可以基于该场景概率向量、对应的场景标签以及样本特征，采用本申请提供的场景识别模型训练方法，对原始场景识别模型进行训练。Since any sample image in the sample set corresponds to a scene label, that is, it identifies the scene category to which the sample image actually belongs. Therefore, in this application, the scene probability vector corresponding to the sample image and the sample of the sample image can be determined. After the feature, the original scene recognition model can be trained by using the scene recognition model training method provided by the present application based on the scene probability vector, the corresponding scene label and the sample feature.

如果某一图像归属的场景分类是预先训练的场景识别模型可以识别的场景分类，则该图像的图像特征，一般会与样本集中归属于该场景类型的样本图像的图像特征的度量距离越大，与样本集中不归属于该场景类型的样本图像的图像特征的度量距离越小。基于此，可以在对图像的场景分类进行识别时，可以确定该图像的图像特征与样本集中的每个样本图像的图像特征的度量距离，根据获取到的度量距离，确定该图像所归属的场景分类是预先训练的场景识别模型可以识别的场景分类。If the scene classification to which an image belongs is a scene classification that can be recognized by a pre-trained scene recognition model, the image features of the image generally have a greater metric distance from the image features of the sample images belonging to the scene type in the sample set. The smaller the metric distance from the image features of the sample images in the sample set that do not belong to the scene type. Based on this, when identifying the scene classification of an image, the metric distance between the image feature of the image and the image feature of each sample image in the sample set can be determined, and the scene to which the image belongs is determined according to the obtained metric distance. Classification is the classification of scenes that a pre-trained scene recognition model can recognize.

其中，该度量距离可以通过欧式距离、余弦相似度、KL散度函数等方式获取。The metric distance can be obtained by means of Euclidean distance, cosine similarity, KL divergence function, or the like.

进一步地，由于样本集中会包含有大量的样本图像，如果确定某一图像的图像特征与样本集中的每个样本图像的图像特征之间的度量距离，则会耗费大量的计算资源，降低场景识别系统确定该图像所归属的场景类别的效率。基于此，可以获取样本集中每个样本图像所归属的场景类别的类中心特征，以通过该类中心特征可以表征该场景类别图像一般具有的特征。后续可以在对图像的场景分类进行识别时，可以确定该图像的图像特征与样本集中的每个样本图像所归属的场景类别的类中心特征之间的度量距离。根据获取到的度量距离，确定该图像所归属的场景分类是否是预先训练的场景识别模型可以识别的场景分类。Further, since the sample set will contain a large number of sample images, if the metric distance between the image features of an image and the image features of each sample image in the sample set is determined, it will consume a lot of computing resources and reduce scene recognition. The system determines the efficiency of the scene category to which the image belongs. Based on this, the class center feature of the scene category to which each sample image in the sample set belongs can be obtained, so that the features generally possessed by the images of the scene category can be represented by the class center feature. Subsequently, when the scene classification of the image is identified, the metric distance between the image feature of the image and the class center feature of the scene category to which each sample image in the sample set belongs can be determined. According to the obtained metric distance, it is determined whether the scene classification to which the image belongs is a scene classification that can be recognized by the pre-trained scene recognition model.

需要说明的是，样本特征的维度与类中心特征的维度相同。It should be noted that the dimension of the sample feature is the same as the dimension of the class center feature.

在一种可能的实施方式中，为了准确地获取到每个场景类别的类中心特征，可以采用如下方式获取每个场景类别的类中心特征：In a possible implementation, in order to accurately acquire the class-centric feature of each scene category, the class-centric feature of each scene category can be acquired in the following manner:

方式1、为了将获取每个场景类别的类中心特征的过程结合到模型训练的过程中，可以对于原始场景识别模型的每次迭代训练，通过当前迭代的场景识别模型，获取样本集中的每个场景类别的一个或多个样本图像的样本特征。然后根据每个样本特征，确定每个场景类别分别对应的候选类中心特征。基于每个候选类中心特征，确定下一迭代训练中每个场景类别的类中心特征。Method 1. In order to combine the process of obtaining the class center features of each scene category into the model training process, each iteration of the original scene recognition model can be trained through the current iteration of the scene recognition model to obtain each item in the sample set. Sample features of one or more sample images of the scene category. Then, according to each sample feature, the candidate class center feature corresponding to each scene category is determined. Based on each candidate class center feature, the class center feature of each scene category in the next iterative training is determined.

在一种可能的实施方式中，可以针对每个场景类别，确定该场景类别的各个样本图像中，被当前迭代的场景识别模型正确识别的样本图像(为了方便描述，记为目标样本图像)。该被当前迭代的场景识别模型正确识别可以理解为，通过当前迭代的场景识别模型确定该样本图像的场景类别与该样本图像的第一场景分类相同。然后根据目标样本图像的样本特征及目标样本图像的权重值，确定加权平均向量，并基于该加权平均向量，确定该场景类别对应的候选类中心特征。In a possible implementation, for each scene category, among the sample images of the scene category, the sample images correctly recognized by the currently iterative scene recognition model (for convenience of description, denoted as target sample images) can be determined. The correct recognition by the currently iterative scene recognition model can be understood as determining that the scene category of the sample image is the same as the first scene category of the sample image determined by the currently iterative scene recognition model. Then, according to the sample feature of the target sample image and the weight value of the target sample image, a weighted average vector is determined, and based on the weighted average vector, the candidate class center feature corresponding to the scene category is determined.

其中，可以通过人工配置的方式，预先配置该目标样本图像的权重值，比如，设置每个目标样本图像的权重值均为1，也可以通过当前迭代的场景识别模型，获取目标样本图像归属于该场景类别的概率值，将该概率值确定为该目标样本图像的权重值。Among them, the weight value of the target sample image can be pre-configured by manual configuration, for example, the weight value of each target sample image is set to be 1, or the current iterative scene recognition model can be used to obtain the target sample image belonging to The probability value of the scene category, and the probability value is determined as the weight value of the target sample image.

例如，若通过人工配置的方式，预先配置该目标样本图像的权重值，则根据目标样本图像的样本特征及目标样本图像的权重值，确定加权平均向量可通过如下公式表示：For example, if the weight value of the target sample image is pre-configured by manual configuration, then according to the sample characteristics of the target sample image and the weight value of the target sample image, the weighted average vector can be determined by the following formula:

其中，C_i为场景类别i的类中心特征，

为被正确识别为场景分类i的第j个目标样本图像的样本特征，

为被正确识别为场景分类i的目标样本图像的数量，目标样本图像的权重值为1。Among them, C _i is the class center feature of scene category i,

is the sample feature of the j-th target sample image correctly identified as scene classification i,

To be the number of target sample images correctly identified as scene classification i, the weight of target sample images is 1.

再例如，若通过当前迭代的场景识别模型，获取目标样本图像归属于该场景类别的概率值，将该概率值确定为该目标样本图像的权重值，则根据目标样本图像的样本特征及目标样本图像的权重值，确定加权平均向量可通过如下公式表示：For another example, if the probability value of the target sample image belonging to the scene category is obtained through the current iterative scene recognition model, and the probability value is determined as the weight value of the target sample image, then according to the sample characteristics of the target sample image and the target sample image The weight value of the image to determine the weighted average vector can be expressed by the following formula:

其中，C_i为场景类别i的类中心特征，

为被正确识别为场景分类i的目标样本图像的数量，

为通过当前迭代的场景识别模型，获取第j个目标样本图像归属于该场景类别i的概率值，该权重值度越高说明，当前迭代的场景识别模型的识别结果越准确，被越准确的识别的样本图像的样本特征对类中心的贡献更大。Among them, C _i is the class center feature of scene category i,

is the number of target sample images correctly identified as scene class i,

In order to obtain the probability value of the j-th target sample image belonging to the scene category i through the scene recognition model of the current iteration, the higher the weight value is, the more accurate the recognition result of the scene recognition model of the current iteration is, and the more accurate it is. The sample features of the identified sample images contribute more to the class centers.

在另一种可能的实施方式中，还可以针对每个场景类别，确定该场景类别的各个样本图像中，被当前迭代的场景识别模型正确识别的目标样本图像；基于预设的目标算法，获取该目标样本图像的样本特征中的目标特征，并基于该目标特征，确定场景类别分别对应的候选类中心特征。其中，所述目标特征为主成分特征，或，归一化特征。In another possible implementation, it is also possible to determine, for each scene category, the target sample image correctly identified by the currently iterative scene recognition model in each sample image of the scene category; based on the preset target algorithm, obtain The target feature in the sample features of the target sample image, and based on the target feature, the candidate class center features corresponding to the scene categories respectively are determined. Wherein, the target feature is a principal component feature, or a normalized feature.

其中，具体通过预设的目标算法，获取图像的图样特征中的主成分特征，或，归一化特征的过程属于现有技术，在此不做赘述。The process of obtaining the principal component feature in the pattern feature of the image, or the process of normalizing the feature, belongs to the prior art, and will not be repeated here.

当基于上述的实施例获取到每个场景类别的候选类中心特征之后，可以根据每个场景类别的候选类中心特征，确定每个场景类别的类中心特征。具体的，根据每个场景类别的候选类中心特征，确定每个场景类别的类中心特征的过程，主要包括如下两种情况：After the candidate class center feature of each scene category is obtained based on the above-mentioned embodiment, the class center feature of each scene category may be determined according to the candidate class center feature of each scene category. Specifically, according to the candidate class center feature of each scene category, the process of determining the class center feature of each scene category mainly includes the following two cases:

情况1、由于在对原始场景识别模型进行训练之前，每个场景类别的类中心特征均是随机初始化生成的，导致在对原始场景模型进行第一次迭代训练时，当前迭代的每个场景类别的类中心特征是不准确的。因此，在本申请中，若确定当前迭代为第一次迭代，则可以直接将每个候选类中心特征，确定为下一迭代训练中每个场景类别的类中心特征，即根据每个候选类中心特征，对当前迭代的每个场景类别的类中心特征进行更新，从而提高获取到的每个场景类别的类中心特征的准确性。Case 1. Before training the original scene recognition model, the class center features of each scene category are randomly initialized and generated, resulting in the first iteration training of the original scene model, each scene category of the current iteration The class center feature of is inaccurate. Therefore, in the present application, if it is determined that the current iteration is the first iteration, each candidate class center feature can be directly determined as the class center feature of each scene category in the next iterative training, that is, according to each candidate class Center feature, update the class center feature of each scene category in the current iteration, so as to improve the accuracy of the obtained class center feature of each scene category.

其中，该随机初始化生成的类中心特征的维度与样本特征的维度相同。The dimension of the class center feature generated by the random initialization is the same as the dimension of the sample feature.

情况2、为了在每次迭代所确定的每个场景类别的类中心特征更加准确，且变化更稳定，在本申请中，预先配置有权重向量，该权重向量用于调整每次更新类中心特征的幅度。当基于上述的实施例确定了每个场景类别的候选类中心特征之后，若确定当前迭代不为第一次迭代，则针对每个场景类别，确定该场景类别对应的候选类中心特征，与当前迭代确定的该场景类别对应的类中心特征的差向量。然后根据该差向量、以及预先配置的权重向量，对该差向量进行调整。根据调整后的差向量以及当前迭代确定的该场景类别对应的类中心特征，确定下一迭代训练中该场景类别对应的类中心特征。Case 2. In order to make the class center feature of each scene category determined in each iteration more accurate and the change more stable, in this application, a weight vector is pre-configured, and the weight vector is used to adjust the class center feature for each update. Amplitude. After the candidate class center feature of each scene category is determined based on the above-mentioned embodiment, if it is determined that the current iteration is not the first iteration, then for each scene category, the candidate class center feature corresponding to the scene category is determined, which is consistent with the current iteration. The iteratively determined difference vector of the class center feature corresponding to the scene category. Then, the difference vector is adjusted according to the difference vector and the preconfigured weight vector. According to the adjusted difference vector and the class center feature corresponding to the scene category determined in the current iteration, the class center feature corresponding to the scene category in the next iterative training is determined.

在一种可能的实施方式中，可以获取该差向量与预先配置的权重向量的乘积向量，将该乘积向量确定为调整后的差向量。In a possible implementation manner, a product vector of the difference vector and a pre-configured weight vector may be obtained, and the product vector may be determined as the adjusted difference vector.

在一种可能的实施方式中，可以根据调整后的差向量以及当前确定的该场景类别对应的类中心特征，确定和向量，将该和向量确定为下一迭代训练中该场景类别对应的类中心特征。In a possible implementation manner, a sum vector may be determined according to the adjusted difference vector and the currently determined class center feature corresponding to the scene category, and the sum vector may be determined as the class corresponding to the scene category in the next iterative training central feature.

例如，基于上述情况1和情况2，根据每个场景类别的候选类中心特征，确定每个场景类别的类中心特征的过程，可通过如下公式表示：For example, based on the above cases 1 and 2, the process of determining the class center feature of each scene category according to the candidate class center feature of each scene category can be expressed by the following formula:

其中，

为下一迭代训练中该场景类别i对应的类中心特征，

为场景类别i对应的候选类中心向量，

为当前迭代确定的该场景类别i对应的类中心特征，W为预先配置的权重向量。in,

is the class center feature corresponding to the scene category i in the next iterative training,

is the candidate class center vector corresponding to scene category i,

is the class center feature corresponding to the scene category i determined for the current iteration, and W is a pre-configured weight vector.

方式2、还可以通过预先训练的特征提取模型，获取样本集中的每个样本图像的样本特征。可以理解的是，该特征提取模型也为一种特征提取算法。然后采用聚类算法，比如模糊聚类算法、K-means聚类、最大最小距离聚类算法等，对每个样本特征进行聚类，从而获取每个场景类别对应的簇。其中，任一场景类别对应的簇中包括该场景类别的样本特征。然后根据每个场景类别分别对应的簇中所包含的样本特征，确定该簇中的类中心特，即确定每个场景类别的类中心特征。Manner 2: The sample features of each sample image in the sample set can also be obtained through a pre-trained feature extraction model. It can be understood that the feature extraction model is also a feature extraction algorithm. Then, clustering algorithms, such as fuzzy clustering algorithm, K-means clustering, maximum and minimum distance clustering algorithm, etc., are used to cluster each sample feature, so as to obtain the cluster corresponding to each scene category. The cluster corresponding to any scene category includes the sample features of the scene category. Then, according to the sample features contained in the cluster corresponding to each scene category, the class center feature in the cluster is determined, that is, the class center feature of each scene category is determined.

其中，可以将该簇所包含的任一样本特征确定为类中心特征，也可以将根据该簇中所包含的各个样本特征的平均向量，将该平均向量确定为类中心特征。具体实施过程中，可以根据实际需求进行灵活使用，在此不做具体赘述。Wherein, any sample feature included in the cluster may be determined as the class center feature, or the average vector may be determined as the class center feature according to the average vector of each sample feature included in the cluster. In the specific implementation process, it can be used flexibly according to actual needs, which will not be described in detail here.

需要说明的是，训练特征提取模型的过程、以及如何根据聚类算法，对样本特征进行聚类的过程属于现有技术，在此不做具体赘述。It should be noted that the process of training the feature extraction model and the process of clustering the sample features according to the clustering algorithm belong to the prior art, and will not be described in detail here.

为了方便训练得到的场景识别模型，可以从图像的特征层面确定图像所归属的场景，在本申请中，在对原始场景识别模型进行训练的过程中，可以考虑到样本图像的样本特征与每个样本图像所归属的场景类别的类中心特征之间的度量距离。然后基于该度量距离、场景概率向量以及场景标签，对原始场景识别模型进行。可以理解的是，基于场景概率向量以及所述场景标签、样本特征以及第一场景类别对应的类中心特征、样本特征以及第二场景类别对应的类中心特征，对原始场景识别模型进行训练。其中，第二场景类别为样本集中每个样本图像所归属的场景类别中，除第一场景类别之外的场景类别。In order to facilitate the training of the obtained scene recognition model, the scene to which the image belongs can be determined from the feature level of the image. In this application, in the process of training the original scene recognition model, the sample characteristics of the sample image and each The metric distance between the class center features of the scene class to which the sample image belongs. Then based on the metric distance, the scene probability vector and the scene label, the original scene recognition model is performed. It can be understood that the original scene recognition model is trained based on the scene probability vector, the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category. The second scene category is a scene category other than the first scene category among scene categories to which each sample image in the sample set belongs.

在一种可能的实施方式中，在确定样本特征与每个样本图像所归属的场景类别的类中心特征之间的度量距离时，可以通过如下欧式距离公式确定：In a possible implementation manner, when determining the metric distance between the sample feature and the class center feature of the scene category to which each sample image belongs, it can be determined by the following Euclidean distance formula:

其中，d(x，y_i)表示样本特征x与第i个场景类别的类中心特征之间的度量距离。Among them, d(x, y _i ) represents the metric distance between the sample feature x and the class center feature of the ith scene category.

在另一种可能的实施方式中，由于欧式距离代表两个向量在绝对距离上的靠近程度，余弦相似度代表两个向量在方向上的靠近程度。因此，在确定样本特征与每个样本图像所归属的场景类别的类中心特征之间的度量距离时，可以通过如下公式确定：In another possible implementation, since the Euclidean distance represents the closeness of the two vectors in absolute distance, the cosine similarity represents the closeness of the two vectors in the direction. Therefore, when determining the metric distance between the sample feature and the class center feature of the scene category to which each sample image belongs, it can be determined by the following formula:

其中，d(x，y_i)表示样本特征x与第i个场景类别的类中心特征之间的度量距离，cos_sim(x，y_i)表示样本特征x与第i个场景类别的类中心特征之间的余弦相似度，α₁表示欧式距离对应的权重值，α₂表示余弦相似度对应的权重值。Among them, d(x, y _i ) represents the metric distance between the sample feature x and the class center feature of the ith scene category, and cos_sim(x, y _i ) represents the sample feature x and the class center feature of the ith scene category The cosine similarity between the two, α ₁ represents the weight value corresponding to the Euclidean distance, and α ₂ represents the weight value corresponding to the cosine similarity.

在一种可能的实施方式中，可以基于场景概率向量以及场景标签，确定损失值(为了方便说明，记为第一损失值)；基于样本特征以及第一场景类别对应的类中心特征，确定损失值(为了方便说明，记为第二损失值)；基于样本特征以及第二场景类别对应的类中心特征，确定损失值(为了方便说明，记为第三损失值)。然后根据第一损失值及其对应的第一权重值、第二损失值及其对应的第二权重值、以及第三损失值及其对应的第三权重值，确定综合损失值。基于综合损失值，对原始场景识别模型进行训练，以对原始场景识别模型中的参数的参数值进行更新，从而获取训练完成的场景识别模型。In a possible implementation, a loss value (denoted as the first loss value for convenience of description) may be determined based on the scene probability vector and the scene label; the loss may be determined based on the sample feature and the class center feature corresponding to the first scene category value (for convenience of description, denoted as the second loss value); based on the sample feature and the class center feature corresponding to the second scene category, determine the loss value (for convenience of description, denoted as the third loss value). Then, the comprehensive loss value is determined according to the first loss value and its corresponding first weight value, the second loss value and its corresponding second weight value, and the third loss value and its corresponding third weight value. Based on the comprehensive loss value, the original scene recognition model is trained to update the parameter values of the parameters in the original scene recognition model, so as to obtain the trained scene recognition model.

具体实施中，在根据该综合损失值，对原始场景识别模型进行训练时，可以采用梯度下降算法，对原始场景识别模型中的参数的梯度进行反向传播，从而实现对原始场景识别模型进行训练。In specific implementation, when training the original scene recognition model according to the comprehensive loss value, a gradient descent algorithm can be used to back-propagate the gradients of the parameters in the original scene recognition model, so as to realize the training of the original scene recognition model. .

可以理解的是，第二损失值可以由样本特征与第一场景类别对应的类中心特征之间的度量距离确定，第三损失值也可以由样本特征与第二场景类别对应的类中心特征之间的度量距离确定。It can be understood that the second loss value can be determined by the metric distance between the sample feature and the class center feature corresponding to the first scene category, and the third loss value can also be determined by the difference between the sample feature and the class center feature corresponding to the second scene category. The metric distance between them is determined.

例如，根据第一损失值及其对应的第一权重值、第二损失值及其对应的第二权重值、以及第三损失值及其对应的第三权重值，确定综合损失值，可通过如下公式确定：For example, according to the first loss value and its corresponding first weight value, the second loss value and its corresponding second weight value, and the third loss value and its corresponding third weight value, the comprehensive loss value can be determined by Determined by the following formula:

其中，

为根据场景概率向量y以及场景标签

所确定的第一损失值，d(x_i，C_i)为样本特征x_i与第一场景类别对应的类中心特征C_i之间的度量距离，d(x_i，C_cls！＝i)为样本特征x_i与第二场景类别对应的类中心特征C_cls！＝i之间的度量距离，ω₁为第一权重值，ω₂为第二权重值，ω₃为第三权重值。in,

is based on the scene probability vector y and the scene label

The determined first loss value, d( _xi , C _i ) is the metric distance between the sample feature _{xi and the class center feature C i} _{corresponding} to the first scene category, d( _xi , C _{cls !=i} ) is the class center feature C _cls! corresponding to the sample feature _xi and the second scene category! _{= metric distance between i} , ω ₁ is the first weight value, ω ₂ is the second weight value, and ω ₃ is the third weight value.

在实际应用过程中，相同场景类别图像的图像特征之间的度量距离越小，不相同场景类别图像的图像特征之间的度量距离越大。因此，在设置第一权重值、第二权重值以及第三权重值时，该第二损失值可以是负数，第一损失值和第三损失值可以为正数，使得在优化综合损失值时，场景识别模型的优化方向是向着最小化第一损失值，最小化第二损失值，最大化第三损失值的方向，以增大不同场景类别之间的度量距离，减小相同场景类别的样本特征之间的度量距离。从特征空间上看，使不同场景类别的样本特征之间的分布比较分散，但相同场景类别的样本特征之间的分布比较聚拢。In practical application, the smaller the metric distance between image features of images of the same scene category, the greater the metric distance between image features of images of different scene categories. Therefore, when setting the first weight value, the second weight value and the third weight value, the second loss value may be a negative number, and the first loss value and the third loss value may be positive numbers, so that when optimizing the comprehensive loss value , the optimization direction of the scene recognition model is to minimize the first loss value, minimize the second loss value, and maximize the third loss value, so as to increase the metric distance between different scene categories and reduce the same scene category. The metric distance between sample features. From the perspective of feature space, the distribution of sample features of different scene categories is relatively scattered, but the distribution of sample features of the same scene category is relatively concentrated.

通过上述的综合损失值，对场景识别模型进行训练，既可以使不同场景类别的样本特征在特征空间中相似度变小，同时使同一场景类别的样本特征在特征空间中相似度变大，有利于提高确定某一图像的场景类别是否是场景识别模型可以识别的精度。By training the scene recognition model through the above comprehensive loss value, the similarity of the sample features of different scene categories in the feature space can be reduced, and the similarity of the sample features of the same scene category can be increased in the feature space. It is beneficial to improve the accuracy of determining whether the scene category of a certain image can be recognized by the scene recognition model.

由于样本集包含大量的样本图像，对每个样本图像都进行上述操作，当满足预设的收敛条件时，该场景识别模型训练完成。Since the sample set contains a large number of sample images, the above operations are performed on each sample image. When the preset convergence conditions are met, the scene recognition model training is completed.

其中，满足预设的收敛条件可以为根据当前迭代训练获取到的每个综合损失值的和小于设定的损失值阈值、对模型进行训练的迭代次数达到设置的最大迭代次数等。具体实施中可以灵活进行设置，在此不做具体限定。Wherein, satisfying the preset convergence condition may be that the sum of each comprehensive loss value obtained according to the current iterative training is less than the set loss value threshold, the number of iterations for training the model reaches the set maximum number of iterations, and the like. In the specific implementation, it can be set flexibly, which is not specifically limited here.

在一种可能的实施方式中，为了确定已训练的场景识别模型的精度，在上线发布该场景识别模型之前，可以对该场景识别模型进行测试，以确定该场景识别模型是否可以准确地处理不归属于样本集中包含的场景类别图像，以及该场景识别模型对可以识别的图像的识别精度。In a possible implementation, in order to determine the accuracy of the trained scene recognition model, before releasing the scene recognition model online, the scene recognition model may be tested to determine whether the scene recognition model can accurately The images belonging to the scene categories contained in the sample set, and the recognition accuracy of the scene recognition model for the images that can be recognized.

具体实施过程中，获取用于对已训练的场景识别模型进行测试的测试集，该测试集中包含有测试样本图像，以基于测试样本图像对上述已训练的场景识别模型的可靠程度进行验证。在获取该测试集中包含的测试样本图像时，可以重新采集测试集中包含的测试样本图像的方式获取，和/或，也可以通过把样本集中的样本图像分为训练样本图像和测试样本图像的方式获取。需要说明的是，采集测试集中包含的测试样本图像的具体过程，与上述采集样本集中包含的样本图像的过程类似，重复之处不做赘述。In the specific implementation process, a test set for testing the trained scene recognition model is obtained, and the test set includes test sample images, so as to verify the reliability of the above-mentioned trained scene recognition model based on the test sample images. When acquiring the test sample images contained in the test set, the test sample images contained in the test set can be obtained by re-collecting the test sample images, and/or, the sample images in the sample set can also be divided into training sample images and test sample images. Obtain. It should be noted that the specific process of collecting the test sample images included in the test set is similar to the above-mentioned process of collecting the sample images included in the sample set, and the repeated parts will not be repeated.

为了保证可以测试该场景识别模型是否可以准确地处理不归属于样本集中包含的场景类别图像的能力，在获取的每个测试样本图像中，需存在至少一个测试样本图像所归属的场景类别，与样本集中所包含的场景类别均不同。In order to test whether the scene recognition model can accurately process images of scene categories not included in the sample set, in each acquired test sample image, there must be at least one scene category to which the test sample image belongs, and The scene categories included in the sample set are all different.

其中，每个测试样本图像均对应有场景标签以及处理标签，该场景标签用于标识测试样本图像所归属的场景类别(为了方便描述，记为第三场景类别)，处理标签用于标识样本集中包含的场景类别是否包含该第三场景类别。Among them, each test sample image corresponds to a scene label and a processing label. The scene label is used to identify the scene category to which the test sample image belongs (for the convenience of description, it is recorded as the third scene category), and the processing label is used to identify the sample set. Whether the included scene category contains this third scene category.

针对测试集中的每个测试样本图像，将该测试样本图像输入到场景识别模型中。通过该场景识别模型，获取该测试样本图像的图像特征(为了方便说明，记为测试样本特征)。然后确定该测试样本特征，分别与每个场景类别的目标类中心特征的相似度。其中，该目标类中心特征可以是对原始场景识别模型的最后一次迭代训练时，样本集中包含的每个场景类别的类中心特征。然后根据相似度阈值以及获取到的相似度，确定每个场景类别是否包含该测试样本图像所归属的场景类别。For each test sample image in the test set, the test sample image is input into the scene recognition model. Through the scene recognition model, the image features of the test sample image (for convenience of description, denoted as test sample features) are obtained. Then determine the test sample features and the similarity with the target class center features of each scene category respectively. Wherein, the target class center feature may be the class center feature of each scene category included in the sample set during the last iterative training of the original scene recognition model. Then, according to the similarity threshold and the obtained similarity, it is determined whether each scene category includes the scene category to which the test sample image belongs.

在一种可能的实施方式中，可以通过人工的方式配置该相似度阈值，也可以是针对每个目标类中心特征，确定该目标类中心特征与其它的目标类中心特征之间的参考相似度。然后根据每个目标类中心特征分别对应的参考相似度，确定该相似度阈值。In a possible implementation, the similarity threshold may be configured manually, or the reference similarity between the target class center feature and other target class center features may be determined for each target class center feature. . Then, the similarity threshold is determined according to the reference similarity corresponding to each target class center feature respectively.

在一种可能的实施方式中，若参考相似度是根据欧式距离等度量距离确定的，则可以根据各个参考相似度中的最小值，确定该相似度阈值。In a possible implementation manner, if the reference similarity is determined according to metric distances such as Euclidean distance, the similarity threshold may be determined according to the minimum value of each reference similarity.

在一种可能的实施方式中，若参考相似度是根据余弦相似度等度量距离确定的，则可以根据各个参考相似度中的最大值，确定该相似度阈值。In a possible implementation manner, if the reference similarity is determined according to metric distances such as cosine similarity, the similarity threshold may be determined according to the maximum value of each reference similarity.

在一种可能的实施方式中，若相似度是根据欧式距离等度量距离确定的，则两个图像特征的相似度越小，说明两个图像特征之间相似性越高，两个图像特征越有可能归属于同一场景类别；两个图像特征的相似度越大，说明两个图像特征之间的相似性越低，两个图像特征越可能不归属于同一场景类别。因此，在根据每个所述相似度以及相似度阈值，确定每个场景类别是否包含所述测试样本图像所归属的场景类别，若存在任一相似度小于相似度阈值，则说明测试样本图像的图像特征与该相似度所对应的目标类中心特征极有可能归属于同一场景类别，则确定样本集中包含的每个场景类别包含测试样本图像所归属的场景类别；若每个相似度均不小于相似度阈值，则说明测试样本图像的图像特征与每个目标类中心特征均来自不同的场景类别，则确定样本集中包含的每个场景类别不包含测试样本图像所归属的场景类别。In a possible implementation, if the similarity is determined according to the Euclidean distance and other metric distances, the smaller the similarity between the two image features, the higher the similarity between the two image features, the higher the similarity between the two image features. It is possible to belong to the same scene category; the greater the similarity of the two image features, the lower the similarity between the two image features, and the more likely that the two image features do not belong to the same scene category. Therefore, according to each of the similarity and the similarity threshold, it is determined whether each scene category includes the scene category to which the test sample image belongs, and if any similarity is smaller than the similarity threshold, it means that the test sample image has The image feature and the target class center feature corresponding to the similarity are very likely to belong to the same scene category, then it is determined that each scene category included in the sample set includes the scene category to which the test sample image belongs; if each similarity is not less than If the similarity threshold is set, it means that the image features of the test sample image and the central feature of each target class are from different scene categories, then it is determined that each scene category included in the sample set does not include the scene category to which the test sample image belongs.

在一种可能的实施方式中，若相似度是根据余弦相似度等度量距离确定的，则两个图像特征的相似度越小，说明两个图像特征之间相似性越低，两个图像特征越不可能归属于同一场景类别；两个图像特征的相似度越小，说明两个图像特征之间的相似性越高，两个图像特征越可能归属于同一场景类别。因此，在根据每个所述相似度以及相似度阈值，确定每个场景类别是否包含所述测试样本图像所归属的场景类别，若存在任一相似度大于相似度阈值，则说明测试样本图像的图像特征与该相似度所对应的目标类中心特征极有可能归属于同一场景类别，则确定样本集中包含的每个场景类别包含测试样本图像所归属的场景类别；若每个相似度均不大于相似度阈值，则说明测试样本图像的图像特征与每个目标类中心特征均来自不同的场景类别，则确定样本集中包含的每个场景类别不包含测试样本图像所归属的场景类别。In a possible implementation, if the similarity is determined according to a metric distance such as cosine similarity, the smaller the similarity between the two image features is, the lower the similarity between the two image features is, and the lower the similarity between the two image features is. The less likely they belong to the same scene category; the smaller the similarity between the two image features, the higher the similarity between the two image features, the more likely the two image features belong to the same scene category. Therefore, according to each of the similarity and the similarity threshold, it is determined whether each scene category includes the scene category to which the test sample image belongs. If there is any similarity greater than the similarity threshold, it means that the test sample image has The image feature and the target class center feature corresponding to the similarity are very likely to belong to the same scene category, then it is determined that each scene category included in the sample set includes the scene category to which the test sample image belongs; if each similarity is not greater than The similarity threshold indicates that the image features of the test sample images and the central feature of each target class are from different scene categories, and it is determined that each scene category included in the sample set does not include the scene category to which the test sample images belong.

具体的，若确定每个场景类别包含该测试样本图像所归属的场景类别，说明该测试样本图像所归属的场景类别是场景识别模型可以准确地确定的，即该测试样本图像所归属的场景类别是已知的，则通过场景识别模型，确定该测试样本图像所归属的场景类别；若确定每个场景类别不包含该测试样本图像所归属的场景类别，说明该测试样本图像所归属的场景类别不是场景识别模型可以准确地确定的，即该测试样本图像所归属的场景类别是未知的，则不继续识别图像所归属的场景类别。Specifically, if it is determined that each scene category includes the scene category to which the test sample image belongs, it means that the scene category to which the test sample image belongs can be accurately determined by the scene recognition model, that is, the scene category to which the test sample image belongs. is known, the scene recognition model is used to determine the scene category to which the test sample image belongs; if it is determined that each scene category does not contain the scene category to which the test sample image belongs, the scene category to which the test sample image belongs is indicated. If the scene recognition model can not be accurately determined, that is, the scene category to which the test sample image belongs is unknown, then the scene category to which the image belongs will not be recognized.

由于测试集包含大量的测试样本图像，对每个测试样本图像都进行上述操作。基于每个测试样本图像的处理结果(包括场景识别模型是否识别测试样本图像所归属的场景类别的结果、以及场景识别模型识别测试样本图像所归属的场景类别的情况下，获取到的该测试样本图像的场景概率向量)、每个测试样本图像的处理标签、以及每个测试样本图像的场景标签，进行相应的计算，确定场景识别模型的各项评价指标，比如准确率，错误率、精准率等。若确定该场景识别模型的各项评价指标满足预设的发布要求，则可以上线发布该场景识别模型。若确定该场景识别模型的各项评价指标不满足预设的发布要求，则可以重新基于样本集中的样本图像，对该场景识别模型进行进一步地训练。Since the test set contains a large number of test sample images, the above operations are performed for each test sample image. Based on the processing result of each test sample image (including whether the scene recognition model recognizes the result of the scene category to which the test sample image belongs, and when the scene recognition model recognizes the scene category to which the test sample image belongs, the obtained test sample The scene probability vector of the image), the processing label of each test sample image, and the scene label of each test sample image, perform corresponding calculations to determine various evaluation indicators of the scene recognition model, such as accuracy rate, error rate, precision rate Wait. If it is determined that each evaluation index of the scene recognition model meets the preset release requirements, the scene recognition model can be released online. If it is determined that the evaluation indicators of the scene recognition model do not meet the preset release requirements, the scene recognition model can be further trained based on the sample images in the sample set.

由于本申请在训练场景识别模型时，同时去学习样本集中每个场景类别的样本图像的样本特征，从而获取到每个场景类别的类中心特征，使得后续在使用或测试该已训练的场景识别模型时，可以通过该场景识别模型，获取输入图像的图像特征，然后确定该图像特征与每个场景类别的类中心特征的度量距离，若确定该图像特征与任一场景类别的类中心特征的度量距离都不近，说明确定已知的每个场景类别不包含该图像所归属的场景类别，即该图像所归属的场景类别也是未知的，则不对该图像进行后续的场景类型的识别，从而实现了场景识别模型所提取到的图像特征具有判别性，以帮助场景识别模型确定图像所归属的场景类别是否是该场景识别图像可以识别的，避免了误识别该图像所归属的场景类别而影响下有算法的精度。Since the present application learns the sample features of the sample images of each scene category in the sample set at the same time when training the scene recognition model, so as to obtain the class center feature of each scene category, so that the trained scene recognition model can be used or tested in the future. In the model, the image feature of the input image can be obtained through the scene recognition model, and then the metric distance between the image feature and the class center feature of each scene category can be determined. The metric distances are not close, indicating that each known scene category does not contain the scene category to which the image belongs, that is, the scene category to which the image belongs is also unknown, then the subsequent scene type identification for the image is not performed, so that It realizes that the image features extracted by the scene recognition model are discriminative to help the scene recognition model determine whether the scene category to which the image belongs is recognizable by the scene recognition image, and avoids the misidentification of the scene category to which the image belongs. Below is the precision of the algorithm.

实施例2：Example 2:

以执行主体为显示设备为例，下面通过具体实施例对本申请提供的场景识别模型训练方法进行详细的说明，图2为本申请一些实施例提供的具体的场景识别模型训练流程示意图，该流程包括：Taking the execution subject as the display device as an example, the following describes the scene recognition model training method provided by the present application in detail through specific embodiments. FIG. 2 is a schematic diagram of a specific scene recognition model training process provided by some embodiments of the present application. The process includes: :

S201：构建原始场景识别模型。S201: Build an original scene recognition model.

S202：随机构建每个场景类别的类中心特征。S202: Randomly construct the class center feature of each scene category.

S203：获取样本集中任一样本图像。S203: Obtain any sample image in the sample set.

其中，样本图像对应有场景标签，场景标签用于标识样本图像所归属的第一场景类别。The sample image corresponds to a scene label, and the scene label is used to identify the first scene category to which the sample image belongs.

S204：通过原始场景识别模型，确定样本图像对应的场景概率向量以及样本图像的样本特征。S204: Determine the scene probability vector corresponding to the sample image and the sample feature of the sample image through the original scene recognition model.

其中，场景概率向量包括样本图像分别归属于每个场景类别的概率值。Wherein, the scene probability vector includes the probability values that the sample images belong to each scene category respectively.

下面结合图3对通过原始场景识别模型，确定样本图像对应的场景概率向量以及样本图像的样本特征的过程进行详细的介绍，图3为本申请一些实施例提供的一种原始场景识别模型的结构示意图。将任一样本图像输入到原始场景识别模型之后，通过该原始场景识别模型中的特征提取层，可以获取输入的样本图像的样本特征。然后通过原始场景识别模型中的特征输出层，可以将该样本特征输出。通过原始场景识别模型中的分类输出层，基于该样本特征，可以获取并输出样本图像对应的场景概率向量。The process of determining the scene probability vector corresponding to the sample image and the sample feature of the sample image by using the original scene recognition model will be described in detail below with reference to FIG. 3 . FIG. 3 is a structure of an original scene recognition model provided by some embodiments of the application. Schematic. After inputting any sample image into the original scene recognition model, the sample features of the input sample image can be obtained through the feature extraction layer in the original scene recognition model. Then, the sample features can be output through the feature output layer in the original scene recognition model. Through the classification output layer in the original scene recognition model, based on the sample features, the scene probability vector corresponding to the sample image can be obtained and output.

由于样本集包含大量的样本图像，对每个样本图像都进行上述操作S203～S204的步骤。Since the sample set includes a large number of sample images, the above steps of operations S203 to S204 are performed for each sample image.

S205：更新当前迭代的每个场景类别的类中心特征。S205: Update the class center feature of each scene category of the current iteration.

其中，若当前迭代为第一次迭代，根据当前迭代获取到的每个样本特征，确定每个场景类别分别对应的候选类中心特征；将每个候选类中心特征，确定为下一迭代训练中每个场景类别的类中心特征。Among them, if the current iteration is the first iteration, the candidate class center feature corresponding to each scene category is determined according to each sample feature obtained in the current iteration; each candidate class center feature is determined as the next iteration of training. Class-centric features for each scene category.

若当前迭代为第一次迭代，根据当前迭代获取到的每个样本特征，确定每个场景类别分别对应的候选类中心特征；针对每个场景类别，确定该场景类别对应的候选类中心特征，与当前迭代确定的该场景类别对应的类中心特征的差向量；根据差向量、预先配置的权重向量以及当前迭代确定的该场景类别对应的类中心特征，确定下一迭代训练中该场景类别对应的类中心特征。If the current iteration is the first iteration, according to each sample feature obtained in the current iteration, determine the candidate class center feature corresponding to each scene category; for each scene category, determine the candidate class center feature corresponding to the scene category, The difference vector of the class center feature corresponding to the scene category determined in the current iteration; according to the difference vector, the preconfigured weight vector, and the class center feature corresponding to the scene category determined in the current iteration, determine the scene category corresponding to the next iteration training. class-centric features.

S206：针对每个样本图像，根据该样本图像的场景概率向量以及场景标签、该样本图像的样本特征以及该样本图像所归属的第一场景类别对应的类中心特征、该样本图像的样本特征以及该样本图像的第二场景类别对应的类中心特征，确定综合损失值。S206: For each sample image, according to the scene probability vector and the scene label of the sample image, the sample feature of the sample image and the class center feature corresponding to the first scene category to which the sample image belongs, the sample feature of the sample image and The class center feature corresponding to the second scene category of the sample image determines the comprehensive loss value.

S207：确定每个综合损失值的和是否小于预设的损失值阈值，若小于，则执行S208，否则，执行S209。S207: Determine whether the sum of each comprehensive loss value is less than the preset loss value threshold, if it is less than, execute S208, otherwise, execute S209.

S208：获取到训练完成的场景识别模型并保存。S208: Acquire and save the scene recognition model that has been trained.

S209：对原始场景识别模型的参数的参数值进行调整，执行S203。S209: Adjust the parameter values of the parameters of the original scene recognition model, and execute S203.

实施例3：Example 3:

本申请还提供了一种场景识别方法，图4为本申请一些实施例提供的场景识别过程示意图，该过程包括：The present application also provides a scene recognition method. FIG. 4 is a schematic diagram of a scene recognition process provided by some embodiments of the present application, and the process includes:

S401：通过预先训练的场景识别模型，确定待识别图像的图像特征。S401: Determine image features of the image to be recognized by using a pre-trained scene recognition model.

S402：确定所述图像特征，分别与每个场景类别的目标类中心特征的相似度。S402: Determine the similarity between the image features and the target class center feature of each scene category respectively.

S403：根据每个所述相似度以及相似度阈值，确定所述每个场景类别是否包含所述待识别图像所归属的场景类别。S403: Determine whether each scene category includes the scene category to which the to-be-identified image belongs, according to each of the similarity and the similarity threshold.

S404：若确定所述每个场景类别包含所述待识别图像所归属的场景类别，则通过所述场景识别模型，确定所述待识别图像所归属的场景类别。S404: If it is determined that each scene category includes the scene category to which the to-be-identified image belongs, determine the scene category to which the to-be-identified image belongs by using the scene recognition model.

S405：若确定所述每个场景类别不包含所述待识别图像所归属的场景类别，则不继续识别所述待识别图像所归属的场景类别。S405: If it is determined that each scene category does not include the scene category to which the to-be-identified image belongs, do not continue to identify the scene category to which the to-be-identified image belongs.

本申请提供的场景识别方法应用于电子设备，该电子设备可以是如移动终端等智能设备，也可以是服务器。当然，该电子设备还可以是如电视机等显示设备。The scene recognition method provided in this application is applied to an electronic device, and the electronic device may be a smart device such as a mobile terminal, or a server. Of course, the electronic device may also be a display device such as a television.

在一种可能的应用场景中，以电子设备为电视机为例，对电视播放的视频画面进行实时的场景分类的场景为例，为了更好的对视频画面进行分析，电视节可以先对视频中包含的图像进行场景识别，以根据该视频所归属的场景类别，结合下游算法，对该视频画面进行处理，比如，对视频画面的画质进行优化等。In a possible application scenario, taking the electronic device as a TV set as an example, the real-time scene classification of the video screen played by the TV is an example. In order to better analyze the video screen, the TV festival can first analyze the video Perform scene recognition on the images contained in the video, so as to process the video picture according to the scene category to which the video belongs, combined with the downstream algorithm, for example, optimize the picture quality of the video picture.

在一种可能的实施方式中，在确定当电子设备接收到对某一视频中的图像的场景识别的处理请求后，便将该图像确定为待识别图像，并基于该待识别图像，采用本申请提供的场景识别方法，进行相应的处理。In a possible implementation, after it is determined that the electronic device receives a processing request for scene recognition of an image in a video, the image is determined as an image to be recognized, and based on the image to be recognized, this Apply for the provided scene recognition method and process accordingly.

其中，进行场景识别的电子设备接收到对某一视频中的图像进行场景识别的处理请求，主要包括以下至少一种情况：Wherein, the electronic device that performs scene recognition receives a processing request to perform scene recognition on an image in a video, which mainly includes at least one of the following situations:

情况一、当需要进行场景识别时，用户可以向智能设备输入场景识别的业务处理请求，智能设备接收到该业务处理请求后，便可以向进行场景识别的电子设备发送对视频中的图像进行场景识别的处理请求。Situation 1. When scene recognition is required, the user can input a service processing request for scene recognition to the smart device. After the smart device receives the service processing request, it can send the scene recognition electronic device to perform a scene on the image in the video. Identified processing request.

情况二、当智能设备确定录制到视频后，便生成对录制的视频中的图像进行场景识别的处理请求并发送至进行场景识别的电子设备。Scenario 2: After the smart device determines that the video is recorded, it generates a processing request for scene recognition of the image in the recorded video and sends it to the electronic device for scene recognition.

情况三、当用户需要对某一特定视频进行场景识别时，可以向智能设备输入对该视频进行场景识别的业务处理请求，智能设备接收到该业务处理请求后，便可以向进行场景识别的电子设备发送对该视频中的图像进行场景识别的处理请求。Situation 3. When the user needs to perform scene recognition on a specific video, he can input a business processing request for scene recognition of the video to the smart device. After receiving the business processing request, the smart device can send the scene recognition electronic The device sends a processing request for scene recognition of the image in the video.

需要说明的是，进行场景识别的电子设备可以与该智能设备相同，也可以不同。It should be noted that the electronic device for scene recognition may be the same as the smart device, or may be different.

作为一种可能的实施方式中，也可以预设有场景识别条件，比如，当接收到显示设备发送的视频便对该视频中的图像进行场景识别，当接收到显示设备发送的某一视频中预设数量帧图像时便对该预设数量帧图像进行场景识别、按照预设的周期对当前获取到的视频中的图像进行场景识别等。当电子设备确定当前时间满足预设的场景识别条件时，便对某一视频中的图像的场景识别。As a possible implementation, scene recognition conditions may also be preset, for example, when a video sent by a display device is received, scene recognition is performed on an image in the video, and when a video sent by a display device is received When the preset number of frame images is preset, scene recognition is performed on the preset number of frame images, and scene recognition is performed on the images in the currently acquired video according to a preset period, and the like. When the electronic device determines that the current time satisfies the preset scene recognition condition, it recognizes the scene of the image in a certain video.

在本申请中，在获取视频中的图像时，可以按照预设的抽帧策略，从视频中抽取部分视频帧，将抽取的部分视频帧转换成对应的图像，也可以按照全量取帧的方式，将该视频中的全部视频帧转换成对应的图像。In this application, when acquiring an image in a video, some video frames can be extracted from the video according to a preset frame extraction strategy, and the extracted part of the video frames can be converted into corresponding images, or the whole amount of frames can be extracted. , convert all video frames in the video into corresponding images.

为了准确地确定图像所归属的场景，预先训练有场景识别模型。当进行场景识别的电子设备需要对某一待识别图像进行场景识别时，可以将该待识别图像输入到预先训练的场景识别模型，以通过该预先训练的场景识别模型，确定输入的待识别图像所归属的场景类别。In order to accurately determine the scene to which an image belongs, a scene recognition model is pre-trained. When the electronic device for scene recognition needs to perform scene recognition on a certain image to be recognized, the to-be-recognized image can be input into a pre-trained scene recognition model, so that the input to-be-recognized image can be determined through the pre-trained scene recognition model. The scene category to which it belongs.

其中，对场景识别模型进行训练的过程，已在上述实施例中进行描述，重复之处不做赘述。对于上述场景识别模型的训练方法，由于在基于样本集中的样本图像对原始场景识别模型进行训练的过程中，通过原始场景识别模型，可以获取到输入的样本图像对应的场景概率向量以及样本图像的样本特征，使得后续可以基于所述场景概率向量以及所述场景标签、所述样本特征以及所述第一场景类别对应的类中心特征、所述样本特征以及第二场景类别对应的类中心特征，对所述原始场景识别模型进行训练，以获取到训练完成的场景识别模型，使得训练得到的场景识别模型，可以根据相同场景类别内图像的图像特征向该场景类别的类中心特征靠拢，同时远离其他场景类别的类中心特征的特性，进一步结合图像的特征层面，确定该图像的场景类别是否可以识别以及在该图像的场景类别可以识别的情况下，该图像所归属的场景类别，不仅实现了准确地识别归属于封闭的图像集中包含的场景类别图像，还能够处理不归属于封闭的图像集中包含的场景类别图像，提高了场景识别模型的精度、性能以及自然度。Wherein, the process of training the scene recognition model has been described in the above-mentioned embodiment, and repeated description will not be repeated. For the above-mentioned training method of the scene recognition model, in the process of training the original scene recognition model based on the sample images in the sample set, through the original scene recognition model, the scene probability vector corresponding to the input sample image and the probability vector of the sample image can be obtained. The sample feature, so that the subsequent can be based on the scene probability vector and the scene label, the sample feature and the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category, The original scene recognition model is trained to obtain the trained scene recognition model, so that the trained scene recognition model can move closer to the class center feature of the scene category according to the image features of the images in the same scene category, while staying away from the class center feature of the scene category. The characteristics of the class-centric features of other scene categories, further combined with the feature level of the image, determine whether the scene category of the image can be identified and if the scene category of the image can be identified, the scene category to which the image belongs is not only realized. The scene category images included in the closed image set are accurately identified, and the scene category images that are not included in the closed image set can also be processed, thereby improving the accuracy, performance and naturalness of the scene recognition model.

需要说明的是，进行训练场景识别模型的电子设备与进行场景识别的电子设备可以相同，也可以不同。It should be noted that the electronic device for training the scene recognition model and the electronic device for scene recognition may be the same or different.

由于待识别图像所归属的场景类别是无法预计的，且具有一定的多样性，而待识别图像实际所归属的场景类别，有可能不是用于训练场景识别模型的样本集中包含的场景类别，但如果直接通过该场景识别模型去识别该图像所归属的场景类别，则会得到错误的结果，进而影响下游算法的处理。并且，如果某一图像归属的场景分类是预先训练的场景识别模型可以识别的场景分类，则该图像的图像特征，一般会与样本集中归属于该场景类型的样本图像的图像特征的度量距离越大，与样本集中不归属于该场景类型的样本图像的图像特征的度量距离越小。因此，为了保证场景识别模型可以准确地识别归属于封闭的图像集中包含的场景类别图像，在本申请中，通过预先训练的场景识别模型可以获取到待识别图像的图像特征，且对于用于训练场景识别模型的样本集中包含的场景类别，预先获取有该场景类别的目标类中心特征。当基于上述的实施例将待识别图像输入到预先训练的场景识别模型之后，通过该预先训练的场景识别模型，可以获取待识别图像的图像特征。然后确定该图像特征，分别与每个场景类别的目标类中心特征的相似度。根据每个相似度，确定该待识别图像所归属的场景类别是否为样本集中包含的任一场景类别。Since the scene category to which the image to be recognized belongs is unpredictable and has a certain diversity, the scene category to which the image to be recognized actually belongs may not be the scene category included in the sample set used to train the scene recognition model, but If the scene category to which the image belongs is directly identified by the scene recognition model, erroneous results will be obtained, which will affect the processing of downstream algorithms. Moreover, if the scene classification to which an image belongs is a scene classification that can be recognized by the pre-trained scene recognition model, the image features of the image generally have a greater metric distance from the image features of the sample images belonging to the scene type in the sample set. The larger the metric distance, the smaller the metric distance from the image features of the sample images that do not belong to the scene type in the sample set. Therefore, in order to ensure that the scene recognition model can accurately identify the images belonging to the scene categories included in the closed image set, in this application, the image features of the images to be recognized can be obtained through the pre-trained scene recognition model, and for the images used for training The scene category included in the sample set of the scene recognition model is pre-obtained with the target class center feature of the scene category. After the to-be-recognized image is input into the pre-trained scene recognition model based on the above-mentioned embodiment, the image features of the to-be-recognized image can be acquired through the pre-trained scene recognition model. Then determine the similarity of the image feature with the target class center feature of each scene category, respectively. According to each similarity, it is determined whether the scene category to which the image to be recognized belongs is any scene category included in the sample set.

其中，该目标类中心特征可以是在对原始场景识别模型的最后一次迭代训练时，样本集中包含的每个场景类别的类中心特征。Wherein, the target class center feature may be the class center feature of each scene category included in the sample set during the last iterative training of the original scene recognition model.

具体实施过程中，通过该场景识别模型中的特征提取层，可以获取输入的待识别图像的图像特征。然后通过该场景识别模型中的特征输出层，可以将该图像特征输出。然后确定该图像特征，分别与每个场景类别的目标类中心特征的相似度。根据每个相似度，确定该待识别图像所归属的场景类别是否为样本集中包含的任一场景类别。In the specific implementation process, through the feature extraction layer in the scene recognition model, the image features of the input image to be recognized can be acquired. Then, through the feature output layer in the scene recognition model, the image features can be output. Then determine the similarity of the image feature with the target class center feature of each scene category, respectively. According to each similarity, it is determined whether the scene category to which the image to be recognized belongs is any scene category included in the sample set.

需要说明的是，在对原始场景识别模型的最后一次迭代训练时，样本集中包含的每个场景类别的类中心特征的获取方法，可以参照情况1和情况2中的获取方法，重复之处不足赘述。It should be noted that, in the last iterative training of the original scene recognition model, the acquisition method of the class center feature of each scene category contained in the sample set can refer to the acquisition methods in Case 1 and Case 2, and the repetition is insufficient. Repeat.

在一种可能的实施方式中，可以根据该图像特征，分别与每个场景类别的目标类中心特征之间的度量距离，确定该图像特征，分别与每个场景类别的目标类中心特征的相似度。其中，该度量距离可以通过欧式距离、余弦相似度、KL散度函数等方式获取。In a possible implementation manner, the image feature may be determined according to the metric distance between the image feature and the target class center feature of each scene category, respectively, which is similar to the target class center feature of each scene category. Spend. The metric distance can be obtained by means of Euclidean distance, cosine similarity, KL divergence function, or the like.

在一种可能的实施方式中，在确定图像特征与每个样本图像所归属的场景类别的目标类中心特征之间的度量距离时，可以通过如下欧式距离公式确定：In a possible implementation manner, when determining the metric distance between the image feature and the target class center feature of the scene category to which each sample image belongs, it can be determined by the following Euclidean distance formula:

其中，d(x，y_i)表示图像特征x与第i个场景类别的目标类中心特征之间的度量距离。Among them, d(x, y _i ) represents the metric distance between the image feature x and the target class center feature of the ith scene category.

在另一种可能的实施方式中，由于欧式距离代表两个向量在绝对距离上的靠近程度，余弦相似度代表两个向量在方向上的靠近程度。因此，在确定图像特征与每个样本图像所归属的场景类别的目标类中心特征之间的度量距离时，可以通过如下公式确定：In another possible implementation, since the Euclidean distance represents the closeness of the two vectors in absolute distance, the cosine similarity represents the closeness of the two vectors in the direction. Therefore, when determining the metric distance between the image feature and the target class center feature of the scene category to which each sample image belongs, it can be determined by the following formula:

其中，d(x，y_i)表示图像特征x与第i个场景类别的目标类中心特征之间的度量距离，cos_sim(x，y_i)表示图像特征x与第i个场景类别的目标类中心特征之间的余弦相似度，α₁表示欧式距离对应的权重值，α₂表示余弦相似度对应的权重值。Among them, d(x, y _i ) represents the metric distance between the image feature x and the target class center feature of the ith scene category, and cos_sim(x, _yi ) represents the image feature x and the target class of the ith scene category Cosine similarity between central features, α ₁ represents the weight value corresponding to the Euclidean distance, α ₂ represents the weight value corresponding to the cosine similarity.

在一种可能的实施方式中，若相似度是根据欧式距离等度量距离确定的，则两个图像特征的相似度越小，说明两个图像特征之间相似性越高，两个图像特征越有可能归属于同一场景类别；两个图像特征的相似度越大，说明两个图像特征之间的相似性越低，两个图像特征越可能不归属于同一场景类别。因此，在根据每个所述相似度以及相似度阈值，确定每个场景类别是否包含所述待识别图像所归属的场景类别，若存在任一相似度小于相似度阈值，则说明待识别图像的图像特征与该相似度所对应的目标类中心特征极有可能归属于同一场景类别，则确定样本集中包含的每个场景类别包含待识别图像所归属的场景类别；若每个相似度均不小于相似度阈值，则说明待识别图像的图像特征与每个目标类中心特征均来自不同的场景类别，则确定样本集中包含的每个场景类别不包含待识别图像所归属的场景类别。In a possible implementation, if the similarity is determined according to the Euclidean distance and other metric distances, the smaller the similarity between the two image features, the higher the similarity between the two image features, the higher the similarity between the two image features. It is possible to belong to the same scene category; the greater the similarity of the two image features, the lower the similarity between the two image features, and the more likely that the two image features do not belong to the same scene category. Therefore, according to each of the similarity and the similarity threshold, it is determined whether each scene category includes the scene category to which the to-be-recognized image belongs. The image feature and the target class center feature corresponding to the similarity are very likely to belong to the same scene category, then it is determined that each scene category included in the sample set contains the scene category to which the image to be recognized belongs; if each similarity is not less than The similarity threshold indicates that the image feature of the image to be recognized and the central feature of each target class are from different scene categories, and it is determined that each scene category included in the sample set does not include the scene category to which the image to be identified belongs.

在一种可能的实施方式中，若相似度是根据余弦相似度等度量距离确定的，则两个图像特征的相似度越小，说明两个图像特征之间相似性越低，两个图像特征越不可能归属于同一场景类别；两个图像特征的相似度越小，说明两个图像特征之间的相似性越高，两个图像特征越可能归属于同一场景类别。因此，在根据每个所述相似度以及相似度阈值，确定每个场景类别是否包含所述待识别图像所归属的场景类别，若存在任一相似度大于相似度阈值，则说明待识别图像特征的图像与该相似度所对应的目标类中心特征极有可能归属于同一场景类别，则确定样本集中包含的每个场景类别包含待识别图像所归属的场景类别；若每个相似度均不大于相似度阈值，则说明待识别图像的图像特征与每个目标类中心特征均来自不同的场景类别，则确定样本集中包含的每个场景类别不包含待识别图像所归属的场景类别。In a possible implementation, if the similarity is determined according to a metric distance such as cosine similarity, the smaller the similarity between the two image features is, the lower the similarity between the two image features is, and the lower the similarity between the two image features is. The less likely they belong to the same scene category; the smaller the similarity between the two image features, the higher the similarity between the two image features, the more likely the two image features belong to the same scene category. Therefore, according to each of the similarity and the similarity threshold, it is determined whether each scene category includes the scene category to which the to-be-identified image belongs, and if any similarity is greater than the similarity threshold, it means that the to-be-identified image features The image and the target class center feature corresponding to the similarity are very likely to belong to the same scene category, then it is determined that each scene category included in the sample set includes the scene category to which the image to be recognized belongs; if each similarity is not greater than The similarity threshold indicates that the image feature of the image to be recognized and the central feature of each target class are from different scene categories, and it is determined that each scene category included in the sample set does not include the scene category to which the image to be identified belongs.

具体实施过程中，若确定每个场景类别包含该待识别图像所归属的场景类别，说明该待识别图像所归属的场景类别是场景识别模型可以准确地确定的，即该待识别图像所归属的场景类别是已知的，则通过预先训练的场景识别模型，确定该待识别图像所归属的场景类别；若确定每个场景类别不包含该待识别图像所归属的场景类别，说明该待识别图像所归属的场景类别不是场景识别模型可以准确地确定的，即该待识别图像所归属的场景类别是未知的，则不继续识别图像所归属的场景类别。In the specific implementation process, if it is determined that each scene category includes the scene category to which the to-be-recognized image belongs, it means that the scene category to which the to-be-recognized image belongs can be accurately determined by the scene recognition model, that is, the to-be-recognized image belongs to the scene category. If the scene category is known, the pre-trained scene recognition model is used to determine the scene category to which the to-be-recognized image belongs; if it is determined that each scene category does not contain the scene category to which the to-be-identified image belongs, it indicates that the to-be-recognized image belongs to the scene category. The to-be-recognized scene category cannot be accurately determined by the scene recognition model, that is, the scene category to which the to-be-recognized image belongs is unknown, and the scene category to which the image belongs is not identified.

进一步的，若确定每个场景类别包含该待识别图像所归属的场景类别，则通过该场景识别模型中的分类输出层，基于该待识别图像的图像特征，可以获取并输出待识别图像所归属的场景类别。Further, if it is determined that each scene category includes the scene category to which the to-be-recognized image belongs, then through the classification output layer in the scene-recognition model, based on the image features of the to-be-recognized image, it is possible to obtain and output the to-be-recognized image to which the image belongs. scene category.

由于预先训练有场景识别模型，且该场景识别模型是基于样本图像的场景概率向量以及样本图像的场景标签、样本图像的样本特征以及样本图像的第一场景类别对应的类中心特征、样本图像的样本特征以及样本图像的第二场景类别对应的类中心特征，对原始场景识别模型进行训练获取到的，使得基于该场景识别模型，对待识别图像所归属的场景类别进行识别的过程中，可以根据相同场景类别内图像的图像特征向该场景类别的类中心特征靠拢，同时远离其他场景类别的类中心特征的特性，进一步结合图像的特征层面，确定该图像的场景类别是否可以识别，以及在该图像的场景类别可以识别的情况下，该图像所归属的场景类别，不仅实现了准确地识别归属于封闭的图像集中包含的场景类别图像，还能够处理不归属于封闭的图像集中包含的场景类别图像，提高了场景识别模型的精度、性能以及自然度。Since the scene recognition model is pre-trained, and the scene recognition model is based on the scene probability vector of the sample image, the scene label of the sample image, the sample feature of the sample image, the class center feature corresponding to the first scene category of the sample image, the The sample feature and the class center feature corresponding to the second scene category of the sample image are obtained by training the original scene recognition model, so that based on the scene recognition model, in the process of recognizing the scene category to which the image to be recognized belongs, according to The image features of the images in the same scene category are closer to the class center features of the scene category, while being far away from the characteristics of the class center features of other scene categories. When the scene category of an image can be identified, the scene category to which the image belongs can not only accurately identify the scene category images included in the closed image set, but also process the scene categories that do not belong to the closed image set. images, improving the accuracy, performance, and naturalness of scene recognition models.

实施例4：Example 4:

下面以进行场景识别的电子设备为电视机为例，通过具体实施例对本申请提供的场景识别方法进行详细的说明，图5为本申请一些实施例提供的具体的场景识别流程示意图，该流程包括：In the following, taking the electronic device for scene recognition as a TV set as an example, the scene recognition method provided by the present application will be described in detail through specific embodiments. FIG. 5 is a schematic diagram of a specific scene recognition process provided by some embodiments of the present application. The process includes: :

S501：获取预先训练的场景识别模型。S501: Obtain a pre-trained scene recognition model.

S502：通过预先训练的场景识别模型，确定待识别图像的图像特征。S502: Determine an image feature of an image to be recognized by using a pre-trained scene recognition model.

S503：确定图像特征，分别与每个场景类别的目标类中心特征的相似度。S503: Determine the image features and the similarity with the target class center feature of each scene category respectively.

S504：若相似度为欧氏距离，则判断是否存在任一相似度小于相似度阈值，若存在，则执行S505，否则，执行S506。S504: If the similarity is the Euclidean distance, determine whether there is any similarity smaller than the similarity threshold, if so, execute S505, otherwise, execute S506.

S505：通过场景识别模型，确定待识别图像所归属的场景类别。S505: Determine the scene category to which the image to be recognized belongs by the scene recognition model.

S506：不继续识别待识别图像所归属的场景类别。S506: Do not continue to identify the scene category to which the image to be identified belongs.

实施例5：Example 5:

本申请提供了一种场景识别模型训练装置，图6为本申请一些实施例提供的一种场景识别模型训练装置结构示意图，该装置包括：The application provides a scene recognition model training device, and FIG. 6 is a schematic structural diagram of a scene recognition model training device provided by some embodiments of the application, and the device includes:

获取单元61，用于获取样本集中任一样本图像；其中，所述样本图像对应有场景标签，所述场景标签用于标识所述样本图像所归属的第一场景类别；The obtaining unit 61 is configured to obtain any sample image in the sample set; wherein, the sample image corresponds to a scene label, and the scene label is used to identify the first scene category to which the sample image belongs;

处理单元62，用于通过原始场景识别模型，确定所述样本图像对应的场景概率向量以及所述样本图像的样本特征；其中，所述场景概率向量包括所述样本图像分别归属于每个场景类别的概率值；The processing unit 62 is configured to determine the scene probability vector corresponding to the sample image and the sample feature of the sample image through the original scene recognition model; wherein, the scene probability vector includes that the sample image belongs to each scene category respectively the probability value;

训练单元63，用于基于所述场景概率向量以及所述场景标签、所述样本特征以及所述第一场景类别对应的类中心特征、所述样本特征以及第二场景类别对应的类中心特征，对所述原始场景识别模型进行训练，以获取到训练完成的场景识别模型；其中，所述第二场景类别为所述每个场景类别中，除所述第一场景类别之外的场景类别。A training unit 63 is configured to, based on the scene probability vector and the scene label, the sample feature, and the class center feature corresponding to the first scene category, the sample feature, and the class center feature corresponding to the second scene category, The original scene recognition model is trained to obtain the trained scene recognition model; wherein the second scene category is a scene category other than the first scene category in each scene category.

在某些可能的实施方式中，所述训练单元63，还用于对于所述原始场景识别模型的每次迭代训练，通过当前迭代的场景识别模型，获取所述样本集中的所述每个场景类别的样本图像的样本特征；根据每个所述样本特征，确定所述每个场景类别分别对应的候选类中心特征；基于每个所述候选类中心特征，确定下一迭代训练中所述每个场景类别的类中心特征；或，通过预先训练的特征提取模型，获取所述样本集中的每个所述样本图像的样本特征；对每个所述样本图像的样本特征进行聚类，确定所述每个场景类别的类中心特征。In some possible implementations, the training unit 63 is further configured to, for each iteration training of the original scene recognition model, obtain the each scene in the sample set through the scene recognition model of the current iteration The sample features of the sample images of the categories; according to each of the sample features, determine the candidate class center features corresponding to each scene category respectively; based on each of the candidate class center features, determine the Class-center features of each scene category; or, obtaining sample features of each of the sample images in the sample set through a pre-trained feature extraction model; clustering the sample features of each of the sample images to determine the Describe the class-centric features of each scene category.

在某些可能的实施方式中，所述训练单元63，具体用于针对所述每个场景类别，确定该场景类别的各个样本图像中，被所述当前迭代的场景识别模型正确识别的目标样本图像；根据所述目标样本图像的样本特征及所述目标样本图像的权重值，确定加权平均向量，并基于所述加权平均向量，确定该场景类别对应的候选类中心特征；其中，所述目标样本图像的权重值为预先设置的，或，根据通过所述当前迭代的场景识别模型获取到的所述目标样本图像归属于该场景类别的概率值确定的；或，针对所述每个场景类别，确定该场景类别的各个样本图像中，被所述当前迭代的场景识别模型正确识别的目标样本图像；基于预设的目标算法，获取所述目标样本图像的样本特征中的目标特征，并基于所述目标特征，确定该场景类别对应的候选类中心特征；其中，所述目标特征为主成分特征，或，归一化特征。In some possible implementations, the training unit 63 is specifically configured to, for each scene category, determine the target sample correctly identified by the currently iterative scene recognition model in each sample image of the scene category image; according to the sample feature of the target sample image and the weight value of the target sample image, determine a weighted average vector, and based on the weighted average vector, determine the candidate class center feature corresponding to the scene category; wherein, the target The weight value of the sample image is preset, or determined according to the probability value of the target sample image belonging to the scene category obtained through the scene recognition model of the current iteration; or, for each scene category , determine the target sample image correctly identified by the current iterative scene recognition model in each sample image of the scene category; based on the preset target algorithm, obtain the target feature in the sample features of the target sample image, and based on For the target feature, the candidate class center feature corresponding to the scene category is determined; wherein, the target feature is a principal component feature, or a normalized feature.

在某些可能的实施方式中，所述训练单元63，具体用于若当前迭代为第一次迭代，则将每个所述候选类中心特征，确定为下一迭代训练中所述每个场景类别的类中心特征；若当前迭代不为第一次迭代，则针对所述每个场景类别，确定该场景类别对应的候选类中心特征，与所述当前迭代确定的该场景类别对应的类中心特征的差向量；根据所述差向量、预先配置的权重向量以及所述当前迭代确定的该场景类别对应的类中心特征，确定下一迭代训练中该场景类别对应的类中心特征。In some possible implementations, the training unit 63 is specifically configured to, if the current iteration is the first iteration, determine each candidate class center feature as each scene in the next iteration of training The class center feature of the category; if the current iteration is not the first iteration, for each scene category, determine the candidate class center feature corresponding to the scene category, and the class center corresponding to the scene category determined by the current iteration The difference vector of the features; according to the difference vector, the preconfigured weight vector, and the class center feature corresponding to the scene category determined by the current iteration, determine the class center feature corresponding to the scene category in the next iterative training.

在某些可能的实施方式中，所述训练单元63，具体用于确定第一损失值、第二损失值以及第三损失值；其中，所述第一损失值是基于所述场景概率向量以及所述场景标签确定的；所述第二损失值是基于所述样本特征以及所述第一场景类别对应的类中心特征确定的；所述第三损失值是基于所述样本特征以及所述第二场景类别对应的类中心特征确定的；根据所述第一损失值及其对应的第一权重值、所述第二损失值及其对应的第二权重值、以及第三损失值及其对应的第三权重值，确定综合损失值；基于所述综合损失值，对所述原始场景识别模型进行训练。In some possible implementations, the training unit 63 is specifically configured to determine a first loss value, a second loss value and a third loss value; wherein the first loss value is based on the scene probability vector and The scene label is determined; the second loss value is determined based on the sample feature and the class center feature corresponding to the first scene category; the third loss value is based on the sample feature and the first The class center feature corresponding to the second scene category is determined; according to the first loss value and its corresponding first weight value, the second loss value and its corresponding second weight value, and the third loss value and its corresponding The third weight value is determined, and the comprehensive loss value is determined; based on the comprehensive loss value, the original scene recognition model is trained.

实施例6：Example 6:

图7为本申请一些实施例提供的一种场景识别装置结构示意图，本申请提供了一种场景识别装置，包括：FIG. 7 is a schematic structural diagram of a scene recognition apparatus provided by some embodiments of the present application. The present application provides a scene recognition apparatus, including:

第一处理模块71，用于通过预先训练的场景识别模型，确定待识别图像的图像特征；The first processing module 71 is used to determine the image feature of the image to be recognized through a pre-trained scene recognition model;

第二处理模块72，用于确定所述图像特征，分别与每个场景类别的目标类中心特征的相似度；The second processing module 72 is used to determine the similarity between the image features and the target class center feature of each scene category;

第三处理模块73，用于根据每个所述相似度以及相似度阈值，确定所述每个场景类别是否包含所述待识别图像所归属的场景类别；若确定所述每个场景类别包含所述待识别图像所归属的场景类别，则通过所述场景识别模型，确定所述待识别图像所归属的场景类别；若确定所述每个场景类别不包含所述待识别图像所归属的场景类别，则不继续识别所述待识别图像所归属的场景类别。The third processing module 73 is configured to determine whether each scene category includes the scene category to which the to-be-recognized image belongs according to each similarity and the similarity threshold; if it is determined that each scene category includes the The scene category to which the image to be recognized belongs, then the scene recognition model is used to determine the scene category to which the image to be recognized belongs; if it is determined that each scene category does not include the scene category to which the image to be recognized belongs , the scene category to which the to-be-identified image belongs is not continued to be identified.

实施例7：Example 7:

如图8为本申请一些实施例提供的一种电子设备结构示意图，在上述各实施例的基础上，本申请还提供了一种电子设备，如图8所示，包括：处理器81、通信接口82、存储器83和通信总线84，其中，处理器81，通信接口82，存储器83通过通信总线84完成相互间的通信；FIG. 8 is a schematic structural diagram of an electronic device provided by some embodiments of the present application. On the basis of the above-mentioned embodiments, the present application also provides an electronic device, as shown in FIG. 8 , including: a processor 81 , a communication The interface 82, the memory 83 and the communication bus 84, wherein the processor 81, the communication interface 82, and the memory 83 complete the communication with each other through the communication bus 84;

所述存储器83中存储有计算机程序，当所述程序被所述处理器81执行时，使得所述处理器81执行如下步骤：A computer program is stored in the memory 83, and when the program is executed by the processor 81, the processor 81 is caused to perform the following steps:

由于上述电子设备解决问题的原理与场景识别模型训练方法相似，因此上述电子设备的实施可以参见方法的实施，重复之处不再赘述。Since the principle of the above electronic device for solving the problem is similar to that of the scene recognition model training method, the implementation of the above electronic device can refer to the implementation of the method, and the repetition will not be repeated.

上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral ComponentInterconnect，PCI)总线或扩展工业标准结构(Extended Industry StandardArchitecture，EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示，图中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。The communication bus mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

通信接口82用于上述电子设备与其他设备之间的通信。The communication interface 82 is used for communication between the above-mentioned electronic device and other devices.

存储器可以包括随机存取存储器(Random Access Memory，RAM)，也可以包括非易失性存储器(Non-Volatile Memory，NVM)，例如至少一个磁盘存储器。可选地，存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor.

上述处理器可以是通用处理器，包括中央处理器、网络处理器(NetworkProcessor，NP)等；还可以是数字指令处理器(Digital Signal Processing，DSP)、专用集成电路、现场可编程门陈列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。The above-mentioned processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; it may also be a digital instruction processor (Digital Signal Processing, DSP), an application-specific integrated circuit, a field programmable gate array, or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

实施例8：Example 8:

如图9为本申请一些实施例提供的一种电子设备结构示意图，在上述各实施例的基础上，本申请还提供了一种电子设备，如图9所示，包括：处理器91、通信接口92、存储器93和通信总线94，其中，处理器91，通信接口92，存储器93通过通信总线94完成相互间的通信；FIG. 9 is a schematic structural diagram of an electronic device provided by some embodiments of the present application. On the basis of the above embodiments, the present application also provides an electronic device, as shown in FIG. 9 , including: a processor 91, a communication The interface 92, the memory 93 and the communication bus 94, wherein the processor 91, the communication interface 92, and the memory 93 complete the communication with each other through the communication bus 94;

所述存储器93中存储有计算机程序，当所述程序被所述处理器91执行时，使得所述处理器91执行如下步骤：A computer program is stored in the memory 93, and when the program is executed by the processor 91, the processor 91 is caused to perform the following steps:

由于上述电子设备解决问题的原理与场景识别方法相似，因此上述电子设备的实施可以参见方法的实施，重复之处不再赘述。Since the principle of the above electronic device for solving the problem is similar to the scene recognition method, the implementation of the above electronic device can refer to the implementation of the method, and the repetition will not be repeated.

通信接口92用于上述电子设备与其他设备之间的通信。The communication interface 92 is used for communication between the above-mentioned electronic device and other devices.

实施例9：Example 9:

在上述各实施例的基础上，本申请还提供了一种计算机可读存储介质，所述计算机可读存储介质内存储有可由处理器执行的计算机程序，当所述程序在所述处理器上运行时，使得所述处理器执行时实现如下步骤：On the basis of the foregoing embodiments, the present application further provides a computer-readable storage medium, where a computer program executable by a processor is stored in the computer-readable storage medium, when the program is executed on the processor When running, the processor implements the following steps when executing:

由于上述提供的计算机可读取介质解决问题的原理与场景识别模型训练方法相似，因此处理器执行上述计算机可读取介质中的计算机程序后，实现的步骤可以参见方法的实施，重复之处不再赘述。Since the principle of solving the problem by the computer-readable medium provided above is similar to the scene recognition model training method, after the processor executes the computer program in the computer-readable medium, the steps to be implemented can be referred to in the implementation of the method, and the repetition is not repeated. Repeat.

实施例10：Example 10:

由于上述提供的计算机可读取介质解决问题的原理与场景识别方法相似，因此处理器执行上述计算机可读取介质中的计算机程序后，实现的步骤可以参见方法的实施，重复之处不再赘述。Since the principle of solving the problem by the computer-readable medium provided above is similar to that of the scene recognition method, after the processor executes the computer program in the computer-readable medium, the steps to be implemented can be referred to the implementation of the method, and the repetition will not be repeated. .

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

显然，本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样，倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内，则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

1. A method for training a scene recognition model, the method comprising:

acquiring any sample image in a sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs;

determining scene probability vectors corresponding to the sample images and sample characteristics of the sample images through an original scene recognition model; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category;

training the original scene recognition model based on the scene probability vector, the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category to obtain a trained scene recognition model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.

2. The method of claim 1, wherein obtaining the class-centered feature for each scene category comprises:

for each iteration training of the original scene recognition model, acquiring sample characteristics of the sample image of each scene category in the sample set through the current iteration scene recognition model; determining candidate class center features respectively corresponding to each scene category according to each sample feature; determining the class center feature of each scene category in the next iterative training based on each candidate class center feature;

obtaining the sample characteristics of each sample image in the sample set through a pre-trained characteristic extraction model; and clustering the sample features of each sample image, and determining the class center feature of each scene category.

3. The method according to claim 1, wherein the determining, according to each of the sample features, a candidate class-center feature corresponding to each of the scene categories respectively comprises:

for each scene category, determining a target sample image correctly identified by the scene identification model of the current iteration in each sample image of the scene category; determining a weighted average vector according to the sample characteristics of the target sample image and the weight value of the target sample image, and determining candidate class center characteristics corresponding to the scene type based on the weighted average vector; the weight value of the target sample image is preset, or is determined according to a probability value of the target sample image belonging to the scene category, which is acquired through the current iterative scene identification model; or

For each scene category, determining a target sample image correctly identified by the scene identification model of the current iteration in each sample image of the scene category; acquiring a target feature in the sample features of the target sample image based on a preset target algorithm, and determining a candidate class center feature corresponding to the scene category based on the target feature; wherein the target feature is a principal component feature, or a normalized feature.

4. The method of claim 3, wherein said determining a class center feature for said each scene class in a next iteration of training based on each of said candidate class center features comprises:

if the current iteration is the first iteration, determining each candidate class center feature as the class center feature of each scene class in the next iteration training;

if the current iteration is not the first iteration, determining a candidate class center feature corresponding to the scene category and a difference vector of the class center feature corresponding to the scene category determined by the current iteration for each scene category; and determining the class center feature corresponding to the scene category in the next iteration training according to the difference vector, the pre-configured weight vector and the class center feature corresponding to the scene category determined by the current iteration.

5. The method of claim 1, wherein the training the original scene recognition model based on the scene probability vector and the scene label, the sample feature and the class-centered feature corresponding to the first scene class, the sample feature and the class-centered feature corresponding to the second scene class comprises:

determining a first loss value, a second loss value and a third loss value; wherein the first loss value is determined based on the scene probability vector and the scene tag; the second loss value is determined based on the sample feature and a class center feature corresponding to the first scene category; the third loss value is determined based on the sample feature and a class center feature corresponding to the second scene category;

determining a comprehensive loss value according to the first loss value and a first weight value corresponding to the first loss value, the second loss value and a second weight value corresponding to the second loss value, and a third loss value and a third weight value corresponding to the third loss value;

and training the original scene recognition model based on the comprehensive loss value.

6. A method for scene recognition, the method comprising:

determining the image characteristics of an image to be recognized through a pre-trained scene recognition model;

determining the similarity of the image features and the target class center features of each scene class;

determining whether each scene category comprises the scene category to which the image to be identified belongs according to each similarity and a similarity threshold;

if it is determined that each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model;

and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.

7. An apparatus for training a scene recognition model, the apparatus comprising:

the acquisition unit is used for acquiring any sample image in the sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs;

the processing unit is used for determining a scene probability vector corresponding to the sample image and the sample characteristics of the sample image through an original scene recognition model; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category;

a training unit, configured to train the original scene identification model based on the scene probability vector and the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature, and the class center feature corresponding to the second scene category, so as to obtain a trained scene identification model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.

8. A scene recognition apparatus, characterized in that the apparatus comprises:

the first processing module is used for determining the image characteristics of the image to be recognized through a pre-trained scene recognition model;

the second processing module is used for determining the similarity between the image characteristics and the target class center characteristics of each scene class;

a third processing module, configured to determine, according to each of the similarities and a similarity threshold, whether each of the scene categories includes a scene category to which the image to be identified belongs; if it is determined that each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model; and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.

9. An electronic device, characterized in that the electronic device comprises a processor for implementing the steps of the scene recognition model training method as claimed in any one of claims 1 to 5, or the steps of the scene recognition method as claimed in claim 6, when executing a computer program stored in a memory.

10. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, carries out the steps of the scene recognition model training method as defined in any one of claims 1 to 5, or carries out the steps of the scene recognition method as defined in claim 6.