CN113902944A - Model training and scene recognition method, device, equipment and medium - Google Patents

Model training and scene recognition method, device, equipment and medium Download PDF

Info

Publication number
CN113902944A
CN113902944A CN202111159087.3A CN202111159087A CN113902944A CN 113902944 A CN113902944 A CN 113902944A CN 202111159087 A CN202111159087 A CN 202111159087A CN 113902944 A CN113902944 A CN 113902944A
Authority
CN
China
Prior art keywords
scene
image
sample
category
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111159087.3A
Other languages
Chinese (zh)
Inventor
常河河
查林
白晓楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Xinxin Microelectronics Technology Co Ltd
Original Assignee
Qingdao Xinxin Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Xinxin Microelectronics Technology Co Ltd filed Critical Qingdao Xinxin Microelectronics Technology Co Ltd
Priority to CN202111159087.3A priority Critical patent/CN113902944A/en
Publication of CN113902944A publication Critical patent/CN113902944A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a model training and scene recognition method, device, equipment and medium. The original scene recognition model can be trained based on the scene probability vector, the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category, so that the trained scene recognition model can be drawn close to the class center feature of the scene category according to the image features of the images in the same scene category, and is far away from the features of the class center features of other scene categories, and the feature level of the images is further combined, thereby realizing accurate processing of the scene category images which are not contained in the closed image set, and improving the precision, performance and naturalness of the scene recognition model.

Description

Model training and scene recognition method, device, equipment and medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a medium for model training and scene recognition.
Background
With the development of multimedia technology, people watch more and more types of video images daily, and products related to video contents are more and more abundant. Automatically identifying and classifying scene information of an image helps a machine to better understand the image and helps downstream algorithms develop functionality for different scenes.
With the development of neural networks in the visual field, the performance of the neural networks in the image classification task exceeds most of traditional algorithms. However, most neural network-based scene recognition systems are trained and tested in a closed image set, that is, the scene recognition system can only recognize the scene class contained in the closed image set. However, in practical applications, because the scene categories to which all images may belong are not infinite, the scene category to which the image currently required to be subjected to scene recognition actually belongs may not be the scene category contained in the closed image set, but if the scene category to which the image belongs is recognized by the scene recognition system, an erroneous result may be obtained, thereby affecting the processing of the downstream algorithm.
Therefore, there is a need for a scene recognition system that can accurately recognize not only scene type images included in a closed image set but also scene type images not included in the closed image set.
Disclosure of Invention
The application provides a model training and scene recognition method, device, equipment and medium, which are used for solving the problem that an existing scene recognition system cannot accurately process scene type images which are not included in a closed image set.
The application provides a scene recognition model training method, which comprises the following steps:
acquiring any sample image in a sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs;
determining scene probability vectors corresponding to the sample images and sample characteristics of the sample images through an original scene recognition model; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category;
training the original scene recognition model based on the scene probability vector, the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category to obtain a trained scene recognition model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.
The application provides a scene recognition method, which comprises the following steps:
determining the image characteristics of an image to be recognized through a pre-trained scene recognition model;
determining the similarity of the image features and the target class center features of each scene class;
determining whether each scene category comprises the scene category to which the image to be identified belongs according to each similarity and a similarity threshold;
if it is determined that each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model;
and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.
The application provides a scene recognition model training device, the device includes:
the acquisition unit is used for acquiring any sample image in the sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs;
the processing unit is used for determining a scene probability vector corresponding to the sample image and the sample characteristics of the sample image through an original scene recognition model; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category;
a training unit, configured to train the original scene identification model based on the scene probability vector and the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature, and the class center feature corresponding to the second scene category, so as to obtain a trained scene identification model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.
The application provides a scene recognition device, the device includes:
the first processing module is used for determining the image characteristics of the image to be recognized through a pre-trained scene recognition model;
the second processing module is used for determining the similarity between the image characteristics and the target class center characteristics of each scene class;
a third processing module, configured to determine, according to each of the similarities and a similarity threshold, whether each of the scene categories includes a scene category to which the image to be identified belongs; if it is determined that each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model; and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.
The present application provides an electronic device comprising a processor for implementing the steps of the scene recognition model training method as described above, or implementing the steps of the scene recognition method as described above, when executing a computer program stored in a memory.
The present application provides a computer readable storage medium, storing a computer program which, when being executed by a processor, carries out the steps of the scene recognition model training method as described above, or carries out the steps of the scene recognition method as described above.
In the process of training an original scene recognition model based on a sample image in a sample set, a scene probability vector corresponding to an input sample image and sample characteristics of the sample image can be acquired through the original scene recognition model, so that the original scene recognition model can be trained subsequently based on the scene probability vector, the scene label, the sample characteristics, class center characteristics corresponding to a first scene category, the sample characteristics and class center characteristics corresponding to a second scene category to acquire the trained scene recognition model, so that the trained scene recognition model can be close to the class center characteristics of the scene category according to the image characteristics of the images in the same scene category, and meanwhile, the characteristics of the class center characteristics of other scene categories are kept away from, and further the characteristic level of the images is combined, whether the scene type of the image can be identified or not is determined, and under the condition that the scene type of the image can be identified, the scene type to which the image belongs not only can be accurately identified as the scene type image belonging to the closed image set, but also can be processed as the scene type image not belonging to the closed image set, so that the precision, the performance and the naturalness of the scene identification model are improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a scene recognition model training process provided in some embodiments of the present application;
FIG. 2 is a schematic diagram of a specific scene recognition model training process according to some embodiments of the present disclosure;
FIG. 3 is a schematic structural diagram of an original scene recognition model according to some embodiments of the present application;
FIG. 4 is a schematic diagram of a scene recognition process provided by some embodiments of the present application;
fig. 5 is a schematic view of a specific scene recognition process provided in some embodiments of the present application;
FIG. 6 is a schematic structural diagram of a scene recognition model training apparatus according to some embodiments of the present application;
fig. 7 is a schematic structural diagram of a scene recognition apparatus according to some embodiments of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to some embodiments of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to some embodiments of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
How to enable a scene recognition system to accurately process scene category images which do not belong to a closed image set is an open set recognition problem essentially, and the scene recognition system needs to be capable of finding and learning a scene category to which an unknown scene category image belongs. In summary, the open set identification problem is an important and challenging problem in pattern recognition and multimedia communities.
Therefore, in order to realize that the scene recognition system can accurately process the scene type images which are not included in the closed image set, the application provides a model training and scene recognition method, device, equipment and medium.
Example 1:
fig. 1 is a schematic diagram of a scene recognition model training process provided in some embodiments of the present application, where the process includes:
s101: acquiring any sample image in a sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs.
The scene recognition model training method is applied to electronic equipment, and the electronic equipment can be intelligent equipment such as a mobile terminal and can also be a server such as a family brain. Of course, the electronic device may also be a display device such as a television.
In order to acquire an accurate scene recognition model, an original scene recognition model needs to be trained according to each sample image in a sample set acquired in advance. Wherein, any sample image in the sample set is obtained by the following method: determining the acquired original image as a sample image; and/or after adjusting the pixel values of the pixel points in the acquired original image, determining the adjusted image as a sample image.
For convenience of training the scene recognition model, any sample image in the sample set corresponds to a scene tag, and any scene tag is used to identify a scene type (for convenience of description, referred to as a first scene type) to which the sample image belongs. For example, the scene category is a live scene, a game scene, an eating scene, and the like.
As a possible implementation, if the sample set contains a sufficient number of sample images, that is, a large number of raw images acquired under different environments, the original scene recognition model may be trained according to the sample images in the sample set.
As another possible implementation, if the diversity of the sample image is ensured to improve the accuracy of the scene recognition model, a large number of adjusted images may be obtained by adjusting the pixel values of the pixels in the original image, for example, performing blurring, sharpening, contrast processing, and the like on the original image, and the adjusted images are determined as the sample images to train the original scene recognition model.
According to statistics, in a working scene of a display device, such as a television, taking an electronic device as the display device, relatively common image quality problems in an acquired image include: the image processing method includes the steps of blurring, exposure, over-darkness, too low contrast, noise in a picture and the like, for example, in a live scene, an exposure problem may exist in an acquired image. In order to ensure the diversity of the sample images and improve the accuracy of the scene recognition model, the quality of the acquired original image may be adjusted in advance for the image quality problem possibly existing in the acquired image in the working scene of the display device. The adjusting the pixel values of the pixel points in the acquired original image by at least one of the following methods includes:
the method comprises the steps that firstly, the pixel value of a pixel point in an original image is adjusted through a preset convolution kernel;
secondly, adjusting the contrast of pixel values of pixel points in the original image;
performing brightness adjustment on pixel values of pixel points in the original image;
and fourthly, carrying out noise addition processing on the pixel values of the pixel points in the original image.
For example, if it is desired to perform noise addition processing on the original image so as to obtain an adjusted image with different noises, noise addition processing may be performed on pixel values of pixels in the original image, that is, noise may be randomly added to the original image. In the process of denoising the original image, the types of noise used should be as many as possible, such as white noise, salt and pepper noise, gaussian noise, and the like, so that the sample image in the sample set is more diversified, and the accuracy and robustness of the scene recognition model are improved.
It should be noted that, the process of processing the pixel values of the pixel points in the original image belongs to the prior art, and is not described herein in detail.
By the method, the sample images are obtained, the number of the sample images in the sample set can be multiplied, a large number of sample images can be quickly obtained, and the difficulty, cost and consumed resources for obtaining the sample images are reduced. The original scene recognition model can be trained subsequently according to more sample images, and the accuracy and robustness of the scene recognition model are improved.
As another possible implementation manner, the acquired original image and an adjusted image obtained by adjusting pixel values of pixel points in the acquired original image may be determined as a sample image. And training the original scene recognition model together according to the original images in the sample set and the adjusted images.
S102: determining scene probability vectors corresponding to the sample images and sample characteristics of the sample images through an original scene recognition model; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category.
After the sample set used for training the original scene recognition model is acquired based on the above embodiment, the original scene recognition image may be trained based on each sample image in the sample set.
In a specific implementation process, any sample image is input into an original scene recognition model. Through the original scene recognition model, the scene probability vector corresponding to the sample image and the image features (for convenience of description, referred to as sample features) of the sample image can be obtained. The scene probability vector comprises probability values of the sample images respectively belonging to each scene category, and each scene category is determined by the scene category to which each sample image in the sample set belongs. Any sample feature represents a higher dimensional, more abstract image feature extracted from the sample image.
The original scene recognition model may be a decision tree, a Logistic Regression (LR), a Naive Bayes (NB) classification algorithm, a Random Forest (RF) algorithm, a Support Vector Machine (SVM) classification algorithm, a Histogram of Oriented Gradients (HOG), a deep learning algorithm, or the like. The deep learning algorithm may include a neural Network, a deep neural Network, a Convolutional Neural Network (CNN), and the like.
In one possible implementation, for scene recognition by the scene recognition model, the original scene recognition model includes a feature extraction layer, a feature output layer, and a classification output layer. The feature extraction layer is used for outputting the feature output layer, the feature output layer is connected with the classification output layer, and when the sample image is input into the original scene model, the sample features of the input sample image can be obtained through the feature extraction layer in the original scene recognition model. The sample features can then be output through a feature output layer in the original scene recognition model. Through a classification output layer in the original scene recognition model, based on the sample characteristics, a scene probability vector corresponding to the sample image can be obtained and output.
S103: training the original scene recognition model based on the scene probability vector, the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category to obtain a trained scene recognition model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.
Any sample image in the sample set corresponds to a scene label, namely, a scene type to which the sample image actually belongs is identified, so that in the application, after a scene probability vector corresponding to the sample image and a sample feature of the sample image are determined, an original scene recognition model can be trained by adopting the scene recognition model training method provided by the application based on the scene probability vector, the corresponding scene label and the sample feature.
If the scene classification to which a certain image belongs is a scene classification that can be recognized by a pre-trained scene recognition model, the image feature of the image generally has a greater metric distance from the image feature of the sample image belonging to the scene type in the sample set, and has a smaller metric distance from the image feature of the sample image not belonging to the scene type in the sample set. Based on this, when the scene classification of the image is identified, the metric distance between the image feature of the image and the image feature of each sample image in the sample set can be determined, and according to the obtained metric distance, the scene classification to which the image belongs is determined to be the scene classification which can be identified by the pre-trained scene identification model.
The measurement distance can be obtained by means of Euclidean distance, cosine similarity, KL divergence function and the like.
Further, since the sample set may include a large number of sample images, if the metric distance between the image feature of a certain image and the image feature of each sample image in the sample set is determined, a large amount of computing resources may be consumed, and the efficiency of the scene recognition system in determining the scene category to which the image belongs may be reduced. Based on this, the class center feature of the scene class to which each sample image belongs in the sample set can be obtained, so that the features generally possessed by the scene class image can be represented through the class center feature. Subsequently, when identifying the scene classification of an image, a metric distance between an image feature of the image and a class center feature of a scene class to which each sample image in the sample set belongs may be determined. And determining whether the scene classification to which the image belongs is a scene classification which can be recognized by a pre-trained scene recognition model according to the acquired measurement distance.
It should be noted that the dimension of the sample feature is the same as the dimension of the class center feature.
In one possible implementation, in order to accurately obtain the class center feature of each scene category, the class center feature of each scene category may be obtained as follows:
mode 1, in order to combine the process of obtaining the class center feature of each scene category into the process of model training, for each iterative training of the original scene recognition model, the sample features of one or more sample images of each scene category in the sample set may be obtained through the currently iterated scene recognition model. And then determining candidate class center features respectively corresponding to each scene category according to each sample feature. And determining the class center feature of each scene category in the next iterative training based on each candidate class center feature.
In one possible implementation, for each scene class, a sample image (denoted as a target sample image for convenience of description) correctly identified by the scene identification model of the current iteration among the sample images of the scene class may be determined. The correct recognition by the scene recognition model of the current iteration may be understood as determining that the scene classification of the sample image is the same as the first scene classification of the sample image by the scene recognition model of the current iteration. And then determining a weighted average vector according to the sample characteristics of the target sample image and the weight value of the target sample image, and determining candidate class center characteristics corresponding to the scene type based on the weighted average vector.
The weight value of the target sample image may be configured in advance in a manual configuration manner, for example, the weight value of each target sample image is set to be 1, and a probability value that the target sample image belongs to the scene category may also be obtained through a scene identification model of the current iteration, and the probability value is determined as the weight value of the target sample image.
For example, if the weight value of the target sample image is configured in advance by a manual configuration manner, determining the weighted average vector according to the sample characteristics of the target sample image and the weight value of the target sample image may be represented by the following formula:
Figure BDA0003289423640000071
wherein, CiFor the class-centered feature of the scene class i,
Figure BDA0003289423640000072
to be correctly identified as a sample feature of the jth target sample image of the scene classification i,
Figure BDA0003289423640000073
to correctly identify the number of target sample images of the scene classification i, the weight value of the target sample image is 1.
For another example, if a probability value that the target sample image belongs to the scene category is obtained through a scene recognition model of the current iteration, and the probability value is determined as a weight value of the target sample image, then determining a weighted average vector according to the sample characteristics of the target sample image and the weight value of the target sample image may be represented by the following formula:
Figure BDA0003289423640000074
wherein, CiFor the class-centered feature of the scene class i,
Figure BDA0003289423640000075
to be correctly identified as a sample feature of the jth target sample image of the scene classification i,
Figure BDA0003289423640000076
for the number of target sample images correctly identified as scene classification i,
Figure BDA0003289423640000077
the probability value that the jth target sample image belongs to the scene category i is obtained through the scene identification model of the current iteration, and the higher the weight value is, the more accurate the identification result of the scene identification model of the current iteration is, and the more accurate the contribution of the sample characteristics of the identified sample image to the class center is.
In another possible implementation manner, for each scene category, a target sample image, which is correctly identified by the currently iterated scene identification model, in the sample images of the scene category may also be determined; and acquiring target features in the sample features of the target sample image based on a preset target algorithm, and determining candidate class center features respectively corresponding to scene categories based on the target features. Wherein the target feature is a principal component feature, or a normalized feature.
The process of acquiring the principal component features in the pattern features of the image or normalizing the features by a preset target algorithm belongs to the prior art, and is not described herein any more.
After the candidate class center feature of each scene category is obtained based on the above embodiment, the class center feature of each scene category may be determined according to the candidate class center feature of each scene category. Specifically, the process of determining the class center feature of each scene category according to the candidate class center feature of each scene category mainly includes the following two cases:
in case 1, since the class center feature of each scene category is generated by random initialization before the original scene recognition model is trained, the class center feature of each scene category of the current iteration is inaccurate when the original scene recognition model is trained for the first iteration. Therefore, in the present application, if it is determined that the current iteration is the first iteration, each candidate class center feature may be directly determined as the class center feature of each scene class in the next iteration training, that is, the class center feature of each scene class of the current iteration is updated according to each candidate class center feature, so as to improve the accuracy of the obtained class center feature of each scene class.
And the dimension of the class center feature generated by the random initialization is the same as that of the sample feature.
In case 2, in order to make the class center feature of each scene category determined at each iteration more accurate and the change more stable, in the present application, a weight vector is configured in advance, and the weight vector is used to adjust the amplitude of each update of the class center feature. After determining the candidate class center feature of each scene category based on the above embodiment, if it is determined that the current iteration is not the first iteration, for each scene category, determining a difference vector between the candidate class center feature corresponding to the scene category and the class center feature corresponding to the scene category determined by the current iteration. The difference vector is then adjusted based on the difference vector and a preconfigured weight vector. And determining the class center feature corresponding to the scene category in the next iteration training according to the adjusted difference vector and the class center feature corresponding to the scene category determined by the current iteration.
In one possible embodiment, a product vector of the difference vector and a preconfigured weight vector may be obtained, and the product vector may be determined as the adjusted difference vector.
In a possible implementation manner, a sum vector may be determined according to the adjusted difference vector and the currently determined class center feature corresponding to the scene category, and the sum vector is determined as the class center feature corresponding to the scene category in the next iterative training.
For example, based on the above cases 1 and 2, the process of determining the class center feature of each scene category according to the candidate class center feature of each scene category can be represented by the following formula:
Figure BDA0003289423640000081
wherein,
Figure BDA0003289423640000082
for the class center feature corresponding to the scene class i in the next iteration training,
Figure BDA0003289423640000083
candidate class center vectors corresponding to scene class i,
Figure BDA0003289423640000084
and W is a preset weight vector for the class center feature corresponding to the scene class i determined by the current iteration.
In the mode 2, the sample characteristics of each sample image in the sample set can be obtained through a pre-trained characteristic extraction model. It is understood that the feature extraction model is also a feature extraction algorithm. And then clustering the characteristics of each sample by adopting a clustering algorithm, such as a fuzzy clustering algorithm, a K-means clustering algorithm, a maximum-minimum distance clustering algorithm and the like, so as to obtain a cluster corresponding to each scene category. And the cluster corresponding to any scene category comprises the sample characteristics of the scene category. And then determining the class center feature in the cluster according to the sample features contained in the clusters respectively corresponding to each scene category, namely determining the class center feature of each scene category.
Any sample feature included in the cluster may be determined as a class center feature, or an average vector of each sample feature included in the cluster may be determined as a class center feature. In the specific implementation process, the flexible application can be performed according to actual requirements, and detailed description is not given here.
It should be noted that the process of training the feature extraction model and how to cluster the sample features according to the clustering algorithm belong to the prior art, and are not described in detail herein.
In the application, in the process of training the original scene recognition model, the measurement distance between the sample feature of the sample image and the class center feature of the scene class to which each sample image belongs may be considered. The original scene recognition model is then performed based on the metric distance, the scene probability vector, and the scene label. It can be understood that the original scene recognition model is trained based on the scene probability vector and the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature, and the class center feature corresponding to the second scene category. The second scene category is a scene category except the first scene category in the scene categories to which each sample image in the sample set belongs.
In one possible implementation, when determining the metric distance between the sample feature and the class center feature of the scene class to which each sample image belongs, the metric distance may be determined by the following euclidean distance formula:
Figure BDA0003289423640000093
wherein d (x, y)i) Representing the metric distance between the sample feature x and the class-center feature of the ith scene class.
In another possible embodiment, since the euclidean distance represents the closeness of two vectors in absolute distance, the cosine similarity represents the closeness of two vectors in direction. Therefore, when determining the metric distance between the sample feature and the class center feature of the scene class to which each sample image belongs, the metric distance can be determined by the following formula:
Figure BDA0003289423640000091
wherein d (x, y)i) Represents the metric distance between the sample feature x and the class center feature of the ith scene class, cos _ sim (x, y)i) Representing the cosine similarity, α, between the sample feature x and the class-centered feature of the ith scene class1Representing weight values, alpha, corresponding to Euclidean distances2And representing the weight value corresponding to the cosine similarity.
In one possible implementation, a loss value (for convenience of explanation, denoted as a first loss value) may be determined based on the scene probability vector and the scene tag; determining a loss value (for convenience of description, recorded as a second loss value) based on the sample feature and the class center feature corresponding to the first scene category; based on the sample features and the class center features corresponding to the second scene type, a loss value (referred to as a third loss value for convenience of description) is determined. And then determining a comprehensive loss value according to the first loss value and a first weight value corresponding to the first loss value, the second loss value and a second weight value corresponding to the second loss value, and the third loss value and a third weight value corresponding to the third loss value. And training the original scene recognition model based on the comprehensive loss value so as to update parameter values of parameters in the original scene recognition model, thereby obtaining the trained scene recognition model.
In specific implementation, when the original scene recognition model is trained according to the comprehensive loss value, a gradient descent algorithm can be adopted to perform back propagation on the gradient of the parameter in the original scene recognition model, so that the original scene recognition model is trained.
It will be appreciated that the second loss value may be determined by a metric distance between the sample feature and the class-centered feature corresponding to the first scene class, and the third loss value may also be determined by a metric distance between the sample feature and the class-centered feature corresponding to the second scene class.
For example, the comprehensive loss value is determined according to the first loss value and the corresponding first weight value, the second loss value and the corresponding second weight value, and the third loss value and the corresponding third weight value, and may be determined by the following formula:
Figure BDA0003289423640000092
wherein,
Figure BDA0003289423640000101
according to the scene probability vector y and the scene label
Figure BDA0003289423640000102
The first determined loss value, d (x)i,Ci) For sample features xiClass center feature C corresponding to a first scene categoryiMeasured distance between, d (x)i,Ccls!=i) For sample features xiClass center feature C corresponding to a second scene categorycls!=iMeasured distance between, ω1Is a first weight value, ω2Is a second weight value, ω3Is the third weight value.
In the practical application process, the smaller the metric distance between the image features of the images of the same scene category, the larger the metric distance between the image features of the images of different scene categories. Therefore, when the first weight value, the second weight value and the third weight value are set, the second loss value may be a negative number, and the first loss value and the third loss value may be positive numbers, so that when the integrated loss value is optimized, the optimization direction of the scene recognition model is towards the direction of minimizing the first loss value, minimizing the second loss value and maximizing the third loss value, so as to increase the metric distance between different scene categories and decrease the metric distance between sample features of the same scene category. From the aspect of feature space, the distribution comparison among the sample features of different scene categories is dispersed, but the distribution comparison among the sample features of the same scene category is gathered.
By training the scene recognition model through the comprehensive loss value, the similarity of the sample characteristics of different scene types in the feature space is reduced, and the similarity of the sample characteristics of the same scene type in the feature space is increased, so that the accuracy of determining whether the scene type of a certain image can be recognized by the scene recognition model is improved.
Because the sample set contains a large number of sample images, the above operation is performed on each sample image, and when a preset convergence condition is met, the training of the scene recognition model is completed.
The condition that the preset convergence condition is met can be that the sum of all the comprehensive loss values obtained according to the current iterative training is smaller than a set loss value threshold, the iteration number of the model training reaches a set maximum iteration number, and the like. The specific implementation can be flexibly set, and is not particularly limited herein.
In one possible implementation, to determine the accuracy of the trained scene recognition model, before the scene recognition model is released online, the scene recognition model may be tested to determine whether the scene recognition model can accurately process images that do not belong to the scene category contained in the sample set, and the accuracy of the scene recognition model's recognition of the recognizable images.
In a specific implementation process, a test set for testing the trained scene recognition model is obtained, and the test set comprises a test sample image so as to verify the reliability of the trained scene recognition model based on the test sample image. When the test sample image contained in the test set is obtained, the test sample image contained in the test set may be re-acquired, and/or the sample image contained in the sample set may be divided into a training sample image and a test sample image. It should be noted that the specific process of acquiring the test sample images included in the test set is similar to the process of acquiring the sample images included in the sample set, and repeated parts are not repeated.
In order to ensure the capability of testing whether the scene recognition model can accurately process the scene type images which do not belong to the sample set, at least one scene type to which the test sample image belongs needs to exist in each obtained test sample image, and the scene type is different from the scene type contained in the sample set.
Each test sample image corresponds to a scene tag and a processing tag, the scene tag is used to identify a scene category (for convenience of description, denoted as a third scene category) to which the test sample image belongs, and the processing tag is used to identify whether the scene categories included in the sample set include the third scene category.
For each test sample image in the test set, the test sample image is input into a scene recognition model. Image features of the test sample image (for convenience of description, referred to as test sample features) are obtained by the scene recognition model. And then determining the similarity of the test sample characteristics and the target class center characteristics of each scene class. The target class center feature may be a class center feature of each scene class included in the sample set during the last iterative training of the original scene recognition model. And then determining whether each scene category comprises the scene category to which the test sample image belongs according to the similarity threshold and the acquired similarity.
In a possible embodiment, the similarity threshold may be configured manually, or may be determined for each target class-center feature, a reference similarity between the target class-center feature and other target class-center features. And then determining the similarity threshold according to the reference similarity corresponding to each target class center feature.
In a possible implementation manner, if the reference similarity is determined according to a metric distance such as euclidean distance, the similarity threshold may be determined according to a minimum value of the respective reference similarities.
In a possible implementation manner, if the reference similarity is determined according to a metric distance such as cosine similarity, the similarity threshold may be determined according to a maximum value of the respective reference similarities.
In a possible implementation manner, if the similarity is determined according to a metric distance such as euclidean distance, the smaller the similarity of the two image features is, which indicates that the higher the similarity between the two image features is, the more likely the two image features belong to the same scene category; the greater the similarity of the two image features, the lower the similarity between the two image features, and the more likely the two image features are not belonging to the same scene class. Therefore, whether each scene type contains the scene type to which the test sample image belongs is determined according to each similarity and the similarity threshold, if any similarity is smaller than the similarity threshold, it is indicated that the image feature of the test sample image and the target class center feature corresponding to the similarity are most likely to belong to the same scene type, and each scene type contained in the sample set is determined to contain the scene type to which the test sample image belongs; if each similarity is not smaller than the similarity threshold, the image features of the test sample image and the central features of each target class are from different scene classes, and it is determined that each scene class contained in the sample set does not contain the scene class to which the test sample image belongs.
In a possible implementation manner, if the similarity is determined according to a measured distance such as cosine similarity, the smaller the similarity of the two image features is, which indicates that the lower the similarity between the two image features is, the less likely the two image features belong to the same scene category; the smaller the similarity of the two image features is, the higher the similarity between the two image features is, and the more likely the two image features belong to the same scene category. Therefore, whether each scene type contains the scene type to which the test sample image belongs is determined according to each similarity and the similarity threshold, if any similarity is greater than the similarity threshold, it is indicated that the image feature of the test sample image and the target class center feature corresponding to the similarity are most likely to belong to the same scene type, and each scene type contained in the sample set is determined to contain the scene type to which the test sample image belongs; if each similarity is not greater than the similarity threshold, the image features of the test sample image and the central features of each target class are from different scene classes, and it is determined that each scene class included in the sample set does not include the scene class to which the test sample image belongs.
Specifically, if it is determined that each scene type includes the scene type to which the test sample image belongs, it is indicated that the scene type to which the test sample image belongs can be accurately determined by the scene recognition model, that is, the scene type to which the test sample image belongs is known, the scene type to which the test sample image belongs is determined by the scene recognition model; if it is determined that each scene type does not include the scene type to which the test sample image belongs, it is indicated that the scene type to which the test sample image belongs is not accurately determinable by the scene recognition model, that is, the scene type to which the test sample image belongs is unknown, and the scene type to which the image belongs is not continuously recognized.
Since the test set contains a large number of test sample images, the above operation is performed for each test sample image. Based on the processing result of each test sample image (including whether the scene identification model identifies the scene type to which the test sample image belongs, and the obtained scene probability vector of the test sample image when the scene identification model identifies the scene type to which the test sample image belongs), the processing label of each test sample image, and the scene label of each test sample image), corresponding calculation is performed, and various evaluation indexes of the scene identification model, such as accuracy, error rate, accuracy rate, and the like, are determined. And if determining that each evaluation index of the scene recognition model meets the preset release requirement, releasing the scene recognition model on line. If it is determined that the evaluation indexes of the scene recognition model do not meet the preset release requirement, the scene recognition model can be further trained based on the sample images in the sample set again.
According to the method, when a scene recognition model is trained, the sample characteristics of the sample images of each scene category in a sample set are learned at the same time, so that the class center characteristics of each scene category are obtained, the image characteristics of the input image can be obtained through the scene recognition model when the trained scene recognition model is used or tested subsequently, then the measurement distance between the image characteristics and the class center characteristics of each scene category is determined, if the measurement distance between the image characteristics and the class center characteristics of any scene category is not close, the fact that each known scene category does not include the scene category to which the image belongs is determined, namely the scene category to which the image belongs is unknown is also determined, the subsequent scene type recognition is not performed on the image, the image characteristics extracted by the scene recognition model are discriminable, and the scene recognition model is helped to determine whether the scene category to which the image belongs is the scene recognition image or not And the accuracy of the algorithm under the influence of mistakenly identifying the scene type to which the image belongs is avoided.
In the process of training an original scene recognition model based on a sample image in a sample set, a scene probability vector corresponding to an input sample image and sample characteristics of the sample image can be acquired through the original scene recognition model, so that the original scene recognition model can be trained subsequently based on the scene probability vector, the scene label, the sample characteristics, class center characteristics corresponding to a first scene category, the sample characteristics and class center characteristics corresponding to a second scene category to acquire the trained scene recognition model, so that the trained scene recognition model can be close to the class center characteristics of the scene category according to the image characteristics of the images in the same scene category, and meanwhile, the characteristics of the class center characteristics of other scene categories are kept away from, and further the characteristic level of the images is combined, whether the scene type of the image can be identified or not is determined, and under the condition that the scene type of the image can be identified, the scene type to which the image belongs not only can be accurately identified as the scene type image belonging to the closed image set, but also can be processed as the scene type image not belonging to the closed image set, so that the precision, the performance and the naturalness of the scene identification model are improved.
Example 2:
taking an execution subject as a display device as an example, the following describes in detail the scene recognition model training method provided by the present application through specific embodiments, and fig. 2 is a schematic diagram of a specific scene recognition model training process provided by some embodiments of the present application, where the process includes:
s201: and constructing an original scene recognition model.
S202: and randomly constructing class center characteristics of each scene category.
S203: an image of any sample in the sample set is acquired.
The sample images correspond to scene labels, and the scene labels are used for identifying the first scene category to which the sample images belong.
S204: and determining a scene probability vector corresponding to the sample image and the sample characteristics of the sample image through the original scene recognition model.
The scene probability vector comprises probability values of sample images belonging to each scene category respectively.
A process of determining a scene probability vector corresponding to a sample image and a sample feature of the sample image by using an original scene recognition model is described in detail below with reference to fig. 3, where fig. 3 is a schematic structural diagram of an original scene recognition model according to some embodiments of the present disclosure. After any sample image is input into the original scene recognition model, the sample characteristics of the input sample image can be obtained through a characteristic extraction layer in the original scene recognition model. The sample features can then be output through a feature output layer in the original scene recognition model. Through a classification output layer in the original scene recognition model, based on the sample characteristics, a scene probability vector corresponding to the sample image can be obtained and output.
Since the sample set contains a large number of sample images, the steps of the above-described operations S203 to S204 are performed for each sample image.
S205: and updating the class center feature of each scene category of the current iteration.
If the current iteration is the first iteration, determining candidate class center features respectively corresponding to each scene class according to each sample feature obtained by the current iteration; and determining each candidate class center feature as the class center feature of each scene category in the next iterative training.
If the current iteration is the first iteration, determining candidate class center features respectively corresponding to each scene class according to each sample feature obtained by the current iteration; for each scene category, determining a candidate class center feature corresponding to the scene category and a difference vector of the class center feature corresponding to the scene category determined by current iteration; and determining the class center feature corresponding to the scene category in the next iteration training according to the difference vector, the preset weight vector and the class center feature corresponding to the scene category determined by the current iteration.
S206: and for each sample image, determining a comprehensive loss value according to the scene probability vector and the scene label of the sample image, the sample feature of the sample image, the class center feature corresponding to the first scene category to which the sample image belongs, the sample feature of the sample image and the class center feature corresponding to the second scene category of the sample image.
S207: and determining whether the sum of each comprehensive loss value is smaller than a preset loss value threshold, if so, executing S208, otherwise, executing S209.
S208: and acquiring and storing the trained scene recognition model.
S209: the parameter values of the parameters of the original scene recognition model are adjusted, and S203 is performed.
Example 3:
fig. 4 is a schematic view of a scene recognition process provided in some embodiments of the present application, where the process includes:
s401: and determining the image characteristics of the image to be recognized through a pre-trained scene recognition model.
S402: and determining the similarity between the image features and the target class center features of each scene class.
S403: and determining whether each scene category comprises the scene category to which the image to be identified belongs according to each similarity and a similarity threshold value.
S404: and if the scene category of each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model.
S405: and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.
The scene recognition method is applied to electronic equipment, and the electronic equipment can be intelligent equipment such as a mobile terminal and can also be a server. Of course, the electronic device may also be a display device such as a television.
In a possible application scenario, taking an electronic device as a television as an example, and taking a scenario in which a video picture played by the television is subjected to real-time scene classification as an example, in order to better analyze the video picture, a television section may first perform scene recognition on an image included in the video, so as to process the video picture according to a scene type to which the video belongs, in combination with a downstream algorithm, for example, optimize the image quality of the video picture.
In a possible implementation manner, after it is determined that the electronic device receives a processing request for scene recognition of an image in a certain video, the image is determined as an image to be recognized, and corresponding processing is performed by using the scene recognition method provided by the present application based on the image to be recognized.
The method for processing scene recognition by the electronic device includes that the electronic device for scene recognition receives a processing request for scene recognition of an image in a certain video, and mainly includes at least one of the following situations:
in the first situation, when scene recognition is needed, a user can input a service processing request for scene recognition to the intelligent device, and after receiving the service processing request, the intelligent device can send a processing request for scene recognition of an image in a video to the electronic device for scene recognition.
And secondly, when the intelligent equipment determines to record the video, generating a processing request for carrying out scene recognition on the image in the recorded video and sending the processing request to the electronic equipment for carrying out scene recognition.
And thirdly, when a user needs to perform scene recognition on a certain specific video, a service processing request for performing scene recognition on the video can be input to the intelligent device, and after receiving the service processing request, the intelligent device can send a processing request for performing scene recognition on an image in the video to the electronic device for performing scene recognition.
The electronic device for scene recognition may be the same as or different from the smart device.
As a possible implementation, a scene recognition condition may also be preset, for example, when a video sent by the display device is received, performing scene recognition on images in the video, when a preset number of frame images in a certain video sent by the display device are received, performing scene recognition on the preset number of frame images, performing scene recognition on images in the currently acquired video according to a preset period, and the like. When the electronic equipment determines that the current time meets a preset scene recognition condition, recognizing the scene of an image in a certain video.
In the application, when an image in a video is acquired, a part of video frames can be extracted from the video according to a preset frame extraction strategy, the extracted part of video frames is converted into a corresponding image, and all video frames in the video can be converted into corresponding images according to a full frame extraction mode.
In order to accurately determine a scene to which an image belongs, a scene recognition model is trained in advance. When the electronic device performing scene recognition needs to perform scene recognition on a certain image to be recognized, the image to be recognized may be input to a pre-trained scene recognition model, so as to determine, through the pre-trained scene recognition model, a scene category to which the input image to be recognized belongs.
The process of training the scene recognition model is described in the above embodiments, and repeated parts are not described again. For the training method of the scene recognition model, in the process of training the original scene recognition model based on the sample images in the sample set, the original scene recognition model can be used for acquiring the scene probability vector corresponding to the input sample images and the sample characteristics of the sample images, so that the original scene recognition model can be trained subsequently based on the scene probability vector, the scene label, the sample characteristics, the class center characteristics corresponding to the first scene type, the sample characteristics and the class center characteristics corresponding to the second scene type, so as to acquire the trained scene recognition model, so that the trained scene recognition model can be close to the class center characteristics of the scene type according to the image characteristics of the images in the same scene type, and is far away from the characteristics of the class center characteristics of other scene types, and further combining the characteristic level of the image, determining whether the scene type of the image can be identified and determining the scene type to which the image belongs under the condition that the scene type of the image can be identified, so that the scene type image belonging to the closed image set can be accurately identified, the scene type image not belonging to the closed image set can be processed, and the precision, the performance and the naturalness of the scene identification model are improved.
The electronic device that performs the training of the scene recognition model may be the same as or different from the electronic device that performs the scene recognition.
Since the scene type to which the image to be recognized belongs is unpredictable and has a certain diversity, while the scene type to which the image to be recognized actually belongs may not be the scene type included in the sample set for training the scene recognition model, if the scene type to which the image belongs is directly recognized by the scene recognition model, an erroneous result may be obtained, thereby affecting the processing of the downstream algorithm. In addition, if the scene classification to which a certain image belongs is a scene classification that can be recognized by a pre-trained scene recognition model, the image feature of the image is generally measured at a greater distance from the image feature of the sample image belonging to the scene type in the sample set, and at a lesser distance from the image feature of the sample image not belonging to the scene type in the sample set. Therefore, in order to ensure that the scene recognition model can accurately recognize the scene category images included in the closed image set, in the application, the image features of the image to be recognized can be acquired through the pre-trained scene recognition model, and the target class center features of the scene categories are acquired in advance for the scene categories included in the sample set used for training the scene recognition model. After the image to be recognized is input to the pre-trained scene recognition model based on the above-described embodiment, the image features of the image to be recognized can be acquired through the pre-trained scene recognition model. Then, the similarity of the image features and the target class center features of each scene class is determined. And determining whether the scene category to which the image to be identified belongs is any scene category contained in the sample set according to each similarity.
The target class center feature may be a class center feature of each scene class included in the sample set during the last iterative training of the original scene recognition model.
In a specific implementation process, the image features of the input image to be recognized can be obtained through the feature extraction layer in the scene recognition model. The image features may then be output through a feature output layer in the scene recognition model. Then, the similarity of the image features and the target class center features of each scene class is determined. And determining whether the scene category to which the image to be identified belongs is any scene category contained in the sample set according to each similarity.
It should be noted that, when training the original scene recognition model in the last iteration, the method for acquiring the class center feature of each scene category included in the sample set may refer to the acquisition methods in case 1 and case 2, and the repetition is not repeated.
In one possible implementation, the similarity between the image feature and the target class center feature of each scene class may be determined according to the measured distance between the image feature and the target class center feature of each scene class. The measurement distance can be obtained by means of Euclidean distance, cosine similarity, KL divergence function and the like.
In one possible implementation, when determining the metric distance between the image feature and the target class center feature of the scene class to which each sample image belongs, the metric distance may be determined by the following euclidean distance formula:
Figure BDA0003289423640000161
wherein d (x, y)i) Representing the metric distance between the image feature x and the target class center feature of the ith scene class.
In another possible embodiment, since the euclidean distance represents the closeness of two vectors in absolute distance, the cosine similarity represents the closeness of two vectors in direction. Therefore, when determining the metric distance between the image feature and the target class center feature of the scene class to which each sample image belongs, the metric distance can be determined by the following formula:
Figure BDA0003289423640000162
wherein d (x, y)i) Represents the metric distance between the image feature x and the target class center feature of the ith scene class, cos _ sim (x, y)i) Representing the cosine similarity, alpha, between the image feature x and the target class center feature of the ith scene class1Representing weight values, alpha, corresponding to Euclidean distances2And representing the weight value corresponding to the cosine similarity.
In a possible embodiment, the similarity threshold may be configured manually, or may be determined for each target class-center feature, a reference similarity between the target class-center feature and other target class-center features. And then determining the similarity threshold according to the reference similarity corresponding to each target class center feature.
In a possible implementation manner, if the reference similarity is determined according to a metric distance such as euclidean distance, the similarity threshold may be determined according to a minimum value of the respective reference similarities.
In a possible implementation manner, if the reference similarity is determined according to a metric distance such as cosine similarity, the similarity threshold may be determined according to a maximum value of the respective reference similarities.
In a possible implementation manner, if the similarity is determined according to a metric distance such as euclidean distance, the smaller the similarity of the two image features is, which indicates that the higher the similarity between the two image features is, the more likely the two image features belong to the same scene category; the greater the similarity of the two image features, the lower the similarity between the two image features, and the more likely the two image features are not belonging to the same scene class. Therefore, whether each scene type comprises the scene type to which the image to be identified belongs is determined according to each similarity and the similarity threshold, if any similarity is smaller than the similarity threshold, it is indicated that the image feature of the image to be identified and the target class center feature corresponding to the similarity are most likely to belong to the same scene type, and each scene type contained in the sample set is determined to comprise the scene type to which the image to be identified belongs; if each similarity is not smaller than the similarity threshold, the image features of the image to be recognized and the central features of each target class are from different scene classes, and it is determined that each scene class contained in the sample set does not contain the scene class to which the image to be recognized belongs.
In a possible implementation manner, if the similarity is determined according to a measured distance such as cosine similarity, the smaller the similarity of the two image features is, which indicates that the lower the similarity between the two image features is, the less likely the two image features belong to the same scene category; the smaller the similarity of the two image features is, the higher the similarity between the two image features is, and the more likely the two image features belong to the same scene category. Therefore, whether each scene type comprises the scene type to which the image to be identified belongs is determined according to each similarity and the similarity threshold, if any similarity is greater than the similarity threshold, it is indicated that the image of the image feature to be identified and the target class center feature corresponding to the similarity are most likely to belong to the same scene type, and each scene type contained in the sample set is determined to comprise the scene type to which the image to be identified belongs; if each similarity is not greater than the similarity threshold, the image features of the image to be recognized and the center features of each target class are from different scene classes, and it is determined that each scene class contained in the sample set does not contain the scene class to which the image to be recognized belongs.
In a specific implementation process, if it is determined that each scene type contains a scene type to which the image to be recognized belongs, it is indicated that the scene type to which the image to be recognized belongs can be accurately determined by a scene recognition model, that is, the scene type to which the image to be recognized belongs is known, the scene type to which the image to be recognized belongs is determined by a pre-trained scene recognition model; if each scene type is determined not to contain the scene type to which the image to be recognized belongs, it is indicated that the scene type to which the image to be recognized belongs is not accurately determinable by the scene recognition model, namely the scene type to which the image to be recognized belongs is unknown, and the scene type to which the image belongs is not continuously recognized.
Further, if it is determined that each scene type includes a scene type to which the image to be recognized belongs, the scene type to which the image to be recognized belongs may be acquired and output based on the image feature of the image to be recognized through a classification output layer in the scene recognition model.
Because a scene recognition model is trained in advance and is obtained by training an original scene recognition model based on a scene probability vector of a sample image, a scene label of the sample image, a sample feature of the sample image, a class center feature corresponding to a first scene category of the sample image, a sample feature of the sample image and a class center feature corresponding to a second scene category of the sample image, in the process of recognizing the scene category to which the image to be recognized belongs based on the scene recognition model, whether the scene category of the image can be recognized or not is determined according to the characteristic that the image feature of the image in the same scene category is close to the class center feature of the scene category and is far away from the class center features of other scene categories, further in combination with the feature level of the image, and under the condition that the scene category of the image can be recognized, the scene type to which the image belongs not only realizes accurate identification of the scene type image contained in the closed image set, but also can process the scene type image not contained in the closed image set, and improves the precision, performance and naturalness of the scene identification model.
Example 4:
in the following, taking an electronic device for performing scene recognition as a television as an example, the scene recognition method provided by the present application is described in detail through specific embodiments, and fig. 5 is a schematic diagram of a specific scene recognition process provided by some embodiments of the present application, where the process includes:
s501: and acquiring a pre-trained scene recognition model.
S502: and determining the image characteristics of the image to be recognized through a pre-trained scene recognition model.
S503: and determining the similarity of the image features and the target class center features of each scene class.
S504: if the similarity is the Euclidean distance, whether any similarity is smaller than a similarity threshold value is judged, if yes, S505 is executed, and if not, S506 is executed.
S505: and determining the scene category to which the image to be recognized belongs through the scene recognition model.
S506: and the scene category to which the image to be identified belongs is not continuously identified.
Example 5:
the present application provides a scene recognition model training device, and fig. 6 is a schematic structural diagram of a scene recognition model training device provided in some embodiments of the present application, and the device includes:
an obtaining unit 61, configured to obtain an image of any sample in the sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs;
a processing unit 62, configured to determine, through an original scene identification model, a scene probability vector corresponding to the sample image and a sample feature of the sample image; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category;
a training unit 63, configured to train the original scene identification model based on the scene probability vector and the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature, and the class center feature corresponding to the second scene category, so as to obtain a trained scene identification model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.
In some possible embodiments, the training unit 63 is further configured to, for each iteration training of the original scene recognition model, obtain, through a currently iterated scene recognition model, a sample feature of a sample image of each scene category in the sample set; determining candidate class center features respectively corresponding to each scene category according to each sample feature; determining the class center feature of each scene category in the next iterative training based on each candidate class center feature; or, obtaining the sample characteristics of each sample image in the sample set through a pre-trained characteristic extraction model; and clustering the sample features of each sample image, and determining the class center feature of each scene category.
In some possible embodiments, the training unit 63 is specifically configured to determine, for each scene class, a target sample image that is correctly identified by the scene identification model of the current iteration, in the sample images of the scene class; determining a weighted average vector according to the sample characteristics of the target sample image and the weight value of the target sample image, and determining candidate class center characteristics corresponding to the scene type based on the weighted average vector; the weight value of the target sample image is preset, or is determined according to a probability value of the target sample image belonging to the scene category, which is acquired through the current iterative scene identification model; or, for each scene category, determining a target sample image correctly identified by the currently iterated scene identification model in the sample images of the scene category; acquiring a target feature in the sample features of the target sample image based on a preset target algorithm, and determining a candidate class center feature corresponding to the scene category based on the target feature; wherein the target feature is a principal component feature, or a normalized feature.
In some possible embodiments, the training unit 63 is specifically configured to determine each candidate class center feature as the class center feature of each scene class in the next iteration training if the current iteration is the first iteration; if the current iteration is not the first iteration, determining a candidate class center feature corresponding to the scene category and a difference vector of the class center feature corresponding to the scene category determined by the current iteration for each scene category; and determining the class center feature corresponding to the scene category in the next iteration training according to the difference vector, the pre-configured weight vector and the class center feature corresponding to the scene category determined by the current iteration.
In some possible embodiments, the training unit 63 is specifically configured to determine a first loss value, a second loss value, and a third loss value; wherein the first loss value is determined based on the scene probability vector and the scene tag; the second loss value is determined based on the sample feature and a class center feature corresponding to the first scene category; the third loss value is determined based on the sample feature and a class center feature corresponding to the second scene category; determining a comprehensive loss value according to the first loss value and a first weight value corresponding to the first loss value, the second loss value and a second weight value corresponding to the second loss value, and a third loss value and a third weight value corresponding to the third loss value; and training the original scene recognition model based on the comprehensive loss value.
In the process of training an original scene recognition model based on a sample image in a sample set, a scene probability vector corresponding to an input sample image and sample characteristics of the sample image can be acquired through the original scene recognition model, so that the original scene recognition model can be trained subsequently based on the scene probability vector, the scene label, the sample characteristics, class center characteristics corresponding to a first scene category, the sample characteristics and class center characteristics corresponding to a second scene category to acquire the trained scene recognition model, so that the trained scene recognition model can be close to the class center characteristics of the scene category according to the image characteristics of the images in the same scene category, and meanwhile, the characteristics of the class center characteristics of other scene categories are kept away from, and further the characteristic level of the images is combined, whether the scene type of the image can be identified or not is determined, and under the condition that the scene type of the image can be identified, the scene type to which the image belongs not only can be accurately identified as the scene type image belonging to the closed image set, but also can be processed as the scene type image not belonging to the closed image set, so that the precision, the performance and the naturalness of the scene identification model are improved.
Example 6:
fig. 7 is a schematic structural diagram of a scene recognition device according to some embodiments of the present application, which provides a scene recognition device including:
the first processing module 71 is configured to determine, through a pre-trained scene recognition model, image features of an image to be recognized;
a second processing module 72, configured to determine similarity between the image features and the target class center feature of each scene class;
a third processing module 73, configured to determine, according to each of the similarities and a similarity threshold, whether each of the scene categories includes a scene category to which the image to be identified belongs; if it is determined that each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model; and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.
Because a scene recognition model is trained in advance and is obtained by training an original scene recognition model based on a scene probability vector of a sample image, a scene label of the sample image, a sample feature of the sample image, a class center feature corresponding to a first scene category of the sample image, a sample feature of the sample image and a class center feature corresponding to a second scene category of the sample image, in the process of recognizing the scene category to which the image to be recognized belongs based on the scene recognition model, whether the scene category of the image can be recognized or not is determined according to the characteristic that the image feature of the image in the same scene category is close to the class center feature of the scene category and is far away from the class center features of other scene categories, further in combination with the feature level of the image, and under the condition that the scene category of the image can be recognized, the scene type to which the image belongs not only realizes accurate identification of the scene type image contained in the closed image set, but also can process the scene type image not contained in the closed image set, and improves the precision, performance and naturalness of the scene identification model.
Example 7:
fig. 8 is a schematic structural diagram of an electronic device according to some embodiments of the present application, and on the basis of the foregoing embodiments, the present application further provides an electronic device, as shown in fig. 8, including: the system comprises a processor 81, a communication interface 82, a memory 83 and a communication bus 84, wherein the processor 81, the communication interface 82 and the memory 83 are communicated with each other through the communication bus 84;
the memory 83 has stored therein a computer program which, when executed by the processor 81, causes the processor 81 to perform the steps of:
acquiring any sample image in a sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs;
determining scene probability vectors corresponding to the sample images and sample characteristics of the sample images through an original scene recognition model; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category;
training the original scene recognition model based on the scene probability vector, the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category to obtain a trained scene recognition model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.
Because the principle of the electronic device for solving the problems is similar to the scene recognition model training method, the implementation of the electronic device can refer to the implementation of the method, and repeated parts are not described again.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface 82 is used for communication between the above-described electronic apparatus and other apparatuses.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
In the process of training an original scene recognition model based on a sample image in a sample set, a scene probability vector corresponding to an input sample image and sample characteristics of the sample image can be acquired through the original scene recognition model, so that the original scene recognition model can be trained subsequently based on the scene probability vector, the scene label, the sample characteristics, class center characteristics corresponding to a first scene category, the sample characteristics and class center characteristics corresponding to a second scene category to acquire the trained scene recognition model, so that the trained scene recognition model can be close to the class center characteristics of the scene category according to the image characteristics of the images in the same scene category, and meanwhile, the characteristics of the class center characteristics of other scene categories are kept away from, and further the characteristic level of the images is combined, whether the scene type of the image can be identified or not is determined, and under the condition that the scene type of the image can be identified, the scene type to which the image belongs not only can be accurately identified as the scene type image belonging to the closed image set, but also can be processed as the scene type image not belonging to the closed image set, so that the precision, the performance and the naturalness of the scene identification model are improved.
Example 8:
fig. 9 is a schematic structural diagram of an electronic device according to some embodiments of the present application, and on the basis of the foregoing embodiments, the present application further provides an electronic device, as shown in fig. 9, including: the system comprises a processor 91, a communication interface 92, a memory 93 and a communication bus 94, wherein the processor 91, the communication interface 92 and the memory 93 are communicated with each other through the communication bus 94;
the memory 93 has stored therein a computer program which, when executed by the processor 91, causes the processor 91 to perform the steps of:
determining the image characteristics of an image to be recognized through a pre-trained scene recognition model;
determining the similarity of the image features and the target class center features of each scene class;
determining whether each scene category comprises the scene category to which the image to be identified belongs according to each similarity and a similarity threshold;
if it is determined that each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model;
and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.
Because the principle of the electronic device for solving the problem is similar to the scene recognition method, the implementation of the electronic device may refer to the implementation of the method, and repeated details are not repeated.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface 92 is used for communication between the above-described electronic apparatus and other apparatuses.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
Because a scene recognition model is trained in advance and is obtained by training an original scene recognition model based on a scene probability vector of a sample image, a scene label of the sample image, a sample feature of the sample image, a class center feature corresponding to a first scene category of the sample image, a sample feature of the sample image and a class center feature corresponding to a second scene category of the sample image, in the process of recognizing the scene category to which the image to be recognized belongs based on the scene recognition model, whether the scene category of the image can be recognized or not is determined according to the characteristic that the image feature of the image in the same scene category is close to the class center feature of the scene category and is far away from the class center features of other scene categories, further in combination with the feature level of the image, and under the condition that the scene category of the image can be recognized, the scene type to which the image belongs not only realizes accurate identification of the scene type image contained in the closed image set, but also can process the scene type image not contained in the closed image set, and improves the precision, performance and naturalness of the scene identification model.
Example 9:
on the basis of the foregoing embodiments, the present application further provides a computer-readable storage medium, in which a computer program executable by a processor is stored, and when the program is run on the processor, the processor is caused to execute the following steps:
acquiring any sample image in a sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs;
determining scene probability vectors corresponding to the sample images and sample characteristics of the sample images through an original scene recognition model; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category;
training the original scene recognition model based on the scene probability vector, the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category to obtain a trained scene recognition model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.
Because the principle of solving the problems of the computer readable medium is similar to the scene recognition model training method, after the processor executes the computer program in the computer readable medium, the implementation steps can be referred to the implementation of the method, and repeated parts are not described again.
In the process of training an original scene recognition model based on a sample image in a sample set, a scene probability vector corresponding to an input sample image and sample characteristics of the sample image can be acquired through the original scene recognition model, so that the original scene recognition model can be trained subsequently based on the scene probability vector, the scene label, the sample characteristics, class center characteristics corresponding to a first scene category, the sample characteristics and class center characteristics corresponding to a second scene category to acquire the trained scene recognition model, so that the trained scene recognition model can be close to the class center characteristics of the scene category according to the image characteristics of the images in the same scene category, and meanwhile, the characteristics of the class center characteristics of other scene categories are kept away from, and further the characteristic level of the images is combined, whether the scene type of the image can be identified or not is determined, and under the condition that the scene type of the image can be identified, the scene type to which the image belongs not only can be accurately identified as the scene type image belonging to the closed image set, but also can be processed as the scene type image not belonging to the closed image set, so that the precision, the performance and the naturalness of the scene identification model are improved.
Example 10:
on the basis of the foregoing embodiments, the present application further provides a computer-readable storage medium, in which a computer program executable by a processor is stored, and when the program is run on the processor, the processor is caused to execute the following steps:
determining the image characteristics of an image to be recognized through a pre-trained scene recognition model;
determining the similarity of the image features and the target class center features of each scene class;
determining whether each scene category comprises the scene category to which the image to be identified belongs according to each similarity and a similarity threshold;
if it is determined that each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model;
and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.
Since the principle of the problem solving by the computer readable medium is similar to the scene recognition method, after the processor executes the computer program in the computer readable medium, the implementation steps can be referred to as the implementation of the method, and repeated parts are not described again.
Because a scene recognition model is trained in advance and is obtained by training an original scene recognition model based on a scene probability vector of a sample image, a scene label of the sample image, a sample feature of the sample image, a class center feature corresponding to a first scene category of the sample image, a sample feature of the sample image and a class center feature corresponding to a second scene category of the sample image, in the process of recognizing the scene category to which the image to be recognized belongs based on the scene recognition model, whether the scene category of the image can be recognized or not is determined according to the characteristic that the image feature of the image in the same scene category is close to the class center feature of the scene category and is far away from the class center features of other scene categories, further in combination with the feature level of the image, and under the condition that the scene category of the image can be recognized, the scene type to which the image belongs not only realizes accurate identification of the scene type image contained in the closed image set, but also can process the scene type image not contained in the closed image set, and improves the precision, performance and naturalness of the scene identification model.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method for training a scene recognition model, the method comprising:
acquiring any sample image in a sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs;
determining scene probability vectors corresponding to the sample images and sample characteristics of the sample images through an original scene recognition model; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category;
training the original scene recognition model based on the scene probability vector, the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature and the class center feature corresponding to the second scene category to obtain a trained scene recognition model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.
2. The method of claim 1, wherein obtaining the class-centered feature for each scene category comprises:
for each iteration training of the original scene recognition model, acquiring sample characteristics of the sample image of each scene category in the sample set through the current iteration scene recognition model; determining candidate class center features respectively corresponding to each scene category according to each sample feature; determining the class center feature of each scene category in the next iterative training based on each candidate class center feature;
obtaining the sample characteristics of each sample image in the sample set through a pre-trained characteristic extraction model; and clustering the sample features of each sample image, and determining the class center feature of each scene category.
3. The method according to claim 1, wherein the determining, according to each of the sample features, a candidate class-center feature corresponding to each of the scene categories respectively comprises:
for each scene category, determining a target sample image correctly identified by the scene identification model of the current iteration in each sample image of the scene category; determining a weighted average vector according to the sample characteristics of the target sample image and the weight value of the target sample image, and determining candidate class center characteristics corresponding to the scene type based on the weighted average vector; the weight value of the target sample image is preset, or is determined according to a probability value of the target sample image belonging to the scene category, which is acquired through the current iterative scene identification model; or
For each scene category, determining a target sample image correctly identified by the scene identification model of the current iteration in each sample image of the scene category; acquiring a target feature in the sample features of the target sample image based on a preset target algorithm, and determining a candidate class center feature corresponding to the scene category based on the target feature; wherein the target feature is a principal component feature, or a normalized feature.
4. The method of claim 3, wherein said determining a class center feature for said each scene class in a next iteration of training based on each of said candidate class center features comprises:
if the current iteration is the first iteration, determining each candidate class center feature as the class center feature of each scene class in the next iteration training;
if the current iteration is not the first iteration, determining a candidate class center feature corresponding to the scene category and a difference vector of the class center feature corresponding to the scene category determined by the current iteration for each scene category; and determining the class center feature corresponding to the scene category in the next iteration training according to the difference vector, the pre-configured weight vector and the class center feature corresponding to the scene category determined by the current iteration.
5. The method of claim 1, wherein the training the original scene recognition model based on the scene probability vector and the scene label, the sample feature and the class-centered feature corresponding to the first scene class, the sample feature and the class-centered feature corresponding to the second scene class comprises:
determining a first loss value, a second loss value and a third loss value; wherein the first loss value is determined based on the scene probability vector and the scene tag; the second loss value is determined based on the sample feature and a class center feature corresponding to the first scene category; the third loss value is determined based on the sample feature and a class center feature corresponding to the second scene category;
determining a comprehensive loss value according to the first loss value and a first weight value corresponding to the first loss value, the second loss value and a second weight value corresponding to the second loss value, and a third loss value and a third weight value corresponding to the third loss value;
and training the original scene recognition model based on the comprehensive loss value.
6. A method for scene recognition, the method comprising:
determining the image characteristics of an image to be recognized through a pre-trained scene recognition model;
determining the similarity of the image features and the target class center features of each scene class;
determining whether each scene category comprises the scene category to which the image to be identified belongs according to each similarity and a similarity threshold;
if it is determined that each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model;
and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.
7. An apparatus for training a scene recognition model, the apparatus comprising:
the acquisition unit is used for acquiring any sample image in the sample set; the sample image corresponds to a scene label, and the scene label is used for identifying a first scene category to which the sample image belongs;
the processing unit is used for determining a scene probability vector corresponding to the sample image and the sample characteristics of the sample image through an original scene recognition model; wherein the scene probability vector comprises probability values that the sample image respectively belongs to each scene category;
a training unit, configured to train the original scene identification model based on the scene probability vector and the scene label, the sample feature, the class center feature corresponding to the first scene category, the sample feature, and the class center feature corresponding to the second scene category, so as to obtain a trained scene identification model; wherein the second scene category is a scene category other than the first scene category in each of the scene categories.
8. A scene recognition apparatus, characterized in that the apparatus comprises:
the first processing module is used for determining the image characteristics of the image to be recognized through a pre-trained scene recognition model;
the second processing module is used for determining the similarity between the image characteristics and the target class center characteristics of each scene class;
a third processing module, configured to determine, according to each of the similarities and a similarity threshold, whether each of the scene categories includes a scene category to which the image to be identified belongs; if it is determined that each scene category comprises the scene category to which the image to be identified belongs, determining the scene category to which the image to be identified belongs through the scene identification model; and if the scene category to which the image to be identified belongs is determined not to be contained in each scene category, the scene category to which the image to be identified belongs is not continuously identified.
9. An electronic device, characterized in that the electronic device comprises a processor for implementing the steps of the scene recognition model training method as claimed in any one of claims 1 to 5, or the steps of the scene recognition method as claimed in claim 6, when executing a computer program stored in a memory.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, carries out the steps of the scene recognition model training method as defined in any one of claims 1 to 5, or carries out the steps of the scene recognition method as defined in claim 6.
CN202111159087.3A 2021-09-30 2021-09-30 Model training and scene recognition method, device, equipment and medium Pending CN113902944A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111159087.3A CN113902944A (en) 2021-09-30 2021-09-30 Model training and scene recognition method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111159087.3A CN113902944A (en) 2021-09-30 2021-09-30 Model training and scene recognition method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN113902944A true CN113902944A (en) 2022-01-07

Family

ID=79189662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111159087.3A Pending CN113902944A (en) 2021-09-30 2021-09-30 Model training and scene recognition method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113902944A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445760A (en) * 2022-01-24 2022-05-06 腾讯科技(深圳)有限公司 Scene recognition method, scene recognition system, storage medium and terminal equipment
CN114494747A (en) * 2022-01-28 2022-05-13 北京百度网讯科技有限公司 Model training method, image processing method, device, electronic device and medium
CN114750147A (en) * 2022-03-10 2022-07-15 深圳甲壳虫智能有限公司 Robot space pose determining method and device and robot
CN117115596A (en) * 2023-10-25 2023-11-24 腾讯科技(深圳)有限公司 Training method, device, equipment and medium of object action classification model

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445760A (en) * 2022-01-24 2022-05-06 腾讯科技(深圳)有限公司 Scene recognition method, scene recognition system, storage medium and terminal equipment
CN114494747A (en) * 2022-01-28 2022-05-13 北京百度网讯科技有限公司 Model training method, image processing method, device, electronic device and medium
CN114750147A (en) * 2022-03-10 2022-07-15 深圳甲壳虫智能有限公司 Robot space pose determining method and device and robot
CN114750147B (en) * 2022-03-10 2023-11-24 深圳甲壳虫智能有限公司 Space pose determining method and device of robot and robot
CN117115596A (en) * 2023-10-25 2023-11-24 腾讯科技(深圳)有限公司 Training method, device, equipment and medium of object action classification model
CN117115596B (en) * 2023-10-25 2024-02-02 腾讯科技(深圳)有限公司 Training method, device, equipment and medium of object action classification model

Similar Documents

Publication Publication Date Title
CN111523621B (en) Image recognition method and device, computer equipment and storage medium
CN112329619B (en) Face recognition method and device, electronic equipment and readable storage medium
CN113902944A (en) Model training and scene recognition method, device, equipment and medium
CN111814810A (en) Image recognition method and device, electronic equipment and storage medium
CN110188829B (en) Neural network training method, target recognition method and related products
CN112862093B (en) Graphic neural network training method and device
CN109086811A (en) Multi-tag image classification method, device and electronic equipment
CN108228684B (en) Method and device for training clustering model, electronic equipment and computer storage medium
JP2017062778A (en) Method and device for classifying object of image, and corresponding computer program product and computer-readable medium
CN110135505B (en) Image classification method and device, computer equipment and computer readable storage medium
CN111753863A (en) Image classification method and device, electronic equipment and storage medium
CN112418327A (en) Training method and device of image classification model, electronic equipment and storage medium
CN113743426A (en) Training method, device, equipment and computer readable storage medium
CN111401343B (en) Method for identifying attributes of people in image and training method and device for identification model
CN113762382B (en) Model training and scene recognition method, device, equipment and medium
CN113011532A (en) Classification model training method and device, computing equipment and storage medium
CN114170654A (en) Training method of age identification model, face age identification method and related device
CN111062440B (en) Sample selection method, device, equipment and storage medium
CN110880018B (en) Convolutional neural network target classification method
CN117523218A (en) Label generation, training of image classification model and image classification method and device
CN111814846A (en) Training method and recognition method of attribute recognition model and related equipment
CN112446428B (en) Image data processing method and device
CN112906810B (en) Target detection method, electronic device, and storage medium
CN113920382A (en) Cross-domain image classification method based on class consistency structured learning and related device
CN113762005A (en) Method, device, equipment and medium for training feature selection model and classifying objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination