CN116259083A

CN116259083A - Image quality recognition model determining method and related device

Info

Publication number: CN116259083A
Application number: CN202111493694.3A
Authority: CN
Inventors: 付灿苗
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2023-06-13

Abstract

The embodiment of the application discloses a method and a related device for determining an image quality recognition model, which relate to image recognition and machine learning in the field of artificial intelligence, in an image classification scene, the image quality is related to the classification difficulty, so that an image sample marked with an actual classification category is obtained, probability distribution of the image sample under a plurality of classification categories is determined through an initial classification model, attention weights corresponding to the classification difficulty are generated for the image sample based on the probability distribution through an attention layer, the attention weights can enable the model to pay more attention to the image sample which is not easy to classify, and the size of the attention weights output by the attention layer can play a role in distinguishing the image quality of an input image. Therefore, a recognition model for recognizing the image quality according to the attention weight can be obtained by changing the attention layer in the trained classification model into the model output layer without specially labeling the image quality sample, and the acquisition cost of the recognition model is reduced.

Description

Image quality recognition model determining method and related device

Technical Field

The present invention relates to the field of image processing, and in particular, to a method and an apparatus for determining an image quality recognition model.

Background

Currently, many business scenarios need to apply an image recognition technology, and services such as face recognition, face editing, video understanding, content recommendation and the like are performed based on the image recognition technology through an input image to be recognized.

In these business scenes, the accuracy of the image recognition result is affected by the image quality of the input image to be recognized, however, the related technology mostly adopts a depth model to recognize the image quality, but training the depth model needs a large number of image samples, the image samples need to be marked with labels showing the image quality, and the label marking needs to consume a large amount of manpower and time.

Therefore, in order to save the cost, the image quality of the image to be identified is generally not identified before processing in the service scenario, so that the accuracy of the image identification result is difficult to ensure.

Disclosure of Invention

In order to solve the technical problems, the application provides a method and a related device for determining an image quality recognition model, which can obtain the recognition model for image quality recognition without specially marking an image quality sample, thereby greatly reducing the acquisition cost of the recognition model.

The embodiment of the application discloses the following technical scheme:

in one aspect, an embodiment of the present application provides a method for determining an image quality recognition model, where the method includes:

obtaining an image sample comprising a sample tag, the sample tag being used to identify an actual classification category of the image sample;

inputting the image sample into an initial classification model, and determining probability distribution under a plurality of classification categories through the initial classification model;

determining an attention weight corresponding to the image sample through an attention layer of the initial classification model according to the probability distribution, wherein the attention weight is used for identifying the classification difficulty of the image sample;

determining a loss function corresponding to the image sample according to the attention weight and the difference between the probability distribution and the actual classification category;

model training is carried out on the initial classification model through the loss function, and a classification model is obtained;

and changing the attention layer of the classification model into a model output layer to obtain an identification model for identifying the image quality according to the attention weight.

On the other hand, the embodiment of the application provides a determining device of an image quality recognition model, which comprises an obtaining unit, a determining unit, a training unit and a changing unit:

The acquisition unit is used for acquiring an image sample comprising a sample label, wherein the sample label is used for identifying the actual classification category of the image sample;

the determining unit is used for inputting the image sample into an initial classification model, and determining probability distribution under a plurality of classification categories through the initial classification model;

the determining unit is further configured to determine, according to the probability distribution, an attention weight corresponding to the image sample through an attention layer of the initial classification model, where the attention weight is used to identify classification difficulty of the image sample;

the determining unit is further configured to determine a loss function corresponding to the image sample according to the attention weight and a difference between the probability distribution and the actual classification category;

the training unit is used for carrying out model training on the initial classification model through the loss function to obtain a classification model;

and the changing unit is used for changing the attention layer of the classification model into a model output layer to obtain an identification model for identifying the image quality according to the attention weight.

In yet another aspect, embodiments of the present application provide a computer device comprising a processor and a memory:

The memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to execute the method for determining the image quality recognition model according to the above aspect according to the instructions in the program code.

In yet another aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program for executing the method for determining an image quality recognition model described in the above aspect.

In yet another aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of determining an image quality recognition model as described in the above aspects.

According to the technical scheme, the image with the classification type is easy to determine and generally has better image quality in the image classification scene, and the image with the classification type is difficult to determine and generally has poor image quality, so that the image sample marked with the actual classification type is obtained, the probability distribution of the image sample under a plurality of classification types is determined through the initial classification model, and whether the image sample is easy to classify or not can be directly represented by the probability distribution. The attention layer is arranged on the initial classification model, and the attention weight corresponding to the classification difficulty is generated for the image samples based on the probability distribution, so that when the initial classification model is trained through the attention weight and a loss function determined by sample labels, the attention weight can enable the model to pay more attention to the image samples which are difficult to classify, the obtained classification model is trained, the size of the attention weight output by the attention layer can play a role in distinguishing whether the input image is easy to classify or difficult to classify, and the effect is equivalent to intuitively representing the quality of the image. Therefore, the recognition model for recognizing the image quality according to the attention weight can be obtained without specially labeling the image quality samples, a large number of samples with the prior class labels are used for training the classification model, and the attention layer is changed into the model output layer, so that the acquisition cost of the recognition model is greatly reduced, and the image quality of an input image in a business scene applied to an image recognition technology can be conveniently improved through the recognition model.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a determined scene graph of an image quality recognition model according to an embodiment of the present application;

fig. 2 is a method flowchart of a method for determining an image quality recognition model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of classification difficulty in vector space according to an embodiment of the present disclosure;

FIG. 4a is a diagram of a model structure of a classification model;

FIG. 4b is a schematic diagram of a classification model according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of an image quality recognition result obtained by recognizing a model according to an embodiment of the present application;

fig. 6 is a device structure diagram of a determining device for an image quality recognition model according to an embodiment of the present application;

fig. 7 is a block diagram of a terminal device according to an embodiment of the present application;

Fig. 8 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

Before the image recognition technology is implemented, if the image to be recognized can be recognized in advance based on the image quality, the accuracy of the subsequent image recognition can be effectively improved. In the field of image recognition technology, for example, related to face recognition, it is desirable in most cases that the recognition object is a face image of high quality.

However, in the related art, the recognition of the image quality is mainly realized by training a depth model, a large number of image samples with image quality labels need to be marked in a targeted manner before training to complete the training of the depth model, and the time and the effort are consumed for collecting and marking the image samples, so that the recognition of the image quality is basically not performed before the implementation of the image recognition technology.

Therefore, the embodiment of the application provides a method and a related device for determining an image quality recognition model, which can obtain the recognition model for image quality recognition without specially marking an image quality sample, and greatly reduces the acquisition cost of the recognition model.

The method for determining the image quality recognition model provided by the embodiment of the application can be implemented through computer equipment, wherein the computer equipment can be terminal equipment or a server, and the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. The terminal equipment comprises, but is not limited to, mobile phones, computers, intelligent voice interaction equipment, intelligent household appliances, vehicle-mounted terminals and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

The present application may be applied in the field of artificial intelligence (Artificial Intelligence, AI), which is a theory, method, technique and application system that simulates, extends and expands human intelligence, senses environment, acquires knowledge and uses knowledge to obtain optimal results using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, car networking, autopilot, smart transportation, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will be of increasing importance.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

The embodiment of the application mainly relates to the directions of computer vision technology, machine learning and the like.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing to make the Computer process into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, etc., as well as common biometric technologies such as face recognition, fingerprint recognition, etc.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

For example, in the embodiment of the application, various types of information included in an image can be identified based on a computer vision technology through a classification model or an identification model, and an initial classification model can be trained based on a machine learning mode, so that an attention layer in the initial classification model can distinguish a simple sample which is easy to classify and a difficult sample which is difficult to classify, and therefore, the relationship between the attention weight and the classification difficulty is established, and images to be identified with different image qualities can be distinguished based on the output attention weight in the identification model.

It will be appreciated that in the specific embodiment of the present application, the image samples and the images to be identified may relate to relevant data such as user information, facial features, etc., and when the embodiments of the present application are applied to specific products or technologies, user permission or consent is required to be obtained, and the collection, use and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions.

Fig. 1 shows a schematic representation of a model determination scenario provided in an embodiment of the present application, where a server 100 is illustrated as an example of the foregoing computer device.

Since there are already many images to which classification categories (non-image quality) have been labeled, it is possible to perform training of the classification model without specifically labeling image samples with sample labels related to image quality, with the actual classification category that has been labeled as the sample label of the image sample, the training process belonging to supervised training with respect to the classification model and to unsupervised training with respect to the recognition model to be finally obtained for recognizing image quality.

In this embodiment of the present application, the image quality includes not only the quality of measuring the conventional attributes of the image, such as sharpness, brightness, etc., but also the quality related to the image recognition technology, such as the display quality of the required recognition object, for example, in the face recognition scene, whether the face in the face image is a front face, and whether there is a shielding all belong to the category of image quality.

In an image classification scene, an image with a classification type which is easy to determine generally has better image quality, and an image with a classification type which is difficult to determine generally has poor image quality, for example, for a face image, the classification for the face image is affected by blurred images, non-frontal faces, blocked images and the like.

In the scenario shown in fig. 1, the image sample may be a face image, and the sample label thereof may be a label labeled with an actual person category, for example, person a or person b, or the like.

By acquiring an image sample labeled with an actual person category, a probability distribution of the image sample under a plurality of person categories is determined based on the initial classification model, which can directly reveal whether the image sample is easily classified.

Taking a plurality of person categories as a person a and a person b as an example, when the probability distribution corresponding to the face image 1 is [51%,49% ], the probability that the face in the image sample has 51% is the person a, the probability that the face in the image sample has 49% is the person b, and because the probabilities are relatively close, the characteristics of the face in the face image 1 are not clear, such as image blurring, face shielding and the like, the image quality is not high, and the face image belongs to the face image which is not easy to be correctly classified.

When the probability distribution corresponding to the face image 2 is [91%,9% ], the probability that the face in the image sample has 91% is character a, the probability that the face has 9% is character b is shown, and the probability is very different, so that the feature of the face in the face image 2 is clear, the image quality is high, and the face image belongs to the face image which is easy to be correctly classified.

By setting the attention layer on the initial classification model, the attention weight corresponding to the classification difficulty is generated for the image sample based on the probability distribution, so that when the initial classification model is trained through the attention weight and the loss function determined by the sample label, the attention weight can make the model pay more attention to the image sample which is not easy to classify, and the assigned attention weight of the face image 1 is obviously different from that of the face image 2.

The size of the attention weight output by the attention layer of the classification model obtained through training can play a role in distinguishing whether the input image is easy to classify or not, and is equivalent to intuitively representing the quality of the image. For example, the face image 1 belongs to an image with low image quality, the face image 2 belongs to an image with high image quality, and the face image 1 and the face image 2 can be effectively distinguished by the attention weight.

Therefore, samples with special image quality are not required to be marked, a large number of existing class marked samples, such as the face images of the identified person classes, are used for training the classification model, the attention layer is changed into the model output layer, the recognition model for recognizing the image quality according to the attention weight can be obtained, the acquisition cost of the recognition model is greatly reduced, and therefore the image quality of the input image in a business scene applied to the image recognition technology can be conveniently improved through the recognition model.

Fig. 2 is a flowchart of a method for determining an image quality recognition model according to an embodiment of the present application, where in this embodiment, a server is used as the foregoing computer device for description.

The method comprises the following steps:

s201: obtaining an image sample comprising a sample tag, the sample tag being used to identify an actual classification category of the image sample;

the application does not limit the classification scene corresponding to the classification model, and the classification scene can be determined according to the actual classification category serving as the sample label, for example, when the image sample is a face image, if the corresponding actual classification category includes a person category, the corresponding classification scene can be classified as a person, and if the corresponding actual classification category includes a gender, the corresponding classification scene can be classified as a gender. In addition, various other classification scenes that are irrelevant to the image quality, such as classification of difficulty in classification, etc., may be included.

It is emphasized that the sample label of the image sample does not directly represent the image quality of the image sample, but rather the identification of the classification category associated with the image content in the image sample.

The content included in the image sample is not limited, and the image sample may be a face image, a landscape image, a movie image, or the like, for example.

In a possible implementation manner, the image sample is a face image sample, and the sample tag is used for identifying an actual user identifier corresponding to a face in the face image sample.

The face image sample includes image content related to the face, and the image content is marked with a corresponding actual user identifier, and the actual user identifier is used for identifying classification categories related to the face included in the actual user identifier, for example, a specific personal identifier such as a certain star, a scholars, etc., or a specific appearance characteristic such as long light emitting, a light head, etc., or a human category such as a race, gender, etc.

Accordingly, the initial classification model can implement image recognition under a plurality of classification categories, which can be determined based on the range involved by the actual user identification of the face image sample when the image sample is the face image sample; for example, the plurality of classification categories may be formed based on the range to which the actual user identification relates.

For example, when the actual user is identified as a specific person, the plurality of classification categories may be determined as star a, star b, and learner c based on different person identifications, and when the actual user is identified as gender, the plurality of classification categories may be determined as male and female based on gender.

S202: the image samples are input into an initial classification model by which probability distributions under a plurality of classification categories are determined.

The initial classification model can realize classification recognition under several classification categories, which can be decided based on actual application scenes or on the existing sample labels of the image samples. For example, in the case where the sample tag can only identify whether it is a certain classification category, the identification of multiple classification categories by the initial classification model may be a classification, i.e., the classification category or both classification categories that are not the classification category. For example, where the sample tag may identify one of a plurality of classification categories, the identification of the plurality of classification categories by the initial classification model may be a multi-classification (which may also include a two-classification) case.

The initial classification model is a classification model, according to image characteristics carried in an input image sample, information related to a plurality of classification categories is identified through artificial intelligence, which classification category of the plurality of classification categories the image sample possibly belongs to is identified based on the information, and quantized probability classification is output, and probability values included in probability distribution can be identical to and correspond to the plurality of classification categories one by one and are used for reflecting probability that the image sample possibly belongs to the corresponding classification category.

The probability distribution represents probabilities that the image samples respectively belong to the plurality of classification categories, and the sum of the probabilities may be 1 or not 1, which is not limited in the application. The classification difficulty of the image sample can be expressed by the difference and the size of different probabilities in the probability distribution, and the classification difficulty directly influences the difference between the loss function of the image sample and the actual classification category.

The classification difficulty represented by the probability distribution is directly reflected on the corresponding loss function of the image sample, when the classification difficulty of one image sample is higher, the initial classification model cannot easily obtain the correct classification category, namely the classification category is greatly different from the actual classification category marked by the sample label of the image sample, so that a larger loss function is obtained. When the classification difficulty of an image sample is low, the initial classification model will easily obtain the correct classification category, that is, the classification category has little difference from the actual classification category identified by the sample label of the image sample, so that a smaller loss function is obtained.

As shown in fig. 3, each of the circular points and the triangular points respectively has a different expression position of the feature vector of the image sample in the same vector space. The middle straight line is the feature vector boundary between two different classification categories of the initial classification model.

The closer the position of the feature vector is to the image sample at the upper left corner, the greater the possibility that the feature vector is identified by the initial classification model as the image sample at the lower right corner, the image sample of the feature vector away from the boundary of the feature vector belongs to the sample easy to classify, the clearer the feature related to the classification category is reflected in the image sample, and the image quality is generally relatively better.

Accordingly, the closer the position of the feature vector is to the image sample of the feature vector boundary, the greater the difficulty of being identified as classification category 1 or classification category 2 by the initial classification model, the more the feature vector is to the feature vector boundary image sample, which belongs to the sample which is not easy to classify, the more blurred the feature related to the classification category is in the image sample, and the image quality is generally relatively poor.

Although the magnitude of the loss function cannot visually represent the image quality of the image sample in the initial stage of training the initial classification model because the initial classification model has not learned enough classification knowledge, for example, when the initial classification model just starts to train, an erroneous classification result may be obtained for the image sample with good image quality, thereby resulting in a larger loss function.

Accordingly, although the loss function of the image sample with low image quality is not greatly different from that of the image sample with high image quality in the initial training period, the image sample with low image quality has fewer classification-related features, even if the initial classification model learns classification knowledge along with the training process, the image sample with low image quality is difficult to accurately classify, the corresponding loss function is slow to drop, and still is near the boundary of the loss function.

Therefore, the probability distribution of one image sample can effectively show the classification difficulty of the image sample.

S203: and determining the attention weight corresponding to the image sample through the attention layer of the initial classification model according to the probability distribution.

Under the aim of classification training, an attention layer is arranged in an initial classification model, so that an attention mechanism provided by the attention layer focuses on difficult samples, namely image samples which are not easy to classify, and the probability distribution can reflect the classification difficulty of the image samples, so that the attention weight output by the attention layer aiming at the probability distribution of one image sample can be associated with the classification difficulty of the image sample, namely the attention weight is used for identifying the classification difficulty of the image sample.

The attention layer may employ a self-attention mechanism (self-attention), which includes two linear layers (linear layers) and an activation function, and is expressed mathematically as follows:

hidden＝W ₁ *tanh(W ₂ *embedding)

attention＝sigmoid(hidden)

wherein hidden is the output of the hidden layer in the attention layer, W ₁ And W is ₂ For model parameters adjusted through model training, probability distribution is defined as a vector corresponding to probability distribution, and tanh and sigmoid are activation functions.

Since the purpose of the attention weight is to direct the initial classification model to focus on difficult samples, the attention weight is optionally inversely related to the classification difficulty of the image samples. I.e. for image samples that are difficult to classify correctly, the corresponding attention weight is larger, thus increasing the extent to which the image sample affects the adjustment of the model parameters during training, so in one possible implementation the greater the classification difficulty the lower the image quality.

S204: and determining a loss function corresponding to the image sample according to the attention weight and the difference between the probability distribution and the actual classification category.

S205: and carrying out model training on the initial classification model through the loss function to obtain a classification model.

The initial classification model is guided by the attention weight to pay more attention to the image sample with higher classification difficulty, namely the difficult sample, and pay less attention to the image sample with lower classification difficulty, namely the easy sample, when in training, so that the classification capability of the initial classification model is improved, the overfitting on the easy sample is reduced, and the like.

The model structure shown in fig. 4a is a model structure of an initial classification model in the related art, and for an input image sample, probability distribution related to classification purposes is obtained through a feature extraction layer, and then a corresponding loss function is determined through the probability distribution and a sample label.

As shown in fig. 4b, in comparison with fig. 4a, the model structure of the initial classification model of the embodiment of the present application is added with an attention layer on the original model structure, where the input of the attention layer is a vector (emmbeding) including a probability distribution, and the output is an attention weight, where the attention weight and the loss function in fig. 4a together obtain a corresponding loss function in the embodiment of the present application, so that the initial classification model of the present application is guided by the attention weight to pay more attention to the image sample with greater classification difficulty.

S206: and changing the attention layer of the classification model into a model output layer to obtain an identification model for identifying the image quality according to the attention weight.

Through the training based on the classification purpose, the obtained classification model can effectively and accurately classify the images under a plurality of classification categories, meanwhile, the attention weight output by the attention layer of the classification model can reasonably play a role in distinguishing the images with different classification difficulties, and based on the relevance between the classification difficulties and the image quality, the images with high image quality and the images with low image quality can be distinguished accurately through the attention weight.

In one possible implementation, the recognition model may be obtained by removing the model output layer of the original classification model. S206 includes:

s2061: and deleting the classification layer serving as a model output layer in the classification model.

S2062: and taking the attention layer of the classification model as a model output layer to obtain a recognition model for recognizing the image quality according to the attention weight.

By the original output layer of the trained classification model: the classification layer is deleted and then the attention layer is re-used as a newly generated model: the model output layer of the recognition model enables the recognition model to output corresponding attention weights through the attention layer serving as the model output layer according to the input image to be recognized, and accurately recognize the image quality of the image to be recognized through the attention weights with relevance to classification difficulty, namely the image quality.

Thus, the original output layer of the classification model can be replaced by the attention layer, and the recognition model can be obtained. I.e. the output of the recognition model is the attention weight output by the attention layer. The recognition model has the function of recognizing the image quality of the input image through the distinguishing capability of the attention weight on the image quality.

In one possible implementation, embodiments of the present application provide a way to perform image quality recognition by a recognition model. The method comprises the following steps:

s11: and acquiring an image to be identified.

S12: and determining the attention weight corresponding to the image to be identified according to the identification model.

S13: and determining an identification result of the image quality of the image to be identified based on the association relation between the attention weight and the classification difficulty.

As described above, the attention weight output by the recognition model for the image to be recognized may represent the classification difficulty of the image to be recognized, and the classification difficulty and the image quality have a correlation, that is, the image quality of the image easy to classify is higher than the image difficult to classify, that is, the image quality and the classification difficulty are inversely related. I.e. the greater the classification difficulty, the lower the image quality.

If the classification difficulty identified by the larger attention weight is higher, the smaller the attention weight output for the image to be identified is, the higher the image quality is according to the identification model, so that the image with better image quality can be accurately identified from the image to be identified based on the attention weight. For example, when the attention weight is greater than a first threshold, the classification difficulty of identifying the image to be identified is high, the corresponding image quality is low, and the processing requirements of the subsequent image identification technology are not met. When the attention weight is smaller than a second threshold value (the second threshold value is smaller than the first threshold value), the classification difficulty of the images to be identified is smaller, the corresponding image quality is higher, and the processing requirements of the subsequent image identification technology are met.

It can be seen that, in the image classification scene, it is easy to determine that the image of the classification type generally has better image quality, and it is difficult to determine that the image of the classification type generally has poor image quality, so that the image sample marked with the actual classification type is obtained, and the probability distribution of the image sample under a plurality of classification types is determined through the initial classification model, and the probability distribution can directly show whether the image sample is easy to classify. The attention layer is arranged on the initial classification model, and the attention weight corresponding to the classification difficulty is generated for the image samples based on the probability distribution, so that when the initial classification model is trained through the attention weight and a loss function determined by sample labels, the attention weight can enable the model to pay more attention to the image samples which are difficult to classify, the obtained classification model is trained, the size of the attention weight output by the attention layer can play a role in distinguishing whether the input image is easy to classify or difficult to classify, and the effect is equivalent to intuitively representing the quality of the image. Therefore, the recognition model for recognizing the image quality according to the attention weight can be obtained without specially labeling the image quality samples, a large number of samples with the prior class labels are used for training the classification model, and the attention layer is changed into the model output layer, so that the acquisition cost of the recognition model is greatly reduced, and the image quality of an input image in a business scene applied to an image recognition technology can be conveniently improved through the recognition model.

Multiple exercises are performed during model training of the initial classification model, one exercise being performed with one batch (batch) of image samples.

So in one possible implementation, S201 includes: a sample batch including a target number of image samples is acquired from the image sample set. The image sample set includes a large number of collected image samples with actual classification categories based on the number of samples required for one training: the target number can be selected from the image sample set to obtain a corresponding sample batch.

In the process of training the initial classification model through one sample batch, the initial classification model classifies and identifies each image sample in the sample batch, and each image sample in the identification process generates corresponding probability distribution and attention weight.

So in one possible implementation, S203 includes: and determining the attention weights respectively corresponding to the image samples of the target number through the attention layer of the initial classification model according to the probability distribution respectively corresponding to the image samples of the target number.

And the sum of the attention weights respectively corresponding to the image samples of the target number is constant. The mathematical expression is as follows:

Where const is a constant and N is the target number of image samples included in the sample lot.

That is, in order to improve the discrimination of the attention weights to the classification difficulty, the total amount of the attention weights is controlled for one sample lot, so that the final value of each attention weight in the sample lot is comprehensively measured based on the attention weights of the whole lot, the weight value stability of the attention weights to similar image quality is improved, and the weight discrimination to image quality with larger difference is achieved.

Model training for an initial classification model generally requires multiple training iterations, each of which can be performed for a batch of image samples. The number of image samples used in different batches may be the same or different.

Optionally, for S201: and sequentially acquiring a plurality of sample batches from the image sample set according to the target quantity respectively corresponding to the different sample batches.

In the process of determining the attention weights respectively corresponding to the image samples in the plurality of sample batches through the attention layer of the initial classification model, the sum of the attention weights respectively corresponding to the plurality of sample batches is the same. That is, for different sample batches, when corresponding attention weights are generated, unified control and model training are performed based on the same numerical range, so that the attention weights of the initial classification models are further improved to distinguish classification difficulties.

Since the number of image samples included in different sample batches may be different, in order to reasonably represent the attention weights of the different sample batches, normalization processing needs to be performed in a unified numerical space, and even if the number of image samples included in different sample batches is the same, subsequent calculation can be facilitated through normalization processing.

So in one possible implementation, S203 includes:

determining an initial attention value corresponding to the image sample through an attention layer of the initial classification model according to the probability distribution;

and normalizing the initial attention value according to the target number of the image samples in the sample batch where the image samples are located and the constant to obtain the attention weight.

Normalized mathematical expression is as follows:

where batch_size is the number of image samples in a sample batch. Σ attention is the sum of the attention weights of the sample lot, the attention on the left of the equation is the attention weight, and the attention on the right of the equation is the initial attention value.

Next, how the loss function is determined by the attention weight and the probability distribution is explained, including, for S204:

an initial loss function is determined based on the difference of the probability distribution and the actual classification category.

And taking the attention weight as the weight of the initial loss function, and determining the loss function corresponding to the image sample.

In the related art, the mathematical expression of the loss function of the classification model is as follows:

in this application, for the difference between the prediction result (determined by the probability distribution) of the image sample and the actual classification category, the attention weight corresponding to the image sample is used to guide the influence of the difference on the model training. The specific mathematical expressions are as follows:

in order to add the evaluation of the image quality of the image samples to the classification training task, the application learns the contribution degree of the current image sample to the overall loss function by using an adaptive learning parameter, so that the loss function (loss) of each image sample is adaptively weighted, and different contributions of the image samples with different image qualities to the overall loss can be realized. Thus, in the recognition stage of image quality by the recognition model, the image quality score (i.e., the quality level) of the current image to be recognized can be obtained directly using the adaptively learned parameter value (i.e., the attention weight).

Based on this, the loss function for the initial classification model in the embodiment of the present application may be:

loss_cls＝log_softmax(embedding)

att＝self_attention(embedding)

loss att＝loss cls*att

loss＝cross_entropy(loss_att，labels)

The log_softmax calculates a softmax value (loss_cls) of the ebedding, meanwhile, the ebedding is sent to an attention layer to obtain attention weight, then the attention weight is used for weighting the original softmax value to obtain a final softmax value (loss_att), and finally, cross entropy (cross_entcopy) of the loss_att and a sample label (label) is calculated to obtain a final loss function.

It should be noted that, although the embodiment of the present application provides a scheme for training a classification model, the actual purpose is to obtain an identification model for performing image quality identification by means of training of the classification model, so that the attention layer, which is the model output layer of the identification model, is the focus of the training.

Since in general the value range of the attention weight is between 0 and 1, i.e. any value of the attention weight output through the attention layer lies within this value range.

When the image quality of the image samples in one sample batch is too close, the attention weight of each image sample is too close, for example, in one sample batch with good image quality, the attention weight corresponding to the image sample is close to 0 based on the output of the attention layer, and the overall loss function is too small due to the too small attention weight, so that the training of the attention layer is difficult to help, the learning for distinguishing different classification difficulties is difficult to achieve, and the stable training of the attention layer is not beneficial.

In one possible implementation, the method further includes: the numerical range of the attention weights determined by the attention layer is pre-adjusted to increase the upper and lower limits of the numerical range.

By increasing the upper limit and the lower limit of the numerical range, for example, increasing the numerical range from 0 to 1 to 0.5 to 1.5, the influence degree of the loss function of the image sample with lower image quality in the model training process is increased by increasing the upper limit, so that the loss function corresponding to the image sample with better image quality is prevented from being too small and is unfavorable for training of the attention layer.

Therefore, the training effect on the attention layer in the initial classification model can be improved by improving the upper limit and the lower limit of the numerical range of the attention weight, so that the degree of distinction of the attention weight on different classification difficulties is enhanced, and the training stability is ensured.

By combining the foregoing embodiments, it can be seen that, by the method for determining an image quality recognition model provided in the embodiments of the present application, at least the following features may be achieved:

first, unsupervised: aiming at the image sample, the image quality is not required to be additionally marked, and the existing image sample marked with the actual classification category is used;

Secondly, the effect is obvious, and pictures with low image quality, such as blurring, dark light, covering and the like, can be effectively filtered;

thirdly, aiming at the face image, the face data is screened, the video face recognition, the video face clustering and the like are effectively improved in performance and efficiency;

fourth, the trained classification model can be migrated to other targets, tasks (tasks), such as object classification, etc.;

fifth, the existing mobile terminal network background structure such as mobiletv 2, v3 and so on can be conveniently compatible, and only the last classification layer (i.e. the model output layer) of the classification model needs to be replaced.

For the face image, the recognition model provided by the application can effectively filter the face image with low image quality, such as blurring, side face, dark light, covering and the like, and in the recognition result shown in fig. 5, the image quality of the image to be recognized shown in fig. 5 can be effectively distinguished through the recognition model provided by the application, for example, the image quality sorting shown in fig. 5 is obtained, and the image quality is progressively poorer from left to right.

Quantitative analysis: the special training quality evaluation model is used for filtering out the test data set, images with poor image quality are removed, then the same test data set is used for carrying out image quality recognition through the recognition model, the recognition accuracy is as follows, and the recognition model provided by the embodiment of the application has high recognition accuracy on the image quality:

All test data	Top 50% test data	First 20% test data
			86.7％	93.7％	95.1％

On the basis of the foregoing embodiments corresponding to fig. 1 to fig. 5, fig. 6 is a device configuration diagram of a device for determining an image quality recognition model, where the device 600 for determining an image quality recognition model includes an obtaining unit 601, a determining unit 602, a training unit 603, and a modifying unit 604:

the acquiring unit 601 is configured to acquire an image sample including a sample tag, where the sample tag is used to identify an actual classification category of the image sample;

the determining unit 602 is configured to input the image sample into an initial classification model, and determine probability distribution under a plurality of classification categories through the initial classification model;

the determining unit 602 is further configured to determine, according to the probability distribution, an attention weight corresponding to the image sample through an attention layer of the initial classification model, where the attention weight is used to identify a classification difficulty of the image sample;

the determining unit 602 is further configured to determine a loss function corresponding to the image sample according to the attention weight and a difference between the probability distribution and the actual classification category;

the training unit 603 is configured to perform model training on the initial classification model through the loss function to obtain a classification model;

The modifying unit 604 is configured to modify an attention layer of the classification model into a model output layer, so as to obtain an identification model for identifying image quality according to the attention weight.

In a possible implementation manner, the acquiring unit is further configured to acquire an image to be identified;

the determining unit is further used for determining the attention weight corresponding to the image to be identified according to the identification model;

the determining unit is further configured to determine a recognition result for the image quality of the image to be recognized based on the association relationship between the attention weight and the classification difficulty, where the greater the classification difficulty is, the lower the image quality is.

In a possible implementation manner, the obtaining unit is further configured to:

obtaining a sample batch including a target number of image samples from the image sample set;

the determining, according to the probability distribution, the attention weight corresponding to the image sample through the attention layer of the initial classification model includes:

and determining the attention weights respectively corresponding to the image samples of the target number through the attention layer of the initial classification model according to the probability distribution respectively corresponding to the image samples of the target number, wherein the sum of the attention weights respectively corresponding to the image samples of the target number is a constant.

sequentially acquiring a plurality of sample batches from the image sample set according to the target quantity respectively corresponding to the different sample batches;

in the process of determining the attention weights respectively corresponding to the image samples in the plurality of sample batches through the attention layer of the initial classification model, the sum of the attention weights respectively corresponding to the plurality of sample batches is the same.

In a possible implementation, the determining unit is further configured to:

In a possible implementation manner, the apparatus further includes an adjusting unit, where the adjusting unit is configured to pre-adjust a numerical range of the attention weight determined by the attention layer to increase an upper limit and a lower limit of the numerical range.

In a possible implementation, the determining unit is further configured to:

Determining an initial loss function based on the difference of the probability distribution and the actual classification category;

In a possible implementation, the modification unit is further configured to:

deleting a classification layer serving as a model output layer in the classification model;

and taking the attention layer of the classification model as a model output layer to obtain a recognition model for recognizing the image quality according to the attention weight.

In one possible implementation, the attention weight is inversely related to the classification difficulty of the image sample.

In one possible implementation, the plurality of classification categories are determined according to the actual user identification.

The embodiment of the application also provides a computer device, which is the computer device introduced above, and may include a terminal device or a server, where the determining device of the image quality recognition model may be configured in the computer device. The computer device is described below with reference to the accompanying drawings.

If the computer device is a terminal device, please refer to fig. 7, an embodiment of the present application provides a terminal device, taking the terminal device as a mobile phone as an example:

fig. 7 is a block diagram showing a part of the structure of a mobile phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 7, the mobile phone includes: radio Frequency (RF) circuitry 1410, memory 1420, input unit 1430, display unit 1440, sensor 1450, audio circuitry 1460, wireless fidelity (Wireless Fidelity, wiFi) module 1470, processor 1480, and power supply 1490. It will be appreciated by those skilled in the art that the handset construction shown in fig. 7 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes the components of the mobile phone in detail with reference to fig. 7:

The RF circuit 1410 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the downlink information is processed by the processor 1480; in addition, the data of the design uplink is sent to the base station. Typically, the RF circuitry 1410 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (Low Noise Amplifier, LNA for short), a duplexer, and the like. In addition, the RF circuitry 1410 may also communicate with networks and other devices through wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (Global System of Mobile communication, GSM for short), general packet radio service (General Packet Radio Service, GPRS for short), code division multiple access (Code Division Multiple Access, CDMA for short), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA for short), long term evolution (Long Term Evolution, LTE for short), email, short message service (Short Messaging Service, SMS for short), and the like.

The memory 1420 may be used to store software programs and modules, and the processor 1480 performs various functional applications and data processing of the cellular phone by executing the software programs and modules stored in the memory 1420. The memory 1420 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 1430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1430 may include a touch panel 1431 and other input devices 1432. The touch panel 1431, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1431 or thereabout by using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 1431 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device and converts it into touch point coordinates, which are then sent to the processor 1480, and can receive commands from the processor 1480 and execute them. Further, the touch panel 1431 may be implemented in various types such as a resistive type, a capacitive type, an infrared type, and a surface acoustic wave type. The input unit 1430 may include other input devices 1432 in addition to the touch panel 1431. In particular, the other input devices 1432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 1440 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1440 may include a display panel 1441, and optionally, the display panel 1441 may be configured in a form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 1431 may overlay the display panel 1441, and when the touch panel 1431 detects a touch operation thereon or nearby, the touch operation is transferred to the processor 1480 to determine the type of the touch event, and then the processor 1480 provides a corresponding visual output on the display panel 1441 according to the type of the touch event. Although in fig. 7 the touch panel 1431 and the display panel 1441 are two separate components to implement the input and input functions of the mobile phone, in some embodiments, the touch panel 1431 may be integrated with the display panel 1441 to implement the input and output functions of the mobile phone.

The handset can also include at least one sensor 1450, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 1441 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1441 and/or the backlight when the phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.

Audio circuitry 1460, speaker 1461, microphone 1462 may provide an audio interface between the user and the handset. The audio circuit 1460 may transmit the received electrical signal after the audio data conversion to the speaker 1461, and the electrical signal is converted into a sound signal by the speaker 1461 and output; on the other hand, the microphone 1462 converts the collected sound signals into electrical signals, which are received by the audio circuit 1460 and converted into audio data, which are processed by the audio data output processor 1480 and sent via the RF circuit 1410 to, for example, another cell phone, or which are output to the memory 1420 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 1470, so that wireless broadband Internet access is provided for the user. Although fig. 7 shows a WiFi module 1470, it is understood that it does not belong to the necessary components of a cell phone, and can be omitted entirely as needed within the scope of not changing the essence of the invention.

The processor 1480 is a control center of the mobile phone, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions and processes data of the mobile phone by running or executing software programs and/or modules stored in the memory 1420, and calling data stored in the memory 1420, thereby performing overall monitoring of the mobile phone. In the alternative, processor 1480 may include one or more processing units; preferably, the processor 1480 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1480.

The handset further includes a power supply 1490 (e.g., a battery) for powering the various components, which may be logically connected to the processor 1480 via a power management system so as to provide for managing charge, discharge, and power consumption by the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.

In this embodiment, the processor 1480 included in the terminal apparatus also has the following functions:

If the computer device is a server, as shown in fig. 8, fig. 8 is a block diagram of the server 1500 provided in the embodiment of the present application, where the server 1500 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (Central Processing Units, abbreviated as CPU) 1522 (e.g., one or more processors) and a memory 1532, one or more storage media 1530 (e.g., one or more mass storage devices) storing application programs 1542 or data 1544. Wherein the memory 1532 and the storage medium 1530 may be transitory or persistent storage. The program stored on the storage medium 1530 may include one or more modules (not shown), each of which may include a series of instruction operations on the server. Still further, the central processor 1522 may be configured to communicate with a storage medium 1530 and execute a series of instruction operations on the storage medium 1530 on the server 1500.

The Server 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input/output interfaces 1558, and/or one or more operating systems 1541, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Etc.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 8.

In addition, the embodiment of the application also provides a storage medium for storing a computer program for executing the method provided by the embodiment.

The present embodiments also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method provided by the above embodiments.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, where the above program may be stored in a computer readable storage medium, and when the program is executed, the program performs steps including the above method embodiments; and the aforementioned storage medium may be at least one of the following media: read-only Memory (ROM), RAM, magnetic disk or optical disk, etc.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The apparatus and system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing is merely one specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered in the protection scope of the present application. Further combinations of the implementations provided in the above aspects may be made to provide further implementations. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of determining an image quality recognition model, the method comprising:

2. The method according to claim 1, wherein the method further comprises:

acquiring an image to be identified;

determining the attention weight corresponding to the image to be identified according to the identification model;

And determining an identification result of the image quality of the image to be identified based on the association relation between the attention weight and the classification difficulty, wherein the higher the classification difficulty is, the lower the image quality is.

3. The method of claim 1, wherein the acquiring the image sample including the sample tag comprises:

4. A method according to claim 3, wherein the obtaining a sample batch comprising a target number of image samples from the set of image samples comprises:

5. A method according to claim 3, wherein said determining, from the probability distribution, the attention weight corresponding to the image sample by the attention layer of the initial classification model comprises:

6. The method according to any one of claims 1-5, further comprising:

the numerical range of the attention weights determined by the attention layer is pre-adjusted to increase the upper and lower limits of the numerical range.

7. The method according to any one of claims 1-5, wherein said determining a loss function for said image sample based on said attention weight and a difference of said probability distribution from said actual classification category comprises:

8. The method according to any one of claims 1-5, wherein said changing the attention layer of the classification model to a model output layer results in an identification model for identifying image quality based on the attention weight, comprising:

9. The method of any one of claims 1-5, wherein the attention weight is inversely related to the difficulty of classification of the image sample.

10. The method according to any one of claims 1-5, wherein the image sample is a face image sample, and the sample tag is used to identify an actual user identifier corresponding to a face in the face image sample.

11. The method of claim 10, wherein the plurality of classification categories are determined based on the actual user identification.

12. A determining device of an image quality recognition model, characterized in that the device comprises an acquiring unit, a determining unit, a training unit and a modifying unit:

13. A computer device, the computer device comprising a processor and a memory:

the processor is configured to perform the method of any of claims 1-11 according to instructions in the program code.

14. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a computer program for executing the method of any one of claims 1-11.

15. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-11.