CN111814810A

CN111814810A - Image recognition method and device, electronic equipment and storage medium

Info

Publication number: CN111814810A
Application number: CN202010802994.4A
Authority: CN
Inventors: 颜波
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2020-10-23
Also published as: WO2022033150A1

Abstract

The application discloses an image identification method, an image identification device, electronic equipment and a storage medium, wherein the image identification method comprises the following steps: acquiring an image to be identified; inputting the image to be recognized into a pre-trained image recognition model to obtain the probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with non-preset categories; and when the probability corresponding to each preset category is smaller than a set threshold value, outputting a result for representing that the image to be recognized does not belong to any preset category. The method can effectively avoid the problem that the image recognition model wrongly recognizes the image of the category as the existing category when recognizing the image of the category for the category which does not exist in the image recognition model, thereby improving the recognition accuracy of the image recognition.

Description

Image recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image recognition method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of the technology level, people have attracted great research interest in identifying objects in images, and the method is deployed in a plurality of application products, so that the problems in daily life are solved intelligently, and the method is used in the fields of security, public security, judicial law and the like. In the conventional image recognition technology, the image recognition is performed through a trained image recognition model, but for classes which do not exist in the image recognition model, the situation of false recognition exists, so that the situation of inaccuracy exists in the application of the image recognition.

Disclosure of Invention

In view of the foregoing, the present application provides an image recognition method, an image recognition apparatus, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present application provides an image recognition method, where the method includes: acquiring an image to be identified; inputting the image to be recognized into a pre-trained image recognition model to obtain the probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with non-preset categories; and when the probability corresponding to each preset category is smaller than a set threshold value, outputting a result for representing that the image to be recognized does not belong to any preset category.

In a second aspect, an embodiment of the present application provides an image recognition apparatus, including: the device comprises an image acquisition module, an image input module and a result output module, wherein the image acquisition module is used for acquiring an image to be identified; the image input module is used for inputting the image to be recognized into a pre-trained image recognition model to obtain the probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with non-preset categories; and the result output module is used for outputting a result for representing that the image to be identified does not belong to any preset category when the probability corresponding to each preset category is smaller than a set threshold.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the image recognition method provided by the first aspect above.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code may be called by a processor to execute the image recognition method provided in the first aspect.

According to the scheme provided by the application, when the image to be recognized is obtained and input to the pre-trained image recognition model to obtain the probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with the non-preset categories, when the probability corresponding to each preset category is less than the set threshold value, the result used for representing that the image to be recognized does not belong to any preset category is output, so that the image to be recognized can be recognized according to the image recognition model obtained by training according to the positive sample images marked with the preset categories and the negative sample images marked with the non-preset categories, and the image to be recognized of the category which does not exist in the image recognition model can be recognized, the probability of each preset category output by the image recognition model is determined as not belonging to any preset category, so that the situation that the image to be recognized is recognized as the existing category by mistake is effectively avoided, and the accuracy of image recognition is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 shows a flow diagram of an image recognition method according to one embodiment of the present application.

FIG. 2 shows a flow diagram of an image recognition method according to another embodiment of the present application.

Fig. 3 shows a flowchart of step S210 in an image recognition method according to another embodiment of the present application.

FIG. 4 shows a flow diagram of an image recognition method according to yet another embodiment of the present application.

Fig. 5 shows a flowchart of step S320 in an image recognition method according to another embodiment of the present application.

FIG. 6 shows a block diagram of an image recognition device according to an embodiment of the present application.

Fig. 7 is a block diagram of an electronic device for executing an image recognition method according to an embodiment of the present application.

Fig. 8 is a storage unit for storing or carrying a program code implementing an image recognition method according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

With the development of science and technology, image recognition based on artificial intelligence is widely applied to various industries, so that people can feel the influence brought by the artificial intelligence in daily life. For example, image recognition may be used in security systems, may be used to identify and predict events; for another example, image recognition may be used in the screening of bad images to reduce the cost of manual screening; also for example, image recognition may be used in criminal searches, and may be used to screen images of criminals from a large number of images.

In the related art, most of conventional image recognition methods are specific to a specific field, and for objects in the specific field, the conventional methods can achieve a satisfactory effect, but in the real world, the classes of the objects cannot be exhausted, that is, the image recognition model cannot include all the classes of the objects in reality, so that for images of classes that do not exist in the model, in the conventional methods, when the images are recognized, the images are still classified into known classes with a certain probability, thereby causing false recognition. The image recognition is a classification problem essentially, and the existing image recognition method is a closed set problem in the training process, that is, the categories of all pictures in the training process are in the given category, but an open set problem in the practical application process, that is, the categories of the pictures which need to be recognized actually may not be in the given category, so that the pictures can still be recognized as objects in the given category with a certain probability, thereby causing misrecognition which can seriously reduce the value of the product and the user experience, and therefore, the reduction of the misrecognition rate is particularly important under the condition of ensuring the accuracy rate.

In view of the above problems, the inventor proposes an image recognition method, an apparatus, an electronic device, and a storage medium provided in the embodiments of the present application, in which an image to be recognized is recognized in advance according to an image recognition model obtained by training a positive sample image labeled with a preset category and a negative sample image labeled with a non-preset category, so that when the image to be recognized of the category is recognized for a category that does not exist in the image recognition model, the image to be recognized is effectively and erroneously recognized as the existing category, and thus the accuracy of image recognition is improved. The specific image recognition method is described in detail in the following embodiments.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an image recognition method according to an embodiment of the present application. In a specific embodiment, the image recognition method is applied to the image recognition apparatus 400 shown in fig. 6 and the electronic device 100 (fig. 7) equipped with the image recognition apparatus 400. The following will describe a specific process of this embodiment by taking an electronic device as an example, and it is understood that the electronic device applied in this embodiment may be a smart phone, a tablet computer, a smart watch, smart glasses, a notebook computer, and the like, which is not limited herein. As will be explained in detail with respect to the flow shown in fig. 1, the image recognition method may specifically include the following steps:

step S110: and acquiring an image to be identified.

In the embodiment of the application, the electronic device may use the image to be subjected to the category identification of the image content as the image to be identified. In this case, the type recognition of the image content is performed, that is, the type recognition of the entity object in the image is performed, for example, the type recognition of the animal, the plant, and the like in the image is performed.

As an embodiment, when the electronic device is a mobile terminal provided with a camera, such as a smart phone, a tablet computer, and a smart watch, image acquisition may be performed through a front camera or a rear camera, so as to obtain an image to be subjected to category identification of image content, for example, the electronic device may acquire an image through the rear camera and use the obtained image as an input image; as another embodiment, the electronic device may obtain the image to be subjected to the detection of the target object from the local, that is, the electronic device may obtain the image to be subjected to the category identification of the image content from a file stored locally, for example, when the electronic device is a mobile terminal, the image to be subjected to the category identification of the image content may be obtained from an album, that is, the electronic device may collect the image by a camera in advance and store the image in the local album, or may download the image from the network in advance and store the image in the local album, and then when the image needs to be subjected to the category identification of the image content, the image to be subjected to the category identification of the image content may be read from the album; as another mode, when the electronic device is a mobile terminal or a computer, the electronic device may also download an image to be subjected to the category identification of the image content from a network, for example, the electronic device may download a required image from a corresponding server through a wireless network, a data network, and the like, so as to perform the category identification of the image content on the downloaded image; as a further embodiment, the electronic device may also receive, through an input operation of a user, an input image to be subjected to category identification of image content, so as to obtain an image to be identified; as yet another embodiment, when the electronic device is a server, the image to be recognized may also be acquired from a database server or from a client of the user. Of course, the way in which the electronic device specifically acquires the image to be recognized may not be limiting.

Step S120: and inputting the image to be recognized into a pre-trained image recognition model to obtain the probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with the non-preset categories.

In this embodiment of the application, the electronic device may input an image to be recognized into a pre-trained image recognition model to obtain a probability corresponding to each preset category output by the image recognition model. In some embodiments, the image recognition model may be stored locally in the electronic device in advance, and the electronic device may directly call the image recognition model from the local, and input the image to be recognized to the image recognition model; the image recognition model may also be stored in the server, and when the electronic device needs to perform the category recognition of the content on the image to be recognized, the electronic device may call the image recognition model in the server, for example, send the image to be recognized to the server, so as to instruct the server to input the image to be recognized into the image recognition model to perform the category recognition of the image content.

In some embodiments, the image recognition model may be trained from a large number of training samples. The training samples may include a plurality of positive sample images labeled with a preset category and a plurality of negative sample images labeled with a non-preset category. Specifically, the positive sample image may be input to the initial identification model, a probability of each preset category of the plurality of preset categories output by the initial identification model is obtained, a difference between a true probability of the preset category corresponding to the positive sample image and the probability output by the initial identification model is calculated, and a loss of an output result output by the image identification model according to the positive sample image is obtained; in addition, inputting the negative sample image into the initial identification model to obtain the probability of outputting each preset category, and then obtaining the loss of the output result of the image identification model according to the negative sample image according to the difference between the real probability of the preset category corresponding to the negative sample image and the probability output by the initial identification model; then calculating to obtain the total loss of the output result corresponding to each sample image in a training batch according to the loss of the output result corresponding to each positive sample image and the loss of the output result corresponding to each negative sample image; and performing iterative training on the initial recognition model according to an optimization algorithm to obtain a trained image recognition model. The initial recognition model may be a convolutional neural network model, and the like, which is not limited herein. Because the initial recognition model is trained by using the negative sample images of the non-preset categories, when the acquired image recognition model recognizes the categories of the input image, if the input image does not contain the image content of any preset category, the probability of each preset category output by the image recognition model is close to or equal to the real probability corresponding to the image of the non-preset category, so that the image of the non-preset category cannot be distinguished as one of the preset categories.

Step S130: and when the probability corresponding to each preset category is smaller than a set threshold value, outputting a result for representing that the image to be recognized does not belong to any preset category.

In the embodiment of the application, after the electronic device obtains the probabilities corresponding to the preset categories output by the image recognition model, the output result may be determined according to the probabilities corresponding to the preset categories. Specifically, the electronic device may compare the probability corresponding to each preset category with a preset threshold, where the preset threshold is used as a judgment basis for determining whether the image to be recognized belongs to one of the preset categories. The preset threshold may be set to be generally high, for example, 70%, 80%, 90%, and the like, and specific numerical values may not be limited, and it can be understood that only when the probability corresponding to a certain preset category is particularly high, the probability that an image belongs to the preset category is indicated to be high, so that the image can be determined to belong to the preset category. After the probabilities corresponding to the preset categories are compared with the set threshold, when the probability corresponding to each preset category is smaller than the set threshold, the probability corresponding to each preset category is smaller, and the preset category cannot be judged as any preset category, so that a result used for representing that the image to be recognized does not belong to any preset category can be output; when the probability corresponding to the target category in all the preset categories is greater than the set threshold, the probability that the image to be recognized belongs to the target category is higher and is greater than the probabilities corresponding to other preset categories, and therefore a result used for representing that the image to be recognized belongs to the target category can be output.

According to the image recognition method provided by the embodiment of the application, the image to be recognized is recognized through the image recognition model obtained by training the positive sample image marked with the preset category and the negative sample image marked with the non-preset category in advance, when the image to be recognized of the category is recognized for the category which does not exist in the image recognition model, the image to be recognized is effectively and wrongly recognized as the existing category, and therefore the accuracy of image recognition is improved.

And, since the class to which the model to be recognized belongs is determined by the probability of each preset class output by the image recognition model, so that when the image to be recognized input to the image recognition model does not belong to any preset category, the probability of each preset category output by the image recognition model is not greater than a set threshold, therefore, the images are judged not to belong to any preset category, so that the situation that one non-preset category is independently set for the images of the non-preset category and the number of negative samples is insufficient can be effectively avoided, because the image recognition model cannot recognize enough non-preset categories, when the image recognition model is used for the non-preset categories which cannot be recognized, the probability that a certain preset category may exist in the output result is high, and the situation that the image is wrongly classified into the preset category with higher probability occurs, so that the image recognition is more accurate.

In addition, for the image recognition model, besides a plurality of preset categories needing to be recognized, one category is not independently set for the non-preset categories, so that the image recognition model only needs to output the probability corresponding to each preset category, and the category to which the image to be recognized belongs can be determined according to the probability corresponding to each preset category.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an image recognition method according to another embodiment of the present application. The image recognition method is applied to the electronic device, and will be described in detail with respect to the flow shown in fig. 2, and the image recognition method may specifically include the following steps:

step S210: obtaining a sample image set, wherein the sample image set comprises a plurality of positive sample images marked with preset categories and a plurality of negative sample images marked with non-preset categories.

In this embodiment of the present application, for the image recognition model mentioned in the foregoing embodiment, the embodiment of the present application further includes a training method for the image recognition model, and it is worth to be noted that training for the image recognition model may be performed in advance according to the obtained sample image set, and then may be performed by using the image recognition model each time the class recognition of the image content needs to be performed on the image to be recognized, instead of performing training for the image recognition model each time the class recognition of the image content is performed on the image to be recognized.

In the embodiment of the application, when the image recognition model is trained, a sample image set may be obtained, where the sample image set includes a plurality of positive sample images labeled with a preset category and a plurality of negative sample images labeled with a non-preset category.

In some embodiments, referring to fig. 3, the electronic device acquiring the sample image set may include:

step S211: acquiring a plurality of first images corresponding to a plurality of preset categories and a plurality of second images corresponding to a non-preset category, wherein the first image corresponding to each preset category comprises an entity object of the preset category, each preset category corresponds to at least one first image, and the second image corresponding to the non-preset category comprises an entity object of the non-preset category;

step S212: preprocessing each first image in the plurality of first images respectively to obtain a plurality of positive sample images marked with preset categories;

step S213: and respectively carrying out the preprocessing on each second image in the plurality of second images to obtain a plurality of negative sample images marked with preset categories.

In this embodiment, the plurality of preset categories may include categories of image contents recognized by the demand image recognition model, and the number of preset categories may be set according to actual demands. For example, where the need identifies 5 categories of cat, dog, pig, sheep and cow, then the plurality of predetermined categories include cat, dog, pig, sheep and cow.

In this embodiment, for each preset category, a plurality of images may be acquired and labeled as the corresponding preset category. It can be understood that the more images correspond to each preset category, the better the recognition capability of the image recognition model obtained by subsequent training on the images of the preset categories is. For example, for a preset category, images of entity objects of the preset category in a plurality of different scenes may be acquired, so that a plurality of images corresponding to each preset category may have a wide range. In addition, when the image corresponding to the non-preset category is acquired, an image including any entity object of the non-preset category may be acquired, and the image corresponding to the entity object of the non-preset category may be acquired widely, for example, when the preset category includes an animal category of cat and dog, the image corresponding to an entity object of another category other than cat and dog may be acquired widely, for example, an image corresponding to an entity object of a category of flower, grass, tree, pig, sheep, and the like may be acquired, and the acquired image corresponding to the non-target category does not include the entity object of the preset category. The above images may be derived from a training set containing a large number of images, such as a COCODataest data set, IMAGECLEF data set, and the like, and are not limited herein.

In some embodiments, preprocessing the plurality of first images may include:

acquiring a target area where an entity object in each first image is located according to a pre-trained object detection model;

carrying out proportion adjustment on the target area corresponding to each obtained first image to obtain an area image corresponding to each first image;

and normalizing the pixel values of all pixel points in each region image.

In this embodiment, a pre-trained object detection model may be used to detect a target region where an entity object of a preset category is located, and the target region is cut out from the first image, so as to separate the target region where the entity object is located from the first image, and the content of other irrelevant regions is removed, so that the first image used for training does not contain too many irrelevant features with the entity object during subsequent training, thereby reducing the amount of computation and improving the effect of model training. In addition, the cut target area is scaled, that is, scaled up or scaled down, so that the size of the image used for model training may be consistent, for example, the size may be uniformly adjusted to 224 × 224, and the size of 224 × 224 is set instead of other sizes, which is a selection after balancing the size, the operation speed, and the performance of the model, and the larger the scale size is, the better the classification performance is, but the size and the operation speed of the model are increased accordingly, and the specific scale size may be selected according to actual requirements.

In addition, the pixel values of the target area are normalized to [0,1], so that the influence of the absolute size of the pixel values on the classification performance can be eliminated, for example, the pixel values of some images are clearer and the pixel values of some images are blurry, and the influence of the size of the pixel values of the images on the recognition performance can be reduced after normalization, so that the model can learn the characteristic information of the texture structure and the like of the images per se, and meanwhile, the model training and convergence speed and the like can be increased.

In the embodiment, the pre-trained object detection model can be a MobileNet-SSD (solid state disk), so that the storage space of the electronic device can be effectively saved, and the operating efficiency of the electronic device can be improved.

In the above embodiment, after obtaining a plurality of positive sample images labeled with a preset category and a plurality of negative sample images labeled with a non-preset category, an image set formed by the positive sample images and the negative sample images is a sample image set to be obtained.

Step S220: and respectively inputting each positive sample image and each negative sample image into the initial recognition model to obtain the probability corresponding to each preset category output by the initial recognition model.

In this embodiment of the application, after the sample image set is obtained, the positive sample images and the negative sample images may be input to the initial recognition model in a distributed manner, so as to obtain a probability corresponding to each preset category output by the initial recognition model.

In some implementations, the initial recognition model can include a feature extraction module and a classification module. The feature extraction module extracts image features of the image, inputs the image features into the classification module, and then the classification module outputs probabilities corresponding to all preset categories according to the input image features.

In some approaches, the feature extraction module may be a pre-trained neural network. For example, the neural Network may be a Visual image Generator (VGG) model, a depth residual Network (ResNet) model, a MobileNetV2, or the like for extracting image features. Optionally, the neural network may be a convolutional neural network (e.g., VGG19) pre-trained in imageNet, and since the convolutional neural network pre-trained in imageNet already has a strong feature extraction capability, it does not need to be retrained, and the features can be extracted based on the convolutional neural network pre-trained in imageNet. Optionally, the neural network may also be MobileNetV2, which can effectively save the storage space of the electronic device and improve the operating efficiency of the electronic device.

In some embodiments, the classification module may be a Softmax logistic regression model (Softmax logistic regression), a Support Vector Machine (SVM), or the like, and the specific classification module may not be limited.

Step S230: and determining a total loss value according to the probability output by the initial identification model, the labeled preset category of each positive sample image and the labeled non-preset category of each negative sample image.

In this embodiment of the application, for the output result of the initial recognition model corresponding to each positive sample image and the preset category labeled to each positive sample image, the loss of the output result corresponding to each positive sample image is calculated, for the output result of the initial recognition model corresponding to each negative sample image and the preset category labeled to each negative sample image, the loss of the output result corresponding to each negative sample image is calculated, and then the total loss value is determined according to the calculated loss of the output results corresponding to each positive sample image and each negative sample image.

In some embodiments, the process of determining the total loss value may include:

obtaining a loss value of an output result corresponding to the positive sample image according to a difference between a probability output by the initial identification model corresponding to the positive sample image and a real probability corresponding to a labeled preset category of the positive sample image, wherein in the real probability corresponding to the labeled preset category of the positive sample image, the probability corresponding to the labeled preset category is greater than or equal to the set threshold, and probabilities corresponding to other preset categories are smaller than the set threshold, and the other preset categories are preset categories except the labeled preset category in all the preset categories; obtaining a loss value of an output result corresponding to the negative sample image according to a difference between a probability output by the initial identification model corresponding to the negative sample image and a real probability corresponding to a non-preset type labeled to the negative sample image, wherein in the real probability corresponding to the non-preset type labeled to the negative sample image, the probability corresponding to each preset type is smaller than the set threshold value; and obtaining the total loss value of the output result corresponding to the sample image set according to the loss value of the output result corresponding to each positive sample image and the loss value of the output result corresponding to each negative sample image.

It is understood that only when the probability corresponding to a certain preset category is particularly high, the probability that the image belongs to the preset category is represented to be high, so that the image can be determined to belong to the preset category. For a positive sample image, the true probability corresponding to the labeled preset category is that only the probability corresponding to the labeled preset category is greater than or equal to the set threshold, and the probabilities corresponding to other preset categories are less than the set threshold, that is, only the probability corresponding to the labeled preset category is very high, and the probabilities corresponding to other preset categories are very low; for the negative sample image, since the negative sample image does not belong to any preset category, the true probability corresponding to the marked preset category should be that the probability corresponding to each preset category is smaller than the set threshold. By setting in this way, the loss of the output result corresponding to each positive sample image and each negative sample image can be obtained according to the output result corresponding to the initial identification model and the true probability of the label marked on each sample image, and then the total loss value of the output results corresponding to all the images of the whole sample image set can be calculated according to the loss of the output result corresponding to each positive sample image and each negative sample image.

In this embodiment, when the total loss value of the output results corresponding to all the sample images is determined based on the loss of the output result corresponding to each positive sample image and the loss of the output result corresponding to each negative sample image, the average loss value of the output results corresponding to the positive sample images may be obtained as a first loss value based on the loss value of the output result corresponding to each positive sample image, the average loss value of the output results corresponding to the negative sample images may be obtained as a second loss value based on the loss value of the output result corresponding to each negative sample image, and the total loss value of the output results corresponding to the sample image set may be obtained based on the first loss value and the second loss value.

As a specific embodiment, different weights may be set for the average loss value corresponding to the positive sample image and the average loss value corresponding to the negative sample image, that is, different weights may be set for the first loss value and the second loss value, respectively, then the product of the first loss value and its corresponding weight and the product of the second loss value and its corresponding weight are obtained, and then the sum of the two products is determined, so as to obtain the total loss value of the output result corresponding to the sample image set.

The above is a total loss value obtained when all images in the sample image set are used as images required for one training batch and training is performed. When all the images in the sample image set are used as images required by one training batch, the data of the positive sample image and the negative sample image can be guaranteed to be equal, for example, the data are all set to be N, and N is a positive integer.

As a specific embodiment, the loss calculation for the output result corresponding to the positive sample image can be calculated according to the following formula:

wherein p is_iFor the probability that the positive sample image belongs to the category i, xi is a feature vector after feature extraction, such as the aforementioned output vector of MobileNetV2, W is a weight vector, b is an offset, y is a labeled label (i.e., a labeled preset category), and N is the number of positive sample images.

For the loss calculation of the output result corresponding to the negative sample image, since the entity object in the negative sample image does not belong to any one of the preset categories, the probabilities obtained after classification (for example, classification by the aforementioned Softmax classifier) should be uniformly distributed, and ideally, the probabilities of the image belonging to each preset category are the same, that is, the probabilities of the image belonging to each preset category are the same

In this way, the probability of occurrence on a certain preset category is not particularly high, that is, the probability corresponding to each preset category is smaller than the set threshold, so that the situation of false recognition is not caused, that is, for sample data without a category label, the following conditions should be satisfied:

maxp₁*p₂*…*p_k

s.t.p₁+p₂+…+p_k＝1

wherein p is_kThe probability that the negative sample image output by the initial recognition model belongs to each preset category is defined as follows: the sum of the probabilities corresponding to all the preset categories should be 1, andand, for negative sample images, it is desirable that the probability that it belongs to each preset class is equal, i.e. p1 p2 pk, which is the target of the final required model output result, and this target is equivalent to solving for p₁*p₂*...*p_kMaximum of, i.e. it takes p₁、p₂、...、p_kMaximum of the product between, that is, when p₁＝p₂＝...＝p_kWhen it is, then p₁*p₂*...*p_kThe product of (d) takes the maximum value.

Further, p is obtained₁*p₂*...*p_kIs equivalent to finding p₁、p₂、...、p_kMaximum of the logarithm of (i), i.e. to find log (p)₁*p₂*...*p_k) This is because log is monotonically increasing, and because the loss functions are represented by minima, maximizing their logarithms is equal to minimizing the inverse of their logarithms, i.e., -log (p)₁*p₂*...*p_k) Is a property expansion according to a logarithmic function of the minimum value of (d) is [ log (p)₁+log(p₂)+...log(p_k)]。

Therefore, the loss value of the output result corresponding to the negative sample image can be calculated according to the following formula:

wherein p is_iProbability, x, that a negative sample image belongs to class i_iFor the feature vector after feature extraction, for example, the output vector of the MobileNetV2 mentioned above, W is a weight vector, b is an offset, y is a labeled label (i.e., a labeled preset category), and N is the number of negative sample images.

Further, after calculating the loss value of the output result corresponding to the positive sample image and the loss value of the output result corresponding to the negative sample image according to the above formula, if the positive sample image and the negative sample image are equal, and both are N, the weighting obtains the total loss value as follows:

ω represents the weight of the average loss value corresponding to the negative sample image, and the value range thereof can be [0.1, 0.5%]。

Step S240: and performing iterative training on the initial recognition model according to the total loss value to obtain the image recognition model.

In the embodiment of the application, after the total loss value of the output result corresponding to the sample image set is obtained, the initial recognition model may be iteratively trained according to the total loss value to obtain a final image recognition model.

In some embodiments, an Adam optimizer may be used to perform iterative training on the initial recognition model according to the total loss function until the loss value of the output result of the initial recognition model converges, and the model at this time is stored to obtain the trained image recognition model. The Adam optimizer combines the advantages of two optimization algorithms, namely Adaptive Gradient and RMSProp, comprehensively considers First Moment Estimation (mean value of Gradient) and Second Moment Estimation (non-centralized variance of Gradient) of the Gradient, and calculates the update step length.

In some embodiments, iteratively trained termination bars) may include: the number of times of iterative training reaches the target number of times; or the total loss value of the output result of the initial recognition model satisfies the set condition.

In one embodiment, the convergence condition is to make the total loss value as small as possible, and the initial learning rate 1e-3 is used, the learning rate decays with the cosine of the step number, the batch _ size is 8, and after 16 epochs are trained, the convergence is considered to be completed. Where batch _ size may be understood as a batch parameter, its limit is the total number of samples in the training set, epoch refers to the number of times the entire data set is trained using all samples in the training set, colloquially the value of epoch is the number of times the entire data set is cycled, 1 epoch equals 1 training time using all samples in the training set.

In another embodiment, the total loss value satisfying the set condition may include: the total loss value is less than the set threshold. Of course, the specific setting conditions may not be limiting.

In some embodiments, the trained image recognition model may be stored locally in the electronic device, and the trained image recognition model may also be stored in a server in communication connection with the electronic device, so that the storage space occupied by the electronic device may be reduced, and the operating efficiency of the electronic device may be improved.

In some embodiments, the image recognition model may also periodically or aperiodically acquire new training data, train and update the image recognition model. For example, when the image is recognized by mistake, the image can be used as a sample image, and after the sample image is labeled, the training is performed through the above training mode, so that the recognition degree and the recognition accuracy of the image recognition model can be improved.

In some embodiments, since the image recognition model is used to recognize certain categories of images, when a user needs a change in the category recognized by the image recognition model, a new preset category may be added or a certain preset category may be deleted; and training the image recognition model according to the changed preset category.

By the training method for the image recognition model, the existing image recognition model can be improved, so that when the error recognition rate is reduced, the training method can be directly used for training without adding a category independently (namely adding a category corresponding to a non-preset category), and the existing image recognition model can be improved more simply and conveniently.

Step S250: and acquiring an image to be identified.

Step S260: and inputting the image to be recognized into a pre-trained image recognition model to obtain the probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with the non-preset categories.

Step S270: and when the probability corresponding to each preset category is smaller than a set threshold value, outputting a result for representing that the image to be recognized does not belong to any preset category.

In the embodiment of the present application, steps S250 to S270 may refer to the contents of the foregoing embodiments, and are not described herein again.

The image recognition method provided by the embodiment of the application provides a training process for an image recognition model, and comprises the steps of respectively inputting each positive sample image and each negative sample image into an initial recognition model by obtaining a sample image set comprising a plurality of positive sample images marked with preset categories and a plurality of negative sample images marked with non-preset categories, obtaining the probability corresponding to each preset category output by the initial recognition model, determining the total loss value according to the result output by the initial recognition model, the marked preset categories of each sample image and the marked non-preset categories of each negative sample image, and then carrying out iterative training on the initial recognition model according to the total loss value to obtain the image recognition model. Because the initial recognition model is trained by using the negative sample images of the non-preset categories, when the acquired image recognition model recognizes the categories of the input image, if the input image does not contain the image content of any preset category, the probability of each preset category output by the image recognition model is close to or equal to the real probability corresponding to the image of the non-preset category, so that the image of the non-preset category cannot be distinguished as one of the preset categories.

Referring to fig. 4, fig. 4 is a flowchart illustrating an image recognition method according to another embodiment of the present application. The image recognition method is applied to the electronic device, and will be described in detail with respect to the flow shown in fig. 4, and the image recognition method may specifically include the following steps:

step S310: and acquiring an image to be identified.

In the embodiment of the present application, the step S310 may refer to the contents of the foregoing embodiments, and is not described herein again.

Step S320: and preprocessing the image to be recognized.

In the embodiment of the application, in order to enable the image to be recognized to meet the image input standard of the image recognition model, the recognition accuracy rate is improved, the processing efficiency is improved, and the image to be recognized can be preprocessed.

In some embodiments, referring to fig. 5, the preprocessing the image to be recognized includes:

step S321: acquiring the area of the entity object in the image to be recognized according to a pre-trained object detection model;

step S322: carrying out proportion adjustment on the region where the entity object in the image to be identified is located to obtain a region image corresponding to the image to be identified;

step S323: and normalizing the pixel values of all pixel points in the area image corresponding to the image to be identified.

The electronic equipment can detect the area where the entity object is located in the image to be recognized by utilizing the pre-trained object detection model, cuts out the area where the entity object is located from the image to be recognized, separates the area where the entity object is located from the image to be recognized, and removes the content of other irrelevant areas. In addition, the cut-out region is scaled up or down, and the scale of the image input to the image recognition model may be adjusted to match the scale of the image used for model training. The pixel values in the above regions are normalized to [0,1], so that the influence of the absolute size of the pixel values on the classification performance can be eliminated, for example, the influence of the size of the pixel values of the image on the identification performance can be reduced after normalization because some images are clearer and some images are less than the fuzzy, so that the model can learn the characteristic information of the texture structure and the like of the image per se, and the accuracy of the image identification is improved.

Step S330: inputting the preprocessed object to be recognized into a pre-trained image recognition model to obtain the probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with non-preset categories.

Step S340: and when the probability corresponding to each preset category is smaller than a set threshold value, outputting a result for representing that the image to be recognized does not belong to any preset category.

Step S350: and when the probability corresponding to the target class in all the preset classes is greater than or equal to the set threshold, outputting a result for representing that the image to be recognized belongs to the target class.

According to the image recognition method provided by the embodiment of the application, the image to be recognized is recognized through the image recognition model obtained by training the positive sample image marked with the preset category and the negative sample image marked with the non-preset category in advance, when the image to be recognized of the category is recognized for the category which does not exist in the image recognition model, the image to be recognized is effectively and wrongly recognized as the existing category, and therefore the accuracy of image recognition is improved. And before the image to be recognized is input into the image recognition model, the image to be recognized is preprocessed, so that the recognition accuracy can be further improved.

Referring to fig. 6, a block diagram of an image recognition apparatus 400 according to an embodiment of the present disclosure is shown. The image recognition apparatus 400 applies the above-mentioned electronic device, and the image recognition apparatus 400 includes: an image acquisition module 410, an image input module 420, and a result output module 430. The image obtaining module 410 is configured to obtain an image to be identified; the image input module 420 is configured to input the image to be recognized into a pre-trained image recognition model, and obtain a probability corresponding to each preset category output by the image recognition model, where the image recognition model is obtained by training according to multiple positive sample images labeled with preset categories and multiple negative sample images labeled with non-preset categories; the result output module 430 is configured to output a result for representing that the image to be recognized does not belong to any preset category when the probability corresponding to each preset category is smaller than a set threshold.

In some embodiments, the image recognition apparatus 400 may further include: the device comprises an image set acquisition module, a probability acquisition module, a loss acquisition module and an iterative training module. The image set obtaining module is used for obtaining a sample image set before the image to be recognized is input into a pre-trained image recognition model and the probability corresponding to each preset category output by the image recognition model is obtained, wherein the sample image set comprises a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with the non-preset categories; the probability obtaining module is used for respectively inputting each positive sample image and each negative sample image into the initial recognition model to obtain the probability corresponding to each preset category output by the initial recognition model; the loss acquisition module is used for determining a total loss value according to the probability output by the initial identification model, the labeled preset category of each positive sample image and the labeled non-preset category of each negative sample image; and the iterative training module is used for performing iterative training on the initial recognition model according to the total loss value to obtain the image recognition model.

In this embodiment, the loss acquisition module may include: a first loss acquisition unit, a second loss acquisition unit, and a total loss acquisition unit. The first loss obtaining unit is configured to obtain a loss value of an output result corresponding to the positive sample image according to a difference between a probability output by the initial identification model corresponding to the positive sample image and a real probability corresponding to a preset category labeled to the positive sample image, where, in the real probabilities corresponding to the preset category labeled to the positive sample image, the probability corresponding to the preset category labeled to the positive sample image is greater than or equal to the set threshold, and probabilities corresponding to other preset categories are smaller than the set threshold, and the other preset categories are preset categories except the preset category labeled to the preset category in all the preset categories; the second loss obtaining unit is used for obtaining a loss value of an output result corresponding to the negative sample image according to a difference between the probability output by the initial identification model corresponding to the negative sample image and the real probability corresponding to the labeled non-preset type of the negative sample image, wherein the probability corresponding to each preset type in the real probability corresponding to the labeled non-preset type of the negative sample image is smaller than the set threshold value; the total loss obtaining unit is used for obtaining a total loss value of the output result corresponding to the sample image set according to the loss value of the output result corresponding to each positive sample image and the loss value of the output result corresponding to each negative sample image.

Further, the total loss obtaining unit may be specifically configured to: acquiring an average loss value of output results corresponding to the multiple positive sample images according to the loss value of the output result corresponding to each positive sample image, wherein the average loss value is used as a first loss value; acquiring an average loss value of the output results corresponding to the negative sample images as a second loss value according to the loss value of the output result corresponding to each negative sample image; and obtaining a total loss value of the output result corresponding to the sample image set according to the first loss value and the second loss value.

In this embodiment, the image set acquisition module includes: the image processing device comprises a first image acquisition unit, a second image acquisition unit and a third image acquisition unit. The first image acquisition unit is used for acquiring a plurality of first images corresponding to a plurality of preset categories and a plurality of second images corresponding to a non-preset category, wherein the first image corresponding to each preset category comprises entity objects of the preset category, each preset category corresponds to at least one first image, and the second image corresponding to the non-preset category comprises entity objects of the non-preset category; the second image acquisition unit is used for respectively preprocessing each first image in the plurality of first images to obtain a plurality of positive sample images marked with preset categories; the third image acquisition unit is used for respectively preprocessing each second image in the second images to obtain a plurality of negative sample images marked with preset categories.

Further, the second image acquisition unit may be specifically configured to: acquiring a target area where an entity object in each first image is located according to a pre-trained object detection model; carrying out proportion adjustment on the target area corresponding to each obtained first image to obtain an area image corresponding to each first image; and normalizing the pixel values of all pixel points in each region image.

In some embodiments, the image input module 420 may include: a preprocessing unit and an input unit. The preprocessing unit is used for preprocessing the image to be recognized; the input unit is used for inputting the preprocessed object to be recognized to the pre-trained image recognition model.

In this embodiment, the pre-processing unit may be specifically configured to: acquiring the area of the entity object in the image to be recognized according to a pre-trained object detection model; carrying out proportion adjustment on the region where the entity object in the image to be identified is located to obtain a region image corresponding to the image to be identified; and normalizing the pixel values of all pixel points in the area image corresponding to the image to be identified.

In some embodiments, the result output module may be further configured to, after the image to be recognized is input to a pre-trained image recognition model, and a probability corresponding to each preset category output by the image recognition model is obtained, output a result used for representing that the image to be recognized belongs to the target category when the probability corresponding to the target category in all the preset categories is greater than or equal to the set threshold.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

In summary, according to the scheme provided by the application, when an image to be recognized is acquired, and the image to be recognized is input into a pre-trained image recognition model to obtain a probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with the non-preset categories, and when the probability corresponding to each preset category is smaller than a set threshold value, a result for representing that the image to be recognized does not belong to any preset category is output, so that the image to be recognized is recognized by the image recognition model obtained by training the positive sample images marked with the preset categories and the negative sample images marked with the non-preset categories in advance, and the category which does not exist in the image recognition model can be recognized, effectively and falsely identifying the image to be identified as the existing category, thereby improving the accuracy of image identification.

Referring to fig. 7, a block diagram of an electronic device according to an embodiment of the present application is shown. The electronic device 100 may be an electronic device capable of running an application, such as a smart phone, a tablet computer, a smart watch, smart glasses, and a notebook computer. The electronic device 100 in the present application may include one or more of the following components: a processor 110, a memory 120, and one or more applications, wherein the one or more applications may be stored in the memory 120 and configured to be executed by the one or more processors 110, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 110 may include one or more processing cores. The processor 110 connects various parts within the overall electronic device 100 using various interfaces and lines, and performs various functions of the electronic device 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120 and calling data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a communication chip.

The Memory 120 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 120 may be used to store instructions, programs, code sets, or instruction sets. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created by the electronic device 100 during use (e.g., phone book, audio-video data, chat log data), and the like.

Referring to fig. 8, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 800 has stored therein a program code that can be called by a processor to execute the method described in the above-described method embodiments.

The computer-readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 800 includes a non-volatile computer-readable storage medium. The computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 810 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image recognition method, characterized in that the method comprises:

acquiring an image to be identified;

inputting the image to be recognized into a pre-trained image recognition model to obtain the probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with non-preset categories;

and when the probability corresponding to each preset category is smaller than a set threshold value, outputting a result for representing that the image to be recognized does not belong to any preset category.

2. The method according to claim 1, wherein before the inputting the image to be recognized to a pre-trained image recognition model and obtaining the probability corresponding to each preset category output by the image recognition model, the method further comprises:

obtaining a sample image set, wherein the sample image set comprises a plurality of positive sample images marked with a preset category and a plurality of negative sample images marked with a non-preset category;

respectively inputting each positive sample image and each negative sample image into an initial recognition model to obtain the probability corresponding to each preset category output by the initial recognition model;

determining a total loss value according to the probability output by the initial identification model, the labeled preset category of each positive sample image and the labeled non-preset category of each negative sample image;

and performing iterative training on the initial recognition model according to the total loss value to obtain the image recognition model.

3. The method of claim 2, wherein determining the total loss value according to the probability output by the initial recognition model, the labeled preset category of each positive sample image and the labeled non-preset category of each negative sample image comprises:

obtaining a loss value of an output result corresponding to the positive sample image according to a difference between a probability output by the initial identification model corresponding to the positive sample image and a real probability corresponding to a labeled preset category of the positive sample image, wherein in the real probability corresponding to the labeled preset category of the positive sample image, the probability corresponding to the labeled preset category is greater than or equal to the set threshold, and probabilities corresponding to other preset categories are smaller than the set threshold, and the other preset categories are preset categories except the labeled preset category in all the preset categories;

obtaining a loss value of an output result corresponding to the negative sample image according to a difference between a probability output by the initial identification model corresponding to the negative sample image and a real probability corresponding to a non-preset type labeled to the negative sample image, wherein in the real probability corresponding to the non-preset type labeled to the negative sample image, the probability corresponding to each preset type is smaller than the set threshold value;

and obtaining the total loss value of the output result corresponding to the sample image set according to the loss value of the output result corresponding to each positive sample image and the loss value of the output result corresponding to each negative sample image.

4. The method of claim 3, wherein obtaining the total loss value of the output results corresponding to the sample image set according to the loss value of the output result corresponding to each positive sample image and the loss value of the output result corresponding to each negative sample image comprises:

acquiring an average loss value of output results corresponding to the multiple positive sample images according to the loss value of the output result corresponding to each positive sample image, wherein the average loss value is used as a first loss value;

acquiring an average loss value of the output results corresponding to the negative sample images as a second loss value according to the loss value of the output result corresponding to each negative sample image;

and obtaining a total loss value of the output result corresponding to the sample image set according to the first loss value and the second loss value.

5. The method of claim 2, wherein the acquiring a sample image set comprises:

acquiring a plurality of first images corresponding to a plurality of preset categories and a plurality of second images corresponding to a non-preset category, wherein the first image corresponding to each preset category comprises an entity object of the preset category, each preset category corresponds to at least one first image, and the second image corresponding to the non-preset category comprises an entity object of the non-preset category;

preprocessing each first image in the plurality of first images respectively to obtain a plurality of positive sample images marked with preset categories;

and respectively carrying out the preprocessing on each second image in the plurality of second images to obtain a plurality of negative sample images marked with preset categories.

6. The method according to claim 5, wherein the pre-processing each of the plurality of first images to obtain a plurality of positive sample images labeled with a predetermined category comprises:

and normalizing the pixel values of all pixel points in each region image.

7. The method according to any one of claims 1-6, wherein the inputting the image to be recognized into a pre-trained image recognition model comprises:

preprocessing the image to be recognized;

and inputting the preprocessed object to be recognized into a pre-trained image recognition model.

8. The method according to claim 7, wherein the preprocessing the image to be recognized comprises:

acquiring the area of the entity object in the image to be recognized according to a pre-trained object detection model;

carrying out proportion adjustment on the region where the entity object in the image to be identified is located to obtain a region image corresponding to the image to be identified;

and normalizing the pixel values of all pixel points in the area image corresponding to the image to be identified.

9. The method according to any one of claims 1 to 6, wherein after the inputting the image to be recognized to a pre-trained image recognition model and obtaining the probability corresponding to each preset category output by the image recognition model, the method further comprises:

and when the probability corresponding to the target class in all the preset classes is greater than or equal to the set threshold, outputting a result for representing that the image to be recognized belongs to the target class.

10. An image recognition apparatus, characterized in that the apparatus comprises: an image acquisition module, an image input module, and a result output module, wherein,

the image acquisition module is used for acquiring an image to be identified;

the image input module is used for inputting the image to be recognized into a pre-trained image recognition model to obtain the probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with the preset categories and a plurality of negative sample images marked with non-preset categories;

and the result output module is used for outputting a result for representing that the image to be identified does not belong to any preset category when the probability corresponding to each preset category is smaller than a set threshold.

11. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-9.

12. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 9.