WO2022033150A1 - 图像识别方法、装置、电子设备及存储介质 - Google Patents

图像识别方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022033150A1
WO2022033150A1 PCT/CN2021/099185 CN2021099185W WO2022033150A1 WO 2022033150 A1 WO2022033150 A1 WO 2022033150A1 CN 2021099185 W CN2021099185 W CN 2021099185W WO 2022033150 A1 WO2022033150 A1 WO 2022033150A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
preset
recognition model
loss value
category
Prior art date
Application number
PCT/CN2021/099185
Other languages
English (en)
French (fr)
Inventor
颜波
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022033150A1 publication Critical patent/WO2022033150A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the technical field of image processing, and more particularly, to an image recognition method, apparatus, electronic device and storage medium.
  • the present application proposes an image recognition method, apparatus, electronic device and storage medium.
  • an embodiment of the present application provides an image recognition method, the method includes: acquiring an image to be recognized; inputting the to-be-recognized image into a pre-trained image recognition model, and obtaining each image output from the image recognition model.
  • the probabilities corresponding to the preset categories wherein the image recognition model is obtained by training from multiple positive sample images marked with preset categories and multiple negative sample images marked with non-preset categories; When the probabilities corresponding to the preset categories are all smaller than the set threshold, a result indicating that the image to be recognized does not belong to any preset category is output.
  • an embodiment of the present application provides an image recognition device, the device includes: an image acquisition module, an image input module, and a result output module, wherein the image acquisition module is used to acquire an image to be recognized; the image The input module is used to input the to-be-recognized image into a pre-trained image recognition model, and obtain a probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is marked with pre- It is assumed that the positive sample images of the category and a plurality of negative sample images marked with non-preset categories are obtained by training; the result output module is used for outputting when the probability corresponding to each preset category is less than the set threshold. It is used to characterize the result that the to-be-recognized image does not belong to any predetermined category.
  • embodiments of the present application provide an electronic device, comprising: one or more processors; a memory; and one or more application programs, wherein the one or more application programs are stored in the memory and Configured to be executed by the one or more processors, the one or more programs are configured to execute the image recognition method provided by the first aspect above.
  • an embodiment of the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code can be invoked by a processor to execute the image provided in the first aspect above recognition methods.
  • the probability corresponding to each preset category output by the image recognition model is obtained, wherein the image recognition model is based on multiple A positive sample image marked with a preset category and a number of negative sample images marked with a non-preset category are obtained by training.
  • the output is used to represent the pending category.
  • Recognition which can be used to identify a category that does not exist in the image recognition model, when the category of the image to be recognized is identified, using the probability of each preset category output by the image recognition model to determine it does not belong to any preset category category, thereby effectively avoiding the erroneous identification of the image to be recognized as an existing category, thereby improving the accuracy of image recognition.
  • FIG. 1 shows a flowchart of an image recognition method according to an embodiment of the present application.
  • FIG. 2 shows a flowchart of an image recognition method according to another embodiment of the present application.
  • FIG. 3 shows a flowchart of step S210 in the image recognition method provided by another embodiment of the present application.
  • FIG. 4 shows a flowchart of an image recognition method according to still another embodiment of the present application.
  • FIG. 5 shows a flowchart of step S320 in the image recognition method provided by another embodiment of the present application.
  • FIG. 6 shows a block diagram of an image recognition apparatus according to an embodiment of the present application.
  • FIG. 7 is a block diagram of an electronic device for executing an image recognition method according to an embodiment of the present application according to an embodiment of the present application.
  • FIG. 8 is a storage unit for storing or carrying a program code for implementing the image recognition method according to the embodiment of the present application according to an embodiment of the present application.
  • image recognition can be used in security systems to identify and predict events; for example, image recognition can be used in the screening of bad images to reduce the cost of manual screening; for example, image recognition can also be used in In the search of criminals, it can be used to filter out images of criminals from a large number of images.
  • the inventor proposes the image recognition method, device, electronic device, and storage medium provided by the embodiments of the present application.
  • the image recognition model obtained by image training can recognize the image to be recognized, and can effectively and incorrectly recognize the to-be-recognized image as an existing category when recognizing the image to be recognized for a category that does not exist in the image recognition model. This improves the accuracy of image recognition.
  • the specific image recognition method will be described in detail in the following embodiments.
  • FIG. 1 shows a schematic flowchart of an image recognition method provided by an embodiment of the present application.
  • the image recognition method is applied to the image recognition apparatus 400 shown in FIG. 6 and the electronic device 100 ( FIG. 7 ) equipped with the image recognition apparatus 400 .
  • the following will take an electronic device as an example to describe the specific process of this embodiment.
  • the electronic device applied in this embodiment may be a smart phone, a tablet computer, a smart watch, a smart glasses, a notebook computer, etc.
  • the flow shown in FIG. 1 will be described in detail below, and the image recognition method may specifically include the following steps:
  • Step S110 Acquire an image to be recognized.
  • the electronic device may use the image to be identified by the category of the image content as the image to be identified.
  • the category recognition of the image content is performed, that is, the category recognition of the entity objects in the image is performed, for example, the category recognition of animals, plants, etc. in the image is performed.
  • the front camera or the rear camera can be used for image acquisition, so as to obtain an image for the category identification of the image content to be performed.
  • the electronic device can collect images through the rear camera, and use the obtained image as an input image; as another implementation, the electronic device can locally obtain the image to be detected for the target object, that is, the electronic device
  • the image to be identified by the category of the image content can be obtained from the locally stored file.
  • the image to be identified by the category of the image content can be obtained from the album, that is, after the electronic device has collected the image through the camera in advance.
  • the electronic device can also download the image to be identified by the category of the image content from the network.
  • the electronic device can download the required image from the corresponding server through a wireless network, data network, etc.
  • the electronic device can also receive the input image to be recognized by the category of the image content through the user's input operation, so as to obtain the to-be-recognized image;
  • the image to be recognized may also be acquired from a database server or from a client of a user.
  • the specific manner in which the electronic device acquires the image to be recognized may not be limited.
  • Step S120 Input the to-be-recognized image into a pre-trained image recognition model to obtain a probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is marked with presets according to multiple images. Class positive images and multiple negative images marked with non-preset classes are trained.
  • the electronic device may input the image to be recognized into a pre-trained image recognition model to obtain a probability corresponding to each preset category output by the image recognition model.
  • the image recognition model can be pre-stored locally in the electronic device, and the electronic device can directly call the image recognition model locally, and input the image to be recognized into the image recognition model; the image recognition model can also be stored in the server, and the electronic device can When it is necessary to recognize the content of the image to be recognized, the image recognition model in the server can be called, for example, the image to be recognized is sent to the server to instruct the server to input the image to be recognized into the image recognition model, and the image content can be identified. Category identification.
  • the image recognition model can be trained from a large number of training samples.
  • the training samples may include multiple positive sample images marked with preset categories and multiple negative sample images marked with non-preset categories.
  • the positive sample image can be input into the initial recognition model, the probability of each preset category in the multiple preset categories output by the initial recognition model can be obtained, and the real probability of the preset category corresponding to the positive sample image can be calculated and the initial recognition
  • the difference between the probabilities output by the models is obtained to obtain the loss of the output results output by the image recognition model according to the positive sample images;
  • the negative sample images are input into the initial recognition model to obtain the output probability of each preset category, and then calculate the negative
  • the difference between the real probability of the preset category corresponding to the sample image and the probability output by the initial recognition model is to obtain the loss of the output result output by the image recognition model according to the negative sample image; and then according to the loss of the output result corresponding to each positive sample image, and the loss of the output results corresponding to each negative sample image, calculate
  • the initial recognition model may be a convolutional neural network model or the like, which is not limited here. Because the negative sample images of non-preset categories are used to train the initial recognition model, when the obtained image recognition model recognizes the input image by category, if the input image does not contain any preset category of images content, the probability of each preset category output by the image recognition model will be close to or equal to the real probability corresponding to an image of a non-preset category, so it will not be judged as one of the preset categories.
  • Step S130 When the probability corresponding to each preset category is less than a set threshold, output a result indicating that the image to be recognized does not belong to any preset category.
  • the electronic device may determine the output result according to the probability corresponding to the preset category. Specifically, the electronic device may compare the probability corresponding to each preset category with a preset threshold, and the preset threshold is used as a judgment basis for determining whether the image to be recognized belongs to one of the preset categories.
  • the preset threshold can usually be set to a higher value, such as 70%, 80%, 90%, etc. The specific value can not be used as a limitation.
  • the probability corresponding to a preset category is particularly high, it means The image has a high probability of belonging to the preset category, so it can be determined to belong to the preset category.
  • the probability corresponding to each preset category is smaller than the set threshold, it means that the probability corresponding to each preset category is small, and it cannot be It is determined to be any preset category, so a result indicating that the to-be-recognized image does not belong to any preset category can be output; when the probability corresponding to the target category in all preset categories is greater than the set threshold, it means that the to-be-identified image does not belong to any preset category.
  • the probability of the image belonging to the target category is relatively high, and all are greater than the corresponding probabilities of other preset categories, so that a result for characterizing that the image to be recognized belongs to the target category can be output.
  • an image recognition model obtained by pre-training on the basis of positive sample images marked with a preset category and negative sample images marked with a non-preset category can recognize the image to be recognized, and can recognize the image to be recognized.
  • the to-be-recognized image of the category is effectively and incorrectly recognized as an existing category, thereby improving the accuracy of image recognition.
  • the category to which the to-be-recognized model belongs is determined by the probability of each preset category output by the image recognition model, when the image to be recognized input to the image recognition model does not belong to any preset category, the image recognition model The output probability of each preset category will not be greater than the set threshold, so it is judged that it does not belong to any preset category, which can effectively avoid setting a non-preset category for images of non-preset categories, while the negative samples are of a different category.
  • the number is not large enough, because the image recognition model cannot recognize enough non-preset categories, when the image recognition model cannot recognize the non-preset categories, there may be a preset category in the output result.
  • the probability of the image is relatively high, and it is wrongly classified into the preset category with the high probability, which makes the image recognition more accurate.
  • a separate category is not set for the non-preset categories, so that the image recognition model only needs to output the probability corresponding to each preset category, and then according to each preset category.
  • the probability corresponding to the category can determine the category to which the image to be recognized belongs. In this way, when there are many preset categories identified by the image recognition model, the calculation amount of the image recognition model can be reduced, thereby reducing the burden on electronic equipment.
  • FIG. 2 shows a schematic flowchart of an image recognition method provided by another embodiment of the present application.
  • the image recognition method is applied to the above-mentioned electronic equipment, and the flow shown in FIG. 2 will be described in detail below.
  • the image recognition method may specifically include the following steps:
  • Step S210 Obtain a sample image set, where the sample image set includes a plurality of positive sample images marked with a preset category and a plurality of negative sample images marked with a non-preset category.
  • the embodiment of the present application further includes a training method for the image recognition model. It is worth noting that the training of the image recognition model may be based on the obtained The sample image set is performed in advance, and each time the image content to be recognized needs to be recognized by category, the image recognition model can be used to perform it, instead of the image recognition model every time the image to be recognized is recognized by the category of the image content. Training.
  • a sample image set when training the image recognition model, a sample image set may be obtained, wherein the sample image set includes a plurality of positive sample images marked with a preset category, and a plurality of non-predetermined sample images marked with Let the negative sample image of the class.
  • the electronic device obtains a sample image set, which may include:
  • Step S211 Acquire multiple first images corresponding to multiple preset categories and multiple second images corresponding to non-preset categories, wherein the first images corresponding to each preset category include entities of the preset category Objects, each preset category corresponds to at least one first image, and the non-preset category corresponds to the second image containing the entity objects of the non-preset category;
  • Step S212 preprocessing each of the first images respectively to obtain a plurality of positive sample images marked with preset categories;
  • Step S213 Perform the preprocessing on each of the plurality of second images respectively to obtain a plurality of negative sample images marked with preset categories.
  • the plurality of preset categories may include categories of image content recognized by the image recognition model that are required to be recognized, and the number of preset categories may be set according to actual requirements. For example, when it is required to identify five categories of cats, dogs, pigs, sheep and cattle, the multiple preset categories include cats, dogs, pigs, sheep and cattle.
  • a plurality of images may be acquired and marked as the corresponding preset category.
  • images corresponding to each preset category the better the recognition ability of the image recognition model obtained by subsequent training for images of the preset category.
  • images of the entity objects of the preset category in multiple different scenes can be obtained, so that the multiple images corresponding to each preset category can be extensive.
  • images containing any non-preset categories of entity objects can be acquired, and images corresponding to non-preset categories of entity objects can be widely acquired.
  • the preset categories include When the animal categories of cats and dogs are used, images corresponding to entity objects of other categories other than cats and dogs can be widely obtained. For example, images corresponding to entity objects of categories such as flowers, grass, trees, pigs, and sheep can be obtained. And the acquired image corresponding to the non-target category does not contain the entity object of the preset category.
  • the above images can be derived from training sets containing a large number of images, such as COCO Dataest dataset, IMAGECLEF dataset, etc., which are not limited here.
  • the preprocessing of the plurality of first images may include:
  • the pre-trained object detection model obtain the target area where the entity object in each first image is located
  • a pre-trained object detection model can be used to detect the target area where the entity objects of the preset category are located, and the target area can be cropped from the first image to separate the entity from the first image.
  • the target area where the object is located, and the contents of other irrelevant areas are cleared, so that in subsequent training, the first image used for training will not contain too many irrelevant features of the entity object, thereby reducing the amount of calculation. It can also improve the effect of model training.
  • the scale of the cropped target area is adjusted, that is, the scale is enlarged or reduced, so that the size of the image used for model training can be consistent. , rather than other sizes, is a choice after weighing the model size, running speed and performance. The larger the scale, the better the classification performance, but the model size and running speed will increase accordingly.
  • the specific scale can be based on actual needs to choose.
  • normalizing the pixel value of the target area can eliminate the influence of the absolute size of the pixel value on the classification performance. For example, if some images are clearer, the pixel value will be very large. Some images are relatively blurred and the pixel value will be very small. After normalization, the influence of the size of the pixel value of the image itself on the recognition performance will be reduced, so that the model can learn more feature information such as the texture structure of the image itself. , and can also speed up model training and convergence.
  • the pre-trained object detection model can be MobileNet-SSD, etc., which can effectively save the storage space of the electronic device and improve the operation efficiency of the electronic device.
  • the image set formed by these positive sample images and negative sample images That is, the sample image set to be acquired.
  • Step S220 Input each positive sample image and each negative sample image into the initial recognition model respectively, and obtain the probability corresponding to each preset category output by the initial recognition model.
  • the positive sample images and each negative sample image can be distributed and input to the initial recognition model to obtain the probability corresponding to each preset category output by the initial recognition model.
  • the initial recognition model may include a feature extraction module as well as a classification module. Wherein, after the feature extraction module extracts the image features of the image, it is input to the classification module, and then the classification module outputs the probability corresponding to each preset category according to the input image features.
  • the feature extraction module can be a pre-trained neural network.
  • the neural network can be a visual image generator (VGG, Visual Graphics Generator) model, a deep residual network (ResNet, Deep Residual Network) model, MobileNetV2 and other models for extracting image features.
  • the neural network can be a pre-trained convolutional neural network in imageNet (such as VGG19). Since the pre-trained convolutional neural network in imageNet already has strong feature extraction capabilities, retraining is not required. The above features can be extracted by a pre-trained convolutional neural network.
  • the neural network can also be MobileNetV2, which can effectively save the storage space of the electronic device and improve the operation efficiency of the electronic device.
  • the classification module may be a Softmax logistic regression model (Softmax logical regression), or a support vector machine (Support Vector Machine, SVM), etc.
  • Softmax logical regression Softmax logical regression
  • SVM Support Vector Machine
  • the specific classification module may not be limited.
  • Step S230 Determine the total loss value according to the probability output by the initial recognition model, the preset category marked with each positive sample image and the non-default category marked with each negative sample image.
  • the loss of the output result corresponding to each positive sample image may be calculated for the output result of the initial recognition model corresponding to each positive sample image and the preset category marked with each positive sample image.
  • the output result of the initial recognition model corresponding to each negative sample image, and the preset category marked with each negative sample image calculate the loss of the output result corresponding to each negative sample image, and then calculate the loss of each positive sample image and The loss of the output corresponding to each negative sample image determines the total loss value.
  • the process of determining the total loss value may include:
  • the loss value of the output result corresponding to the positive sample image is obtained according to the difference between the probability output by the initial recognition model corresponding to the positive sample image and the true probability corresponding to the preset category to which the positive sample image is marked, wherein, in Among the real probabilities corresponding to the labeled preset categories of the positive sample images, the probability corresponding to the labeled preset category is greater than or equal to the set threshold, and the probabilities corresponding to other preset categories are less than the set threshold, so The other preset categories are the preset categories other than the marked preset categories among all preset categories; according to the probability of the output of the initial recognition model corresponding to the negative sample image, it is the same as the marked non-preset category of the negative sample image.
  • the difference between the corresponding real probabilities, the loss value of the output result corresponding to the negative sample image is obtained, wherein, among the real probabilities corresponding to the non-preset categories marked by the negative sample images, the probability corresponding to each preset category are smaller than the set threshold; according to the loss value of the output result corresponding to each positive sample image and the loss value of the output result corresponding to each negative sample image, the total loss value of the output result corresponding to the sample image set is obtained.
  • the true probability corresponding to the marked preset category should be that only the probability corresponding to the marked preset category is greater than or equal to the set threshold, while the probability corresponding to other preset categories is less than the set threshold Threshold, that is to say, only the marked preset category has a high probability, while other preset categories have a small probability; for negative sample images, since they do not belong to any preset category, their The true probability corresponding to the marked preset category should be that the probability corresponding to each preset category is less than the set threshold.
  • the loss of the output corresponding to each positive sample image and each negative sample image can be obtained according to the output result corresponding to the initial recognition model and the true probability of the label of each sample image, and then According to the loss of the output results corresponding to each positive sample image and each negative sample image, the total loss value of the output results corresponding to all the images of the entire sample image set can be calculated.
  • the total loss value of the output results corresponding to all sample images can be determined according to The loss value of the output result corresponding to each positive sample image, the average loss value of the output result corresponding to the multiple positive sample images is obtained as the first loss value, and according to the loss value of the output result corresponding to each negative sample image, The average loss value of the output results corresponding to the plurality of negative sample images is obtained as the second loss value, and then the total loss value of the output results corresponding to the sample image set is obtained according to the first loss value and the second loss value.
  • different weights can be set for the average loss value corresponding to the positive sample image and the average loss value corresponding to the negative sample image, that is to say, different weights can be set for the first loss value and the second loss value respectively. weight, then obtain the product of the first loss value and its corresponding weight as the first product, and the product of the second loss value and its corresponding weight as the second product, and then determine the sum of the first product and the second product to obtain a sample image The total loss value of the output corresponding to the set.
  • the above is the total loss value obtained when all images in the sample image set are used as images required for a training batch, and the training is performed.
  • the data of positive sample images and negative sample images are equal, for example, both are set to N, where N is a positive integer.
  • the loss calculation of the output result corresponding to the positive sample image can be calculated according to the following formula:
  • pi is the probability that the positive sample image belongs to category i
  • xi is the feature vector after feature extraction, such as the output vector of the aforementioned MobileNetV2
  • W is the weight vector
  • b is the bias
  • y is the labeled label (ie annotated preset category)
  • N is the number of positive sample images.
  • finding the maximum value of p 1 *p 2 *...*p k is equivalent to finding the maximum value of the logarithm of p 1 , p 2 , ..., p k , that is, finding log(p 1 *p 2 *...*p k ), this is because the log is monotonically increasing, and because the loss functions are all represented by the minimum, finding the maximum value of their logarithms is equivalent to finding their logarithms
  • the loss value of the output result corresponding to the negative sample image can be calculated according to the following formula:
  • pi is the probability that the negative sample image belongs to category i
  • xi is the feature vector after feature extraction, such as the output vector of the aforementioned MobileNetV2
  • W is the weight vector
  • b is the bias
  • y is the labeled label ( That is, the labeled preset category)
  • N is the number of negative sample images.
  • the positive sample image and the negative sample image are equal, assuming that both are N, then the weighted Get the total loss value as follows:
  • represents the weight of the average loss value corresponding to the negative sample image, and its value range can be [0.1, 0.5].
  • Step S240 Perform iterative training on the initial recognition model according to the total loss value to obtain the image recognition model.
  • the initial recognition model can be iteratively trained according to the total loss value to obtain the final image recognition model.
  • the Adam optimizer can be used to iteratively train the initial recognition model according to the total loss function until the loss value of the output result of the initial recognition model converges, and the model at this time is saved to obtain the trained image Identify the model.
  • the Adam optimizer combines the advantages of AdaGra (Adaptive Gradient, adaptive gradient) and RMSProp optimization algorithms, and estimates the first-order moment of the gradient (First Moment Estimation, that is, the mean of the gradient) and the second-order moment estimation (Second-order moment estimation). Moment Estimation, that is, the uncentered variance of the gradient) is comprehensively considered to calculate the update step size.
  • the termination condition of the iterative training may include: the number of times of the iterative training reaches a target number; or the total loss value of the output result of the initial recognition model satisfies a set condition.
  • batch_size can be understood as a batch parameter, its limit is the total number of samples in the training set, and epoch refers to the number of times of training using all samples in the training set. It is equivalent to training 1 time using all the samples in the training set.
  • the total loss value satisfying the set condition may include: the total loss value is less than the set threshold value.
  • the specific setting conditions may not be limited.
  • the image recognition model obtained by training can be stored locally on the electronic device, and the image recognition model obtained by training can also be stored on a server in communication with the electronic device.
  • the way of storing the image recognition model on the server can reduce the occupation.
  • the storage space of electronic equipment improves the operation efficiency of electronic equipment.
  • the image recognition model may also acquire new training data periodically or irregularly to train and update the image recognition model. For example, when an image is misrecognized, the image can be used as a sample image, and the sample image can be labeled, and then trained through the above training methods, so as to improve the recognition degree and recognition accuracy of the image recognition model.
  • the image recognition model since the image recognition model is used to recognize images of certain categories, when the category identified by the image recognition model changes as required by the user, a new preset category may be added, or a preset category may be deleted ; and retrain the image recognition model according to the changed preset category.
  • the training method for an image recognition model provided by the embodiment of the present application can realize that when an existing image recognition model is improved to reduce its misrecognition rate, the training method can be directly used for training without adding a separate category (that is, adding a category corresponding to a non-preset category), so that the existing image recognition model can be improved more simply and conveniently.
  • Step S250 Acquire the image to be recognized.
  • Step S260 Input the to-be-recognized image into a pre-trained image recognition model to obtain a probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is marked with presets according to multiple images. Class positive images and multiple negative images marked with non-preset classes are trained.
  • Step S270 When the probability corresponding to each preset category is smaller than a set threshold, output a result indicating that the image to be recognized does not belong to any preset category.
  • the image recognition method provided by the embodiment of the present application provides a training process for an image recognition model.
  • acquiring a plurality of positive sample images marked with a preset category and a plurality of negative sample images marked with a non-preset category Sample image set input each positive sample image and each negative sample image to the initial recognition model respectively, obtain the probability corresponding to each preset category output by the initial recognition model, and then according to the results output by the initial recognition model, each sample The preset category that the image is marked with and the non-preset category marked with each negative sample image, determine the total loss value, and then iteratively train the initial recognition model according to the total loss value to obtain the image recognition model.
  • the negative sample images of non-preset categories are used to train the initial recognition model, when the obtained image recognition model recognizes the input image by category, if the input image does not contain any preset category of images content, the probability of each preset category output by the image recognition model will be close to or equal to the real probability corresponding to an image of a non-preset category, so it will not be judged as one of the preset categories.
  • FIG. 4 shows a schematic flowchart of an image recognition method provided by another embodiment of the present application.
  • the image recognition method is applied to the above-mentioned electronic device, and the flow shown in FIG. 4 will be described in detail below.
  • the image recognition method may specifically include the following steps:
  • Step S310 Acquire the image to be recognized.
  • step S310 for step S310, reference may be made to the content of the foregoing embodiments, and details are not described herein again.
  • Step S320 Preprocess the to-be-identified image.
  • the image to be recognized in order to make the image to be recognized meet the image input standard of the image recognition model, improve the recognition accuracy, and improve the processing efficiency, the image to be recognized may also be preprocessed.
  • the preprocessing of the to-be-recognized image includes:
  • Step S321 According to the pre-trained object detection model, obtain the region where the entity object in the to-be-recognized image is located;
  • Step S322 adjusting the scale of the region where the entity object is located in the to-be-recognized image to obtain a region image corresponding to the to-be-recognized image;
  • Step S323 Normalize the pixel values of all pixel points in the region image corresponding to the to-be-identified image.
  • the electronic device can first use a pre-trained object detection model to detect the area where the entity object is located in the image to be recognized, and cut the area where the entity object is located from the image to be recognized, so as to separate the entity object from the image to be recognized.
  • the area where it is located, and the contents of other irrelevant areas are cleared.
  • image recognition it can reduce the amount of calculation and improve the recognition accuracy.
  • the scale of the cropped area is adjusted, that is, the scale is enlarged or reduced, and the scale of the image that can be input to the image recognition model is consistent with the scale of the image used for model training. Normalizing the pixel values of the above areas, that is, normalizing to [0,1], can eliminate the influence of the absolute size of the pixel value on the classification performance.
  • the pixel value will be large, and some If the image is blurred, the pixel value will be very small. After normalization, the influence of the size of the pixel value of the image itself on the recognition performance will be reduced, so that the model can better learn the feature information such as the higher-level texture structure of the image itself. Improve the accuracy of image recognition.
  • Step S330 Input the preprocessed object to be recognized into a pre-trained image recognition model to obtain a probability corresponding to each preset category output by the image recognition model, wherein the image recognition model is marked with The positive sample images of the preset category and the negative sample images marked with the non-preset categories are obtained by training.
  • Step S340 When the probability corresponding to each preset category is smaller than a set threshold, output a result indicating that the image to be recognized does not belong to any preset category.
  • Step S350 When the probability corresponding to the target category in all preset categories is greater than or equal to the set threshold, output a result indicating that the image to be recognized belongs to the target category.
  • an image recognition model obtained by pre-training on the basis of positive sample images marked with a preset category and negative sample images marked with a non-preset category can recognize the image to be recognized, and can recognize the image to be recognized.
  • the to-be-recognized image is effectively and incorrectly recognized as an existing category, thereby improving the accuracy of image recognition.
  • the to-be-recognized image is preprocessed, so that the recognition accuracy can be further improved.
  • FIG. 6 shows a structural block diagram of an image recognition apparatus 400 provided by an embodiment of the present application.
  • the image recognition apparatus 400 applies the above-mentioned electronic equipment, and the image recognition apparatus 400 includes: an image acquisition module 410 , an image input module 420 and a result output module 430 .
  • the image acquisition module 410 is used to acquire the image to be recognized;
  • the image input module 420 is used to input the to-be-recognized image into a pre-trained image recognition model to obtain each preset output by the image recognition model The probability corresponding to the category, wherein the image recognition model is obtained by training according to a plurality of positive sample images marked with a preset category and a plurality of negative sample images marked with a non-preset category;
  • the result output module 430 is used for When the probability corresponding to each preset category is smaller than the set threshold, a result indicating that the image to be recognized does not belong to any preset category is output.
  • the image recognition apparatus 400 may further include: an image set acquisition module, a probability acquisition module, a loss acquisition module, and an iterative training module.
  • the image set acquisition module is configured to acquire a sample image set before inputting the to-be-recognized image into a pre-trained image recognition model to obtain a probability corresponding to each preset category output by the image recognition model.
  • the sample image set includes multiple positive sample images marked with preset categories and multiple negative sample images marked with non-preset categories; the probability acquisition module is used to separate each positive sample image and each negative sample image.
  • the loss acquisition module is used to obtain the probability output by the initial recognition model, the preset category that each positive sample image is marked with and Each negative sample image is marked with a non-preset category, and a total loss value is determined;
  • the iterative training module is configured to iteratively train the initial recognition model according to the total loss value to obtain the image recognition model.
  • the loss acquisition module may include: a first loss acquisition unit, a second loss acquisition unit, and a total loss acquisition unit.
  • the first loss obtaining unit is configured to obtain the probability corresponding to the positive sample image according to the difference between the probability output by the initial recognition model corresponding to the positive sample image and the real probability corresponding to the preset category marked with the positive sample image.
  • the loss value of the output result where, among the true probabilities corresponding to the preset categories marked with positive sample images, the probability corresponding to the marked preset category is greater than or equal to the set threshold, and the corresponding probability of other preset categories is greater than or equal to the set threshold.
  • the probability is less than the set threshold, and the other preset categories are preset categories other than the marked preset categories in all preset categories; the second loss acquisition unit is used for the initial identification corresponding to the negative sample image
  • the difference between the probability output by the model and the real probability corresponding to the non-preset category marked by the negative sample image obtains the loss value of the output result corresponding to the negative sample image, where the non-preset category marked by the negative sample image is Among the real probabilities corresponding to the categories, the probability corresponding to each preset category is less than the set threshold;
  • the total loss acquisition unit is used for the loss value of the output result corresponding to each positive sample image and the corresponding value of each negative sample image. The loss value of the output result is obtained, and the total loss value of the output result corresponding to the sample image set is obtained.
  • the total loss obtaining unit may be specifically configured to: obtain the average loss value of the output results corresponding to the multiple positive sample images according to the loss value of the output result corresponding to each positive sample image as the first loss value; The loss value of the output results corresponding to the negative sample images is obtained, and the average loss value of the output results corresponding to the plurality of negative sample images is obtained as the second loss value; according to the first loss value and the second loss value, obtain The total loss value of the output result corresponding to the sample image set.
  • the image set acquisition module includes: a first image acquisition unit, a second image acquisition unit, and a third image acquisition unit.
  • the first image acquisition unit is configured to acquire multiple first images corresponding to multiple preset categories and multiple second images corresponding to non-preset categories, wherein the first images corresponding to each preset category include For the entity objects of the preset category, each preset category corresponds to at least one first image, and the non-preset category corresponds to the entity objects of the non-preset category included in the second image;
  • the second image acquisition unit is used for Each first image in the plurality of first images is preprocessed to obtain a plurality of positive sample images marked with a preset category;
  • the third image acquisition unit is used for separately processing each of the plurality of second images. The preprocessing is performed on two images to obtain a plurality of negative sample images marked with preset categories.
  • the second image acquisition unit can be specifically used to: acquire the target area where the entity object in each first image is located according to the pre-trained object detection model; The area is scaled to obtain the area image corresponding to each first image; the pixel values of all the pixel points in each area image are normalized.
  • the image input module 420 may include: a preprocessing unit and an input unit.
  • the preprocessing unit is used for preprocessing the to-be-recognized image;
  • the input unit is used for inputting the pre-processed to-be-recognized object into a pre-trained image recognition model.
  • the preprocessing unit may be specifically configured to: obtain the region where the entity object is located in the image to be recognized according to a pre-trained object detection model; adjust the scale of the region where the entity object is located in the image to be recognized, Obtain an area image corresponding to the to-be-recognized image; and normalize the pixel values of all pixel points in the area image corresponding to the to-be-recognized image.
  • the result output module may also be configured to, after inputting the to-be-recognized image into a pre-trained image recognition model to obtain a probability corresponding to each preset category output by the image recognition model, output When the probability corresponding to the target category in all the preset categories is greater than or equal to the set threshold, a result indicating that the to-be-recognized image belongs to the target category is output.
  • the coupling between the modules may be electrical, mechanical or other forms of coupling.
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, or can be implemented in the form of software function modules.
  • the probability corresponding to each preset category output by the image recognition model is obtained, wherein the The image recognition model is trained based on multiple positive sample images marked with preset categories and multiple negative sample images marked with non-preset categories.
  • the output is It is used to represent the result that the image to be recognized does not belong to any preset category, so that the image recognition model obtained by pre-training based on positive sample images marked with preset categories and negative sample images marked with non-preset categories , the to-be-recognized image can be recognized, and for a category that does not exist in the image recognition model, when the category of the to-be-recognized image is recognized, the to-be-recognized image can be effectively and incorrectly recognized as an existing category, thereby improving the accuracy of image recognition.
  • the electronic device 100 may be an electronic device capable of running an application program, such as a smart phone, a tablet computer, a smart watch, a smart glasses, a notebook computer, or the like.
  • the electronic device 100 in the present application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs, wherein the one or more application programs may be stored in the memory 120 and configured to be executed by One or more processors 110 execute, and one or more programs are configured to execute the methods described in the foregoing method embodiments.
  • the processor 110 may include one or more processing cores.
  • the processor 110 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 120, and calling the data stored in the memory 120.
  • the processor 110 may adopt at least one of a digital signal processing (Digital Signal Processing, DSP), a Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and a Programmable Logic Array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 110 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • the CPU mainly handles the operating system, user interface and application programs, etc.
  • the GPU is used for rendering and drawing of the display content
  • the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may also not be integrated into the processor 110, and is implemented by a communication chip alone.
  • the memory 120 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 120 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
  • the memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like.
  • the storage data area may also store data (such as phone book, audio and video data, chat record data) created by the electronic device 100 during use.
  • FIG. 8 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the computer-readable medium 800 stores program codes, and the program codes can be invoked by the processor to execute the methods described in the above method embodiments.
  • the computer readable storage medium 800 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 800 includes a non-transitory computer-readable storage medium.
  • Computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products.
  • Program code 810 may be compressed, for example, in a suitable form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种图像识别方法、装置、电子设备及存储介质,该图像识别方法包括:获取待识别图像;将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率,其中,所述图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到;在所述每个预设类别对应的概率均小于设定阈值时,输出用于表征所述待识别图像不属于任一预设类别的结果。本方法可以有效避免对于图像识别模型中不存在的类别,图像识别模型在对类别的图像进行识别时,错误地将其识别为存在的类别,从而提升图像识别的识别正确率。

Description

图像识别方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请要求于2020年8月11日提交的申请号为202010802994.4的中国申请的优先权,其在此出于所有目的通过引用将其全部内容并入本文。
技术领域
本申请涉及图像处理技术领域,更具体地,涉及一种图像识别方法、装置、电子设备及存储介质。
背景技术
随着科技水平的迅速发展,对图像中物体的识别引起了人们极大的研究兴趣,并同时在很多应用产品中部署,智能化地解决了很多日常生活中的问题,例如用于安防、公安、司法等领域。传统的图像识别的技术中,通过训练的图像识别模型进行,但对于图像识别模型中不存在的类别,会存在误识别的情况,使得图像识别的应用也会存在不准确的情况。
发明内容
鉴于上述问题,本申请提出了一种图像识别方法、装置、电子设备及存储介质。
第一方面,本申请实施例提供了一种图像识别方法,所述方法包括:获取待识别图像;将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率,其中,所述图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到;在所述每个预设类别对应的概率均小于设定阈值时,输出用于表征所述待识别图像不属于任一预设类别的结果。
第二方面,本申请实施例提供了一种图像识别装置,所述装置包括:图像获取模块、图像输入模块以及结果输出模块,其中,所述图像获取模块用于获取待识别图像;所述图像输入模块用于将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率,其中,所述图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到;所述结果输出模块用于在所述每个预设类别对应的概率均小于设定阈值时,输出用于表征所述待识别图像不属于任一预设类别的结果。
第三方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储器;一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行上述第一方面提供的图像识别方法。
第四方面,本申请实施例提供了一种计算机可读取存储介质,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行上述第一方面提供的图像识别方法。
本申请提供的方案,通过获取待识别图像,将该待识别图像输入至预先训练的图像识别模型,得到该图像识别模型输出的每个预设类别对应的概率,其中,该图像识 别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到,在每个预设类别对应的概率均小于设定阈值时,输出用于表征该待识别图像不属于任一预设类别的结果,从而通过预先根据被标注有预设类别的正样本图像以及被标注有非预设类别的负样本图像进行训练得到的图像识别模型,对待识别图像进行识别,能够对于图像识别模型中不存在的类别,对该类别的待识别图像进行识别时,利用图像识别模型所输出的存在的各个预设类别的概率,将其确定为不属于任一预设类别,进而有效避免错误地将待识别图像识别为存在的类别,进而提升图像识别的准确率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了根据本申请一个实施例的图像识别方法流程图。
图2示出了根据本申请另一个实施例的图像识别方法流程图。
图3示出了本申请另一个实施例提供的图像识别方法中步骤S210的流程图。
图4示出了根据本申请又一个实施例的图像识别方法流程图。
图5示出了本申请另一个实施例提供的图像识别方法中步骤S320的流程图。
图6示出了根据本申请一个实施例的图像识别装置的一种框图。
图7是本申请实施例的用于执行根据本申请实施例的图像识别方法的电子设备的框图。
图8是本申请实施例的用于保存或者携带实现根据本申请实施例的图像识别方法的程序代码的存储单元。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
随着科学技术的发展,基于人工智能的图像识别被广泛应用于各行各业,以至于在日常生活中,人们能够感受到人工智能所带来的影响。例如,图像识别可以用于安保系统中,可以用于识别和预测事件;又例如,图像识别可以用在不良图像的筛选中,以减少人工进行筛选的成本;还例如,图像识别也可以用在犯罪分子的查找中,可以用于从海量图像中筛选出犯罪分子的图像。
在相关技术中,传统的图像识别方法多是针对特定领域的,对于特定领域内的物体,现有的方法可以取得较满意的效果,但是现实世界中,物体的类别是不可能穷尽的,也就是说图像识别模型不可能包含现实中的所有物体类别,这样对于模型中不存在的类别的图像,传统的方法中,对图像进行识别时,依然有一定概率将其划分到已知类别中,从而造成误识别。这是因为图像识别的本质是一个分类问题,现有的图像识别方法在训练过程中是一个闭集问题,即训练过程中所有图片的类别,都是在给定类别内的,但是实际应用过程中却是一个开集问题,即实际需要识别的图片类别可能并不在给定类别之内,因此依然会有一定概率将其识别成给定类别内的物体,从而造成误识别,误识别会严重降低产品的价值和用户体验,所以在保证准确率的情况下降低误识别率就显得尤为重要。
针对上述问题,发明人提出了本申请实施例提供的图像识别方法、装置、电子设备以及存储介质,通过预先根据被标注有预设类别的正样本图像以及被标注有非预设类别的负样本图像进行训练得到的图像识别模型,对待识别图像进行识别,能够对于图像识别模型中不存在的类别,对该类别的待识别图像进行识别时,有效错误地将待 识别图像识别为存在的类别,进而提升图像识别的准确率。其中,具体的图像识别方法在后续的实施例中进行详细的说明。
请参阅图1,图1示出了本申请一个实施例提供的图像识别方法的流程示意图。在具体的实施例中,所述图像识别方法应用于如图6所示的图像识别装置400以及配置有所述图像识别装置400的电子设备100(图7)。下面将以电子设备为例,说明本实施例的具体流程,当然,可以理解的,本实施例所应用的电子设备可以为智能手机、平板电脑、智能手表、智能眼镜、笔记本电脑等,在此不做限定。下面将针对图1所示的流程进行详细的阐述,所述图像识别方法具体可以包括以下步骤:
步骤S110:获取待识别图像。
在本申请实施例中,电子设备可以待进行图像内容的类别识别的图像,并将其作为待识别图像。其中,进行图像内容的类别识别,即对图像中的实体对象进行类别的识别,例如对图像中的动物、植物等进行类别识别。
作为一种实施方式,电子设备为智能手机、平板电脑、智能手表等设置有摄像头的移动终端时,可以通过前置摄像头或者后置摄像头进行图像采集,从而获得待进行图像内容的类别识别的图像,例如,电子设备可以通过后置摄像头采集图像,并将获得的图像作为输入图像;作为又一种实施方式,电子设备可以从本地获取待进行目标对象的检测的图像,也就是说,电子设备可以从本地存储的文件中获取待进行图像内容的类别识别的图像,例如,电子设备为移动终端时,可以从相册获取待进行图像内容的类别识别的图像,即电子设备预先通过摄像头采集图像后存储在本地相册,或者预先从网络下载图像后存储在本地相册等,然后在需要对图像进行图像内容的类别识别的时,从相册中读取待进行图像内容的类别识别的图像;作为再一种方式,电子设备为移动终端或者电脑时,也可以从网络下载待进行图像内容的类别识别的图像,例如,电子设备可以通过无线网络、数据网络等从相应的服务器下载需求的图像,以进行对下载的图像进行图像内容的类别识别;作为还一种实施方式,电子设备也可以通过用户的输入操作,对输入的待进行图像内容的类别识别的图像进行接收,从而获得待识别图像;作为又另一种实施方式,电子设备为服务器时,还可以从数据库服务器或者从用户的客户端获取待识别图像。当然,电子设备具体获取待识别图像的方式可以不作为限定。
步骤S120:将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率,其中,所述图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到。
在本申请实施例中,电子设备可以将待识别图像输入至预先训练的图像识别模型中,以得到图像识别模型输出的每个预设类别对应的概率。在一些实施方式中,图像识别模型可以预先存储于电子设备本地,电子设备可以直接从本地调用图像识别模型,并将待识别图像输入至图像识别模型;图像识别模型也可以存储于服务器,电子设备在需要对待识别图像进行内容的类别识别时,可以调用服务器中的图像识别模型,例如,将待识别图像发送至服务器,以指示服务器将该待识别图像输入至图像识别模型中,进行图像内容的类别识别。
在一些实施方式中,图像识别模型可以由大量训练样本训练得到。训练样本可以包括多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像。具体地,可以将正样本图像输入至初始识别模型,获得初始识别模型输出的多个预设类别中每个预设类别的概率,并计算正样本图像对应的预设类别的真实概率与初始识别模型输出的概率之间的差异,获得图像识别模型根据正样本图像输出的输出结果的损失;另外,将负样本图像输入至初始识别模型,获得输出每个预设类别的概率,然后根据计算负样本图像对应的预设类别的真实概率与初始识别模型输出的概率之间的差异,获得图像识别模型根据负样本图像输出的输出结果的损失;然后根据各个正 样本图像对应的输出结果的损失,以及各个负样本图像对应的输出结果的损失,计算得到一个训练批次中各个样本图像对应的输出结果的总损失;再根据优化算法,对初始识别模型进行迭代训练,得到训练后的图像识别模型。其中,初始识别模型可以为卷积神经网络模型等,在此不作限定。由于利用了非预设类别的负样本图像,对初始识别模型进行训练,从而在获得的图像识别模型,在对输入图像进行类别的识别时,如果输入图像中不包含任一预设类别的图像内容,则图像识别模型输出的各个预设类别的概率,会与非预设类别的图像对应的真实概率接近或者相等,从而不会被判别为其中一个预设类别。
步骤S130:在所述每个预设类别对应的概率均小于设定阈值时,输出用于表征所述待识别图像不属于任一预设类别的结果。
在本申请实施例中,电子设备在获得到图像识别模型输出的各个预设类别对应的概率之后,则可以根据预设类别对应的概率,确定输出结果。具体地,电子设备可以将各个预设类别对应的概率与预设阈值进行比较,该预设阈值作为确定待识别图像是否属于其中一个预设类别的判断依据。其中,预设阈值通常可以设置的较高,例如70%,80%,90%等,具体数值可以不作为限定,可以理解的,只有当某个预设类别对应的概率特别高时,才表示图像属于该预设类别的概率较大,从而可以将其确定为属于该预设类别。在将各个预设类别对应的概率与设定阈值进行比较之后,当每个预设类别对应的概率均小于设定阈值时,表示每个预设类别对应的概率都较小,而不能将其判定为任一预设类别,因此可以输出用于表征该待识别图像不属于任一预设类别的结果;当所有预设类别中目标类别对应的概率大于设定阈值时,则表示该待识别图像属于该目标类别的概率较大,并且都是大于其他预设类别对应的概率的,从而可以输出用于表征待识别图像属于目标类别的结果。
本申请实施例提供的图像识别方法,通过预先根据被标注有预设类别的正样本图像以及被标注有非预设类别的负样本图像进行训练得到的图像识别模型,对待识别图像进行识别,能够对于图像识别模型中不存在的类别,对该类别的待识别图像进行识别时,有效错误地将待识别图像识别为存在的类别,进而提升图像识别的准确率。
并且,由于通过图像识别模型输出的各个预设类别的概率来确定待识别模型所属的类别,因此使得当输入到图像识别模型的待识别图像不属于任一预设类别的情况下,图像识别模型输出的各个预设类别的概率不会大于设定阈值,从而将其判定为不属于任一预设类别,能够有效避免对非预设类别的图像单独设置一个非预设类别,而负样本的数量不足够多的情况下,由于图像识别模型并不能识别足够的多的非预设类别,因此图像识别模型对于无法识别的非预设类别时,输出的结果中可能会存在某个预设类别的概率较大,而错误地分类为该概率较大的预设类别的情况发生,进而使得图像识别的更加准确。
另外,对于图像识别模型,除了需要进行识别的多种预设类别之外,不对非预设类别单独设置一个类别,使图像识别模型只需要输出各个预设类别对应的概率,后续根据各个预设类别对应的概率,即可确定出待识别图像所属的类别,这样的话,在需求图像识别模型识别的预设类别较多时,可以减少图像识别模型的计算量,从而降低电子设备的负担。
请参阅图2,图2示出了本申请另一个实施例提供的图像识别方法的流程示意图。该图像识别方法应用于上述电子设备,下面将针对图2所示的流程进行详细的阐述,所述图像识别方法具体可以包括以下步骤:
步骤S210:获取样本图像集,所述样本图像集包括多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像。
在本申请实施例中,针对前述实施例中提到的图像识别模型,本申请实施例还包括对该图像识别模型的训练方法,值得说明的是,对图像识别模型的训练可以是根据 获取的样本图像集预先进行的,后续在每次需要对待识别图像进行图像内容的类别识别时,则可以利用图像识别模型进行,而无需每次对待识别图像进行图像内容的类别识别时对图像识别模型进行训练。
在本申请实施例中,在对图像识别模型进行训练时,可以获取样本图像集,其中,该样本图像集包括多张被标注有预设类别的正样本图像,以及多张被标注有非预设类别的负样本图像。
在一些实施方式中,请参阅图3,电子设备获取样本图像集,可以包括:
步骤S211:获取多个预设类别对应的多张第一图像,以及非预设类别对应的多张第二图像,其中,每个预设类别对应的第一图像中包含该预设类别的实体对象,每个预设类别对应有至少一张第一图像,非预设类别对应第二图像中包含非预设类别的实体对象;
步骤S212:分别对所述多张第一图像中每张第一图像进行预处理,获得多张被标注有预设类别的正样本图像;
步骤S213:分别对所述多张第二图像中每张第二图像进行所述预处理,获得多张被标注有预设类别的负样本图像。
在该实施方式中,多个预设类别可以包括需求图像识别模型识别的图像内容的类别,预设类别的数量可以根据实际需求设定。例如,需求识别猫、狗、猪、羊和牛5种类别时,则多个预设类别包括猫、狗、猪、羊和牛。
在该实施方式中,可以对每个预设类别,获取多张图像,并将其标注为对应的预设类别。可以理解的,每个预设类别对应的图像越多,则后续训练得到的图像识别模型对预设类别的图像的识别能力也越好。例如,对于一个预设类别,可以获取该预设类别的实体对象在多种不同场景的图像,从而使得每个预设类别对应的多张图像可以具有广泛性。另外,在获取非预设类别对应的图像时,可以获取包含任意的非预设类别的实体对象的图像,并且可以广泛地获取非预设类别的实体对象对应的图像,例如,预设类别包括猫和狗的动物类别时,则可以广泛地获取除了猫和狗以外的其他类别的实体对象对应的图像,例如,可以获取花、草、树、猪、羊等类别的实体对象对应的图像,并且获取的非目标类别对应的图像中不包含预设类别的实体对象。以上的图像可以来源于包含大量图像的训练集,例如COCO Dataest数据集,IMAGECLEF数据集等,在此不做限定。
在一些实施方式中,对多张第一图像进行预处理,可以包括:
根据预先训练的物体检测模型,获取每张第一图像中的实体对象所在的目标区域;
对获得的所述每张第一图像所对应的目标区域进行比例调整,获得所述每张第一图像所对应的区域图像;
将每张区域图像中所有像素点的像素值进行归一化。
在该实施方式中,可以先利用预先训练的物体检测模型,将预设类别的实体对象所在的目标区域进行检测,并将目标区域从第一图像中裁剪出,以从第一图像分离出实体对象所在的目标区域,而其他无关的区域的内容则被清除掉,使得在后续进行训练时,用于训练的第一图像中不会包含过多与实体对象的无关特征,从而减少计算量,也能提升模型训练的效果。另外,将裁剪出来的目标区域进行比例调整,即比例放大或者比例缩小,可以使用于模型训练的图像的尺寸大小一致,例如,可以同一调整为224*224的大小,通过设置224*224的大小,而不是其他大小,是在权衡了模型大小、运行速度和性能后的选择,比例大小越大的话一般分类性能越好,但是相应地模型大小和运行速度会增加,具体的比例大小可根据实际需求进行选择。
另外,将目标区域的像素值进行归一化,即归一化到[0,1],可以消除像素值的绝对大小对分类性能的影响,比如有的图像比较清晰像素值就会很大,有的图像比较模糊像素值就会很小,归一化之后会减小这种图像本身像素值的大小对识别性能的影响, 使模型更能够学习到图像本身的更层次的纹理结构等特征信息,同时也可以加快模型训练和收敛速度等。
在该实施方式中,预先训练的物体检测模型可以为MobileNet-SSD等,可以有效节省电子设备的存储空间,提升电子设备的运行效率。
在以上实施方式中,获得到多张被标注有预设类别的正样本图像,以及多张被标注有非预设类别的负样本图像之后,这些正样本图像以及负样本图像构成的图像集合,即为需要获取的样本图像集。
步骤S220:分别将每张正样本图像以及每张负样本图像输入至初始识别模型,得到所述初始识别模型输出的每个预设类别对应的概率。
在本申请实施例中,在获得样本图像集之后,则可以分布将张正样本图像以及每张负样本图像输入至初始识别模型,得到初始识别模型输出的每个预设类别对应的概率。
在一些实施方式中,初始识别模型可以包括特征提取模块以及分类模块。其中,特征提取模块对图像的图像特征进行提取之后,将其输入至分类模块,然后分类模块根据输入的图像特征,输出各个预设类别对应的概率。
在一些方式中,特征提取模块可以为预先训练的神经网络。例如,神经网络可以为目视图像生成器(VGG,Visual Graphics Generator)模型,深度残差网络(ResNet,Deep Residual Network)模型、MobileNetV2等用于提取图像特征的模型。可选的,神经网络可以为imageNet中预训练的卷积神经网络(例如VGG19),由于imageNet中预训练的卷积神经网络已经有很强的特征提取能力,因此不需要重新训练,基于imageNet中预训练的卷积神经网络即可对上述特征进行提取。可选的,神经网络也可以是MobileNetV2,可以有效节省电子设备的存储空间,提升电子设备的运行效率。
在一些方式中,分类模块可以为Softmax逻辑回归模型(Softmax logical regression),也可以是支持向量机(Support Vector Machine,SVM)等,具体的分类模块可以不作为限定。
步骤S230:根据所述初始识别模型输出的概率,每张正样本图像被标注的预设类别以及每张负样本图像被标注的非预设类别,确定总损失值。
在本申请实施例中,可以针对每张正样本图像对应的初始识别模型的输出结果,以及每张正样本图像被标注的预设类别,计算每张正样本图像对应的输出结果的损失,针对每张负样本图像对应的初始识别模型的输出结果,以及每张负样本图像被标注的预设类别,计算每张负样本图像对应的输出结果的损失,然后根据计算的每张正样本图像以及每张负样本图像对应的输出结果的损失,确定总损失值。
在一些实施方式中,确定总损失值的过程可以包括:
根据正样本图像对应的所述初始识别模型输出的概率,与正样本图像被标注的预设类别所对应的真实概率之间的差异,获得正样本图像对应的输出结果的损失值,其中,在正样本图像被标注的预设类别所对应的真实概率中,被标注的预设类别对应的概率大于或等于所述设定阈值,且其他预设类别对应的概率小于所述设定阈值,所述其他预设类别为所有预设类别中除被标注的预设类别以外的预设类别;根据负样本图像对应的所述初始识别模型输出的概率,与负样本图像被标注的非预设类别所对应的真实概率之间的差异,获得负样本图像对应的输出结果的损失值,其中,在负样本图像被标注的非预设类别所对应的真实概率中,每个预设类别对应的概率均小于所述设定阈值;根据每张正样本图像对应的输出结果的损失值以及每张负样本图像对应的输出结果的损失值,获得所述样本图像集对应的输出结果的总损失值。
可以理解地,只有当某个预设类别对应的概率特别高时,才表示图像属于该预设类别的概率较大,从而可以将其确定为属于该预设类别。针对正样本图像,其被标注的预设类别对应的真实概率应该是,只有被标注的预设类别对应的概率是大于或等于 设定阈值的,而其他预设类别对应的概率是小于设定阈值的,也就是说,只有被标注的预设类别对应的概率会很大,而其他预设类别对应的概率会很小;针对负样本图像,由于其不属于任一预设类别,则其被标注的预设类别对应的真实概率应该是,每个预设类别对应的概率都小于设定阈值。通过如此设定,可以根据初始识别模型对应的输出结果,以及每个样本图像被标注的标签的真实概率,求得每个正样本图像以及每个负样本图像所对应的输出结果的损失,再根据每个正样本图像以及每个负样本图像所对应的输出结果的损失,即可计算出整个样本图像集的所有图像对应的输出结果的总损失值。
在该实施方式中,在具体根据每张正样本图像对应的输出结果的损失,以及每张负样本图像对应的输出结果的损失,确定所有样本图像对应的输出结果的总损失值时,可以根据每张正样本图像对应的输出结果的损失值,获取所述多张正样本图像对应的输出结果的平均损失值作为第一损失值,并根据每张负样本图像对应的输出结果的损失值,获取多张负样本图像对应的输出结果的平均损失值作为第二损失值,然后根据第一损失值以及第二损失值,获得样本图像集对应的输出结果的总损失值。
作为一种具体的实施方式,可以对正样本图像对应的平均损失值,以及负样本图像对应的平均损失设置不同的权重,也就是说,对第一损失值以及第二损失值分别设置不同的权重,然后获取第一损失值与其对应的权重的乘积作为第一乘积,以及第二损失值与其对应的权重的乘积作为第二乘积,再确定第一乘积与第二乘积的和,获得样本图像集对应的输出结果的总损失值。
需要说明的是,以上是将样本图像集中的所有图像作为一个训练批次所需的图像,进行训练时,获得的总损失值。在将样本图像集中的所有图像作为一个训练批次所需的图像时,可以保证正样本图像与负样本图像的数据相等,例如,均设置为N,N为正整数。
作为一种具体地实施方式,对于正样本图像对应的输出结果的损失计算,可以按照以下公式计算:
Figure PCTCN2021099185-appb-000001
其中,p i为正样本图像属于类别i的概率,xi为经过特征提取后的特征向量,例如前述的MobileNetV2的输出向量,W为权重向量,b为偏置,y为被标注的标签(即被标注的预设类别),N为正样本图像的数量。
对于负样本图像对应的输出结果的损失计算,由于负样本图像中的实体对象不属于给定多个预设类别中的任一预设类别,因此经过分类后(例如经过前述的Softmax分类器进行分类)得到的概率应该均匀分布,理想情况下图像属于每一个预设类别的概率都是一样的,即
Figure PCTCN2021099185-appb-000002
这样的话就不会在某个预设类别上发生概率特别大,即每个预设类别对应的概率都会小于设定阈值,从而不会造成误识别的情况,也就是对于无类别标签样本数据,应该满足以下条件:
max p 1*p 2*…*p k
s.t.p 1+p 2+…+p k=1
其中,p k是初始识别模型输出的负样本图像属于每一预设类别的概率,该条件的含义为:所有预设类别对应的概率之和应当为1,并且,对于负样本图像,希望得到它属于每一预设类别的概率都是相等的,也就是p1=p2=...=pk,这就是最终需求的模型输出结果的目标,这个目标等价于求p 1*p 2*...*p k的最大值,即其取p 1、p 2、...、p k之间乘积的最大值,也就是说,当p 1=p 2=...=p k时,则p 1*p 2*...*p k的乘积取得最大值。
进一步地,求p 1*p 2*...*p k的最大值也就等价于求p 1、p 2、...、p k的对数的最大值,即求取log(p 1*p 2*...*p k),这是因为log是单调递增的,又因为损失函数都是用最小来表示的,求它们的对数的最大值也就等于求它们的对数的相反数的最小值,也就是-log(p 1*p 2*...*p k)的最小值,根据对数函数的性质展开就是-[log(p 1+log(p 2)+...log(p k)]。
因此,负样本图像对应的输出结果的损失值可以根据以下公式计算:
Figure PCTCN2021099185-appb-000003
其中,p i为负样本图像属于类别i的概率,x i为经过特征提取后的特征向量,例如前述的MobileNetV2的输出向量,W为权重向量,b为偏置,y为被标注的标签(即被标注的预设类别),N为负样本图像的数量。
进一步地,在按照如上的公式计算正样本图像对应的输出结果的损失值,以及负样本图像对应的输出结果的损失值之后,正样本图像与负样本图像的相等,假设都为N,则加权得到总损失值,如下所示:
Figure PCTCN2021099185-appb-000004
ω表示负样本图像对应的平均损失值的权重,其取值范围可以为[0.1,0.5]。
步骤S240:根据所述总损失值对所述初始识别模型进行迭代训练,获得所述图像识别模型。
在本申请实施例中,在获得样本图像集对应的输出结果的总损失值之后,则可以根据总损失值对初始识别模型进行迭代训练,得到最终的图像识别模型。
在一些实施方式中,可以根据总损失函数,使用Adam优化器对初始识别模型进行迭代训练,直至初始识别模型的输出结果的损失值收敛,并将此时的模型进行保存,得到训练后的图像识别模型。其中,Adam优化器,结合了AdaGra(Adaptive Gradient,自适应梯度)和RMSProp两种优化算法的优点,对梯度的一阶矩估计(First Moment Estimation,即梯度的均值)和二阶矩估计(Second Moment Estimation,即梯度的未中心化的方差)进行综合考虑,计算出更新步长。
在一些实施方式中,迭代训练的终止条件可以包括:迭代训练的次数达到目标次数;或者初始识别模型的输出结果的总损失值满足设定条件。
在一种具体实施方式中,收敛条件是让总损失值尽可能小,使用初始学习率1e-3,学习率随步数余弦衰减,batch_size=8,训练16个epoch后,即可认为收敛完成。其中, batch_size可以理解为批处理参数,它的极限值为训练集样本总数,epoch指使用训练集中的全部样本训练的次数,通俗的讲epoch的值就是整个数据集被轮几次,1个epoch等于使用训练集中的全部样本训练1次。
在另一种具体实施方式中,总损失值满足设定条件可以包括:总损失值小于设定阈值。当然,具体设定条件可以不作为限定。
在一些实施方式中,训练得到的图像识别模型可以存储于电子设备本地,该训练得到的图像识别模型也可以在与电子设备通信连接的服务器,将图像识别模型存储在服务器的方式,可以减少占用电子设备的存储空间,提升电子设备运行效率。
在一些实施方式中,图像识别模型还可以周期性的或者不定期的获取新的训练数据,对该图像识别模型进行训练和更新。例如,在存在图像被误识别时,则可以将该图像作为样本图像,对样本图像进行标注后,通过以上训练方式,再进行训练,从而可以提升图像识别模型的辨识度和识别准确度。
在一些实施方式中,由于图像识别模型是用于识别某些类别的图像,因此当用户需求图像识别模型识别的类别发生变化时,还可以增加新的预设类别,或者删除某个预设类别;并根据变更后的预设类别,对图像识别模型再进行训练。
通过本申请实施例提供的对图像识别模型的训练方法,可以实现在对已有的图像识别模型进行改良,以降低其误识别率时,可以直接利用以上训练方法进行训练,而无需在单独添加类别(即添加一个非预设类别对应的类别),从而能够更加简单方便的对已有的图像识别模型进行改良。
步骤S250:获取待识别图像。
步骤S260:将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率,其中,所述图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到。
步骤S270:在所述每个预设类别对应的概率均小于设定阈值时,输出用于表征所述待识别图像不属于任一预设类别的结果。
在本申请实施例中,步骤S250至步骤S270可以参阅前述实施例的内容,在此不再赘述。
本申请实施例提供的图像识别方法,提供了对图像识别模型的训练过程,通过获取包括多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像的样本图像集,分别将每张正样本图像以及每张负样本图像输入至初始识别模型,得到初始识别模型输出的每个预设类别对应的概率,再根据初始识别模型输出的结果,每张样本图像被标注的预设类别以及每张负样本图像被标注的非预设类别,确定总损失值,然后根据总损失值对初始识别模型进行迭代训练,获得图像识别模型。由于利用了非预设类别的负样本图像,对初始识别模型进行训练,从而在获得的图像识别模型,在对输入图像进行类别的识别时,如果输入图像中不包含任一预设类别的图像内容,则图像识别模型输出的各个预设类别的概率,会与非预设类别的图像对应的真实概率接近或者相等,从而不会被判别为其中一个预设类别。
请参阅图4,图4示出了本申请又一个实施例提供的图像识别方法的流程示意图。该图像识别方法应用于上述电子设备,下面将针对图4所示的流程进行详细的阐述,所述图像识别方法具体可以包括以下步骤:
步骤S310:获取待识别图像。
在本申请实施例中,步骤S310可以参阅前述实施例的内容,在此不再赘述。
步骤S320:对所述待识别图像进行预处理。
在本申请实施例中,为使得待识别图像满足图像识别模型的图像输入标准,提升识别准确率,以及提升处理效率,还可以对待识别图像进行预处理。
在一些实施方式中,请参阅图5,对所述待识别图像进行预处理,包括:
步骤S321:根据预先训练的物体检测模型,获取所述待识别图像中的实体对象所在区域;
步骤S322:对所述待识别图像中实体对象所在区域进行比例调整,获得所述待识别图像对应的区域图像;
步骤S323:将所述待识别图像对应的区域图像中所有像素点的像素值进行归一化。
其中,电子设备可以先利用预先训练的物体检测模型,将待识别图像中实体对象所在区域进行检测,并将实体对象所在区域从待识别图像中裁剪出,以从待识别图像中分离出实体对象所在区域,而其他无关的区域的内容则被清除掉,在进行图像识别时,能够减少计算量,也能提升识别准确率。另外,将裁剪出来的区域进行比例调整,即比例放大或者比例缩小,可以输入至图像识别模型的图像的比例大小与模型训练的使用图像的比例大小一致。将以上区域的像素值进行归一化,即归一化到[0,1],可以消除像素值的绝对大小对分类性能的影响,比如有的图像比较清晰像素值就会很大,有的图像比较模糊像素值就会很小,归一化之后会减小这种图像本身像素值的大小对识别性能的影响,使模型更能够学习到图像本身的更层次的纹理结构等特征信息,从而提升图像识别的准确率。
步骤S330:将预处理后的待识别对象输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率,其中,所述图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到。
步骤S340:在所述每个预设类别对应的概率均小于设定阈值时,输出用于表征所述待识别图像不属于任一预设类别的结果。
步骤S350:在所有预设类别中目标类别对应的概率大于或等于所述设定阈值时,输出用于表征所述待识别图像属于所述目标类别的结果。
本申请实施例提供的图像识别方法,通过预先根据被标注有预设类别的正样本图像以及被标注有非预设类别的负样本图像进行训练得到的图像识别模型,对待识别图像进行识别,能够对于图像识别模型中不存在的类别,对该类别的待识别图像进行识别时,有效错误地将待识别图像识别为存在的类别,进而提升图像识别的准确率。并且在将待识别图像输入至图像识别模型之前,对待识别图像进行预处理,从而能够进一步地提升识别准确率。
请参阅图6,其示出了本申请实施例提供的一种图像识别装置400的结构框图。该图像识别装置400应用上述的电子设备,该图像识别装置400包括:图像获取模块410、图像输入模块420以及结果输出模块430。其中,所述图像获取模块410用于获取待识别图像;所述图像输入模块420用于将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率,其中,所述图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到;所述结果输出模块430用于在所述每个预设类别对应的概率均小于设定阈值时,输出用于表征所述待识别图像不属于任一预设类别的结果。
在一些实施方式中,该图像识别装置400还可以包括:图像集获取模块、概率获取模块、损失获取模块以及迭代训练模块。其中,图像集获取模块用于在所述将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率之前,获取样本图像集,所述样本图像集包括多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像;概率获取模块用于分别将每张正样本图像以及每张负样本图像输入至初始识别模型,得到所述初始识别模型输出的每个预设类别对应的概率;损失获取模块用于根据所述初始识别模型输出的概率,每张正样本图像被标注的预设类别以及每张负样本图像被标注的非预设类别,确定总损失值;迭代训练模块用于根据所述总损失值对所述初始识别模型进行迭代训练,获得所述图像识别模型。
在该实施方式中,损失获取模块可以包括:第一损失获取单元、第二损失获取单元、以及总损失获取单元。其中,第一损失获取单元用于根据正样本图像对应的所述初始识别模型输出的概率,与正样本图像被标注的预设类别所对应的真实概率之间的差异,获得正样本图像对应的输出结果的损失值,其中,在正样本图像被标注的预设类别所对应的真实概率中,被标注的预设类别对应的概率大于或等于所述设定阈值,且其他预设类别对应的概率小于所述设定阈值,所述其他预设类别为所有预设类别中除被标注的预设类别以外的预设类别;第二损失获取单元用于根据负样本图像对应的所述初始识别模型输出的概率,与负样本图像被标注的非预设类别所对应的真实概率之间的差异,获得负样本图像对应的输出结果的损失值,其中,在负样本图像被标注的非预设类别所对应的真实概率中,每个预设类别对应的概率均小于所述设定阈值;总损失获取单元用于根据每张正样本图像对应的输出结果的损失值以及每张负样本图像对应的输出结果的损失值,获得所述样本图像集对应的输出结果的总损失值。
进一步地,总损失获取单元可以具体用于:根据每张正样本图像对应的输出结果的损失值,获取所述多张正样本图像对应的输出结果的平均损失值作为第一损失值;根据每张负样本图像对应的输出结果的损失值,获取所述多张负样本图像对应的输出结果的平均损失值作为第二损失值;根据所述第一损失值以及所述第二损失值,获得所述样本图像集对应的输出结果的总损失值。
在该实施方式中,图像集获取模块包括:第一图像获取单元、第二图像获取单元以及第三图像获取单元。其中,第一图像获取单元用于获取多个预设类别对应的多张第一图像,以及非预设类别对应的多张第二图像,其中,每个预设类别对应的第一图像中包含该预设类别的实体对象,每个预设类别对应有至少一张第一图像,非预设类别对应第二图像中包含非预设类别的实体对象;第二图像获取单元用于分别对所述多张第一图像中每张第一图像进行预处理,获得多张被标注有预设类别的正样本图像;第三图像获取单元用于分别对所述多张第二图像中每张第二图像进行所述预处理,获得多张被标注有预设类别的负样本图像。
进一步地,第二图像获取单元可以具体用于:根据预先训练的物体检测模型,获取每张第一图像中的实体对象所在的目标区域;对获得的所述每张第一图像所对应的目标区域进行比例调整,获得所述每张第一图像所对应的区域图像;将每张区域图像中所有像素点的像素值进行归一化。
在一些实施方式中,该图像输入模块420可以包括:预处理单元以及输入单元。其中,预处理单元用于对所述待识别图像进行预处理;输入单元用于将预处理后的待识别对象输入至预先训练的图像识别模型。
在该实施方式中,预处理单元可以具体用于:根据预先训练的物体检测模型,获取所述待识别图像中的实体对象所在区域;对所述待识别图像中实体对象所在区域进行比例调整,获得所述待识别图像对应的区域图像;将所述待识别图像对应的区域图像中所有像素点的像素值进行归一化。
在一些实施方式中,结果输出模块还可以用于在所述将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率之后,在所有预设类别中目标类别对应的概率大于或等于所述设定阈值时,输出用于表征所述待识别图像属于所述目标类别的结果。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,模块相互之间的耦合可以是电性,机械或其它形式的耦合。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成 的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
综上所述,本申请提供的方案,通过获取待识别图像,将该待识别图像输入至预先训练的图像识别模型,得到该图像识别模型输出的每个预设类别对应的概率,其中,该图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到,在每个预设类别对应的概率均小于设定阈值时,输出用于表征该待识别图像不属于任一预设类别的结果,从而通过预先根据被标注有预设类别的正样本图像以及被标注有非预设类别的负样本图像进行训练得到的图像识别模型,对待识别图像进行识别,能够对于图像识别模型中不存在的类别,对该类别的待识别图像进行识别时,有效错误地将待识别图像识别为存在的类别,进而提升图像识别的准确率。
请参考图7,其示出了本申请实施例提供的一种电子设备的结构框图。该电子设备100可以是智能手机、平板电脑、智能手表、智能眼镜、笔记本电脑等能够运行应用程序的电子设备。本申请中的电子设备100可以包括一个或多个如下部件:处理器110、存储器120、以及一个或多个应用程序,其中一个或多个应用程序可以被存储在存储器120中并被配置为由一个或多个处理器110执行,一个或多个程序配置用于执行如前述方法实施例所描述的方法。
处理器110可以包括一个或者多个处理核。处理器110利用各种接口和线路连接整个电子设备100内的各个部分,通过运行或执行存储在存储器120内的指令、程序、代码集或指令集,以及调用存储在存储器120内的数据,执行电子设备100的各种功能和处理数据。可选地,处理器110可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器110可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器110中,单独通过一块通信芯片进行实现。
存储器120可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器120可用于存储指令、程序、代码、代码集或指令集。存储器120可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储电子设备100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。
请参考图8,其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质800中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。
计算机可读存储介质800可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质800包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质800具有执行上述方法中的任何方法步骤的程序代码810的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码810可以例如以适当形式进行压缩。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等 同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种图像识别方法,其特征在于,所述方法包括:
    获取待识别图像;
    将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率,其中,所述图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到;
    在所述每个预设类别对应的概率均小于设定阈值时,输出用于表征所述待识别图像不属于任一预设类别的结果。
  2. 根据权利要求1所述的方法,其特征在于,在所述将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率之前,所述方法还包括:
    获取样本图像集,所述样本图像集包括多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像;
    分别将每张正样本图像以及每张负样本图像输入至初始识别模型,得到所述初始识别模型输出的每个预设类别对应的概率;
    根据所述初始识别模型输出的概率,每张正样本图像被标注的预设类别以及每张负样本图像被标注的非预设类别,确定总损失值;
    根据所述总损失值对所述初始识别模型进行迭代训练,获得所述图像识别模型。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述初始识别模型输出的概率,每张正样本图像被标注的预设类别以及每张负样本图像被标注的非预设类别,确定总损失值,包括:
    根据正样本图像对应的所述初始识别模型输出的概率,与正样本图像被标注的预设类别所对应的真实概率之间的差异,获得正样本图像对应的输出结果的损失值,其中,在正样本图像被标注的预设类别所对应的真实概率中,被标注的预设类别对应的概率大于或等于所述设定阈值,且其他预设类别对应的概率小于所述设定阈值,所述其他预设类别为所有预设类别中除被标注的预设类别以外的预设类别;
    根据负样本图像对应的所述初始识别模型输出的概率,与负样本图像被标注的非预设类别所对应的真实概率之间的差异,获得负样本图像对应的输出结果的损失值,其中,在负样本图像被标注的非预设类别所对应的真实概率中,每个预设类别对应的概率均小于所述设定阈值;
    根据每张正样本图像对应的输出结果的损失值以及每张负样本图像对应的输出结果的损失值,获得所述样本图像集对应的输出结果的总损失值。
  4. 根据权利要求3所述的方法,其特征在于,所述根据每张正样本图像对应的输出结果的损失值以及每张负样本图像对应的输出结果的损失值,获得所述样本图像集对应的输出结果的总损失值,包括:
    根据每张正样本图像对应的输出结果的损失值,获取所述多张正样本图像对应的输出结果的平均损失值作为第一损失值;
    根据每张负样本图像对应的输出结果的损失值,获取所述多张负样本图像对应的输出结果的平均损失值作为第二损失值;
    根据所述第一损失值以及所述第二损失值,获得所述样本图像集对应的输出结果的总损失值。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述第一损失值以及所述第二损失值,获得所述样本图像集对应的输出结果的总损失值,包括:
    获取所述第一损失值与其对应的权重的乘积,作为第一乘积,以及第二损失值与其对应的权重的乘积,作为第二乘积;
    获取所述第一乘积与第二乘积的和值,得到样本图像集对应的输出结果的总损失值。
  6. 根据权利要求3-5任一项所述的方法,其特征在于,
    在负样本图像被标注的非预设类别所对应的真实概率中,所述负样本图像属于每一预设类别的概率相同,且所述负样本图像属于每一预设类别的概率之和为1。
  7. 根据权利要求2-6任一项所述的方法,其特征在于,所述获取样本图像集,包括:
    获取多个预设类别对应的多张第一图像,以及非预设类别对应的多张第二图像,其中,每个预设类别对应的第一图像中包含该预设类别的实体对象,每个预设类别对应有至少一张第一图像,非预设类别对应第二图像中包含非预设类别的实体对象;
    分别对所述多张第一图像中每张第一图像进行预处理,获得多张被标注有预设类别的正样本图像;
    分别对所述多张第二图像中每张第二图像进行所述预处理,获得多张被标注有预设类别的负样本图像。
  8. 根据权利要求7所述的方法,其特征在于,所述分别对所述多张第一图像中每张第一图像进行预处理,获得多张被标注有预设类别的正样本图像,包括:
    根据预先训练的物体检测模型,获取每张第一图像中的实体对象所在的目标区域;
    对获得的所述每张第一图像所对应的目标区域进行比例调整,获得所述每张第一图像所对应的区域图像;
    将每张区域图像中所有像素点的像素值进行归一化。
  9. 根据权利要求8所述的方法,其特征在于,所述对获得的所述每张第一图像所对应的目标区域进行比例调整,获得所述每张第一图像所对应的区域图像,包括:
    对获得的所述每张第一图像所对应的目标区域调整为同一尺寸,获得所述每张第一图像所对应的区域图像。
  10. 根据权利要求7-9任一项所述的方法,其特征在于,所述多个类别包括需求所述图像识别模型识别的图像内容的类别。
  11. 根据权利要求2-10任一项所述的方法,其特征在于,所述根据所述总损失值对所述初始识别模型进行迭代训练,获得所述图像识别模型,包括:
    根据总损失值,并使用Adam优化器对所述初始识别模型进行迭代训练,直至所述初始识别模型的输出结果的损失值收敛,得到所述图像识别模型。
  12. 根据权利要求11所述的方法,其特征在于,所述迭代训练的终止条件包括:
    所述迭代训练的次数达到目标次数;或者
    所述初始识别模型的输出结果的总损失值满足设定条件。
  13. 根据权利要求2-12任一项所述的方法,其特征在于,所述初始识别模型包括特征提取模块以及分类模块,其中,所述特征提取模块用于对输入图像的图像特征进行提取;所述分类模块用于根据所述输入图像的图像特征,输出各个预设类别对应的概率。
  14. 根据权利要求13所述的方法,其特征在于,所述特征提取模块包括imageNet中预训练的卷积神经网络、或者预先训练的MobileNetV2。
  15. 根据权利要求1-14任一项所述的方法,其特征在于,所述将所述待识别图像输入至预先训练的图像识别模型,包括:
    对所述待识别图像进行预处理;
    将预处理后的待识别对象输入至预先训练的图像识别模型。
  16. 根据权利要求15所述的方法,其特征在于,所述对所述待识别图像进行预处理,包括:
    根据预先训练的物体检测模型,获取所述待识别图像中的实体对象所在区域;
    对所述待识别图像中实体对象所在区域进行比例调整,获得所述待识别图像对应的区域图像;
    将所述待识别图像对应的区域图像中所有像素点的像素值进行归一化。
  17. 根据权利要求1-16任一项所述的方法,其特征在于,在所述将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率之后,所述方法还包括:
    在所有预设类别中目标类别对应的概率大于或等于所述设定阈值时,输出用于表征所述待识别图像属于所述目标类别的结果。
  18. 一种图像识别装置,其特征在于,所述装置包括:图像获取模块、图像输入模块以及结果输出模块,其中,
    所述图像获取模块用于获取待识别图像;
    所述图像输入模块用于将所述待识别图像输入至预先训练的图像识别模型,得到所述图像识别模型输出的每个预设类别对应的概率,其中,所述图像识别模型根据多张被标注有预设类别的正样本图像以及多张被标注有非预设类别的负样本图像训练得到;
    所述结果输出模块用于在所述每个预设类别对应的概率均小于设定阈值时,输出用于表征所述待识别图像不属于任一预设类别的结果。
  19. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储器;
    一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行如权利要求1-17任一项所述的方法。
  20. 一种计算机可读取存储介质,其特征在于,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行如权利要求1-17任一项所述的方法。
PCT/CN2021/099185 2020-08-11 2021-06-09 图像识别方法、装置、电子设备及存储介质 WO2022033150A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010802994.4A CN111814810A (zh) 2020-08-11 2020-08-11 图像识别方法、装置、电子设备及存储介质
CN202010802994.4 2020-08-11

Publications (1)

Publication Number Publication Date
WO2022033150A1 true WO2022033150A1 (zh) 2022-02-17

Family

ID=72858927

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099185 WO2022033150A1 (zh) 2020-08-11 2021-06-09 图像识别方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN111814810A (zh)
WO (1) WO2022033150A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331062A (zh) * 2022-08-29 2022-11-11 北京达佳互联信息技术有限公司 图像识别方法、装置、电子设备和计算机可读存储介质
CN117094966A (zh) * 2023-08-21 2023-11-21 青岛美迪康数字工程有限公司 基于图像扩增的舌图像识别方法、装置和计算机设备

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814810A (zh) * 2020-08-11 2020-10-23 Oppo广东移动通信有限公司 图像识别方法、装置、电子设备及存储介质
CN112508062A (zh) * 2020-11-20 2021-03-16 普联国际有限公司 一种开集数据的分类方法、装置、设备及存储介质
CN112488012A (zh) * 2020-12-03 2021-03-12 浙江大华技术股份有限公司 行人属性识别方法、电子设备及存储介质
CN112381055A (zh) * 2020-12-03 2021-02-19 影石创新科技股份有限公司 第一人称视角图像识别方法、装置及计算机可读存储介质
CN112712052A (zh) * 2021-01-13 2021-04-27 安徽水天信息科技有限公司 一种机场全景视频中微弱目标的检测识别方法
CN112966110A (zh) * 2021-03-17 2021-06-15 中国平安人寿保险股份有限公司 文本类别识别方法及相关设备
CN113239804B (zh) * 2021-05-13 2023-06-02 杭州睿胜软件有限公司 图像识别方法、可读存储介质及图像识别系统
CN113657406B (zh) * 2021-07-13 2024-04-23 北京旷视科技有限公司 模型训练和特征提取方法、装置、电子设备及存储介质
CN113569691A (zh) * 2021-07-19 2021-10-29 新疆爱华盈通信息技术有限公司 人头检测模型生成方法、装置、人头检测模型及人头检测方法
CN116012656B (zh) * 2023-01-20 2024-02-13 北京百度网讯科技有限公司 样本图像的生成方法和图像处理模型的训练方法、装置
CN117115596B (zh) * 2023-10-25 2024-02-02 腾讯科技(深圳)有限公司 对象动作分类模型的训练方法、装置、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150302268A1 (en) * 2014-04-16 2015-10-22 I.R.I.S. Pattern recognition system
CN109191453A (zh) * 2018-09-14 2019-01-11 北京字节跳动网络技术有限公司 用于生成图像类别检测模型的方法和装置
CN109522967A (zh) * 2018-11-28 2019-03-26 广州逗号智能零售有限公司 一种商品定位识别方法、装置、设备以及存储介质
CN109766872A (zh) * 2019-01-31 2019-05-17 广州视源电子科技股份有限公司 图像识别方法和装置
CN109934293A (zh) * 2019-03-15 2019-06-25 苏州大学 图像识别方法、装置、介质及混淆感知卷积神经网络
CN111814810A (zh) * 2020-08-11 2020-10-23 Oppo广东移动通信有限公司 图像识别方法、装置、电子设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109878A (zh) * 2018-01-10 2019-08-09 广东欧珀移动通信有限公司 相册管理方法、装置、存储介质及电子设备
CN110135514B (zh) * 2019-05-22 2021-06-15 国信优易数据股份有限公司 一种工件分类方法、装置、设备及介质
CN111126346A (zh) * 2020-01-06 2020-05-08 腾讯科技(深圳)有限公司 脸部识别方法、分类模型的训练方法、装置和存储介质
CN111259968A (zh) * 2020-01-17 2020-06-09 腾讯科技(深圳)有限公司 非法图像识别方法、装置、设备和计算机可读存储介质
CN111260665B (zh) * 2020-01-17 2022-01-21 北京达佳互联信息技术有限公司 图像分割模型训练方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150302268A1 (en) * 2014-04-16 2015-10-22 I.R.I.S. Pattern recognition system
CN109191453A (zh) * 2018-09-14 2019-01-11 北京字节跳动网络技术有限公司 用于生成图像类别检测模型的方法和装置
CN109522967A (zh) * 2018-11-28 2019-03-26 广州逗号智能零售有限公司 一种商品定位识别方法、装置、设备以及存储介质
CN109766872A (zh) * 2019-01-31 2019-05-17 广州视源电子科技股份有限公司 图像识别方法和装置
CN109934293A (zh) * 2019-03-15 2019-06-25 苏州大学 图像识别方法、装置、介质及混淆感知卷积神经网络
CN111814810A (zh) * 2020-08-11 2020-10-23 Oppo广东移动通信有限公司 图像识别方法、装置、电子设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331062A (zh) * 2022-08-29 2022-11-11 北京达佳互联信息技术有限公司 图像识别方法、装置、电子设备和计算机可读存储介质
CN115331062B (zh) * 2022-08-29 2023-08-08 北京达佳互联信息技术有限公司 图像识别方法、装置、电子设备和计算机可读存储介质
CN117094966A (zh) * 2023-08-21 2023-11-21 青岛美迪康数字工程有限公司 基于图像扩增的舌图像识别方法、装置和计算机设备
CN117094966B (zh) * 2023-08-21 2024-04-05 青岛美迪康数字工程有限公司 基于图像扩增的舌图像识别方法、装置和计算机设备

Also Published As

Publication number Publication date
CN111814810A (zh) 2020-10-23

Similar Documents

Publication Publication Date Title
WO2022033150A1 (zh) 图像识别方法、装置、电子设备及存储介质
WO2021169723A1 (zh) 图像识别方法、装置、电子设备及存储介质
WO2021077984A1 (zh) 对象识别方法、装置、电子设备及可读存储介质
CN110533097B (zh) 一种图像清晰度识别方法、装置、电子设备及存储介质
CN110020592B (zh) 物体检测模型训练方法、装置、计算机设备及存储介质
JP2022548438A (ja) 欠陥検出方法及び関連装置、機器、記憶媒体、並びにコンピュータプログラム製品
CN109002766B (zh) 一种表情识别方法及装置
US8792722B2 (en) Hand gesture detection
US8750573B2 (en) Hand gesture detection
CN108288051B (zh) 行人再识别模型训练方法及装置、电子设备和存储介质
WO2019033525A1 (zh) Au特征识别方法、装置及存储介质
JP6309549B2 (ja) 変形可能な表現検出器
CN111652317B (zh) 基于贝叶斯深度学习的超参数图像分割方法
CN107871314B (zh) 一种敏感图像鉴别方法和装置
CN109325440B (zh) 人体动作识别方法及系统
CN110781836A (zh) 人体识别方法、装置、计算机设备及存储介质
CN111767783A (zh) 行为检测、模型训练方法、装置、电子设备及存储介质
CN109117857B (zh) 一种生物属性的识别方法、装置及设备
WO2021238586A1 (zh) 一种训练方法、装置、设备以及计算机可读存储介质
CN113869449A (zh) 一种模型训练、图像处理方法、装置、设备及存储介质
CN111401343B (zh) 识别图像中人的属性的方法、识别模型的训练方法和装置
CN112418327A (zh) 图像分类模型的训练方法、装置、电子设备以及存储介质
CN114612728A (zh) 模型训练方法、装置、计算机设备及存储介质
CN113255557A (zh) 一种基于深度学习的视频人群情绪分析方法及系统
CN114898266A (zh) 训练方法、图像处理方法、装置、电子设备以及存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21855206

Country of ref document: EP

Kind code of ref document: A1