CN112182268A

CN112182268A - Image classification method and device, electronic equipment and storage medium

Info

Publication number: CN112182268A
Application number: CN202011035129.8A
Authority: CN
Inventors: 申世伟
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2021-01-05
Anticipated expiration: 2040-09-27
Also published as: CN112182268B

Abstract

The present disclosure relates to an image classification method, apparatus, electronic device and storage medium, including: acquiring at least two target image datasets corresponding to the original image datasets; respectively inputting at least two first image data sets into an image classification model for training to obtain at least two target models; respectively inputting at least two second image data sets into a target model to obtain a plurality of prediction results; determining the complexity of an image classification task based on the original image data set according to each prediction result; and re-determining the first image data set according to the relation between the complexity parameter and the set threshold value, and determining a final image classification model according to the re-determined first image data set. According to the scheme of the embodiment of the invention, the problems of low accuracy of image classification caused by low accuracy of complexity of image classification tasks evaluated by experience can be solved, the complexity of the image classification tasks can be accurately determined, and the accuracy of image classification is improved.

Description

Image classification method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image classification method and apparatus, an electronic device, and a storage medium.

Background

With the continuous development of computer technology, intelligent algorithms such as deep learning are widely used to solve various daily problems, for example, the daily problems of image classification, pedestrian detection, license plate recognition and the like are realized through a deep learning model.

In the related art, when a task (for example, an image classification task) is started to be processed, a developer needs to judge the complexity of the image classification task through experience, judge whether the task can be solved through a deep learning algorithm, and finally determine an image classification model suitable for the image classification task.

However, the complexity of a certain image classification task determined by experience may cause an evaluation error, which may not only cause problems such as inaccurate processing results, but also affect the accuracy of image classification.

Disclosure of Invention

The present disclosure provides an image classification method, an image classification device, an electronic device, and a storage medium, so as to solve the problem in the related art that the complexity of an image classification task cannot be accurately determined, resulting in a low accuracy of image classification. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided an image classification method, including:

acquiring at least two target image datasets corresponding to an original image dataset, each of the target image datasets comprising a first image dataset and a second image dataset;

inputting at least two first image data sets into an image classification model respectively for training to obtain at least two target models;

inputting at least two second image data sets into the target model respectively to obtain a plurality of prediction results;

determining the complexity of an image classification task based on the original image data set according to each prediction result;

and re-determining the first image data set according to the relation between the complexity parameter and the set threshold value, and determining a final image classification model according to the re-determined first image data set.

Optionally, the determining, according to each prediction result, the complexity of the image classification task based on the target image dataset includes:

and determining a complexity parameter corresponding to the image classification task according to each prediction result, and determining the complexity of the image classification task according to the complexity parameter.

Optionally, the step of determining the complexity parameter corresponding to the image classification task according to each prediction result includes:

respectively determining a target prediction result and a reference prediction result in the plurality of prediction results, and determining the deviation between the target prediction result and the reference prediction result;

and normalizing each deviation, and taking the average value of each normalization result as a complexity parameter corresponding to the image classification task.

Optionally, the complexity parameter corresponding to the image classification task is determined based on the following formula:

wherein K is the complexity parameter, N is the number of test sets, and N > -2; a (i, j) represents a prediction result obtained by inputting the test set j to the target model i; a (i, i) represents the prediction result obtained by inputting the test set i into the target model i.

Optionally, the step of re-determining the first image data set according to the relationship between the complexity parameter and the set threshold, and determining the final image classification model according to the re-determined first image data includes:

when the complexity parameter is smaller than or equal to a set threshold value, merging at least two first image data sets to determine a first image data set again;

and inputting the redetermined first image data set into an image classification model for training to obtain a final image classification model.

Optionally, the step of re-determining the first image data set according to the relationship between the complexity parameter and the set threshold, and determining the final image classification model according to the re-determined first image data further includes:

when the complexity parameter is larger than a set threshold value, adding supplementary data into at least two first image data sets, and using the supplementary data and the original data in each first image data set as a multi-modal training sample;

inputting the multi-modal training samples into a first deep learning model for training;

wherein the modality of the supplemental data and the raw data includes at least one of: image data, text data, voice data, and user behavior data.

Optionally, after the step of determining a final image classification model, the method further includes:

and classifying any second image data set image through the final image classification model.

According to a second aspect of the embodiments of the present disclosure, there is provided an image classification apparatus including:

an acquisition module configured to acquire at least two target image datasets corresponding to an original image dataset, each of the target image datasets comprising a first image dataset and a second image dataset;

the first training module is configured to input at least two first image data sets into a deep learning model for training respectively to obtain at least two target models;

a test module configured to input at least two of the second image data sets into the target model, respectively, resulting in a plurality of prediction results;

a determination module configured to determine a complexity of a model training task based on the original image dataset according to the prediction results;

and the second training module is configured to re-determine the first image data set according to the relation between the complexity parameter and the set threshold value, and determine a final image classification model according to the re-determined first image data set.

Optionally, the determining module comprises a determining submodule configured to determine the first threshold value

Optionally, the determining sub-module is specifically configured to

Optionally, the determining sub-module determines the complexity parameter corresponding to the image classification task based on the following formula:

Optionally, the second training module includes: a merged training submodule configured to

Optionally, the second training module further includes: a supplementary training submodule configured to

Optionally, the image classification apparatus further includes: a classification module configured to

And classifying any second image data set image through the final image classification model. According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image classification method according to any embodiment of the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the image classification method according to any one of the embodiments of the present disclosure.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product for use in conjunction with an electronic device, the computer program product comprising a computer-readable storage medium and a computer program mechanism embedded therein, the program being loaded into and executed by a computer to implement the image classification method according to any of the embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: acquiring at least two target image datasets corresponding to the original image datasets, each target image dataset comprising a first image dataset and a second image dataset; respectively inputting at least two first image data sets into an image classification model for training to obtain at least two target models; respectively inputting at least two second image data sets into a target model to obtain a plurality of prediction results; determining the complexity of an image classification task based on the original image data set according to each prediction result; the first image data set is re-determined according to the relation between the complexity parameter and the set threshold, and the final image classification model is determined according to the re-determined first image data set, so that the problems of low accuracy of image classification caused by low complexity of an image classification task evaluated by experience can be solved, the complexity of the image classification task is accurately determined, and the accuracy of image classification is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow diagram illustrating a method of image classification according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of image classification according to an exemplary embodiment.

FIG. 3 is a flow diagram illustrating a method of image classification according to an exemplary embodiment.

FIG. 4 is a flow diagram illustrating a method of image classification according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating an image classification apparatus according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating an image classification method according to an exemplary embodiment, and as shown in fig. 1, the image classification method may be performed by an image classification apparatus, which may be implemented by software and/or hardware and used in an electronic device, which may be a computer, a server, a smart phone, or the like, and the method includes the following steps.

In step S11, at least two target image data sets corresponding to the original image data sets are acquired, each target image data set comprising a first image data set and a second image data set.

The original image data set may be any image data set, such as a human image data set, an animal image data set, or a self-recognition image data set, which is not limited in this embodiment. The target image dataset may be an image dataset determined by annotating the original image dataset. In general, the original image data set may be labeled by manual labeling or machine labeling, and at present, because the patterns of the images in the original image data set are variable, the image data set is labeled by manual labeling. For example, for the image classification task, the categories included in the image data may be labeled by a manual labeling manner, for example, a cat, a dog, a flower, a person, or the like in the image data is labeled separately, which is not limited in this embodiment.

In an optional implementation manner of this embodiment, while the original image data set is obtained, a target image data set in which different annotators annotate the original image set may be obtained; for example, a target image data set a in which the annotator a annotates an original image data set, a target image data set B in which the annotator B annotates an original image data set, and a target image data set C in which the annotator C annotates an original image data set may be obtained, respectively.

In this embodiment, each target image dataset may include a first image dataset and a second image dataset; wherein the first image dataset may be a training image dataset (training set) and the second image dataset may be a test image dataset (test set).

It should be noted that the training set refers to a sample set for training, and is mainly used for training parameters in the deep learning model; after the deep learning model training process is completed, in order to objectively evaluate the performance of the model on data which is not seen (common parameter and hyper-parameter selection are not affected), the test set and the training set are independent and non-overlapping, and the test set cannot provide a modification opinion on the parameters or the hyper-parameters of the deep learning model and can only be used as an index for evaluating the network performance.

In a specific example of this embodiment, ten staff members may label the original data sets matched with the image classification task to obtain ten target image data sets, and the data sets are respectively labeled according to the following formula 9: a scale of 1 separates each target image dataset into a first image dataset and a second image dataset, i.e. a training set and a test set. For example, each target image data set includes 1000 pieces of image data, 900 pieces of image data may be determined as a training set and 100 pieces of image data may be determined as a test set.

In step S12, at least two first image data sets are input into the deep learning model and trained, respectively, to obtain at least two target models.

The deep learning model may be a convolutional neural network or other deep learning networks, which is not limited in this embodiment. Illustratively, the deep learning model may be a neural network matched with the classification task, such as an initiation-v 3 network or an xception network, and the present embodiment is not limited thereto. It should be noted that, the deep learning model related in this embodiment may also be a neural network matched with other tasks, for example, a neural network matched with a target detection task or a neural network matched with a semantic segmentation task, and this embodiment is not limited by this.

Specifically, after at least two target image data sets corresponding to the original image data sets are obtained, each first image data set may be further input into the deep learning model for training, so as to obtain a target model matched with each first image data set.

Illustratively, the original image data sets are labeled by three workers respectively to obtain three target image data sets, and the labeling is performed according to the following steps of 9: a scale of 1 divides each target image data set into a first image data set and a second image data set. Further, the three first image data sets can be respectively input into the deep learning model for training, so as to obtain three target models. It should be noted that, in this embodiment, the first image data set a may be input into the deep learning model for training to obtain the target model a; inputting the first image data set B and the first image data set C into a deep learning model in sequence for training, and obtaining a target model B and a target model C in sequence; in this embodiment, the first image data set a, the first image data set B, and the first image data set C may also be trained simultaneously, for example, the first image data set a, the first image data set B, and the first image data set C are trained in different servers, or three training tasks are started simultaneously in the same server, and the first image data set a, the first image data set B, and the first image data set C are trained simultaneously, which is not limited in this embodiment.

In step S13, at least two second image data sets are input into the target model, respectively, resulting in a plurality of prediction results.

In an optional implementation manner of this embodiment, after the at least two first image data sets are respectively input to the deep learning model for training to obtain the at least two target models, the model may be further tested by using at least two second image data sets, that is, the at least two second image data sets are respectively input to the target models obtained by training to obtain a plurality of prediction results.

In a specific example of the embodiment, each of the at least two second image data sets may be input into at least two target models, respectively, to obtain a plurality of prediction results. For example, the target data set is annotated by three staff members, resulting in three target image data sets A, B and C, wherein target image data set a comprises a first image data set a and a second image data set a; the target image dataset B comprises a first image dataset B and a second image dataset B; the target image data set C comprises a first image data set C and a second image data set C. Among them, the first image data set A, B and C may include 800 sheets of image data, and the second image data set A, B and C may include 200 sheets of image data. Inputting the three first image data sets A, B and C containing 800 pieces of image data into the deep learning model respectively to obtain three target models A, B and C; further, the second image data set a may be further input into the object models A, B and C, respectively, to obtain 600 prediction results; inputting the second image data set B into the target models A, B and C, respectively, to obtain 600 prediction results; the second image dataset C was input into the object models A, B and C, respectively, resulting in 600 predictions.

In step S14, the complexity of the image classification task based on the original image dataset is determined from the respective prediction results.

The complexity of the image classification task can be the difficulty level of the image classification task; in an alternative implementation manner of the present embodiment, the complexity of the image classification task may be determined according to each prediction result determined in step S13.

Optionally, determining the complexity of an image classification task based on the original image data set according to each prediction result may include: and determining a complexity parameter corresponding to the image classification task according to each prediction result, and determining the complexity of the image classification task according to the complexity parameter.

In an optional implementation manner of this embodiment, after each of the at least two second image data sets is input into the at least two target models respectively to obtain a plurality of prediction results, a complexity parameter of the image classification task may be determined according to the plurality of prediction results, and a complexity of the image classification task may be determined according to the complexity parameter.

In an optional implementation manner of this embodiment, the determining the complexity of the image classification task according to the complexity parameter may include: when the complexity parameter is less than or equal to the set threshold, the complexity of the image classification task can be determined to be low; when the complexity parameter is greater than the set threshold, it can be determined that the complexity of the image classification task is high. The threshold may be set to a value such as 0.3, 0.4, or 0.6, which is not limited in this embodiment.

For example, if the complexity parameter determined according to the plurality of prediction results is 0.2 and the threshold is set to be 0.3, it may be determined that the complexity of the image classification task is low; if the complexity parameter determined from the plurality of prediction results is 0.9 and the threshold is set to 0.3, it can be determined that the complexity of the image classification task is high.

In another specific example of this embodiment, the complexity of the image classification task may also be determined according to the consistency degree of each prediction result; for example, if the similarity of the prediction results of the target models on the image data is greater than 90% for the same image data, it may be determined that the complexity of the image classification task is low. It should be noted that, in this embodiment, the complexity of the image classification task may also be determined by other methods, which is not limited in this embodiment.

In step S15, the first image data set is re-determined according to the relationship between the complexity parameter and the set threshold, and a final image classification model is determined according to the re-determined first image data set.

The threshold may be set to a value such as 0.3, 0.4, or 0.6, which is not limited in this embodiment.

In an optional implementation manner of this embodiment, after determining the complexity of the image classification task based on the original image data set according to each prediction result, the first image data set may be further determined again according to a relationship between the complexity parameter and the set threshold, and the final image classification model may be determined according to the determined first image data set.

The relationship between the complexity parameter and the set threshold may include a relationship greater than, a relationship less than, or a relationship equal to, which is not limited in this embodiment.

In an optional implementation manner of this embodiment, after determining the complexity parameter of the image classification task based on the original image data set, a relationship between the complexity parameter and the set threshold may be further determined, and the first image data set may be further re-determined according to the relationship, for example, a plurality of first image data sets are merged to obtain a new first image data set, or new image data is added to the first image data set, and the like, which is not limited in this embodiment.

In the scheme of the embodiment, at least two target image data sets corresponding to an original image data set are obtained, and each target image data set comprises a first image data set and a second image data set; respectively inputting at least two first image data sets into an image classification model for training to obtain at least two target models; respectively inputting at least two second image data sets into a target model to obtain a plurality of prediction results; determining the complexity of an image classification task based on the original image data set according to each prediction result; according to the relation between the complexity parameter and the set threshold, the first image data set is determined again, and the final image classification model is determined according to the re-determined first image data set, so that the problems that the complexity accuracy of an image classification task is low and the accuracy of image classification is low due to the fact that the complexity of the image classification task is evaluated by experience can be solved, the complexity of the image classification task is accurately determined, and the accuracy of image classification is improved.

Fig. 2 is a flowchart illustrating an image classification method according to an exemplary embodiment, which is a further refinement of the above technical solutions, and the technical solutions in the present embodiment may be combined with various alternatives in one or more of the above embodiments. As shown in fig. 2, the image classification method includes the following steps.

In step S21, at least two target image data sets corresponding to the original image data sets are acquired, each target image data set comprising a first image data set and a second image data set.

In deployment S22, the at least two first image data sets are input into the image classification model for training, respectively, to obtain at least two target models.

In step S23, at least two second image data sets are input into the target model, respectively, resulting in a plurality of prediction results.

In step S24, a complexity parameter corresponding to the image classification task is determined from each prediction result, and the complexity of the image classification task is determined from the complexity parameter.

In an optional implementation manner of this embodiment, determining, according to each prediction result, a complexity parameter corresponding to the image classification task may include: respectively determining a target prediction result and a reference prediction result in the plurality of prediction results, and determining the deviation between the target prediction result and the reference prediction result; and normalizing each deviation, and taking the average value of each normalization result as a complexity parameter corresponding to the image classification task.

The target prediction result may be a prediction result obtained by inputting the second image data set into a target model trained with the first image data set matching the second image data set. It should be noted that the second image data set and the first image data set matching the second image data set belong to the same target image data set. The reference prediction result may be a prediction result obtained by inputting the second image data set to a target model obtained by training the other first image data sets. It should be noted that the second image data set belongs to a different target image data set from the other first image data sets.

Illustratively, if two workers respectively label the original image data sets to obtain target image data sets a and B, the target image data set a comprises a first image data set a and a second image data set a; the target image dataset B comprises a first image dataset B and a second image dataset B; the first image data sets A and B are trained respectively to obtain target models A and B. The target prediction result is a prediction result obtained by inputting the second image data set a to the target model a; the reference prediction result is a prediction result obtained by inputting the second image data set a to the target model B. It will be appreciated that in this example, the target prediction result may also be a prediction result from inputting the second image data set B to the target model B; the reference prediction result may also be a prediction result obtained by inputting the second image data set B to the target model a.

In an optional implementation manner of this embodiment, after the target prediction result and the reference prediction result in the plurality of prediction results are determined respectively, the deviation between the target prediction result and the reference prediction result may be further determined, for example, the difference between the target prediction result and the reference prediction result may be calculated; further, each deviation obtained through calculation is normalized, the mean value of each normalization result is obtained, and the mean value is used as a complexity parameter corresponding to the image classification task.

In a specific example of the present embodiment, the complexity parameter corresponding to the image classification task may be determined based on the following formula:

wherein K is a complexity parameter, N is the number of second image datasets, and N > -2; a (i, j) represents a prediction result obtained by inputting the second image data set j to the target model i, namely a reference prediction result; a (i, i) represents a prediction result obtained by inputting the second image data set i to the target model i, i.e., a target prediction result.

For example, if N is 2, a (i, j) represents a prediction result obtained by inputting the second image data set j to the target model i, that is, a test result obtained by inputting the second image data set 1 to the target model 2, and a test result obtained by inputting the second image data set 2 to the target model 1; a (i, i) represents a prediction result obtained by inputting the second image data set i to the target model i, that is, a test result obtained by inputting the second image data set 1 to the target model 1, and a test result obtained by inputting the second image data set 2 to the target model 2. Further, the specific numerical value is substituted into the formula, and the complexity parameter corresponding to the image classification task can be determined.

In another specific example of this embodiment, the target prediction result and the reference prediction result may also be represented by accuracy, for example, a (i, j) represents the prediction accuracy obtained by inputting the second image data set j to the target model i; a (i, i) represents the prediction accuracy obtained when the second image data set i is input into the target model i. The advantage of this arrangement is that the amount of calculation in the process of determining the complexity parameter corresponding to the image classification task can be reduced, and the complexity parameter can be determined quickly, thereby providing a basis for quickly determining the complexity of the task.

Optionally, the step of determining the complexity of the image classification task according to the complexity parameter may include: when the complexity parameter is smaller than a set threshold value, determining that the complexity of the image classification task is low; and when the complexity parameter is larger than the set threshold value, determining that the complexity of the image classification task is high.

In the scheme of the embodiment, the deviation between the target prediction result and the reference prediction result is determined by respectively determining the target prediction result and the reference prediction result in the plurality of prediction results; and normalizing each deviation, and taking the mean value of each normalization result as a complexity parameter corresponding to the image classification task, so that the complexity parameter can be quickly determined, and a basis is provided for determining the complexity of the image classification task.

Fig. 3 is a flowchart illustrating an image classification method according to an exemplary embodiment, which is a further refinement of the above technical solutions, and the technical solutions in the present embodiment may be combined with various alternatives in one or more of the above embodiments. As shown in fig. 3, the image classification method includes the following steps.

In step S31, at least two target image data sets corresponding to the original image data sets are acquired, each target image data set comprising a first image data set and a second image data set.

In deployment S32, the at least two first image data sets are input into the image classification model for training, respectively, to obtain at least two target models.

In step S33, at least two second image data sets are input into the target model, respectively, resulting in a plurality of prediction results.

In step S34, the complexity of the image classification task based on the original image dataset is determined from the respective prediction results.

In step S35, when the complexity parameter is less than or equal to the set threshold, merging at least two first image data sets to redetermine the first image data sets; and inputting the redetermined first image data set into an image classification model for training to obtain a final image classification model.

In an optional implementation manner of this embodiment, if the complexity parameter determined by each of the above embodiments is smaller than the set threshold, it may be determined that the complexity of the image classification task is low; at this time, at least two first image data sets may be merged (summarized); and inputting the merged first image data set into a deep learning model for training so as to determine a final image classification model corresponding to the image classification task.

Illustratively, if ten annotators annotate an original image data set, ten target image data sets are obtained, wherein each target image data set comprises a first image data set and a second image data set; after determining that the complexity of the image classification task matched with the target data set is low according to the steps, ten first image data sets included in the ten target image data sets can be merged, and the merged first image data sets are input into a deep learning model for training to obtain a final image classification model corresponding to the image classification task.

According to the scheme of the embodiment, after the complexity of the image classification task is determined to be low, at least two first image data sets can be merged, and the merged first image data sets are input into the deep learning model for training so as to determine the final image classification model corresponding to the image classification task, so that the diversity of the first image data set samples can be enhanced, the accuracy of the final image classification model can be improved, and the labeling quantity of the samples and the complexity of the model can be reduced while the task effect is ensured.

Fig. 4 is a flowchart illustrating an image classification method according to an exemplary embodiment, which is a further refinement of the above technical solutions, and the technical solutions in the present embodiment may be combined with various alternatives in one or more of the above embodiments. As shown in fig. 4, the image classification method includes the following steps.

In step S41, at least two target image data sets corresponding to the original image data sets are acquired, each target image data set comprising a first image data set and a second image data set.

In deployment S42, the at least two first image data sets are input into the image classification model for training, respectively, to obtain at least two target models.

In step S43, at least two second image data sets are input into the target model, respectively, resulting in a plurality of prediction results.

In step S44, the complexity of the image classification task based on the original image dataset is determined from the respective prediction results.

In step S45, when the complexity parameter is greater than the set threshold, adding supplementary data to at least two first image data sets, and using the supplementary data and the original data in each first image data set as a multi-modal training sample; and inputting the multi-modal training samples into a first deep learning model for training.

In an optional implementation manner of this embodiment, if the complexity parameter determined in each of the above embodiments is greater than a set threshold, it may be determined that the complexity of the image classification task is high; when the complexity parameter is larger than the set threshold, supplementary data can be added to at least two first image data sets, and the supplementary data and the original data in the first image data sets are used as multi-modal training samples together; inputting the multi-modal training samples into a first deep learning model for training to obtain a final image classification model; wherein the modality of the supplemental data includes at least one of: image data, text data, voice data, and user behavior data.

Illustratively, if a target data set is labeled by ten labeling personnel, ten target image data sets are obtained, wherein each target image data set comprises a first image data set and a second image data set; after determining that the complexity of the image classification task matched with the target data set is high according to the method of each embodiment, adding supplementary data, such as adding voice data, into ten target image data sets respectively; training the ten first image data sets continuously through a first deep learning model, and further judging the complexity of an image classification task after data supplementation; if the complexity of the image classification task is determined to be higher, continuing to add supplementary data, such as user behavior data, to each first image data set; and merging the first image data sets according to the embodiment until the complexity of the image classification task is determined to be low, so as to obtain a final image classification model.

It should be noted that the first depth model in this embodiment may be a multi-modal depth learning model, that is, the first image data set of multiple modalities may be trained simultaneously, for example, the image data and the voice data may be trained simultaneously, which is not limited in this embodiment.

In the scheme of the embodiment, after the complexity of the image classification task is determined to be high, supplementary data are added into at least two first image data sets, and the supplementary data and original data in the first image data sets are used as multi-modal training samples together; the multi-modal training samples are input into the first deep learning model for training, so that the diversity of the first image data set samples can be enhanced, and the accuracy of the final image classification model can be improved.

In order to make those skilled in the art better understand the iterative image classification method of the present embodiment, a specific example is used below for description, and the specific process includes:

1. a collection of data sets (collected task pictures) is prepared according to task or question definitions, and the data sets need to be labeled by a plurality of labeling personnel, and the labeled data amount of each person is consistent. Inconsistencies exist in the labeling of certain data due to the different degrees of learning of labeling rules by each person and the individual cognitive bias. For example, when the annotation task is that the objective problem of 'dog' exists in the picture, the cognition is basically consistent; but when the subjective task of 'beautiful and not beautiful' is met for women in the picture, the cognition is generally very different. Therefore, the difficulty degree of the problem can be judged according to the difference generated when multiple people mark.

2. Training set is carried out on data labeled by multiple persons: test set 9: 1, respectively training 1 deep learning classification model containing the same network architecture and training mode (including an optimizer and iteration steps) based on a training set of each person. Here, taking an image classification task as an example, the initiation-v 3 network may be selected until the value of a loss function (a common loss function of a deep learning classification network, such as cross entropy loss) hardly decreases, which proves that the network converges at this time, and the classification model is trained completely. Here, assuming that there are N persons, N deep learning models are generated and N test sets are used to test the N trained models, where N may be any positive number greater than 2, and this embodiment is not limited thereto.

3. The N deep learning models are tested for accuracy on N test sets, so there are N x N accuracy values, and a (i, j) represents the accuracy of model i on test set j.

4. K-mean (1-A (i, j)/A (i, i)) is used as an index for measuring the difficulty level of the task. When the problem is more complex, the diversity of multi-person labels is larger, resulting in a (i, j) < < a (i, i), the larger K.

When K is larger than a set threshold value, the task is difficult to judge, and then supplementary information such as text, voice, user behavior and the like (if the information is lacked, additional labels can be considered) is added into the suggestion to carry out model training of multiple modes so as to improve the performance of the algorithm on the difficult task.

And when K is smaller than a set threshold value, judging that the task is simpler, and finishing the task by directly using the current deep learning classification model based on the image. The data labeled by multiple persons are collected, and an initiation-v 3 network is used for carrying out classification model training. At the moment, because the task is simple, the sample labeling quantity and the model complexity of the model are reduced while the task effect is ensured, and it can be understood that the model complexity of the multi-modal algorithm is higher than that of a deep learning classification model based on a pure image.

According to the scheme of the embodiment, the difficulty degree of the task is judged more objectively and accurately in a data-driven mode, meanwhile, a technical solution is provided for the difficulty degree of the task, and the sample marking amount and the model complexity of the model can be reduced as far as possible while the task effect is guaranteed.

Fig. 5 is a block diagram illustrating an image classification device according to an exemplary embodiment. Referring to fig. 5, the apparatus includes an acquisition module 51, a first training module 52, a testing module 53, a determination module 54, and a second training module 55.

Wherein the obtaining module 51 is configured to obtain at least two target image datasets corresponding to the original image datasets, each target image dataset comprising a first image dataset and a second image dataset;

a first training module 52 configured to input the at least two first image data sets into the deep learning model for training, respectively, to obtain at least two target models;

a testing module 53 configured to input the at least two second image data sets into the target model respectively, resulting in a plurality of prediction results;

a determination module 54 configured to determine a complexity of a model training task based on the raw image dataset according to the respective prediction results;

and a second training module 55 configured to re-determine the first image data set according to the relationship between the complexity parameter and the set threshold, and determine a final image classification model according to the re-determined first image data set.

Optionally, the determination module 54 comprises a determination submodule configured to

Optionally, the determining sub-module is specifically configured to determine a target prediction result and a reference prediction result in the plurality of prediction results, and determine a deviation between the target prediction result and the reference prediction result; and normalizing each deviation, and taking the average value of each normalization result as a complexity parameter corresponding to the image classification task.

Optionally, the determining sub-module determines a complexity parameter corresponding to the image classification task based on the following formula:

wherein, K is a complexity parameter, N is the number of the test sets, and N > is 2; a (i, j) represents a prediction result obtained by inputting the test set j to the target model i; a (i, i) represents the prediction result obtained by inputting the test set i into the target model i.

Optionally, the second training module 55 includes: a merging training sub-module configured to merge at least two first image data sets to re-determine the first image data sets when the complexity parameter is less than or equal to a set threshold; and inputting the redetermined first image data set into an image classification model for training to obtain a final image classification model.

Optionally, the second training module 55 further includes: the supplementary training sub-module is configured to add supplementary data to the at least two first image data sets when the complexity parameter is larger than a set threshold value, and the supplementary data and the original data in the first image data sets are used as multi-modal training samples together; inputting the multi-modal training samples into a first deep learning model for training; wherein the modality of the supplemental data and the original data includes at least one of: image data, text data, voice data, and user behavior data.

Optionally, the image classification device further includes: a classification module configured to classify any of the second image dataset images by the final image classification model.

With regard to the image classification apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 6 is a block diagram illustrating a structure of an electronic device according to an example embodiment. As shown in fig. 6, the electronic device includes a processor 61; a Memory 62 for storing executable instructions for the processor 61, the Memory 62 may include a Random Access Memory (RAM) and a Read-Only Memory (ROM); wherein the processor 61 is configured to execute instructions to implement the image classification method described above.

In an exemplary embodiment, there is also provided a storage medium, such as a memory 62, that includes instructions executable by a processor 61 of an electronic device (server or smart terminal) to perform the image classification method described above.

Alternatively, the storage medium may be a non-transitory computer readable storage medium, for example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program product, wherein the instructions of the computer program product, when executed by a processor of an electronic device (server or intelligent terminal), implement the image classification method described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image classification method, comprising:

determining the complexity of an image classification task based on the original image data set according to each prediction result; and re-determining the first image data set according to the relation between the complexity parameter and the set threshold value, and determining a final image classification model according to the re-determined first image data set.

2. The method of claim 1, wherein determining the complexity of the image classification task based on the target image dataset based on the respective prediction comprises:

3. The method of claim 2, wherein determining the complexity parameter corresponding to the image classification task based on each of the predictions comprises:

4. The method of claim 3, wherein the complexity parameter corresponding to the image classification task is determined based on the following formula:

5. The method according to claim 1, wherein the step of re-determining the first image data set based on the relationship between the complexity parameter and the set threshold and determining the final image classification model based on the re-determined first image data comprises:

6. The method of claim 1, wherein the step of re-determining the first image data set based on the relationship of the complexity parameter to the set threshold and determining the final image classification model based on the re-determined first image data further comprises:

wherein the modality of the supplemental data includes at least one of: image data, text data, voice data, and user behavior data.

7. The method according to any of claims 1-6, wherein after the determining a final image classification model step, the method further comprises:

8. An image classification apparatus, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image classification method of any of claims 1 to 7.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the image classification method of any one of claims 1 to 7.