CN112182268B

CN112182268B - Image classification method, device, electronic equipment and storage medium

Info

Publication number: CN112182268B
Application number: CN202011035129.8A
Authority: CN
Inventors: 申世伟
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2024-04-05
Anticipated expiration: 2040-09-27
Also published as: CN112182268A

Abstract

The disclosure relates to an image classification method, an image classification device, an electronic device and a storage medium, wherein the image classification method comprises the following steps: acquiring at least two target image data sets corresponding to the original image data sets; respectively inputting at least two first image data sets into an image classification model for training to obtain at least two target models; respectively inputting at least two second image data sets into a target model to obtain a plurality of prediction results; determining the complexity of an image classification task based on the original image dataset according to each prediction result; and re-determining the first image data set according to the relation between the complexity parameter and the set threshold value, and determining a final image classification model according to the re-determined first image data set. The scheme of the embodiment of the disclosure can solve the problems of low accuracy of complexity and low accuracy of image classification caused by experience evaluation of the image classification task, and can accurately determine the complexity of the image classification task and improve the accuracy of image classification.

Description

Image classification method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to an image classification method, an image classification device, electronic equipment and a storage medium.

Background

With the continuous development of computer technology, intelligent algorithms such as deep learning and the like are widely used for solving various daily problems, such as image classification, pedestrian detection, license plate recognition and the like, through a deep learning model.

In the related art, when a task (e.g., an image classification task) starts to be processed, a developer is required to determine the complexity of the image classification task empirically, and determine whether the task can be solved by a deep learning algorithm, and finally determine an image classification model suitable for the image classification task.

However, judging the complexity of a certain image classification task through experience can cause the phenomenon of evaluation error, so that not only can the problem of inaccurate processing result be caused, but also the accuracy of image classification can be influenced.

Disclosure of Invention

The disclosure provides an image classification method, an image classification device, electronic equipment and a storage medium, which are used for solving the problems that the complexity of an image classification task cannot be accurately determined and the accuracy of image classification is low in the related art. The technical scheme of the present disclosure is as follows:

According to a first aspect of an embodiment of the present disclosure, there is provided an image classification method, including:

acquiring at least two target image data sets corresponding to original image data sets, each target image data set comprising a first image data set and a second image data set;

respectively inputting at least two first image data sets into an image classification model for training to obtain at least two target models;

respectively inputting at least two second image data sets into the target model to obtain a plurality of prediction results;

determining the complexity of an image classification task based on the original image dataset according to each prediction result;

and re-determining a first image data set according to the relation between the complexity parameter and the set threshold value, and determining a final image classification model according to the re-determined first image data set.

Optionally, the determining the complexity of the image classification task based on the target image dataset according to each prediction result includes:

and determining complexity parameters corresponding to the image classification task according to the prediction results, and determining the complexity of the image classification task according to the complexity parameters.

Optionally, the step of determining a complexity parameter corresponding to the image classification task according to each prediction result includes:

respectively determining a target prediction result and a reference prediction result in a plurality of prediction results, and determining deviation between the target prediction result and the reference prediction result;

and normalizing each deviation, and taking the average value of each normalization result as a complexity parameter corresponding to the image classification task.

Optionally, the complexity parameter corresponding to the image classification task is determined based on the following formula:

wherein K is the complexity parameter, N is the number of test sets, N > =2; a (i, j) represents a prediction result obtained by inputting a test set j into a target model i; a (i, i) represents a prediction result obtained by inputting the test set i into the target model i.

Optionally, the step of redefining the first image data set according to the relation between the complexity parameter and the set threshold value, and determining the final image classification model according to the redetermined first image data includes:

combining at least two of the first image data sets to redetermine the first image data sets when the complexity parameter is less than or equal to a set threshold;

And inputting the redetermined first image data set into an image classification model for training to obtain a final image classification model.

Optionally, the step of redefining the first image data set according to the relation between the complexity parameter and the set threshold value, and determining the final image classification model according to the redetermined first image data further includes:

when the complexity parameter is larger than a set threshold value, adding supplementary data to at least two first image data sets, and taking the supplementary data and the original data in each first image data set together as a multi-mode training sample;

inputting the multi-modal training sample into a first deep learning model for training;

wherein the modalities of the supplemental data and the raw data include at least one of: image data, text data, voice data, and user behavior data.

Optionally, after the step of determining a final image classification model, the method further comprises:

and classifying any second image data set image through the final image classification model.

According to a second aspect of embodiments of the present disclosure, there is provided an image classification apparatus, comprising:

An acquisition module configured to acquire at least two target image data sets corresponding to original image data sets, each of the target image data sets including a first image data set and a second image data set;

the first training module is configured to input at least two first image data sets into a deep learning model respectively for training to obtain at least two target models;

the testing module is configured to input at least two second image data sets into the target model respectively to obtain a plurality of prediction results;

a determining module configured to determine a complexity of a model training task based on the original image dataset according to each prediction result;

and the second training module is configured to redefine the first image data set according to the relation between the complexity parameter and the set threshold value, and determine a final image classification model according to the redetermined first image data set.

Optionally, the determining module includes a determining sub-module configured to

Optionally, the determination submodule is specifically configured to

Optionally, the determining submodule determines the complexity parameter corresponding to the image classification task based on the following formula:

Optionally, the second training module includes: a merge training sub-module configured to

Optionally, the second training module further includes: a supplemental training sub-module configured to

Optionally, the image classification device further includes: a classification module configured to

And classifying any second image data set image through the final image classification model. According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image classification method according to any embodiment of the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the image classification method according to any of the embodiments of the present disclosure.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product for use in connection with an electronic device, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the program being loaded via a computer and executed to enable the image classification method according to any one of the embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: acquiring at least two target image data sets corresponding to the original image data sets, each target image data set comprising a first image data set and a second image data set; respectively inputting at least two first image data sets into an image classification model for training to obtain at least two target models; respectively inputting at least two second image data sets into a target model to obtain a plurality of prediction results; determining the complexity of an image classification task based on the original image dataset according to each prediction result; the first image data set is redetermined according to the relation between the complexity parameter and the set threshold value, and the final image classification model is determined according to the redetermined first image data set, so that the problems of low accuracy of evaluating the complexity degree of the image classification task by experience and low accuracy of image classification can be solved, the complexity degree of the image classification task is accurately determined, and the accuracy of image classification is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a flow chart illustrating a method of image classification according to an exemplary embodiment.

Fig. 2 is a flow chart illustrating a method of image classification according to an exemplary embodiment.

Fig. 3 is a flow chart illustrating a method of image classification according to an exemplary embodiment.

Fig. 4 is a flow chart illustrating a method of image classification according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating an image classification apparatus according to an exemplary embodiment.

Fig. 6 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Fig. 1 is a flowchart illustrating an image classification method according to an exemplary embodiment, and as shown in fig. 1, the image classification method may be performed by an image classification apparatus, which may be implemented in software and/or hardware, and used in an electronic device, which may be a computer, a server, a smart phone, or the like, and the method includes the following steps.

In step S11, at least two target image data sets corresponding to the original image data sets are acquired, each target image data set comprising a first image data set and a second image data set.

The original image dataset may be any image dataset, for example, a person image dataset, an animal image dataset, or a self-recognition wind-light image dataset, which is not limited in this embodiment. The target image dataset may be an image dataset determined for annotating the original image dataset. In general, an original image dataset can be marked by a manual marking or machine marking mode, and currently, the image dataset is marked by a manual marking mode widely because the patterns of each image in the original image dataset are changeable. For example, for the image classification task, each category included in the image data may be labeled by a manual labeling manner, for example, a cat, a dog, a flower, a person, or the like in the image data is labeled separately, which is not limited in this embodiment.

In an optional implementation manner of this embodiment, when the original image dataset is obtained, a target image dataset in which different labeling personnel label the original image dataset may be obtained; for example, a target image dataset a in which the original image dataset is labeled by the labeling person a, a target image dataset B in which the original image dataset is labeled by the labeling person B, and a target image dataset C in which the original image dataset is labeled by the labeling person C may be obtained, respectively.

In this embodiment, each target image dataset may include a first image dataset and a second image dataset; wherein the first image dataset may be a training image dataset (training set) and the second image dataset may be a test image dataset (test set).

It should be noted that, the training set refers to a sample set for training, and is mainly used for training parameters in the deep learning model; the test set is used for objectively evaluating the performance of the model on the data which is not seen (common parameters and super-parameter selection are not affected) after the deep learning model training process is completed, so that the test set and the training set are independent and non-overlapped, and the test set cannot provide a modified opinion on the parameters or super-parameters of the deep learning model and can only serve as an index for evaluating the network performance.

In a specific example of this embodiment, ten staff members may label the original data sets matched with the image classification task, so as to obtain ten target image data sets, and respectively according to 9: the scale of 1 divides each target image dataset into a first image dataset and a second image dataset, i.e. a training set and a test set. For example, each target image dataset includes 1000 image data, 900 of which may be determined as a training set and 100 as a test set.

In step S12, at least two first image datasets are respectively input into a deep learning model for training, so as to obtain at least two target models.

The deep learning model may be a convolutional neural network or other deep learning network, which is not limited in this embodiment. By way of example, the deep learning model may be a neural network that matches the classification task, such as an acceptance-v 3 network or an xception network, to which the present embodiment is not limited. It should be noted that the deep learning model in this embodiment may be a neural network matched with other tasks, for example, a neural network matched with a target detection task or a neural network matched with a semantic segmentation task, which is not limited in comparison with this embodiment.

Specifically, after at least two target image data sets corresponding to the original image data sets are acquired, each first image data set may be further input into a deep learning model for training, so as to obtain a target model matched with each first image data set.

By way of example, the original image datasets are labeled by three workers, respectively, to obtain three target image datasets, and according to 9: the scale of 1 divides each target image dataset into a first image dataset and a second image dataset. Furthermore, the three first image data sets can be respectively input into the deep learning model for training to obtain three target models. It should be noted that in this embodiment, the first image dataset a may be input into the deep learning model for training to obtain the target model a; inputting the first image data set B and the first image data set C into a deep learning model in sequence for training to obtain a target model B and a target model C in sequence; in this embodiment, the first image dataset a, the first image dataset B, and the first image dataset C may be trained simultaneously, for example, the first image dataset a, the first image dataset B, and the first image dataset C may be trained in different servers, or three training tasks may be started simultaneously in the same server, and the first image dataset a, the first image dataset B, and the first image dataset C may be trained simultaneously.

In step S13, at least two second image data sets are respectively input into the target model, and a plurality of prediction results are obtained.

In an optional implementation manner of this embodiment, after at least two first image data sets are respectively input into the deep learning model to perform training to obtain at least two target models, the model may be further tested by using at least two second image data sets, that is, the at least two second image data sets are respectively input into the target models obtained by training, so as to obtain a plurality of prediction results.

In a specific example of the present embodiment, each of the at least two second image data sets may be input into the at least two target models, respectively, to obtain a plurality of prediction results. For example, labeling the target data sets by three workers results in three target image data sets A, B and C, wherein the target image data set a includes a first image data set a and a second image data set a; the target image dataset B comprises a first image dataset B and a second image dataset B; the target image dataset C comprises a first image dataset C and a second image dataset C. Wherein the first image data set A, B and C may comprise 800 pieces of image data and the second image data set A, B and C may comprise 200 pieces of image data. Inputting the three first image data sets A, B and C containing 800 pieces of image data into the deep learning model, respectively, to obtain three target models A, B and C; further, the second image dataset a may be further input into the target models A, B and C, respectively, to obtain 600 prediction results; inputting the second image dataset B into the target models A, B and C respectively to obtain 600 prediction results; the second image dataset C was input into the object models A, B and C, respectively, resulting in 600 prediction results.

In step S14, the complexity of the image classification task based on the original image dataset is determined from each prediction result.

The complexity of the image classification task can be the difficulty level of the image classification task; in an alternative implementation of this embodiment, the complexity of the image classification task may be determined according to each prediction result determined in step S13.

Optionally, determining the complexity of the image classification task based on the original image dataset according to each prediction result may include: and determining complexity parameters corresponding to the image classification tasks according to the prediction results, and determining the complexity of the image classification tasks according to the complexity parameters.

In an optional implementation manner of this embodiment, after each of the at least two second image data sets is input into the at least two target models respectively to obtain a plurality of prediction results, a complexity parameter of the image classification task may be determined according to the plurality of prediction results, and a complexity of the image classification task may be determined according to the complexity parameter.

In an optional implementation manner of this embodiment, determining the complexity of the image classification task according to the complexity parameter may include: when the complexity parameter is smaller than or equal to the set threshold value, the complexity of the image classification task can be determined to be low; when the complexity parameter is greater than the set threshold, it may be determined that the complexity of the image classification task is high. The set threshold may be a value of 0.3, 0.4, or 0.6, which is not limited in this embodiment.

For example, if the complexity parameter determined according to the plurality of prediction results is 0.2 and the threshold is set to be 0.3, it may be determined that the complexity of the image classification task is low; if the complexity parameter determined according to the plurality of prediction results is 0.9 and the set threshold is 0.3, the complexity of the image classification task can be determined to be high.

In another specific example of this embodiment, the complexity of the image classification task may also be determined according to the degree of consistency of each prediction result; for example, if the similarity of the prediction results of the image data by each target model is greater than 90% for the same image data, it may be determined that the complexity of the image classification task is low. It should be noted that, in this embodiment, the complexity of the image classification task may also be determined by other methods, which is not limited in this embodiment.

In step S15, the first image dataset is redetermined according to the relationship of the complexity parameter and the set threshold, and the final image classification model is determined according to the redetermined first image dataset.

The set threshold may be a value of 0.3, 0.4, or 0.6, which is not limited in this embodiment.

In an alternative implementation manner of this embodiment, after determining the complexity of the image classification task based on the original image dataset according to each prediction result, the first image dataset may be further redetermined according to the relationship between the complexity parameter and the set threshold, and the final image classification model may be determined according to the redetermined first image dataset.

The relationship between the complexity parameter and the set threshold may include a relationship greater than, a relationship less than, or a relationship equal to, which is not limited in this embodiment.

In an optional implementation manner of this embodiment, after determining the complexity parameter of the image classification task based on the original image dataset, the relationship between the complexity parameter and the set threshold may be further determined, and the first image dataset may be further determined according to the relationship, for example, a plurality of first image datasets are combined to obtain a new first image dataset, or new image data is added to the first image dataset, which is not limited in this embodiment.

In the solution of the present embodiment, at least two target image data sets corresponding to an original image data set are acquired, each target image data set including a first image data set and a second image data set; respectively inputting at least two first image data sets into an image classification model for training to obtain at least two target models; respectively inputting at least two second image data sets into a target model to obtain a plurality of prediction results; determining the complexity of an image classification task based on the original image dataset according to each prediction result; the first image data set is redetermined according to the relation between the complexity parameter and the set threshold value, and the final image classification model is determined according to the redetermined first image data set, so that the problems of low accuracy of evaluating the complexity degree of the image classification task by experience and low accuracy of image classification can be solved, the complexity degree of the image classification task is accurately determined, and the accuracy of image classification is improved.

Fig. 2 is a flowchart illustrating an image classification method according to an exemplary embodiment, which is a further refinement of the above-described technical solution, and the technical solution in this embodiment may be combined with each of the alternatives in one or more embodiments described above. As shown in fig. 2, the image classification method includes the following steps.

In step S21, at least two target image data sets corresponding to the original image data sets are acquired, each target image data set comprising a first image data set and a second image data set.

In the deployment S22, at least two first image datasets are respectively input into an image classification model for training, so as to obtain at least two target models.

In step S23, at least two second image data sets are respectively input into the target model, and a plurality of prediction results are obtained.

In step S24, a complexity parameter corresponding to the image classification task is determined according to each prediction result, and the complexity of the image classification task is determined according to the complexity parameter.

In an optional implementation manner of this embodiment, determining, according to each prediction result, a complexity parameter corresponding to the image classification task may include: respectively determining a target prediction result and a reference prediction result in the plurality of prediction results, and determining deviation between the target prediction result and the reference prediction result; and normalizing each deviation, and taking the average value of each normalization result as a complexity parameter corresponding to the image classification task.

The target prediction result may be a prediction result obtained by inputting the second image data set into a target model obtained by training the first image data set matched with the second image data set. The second image data set and the first image data set matched with the second image data set belong to the same target image data set. The reference prediction result may be a prediction result obtained by inputting the second image data set into a target model obtained by training the other first image data set. It should be noted that the second image data set and the other first image data sets belong to different target image data sets.

For example, if the original image data sets are respectively marked by two staff members, a target image data set A and a target image data set B are obtained, wherein the target image data set A comprises a first image data set A and a second image data set A; the target image dataset B comprises a first image dataset B together with a second image dataset B; training the first image data set A and the first image data set B respectively to obtain a target model A and a target model B. The target prediction result is a prediction result obtained by inputting the second image data set A into the target model A; the reference prediction result is a prediction result obtained by inputting the second image data set a to the target model B. It will be appreciated that in this example, the target prediction result may also be a prediction result obtained by inputting the second image dataset B into the target model B; the reference prediction result may also be a prediction result obtained by inputting the second image dataset B into the object model a.

In an alternative implementation manner of the present embodiment, after determining the target prediction result and the reference prediction result in the plurality of prediction results, respectively, a deviation between the target prediction result and the reference prediction result may be further determined, for example, a difference between the target prediction result and the reference prediction result may be calculated; further, normalizing each calculated deviation, solving the average value of each normalized result, and taking the average value as a complexity parameter corresponding to the image classification task.

In a specific example of this embodiment, the complexity parameter corresponding to the image classification task may be determined based on the following formula:

where K is a complexity parameter, N is the number of second image datasets, N > =2; a (i, j) represents a prediction result obtained by inputting the second image data set j into the target model i, namely, a reference prediction result; a (i, i) represents a prediction result obtained by inputting the second image data set i to the target model i, i.e., a target prediction result.

For example, if n=2, a (i, j) represents a prediction result obtained by inputting the second image dataset j into the target model i, that is, a test result obtained by inputting the second image dataset 1 into the target model 2, and a test result obtained by inputting the second image dataset 2 into the target model 1; a (i, i) represents a predicted result obtained by inputting the second image data set i to the target model i, that is, a test result obtained by inputting the second image data set 1 to the target model 1, and a test result obtained by inputting the second image data set 2 to the target model 2. Further, the specific numerical values are substituted into the formula, so that the complexity parameter corresponding to the image classification task can be determined.

In another specific example of the present embodiment, the target prediction result and the reference prediction result may also be represented by an accuracy rate, for example, a (i, j) represents a prediction accuracy rate obtained by inputting the second image dataset j into the target model i; a (i, i) represents the prediction accuracy obtained by inputting the second image dataset i into the object model i. The advantage of this is that the amount of calculation of the process of determining the complexity parameter corresponding to the image classification task can be reduced, and the complexity parameter can be determined quickly, thereby providing a basis for determining the complexity of the task quickly.

Optionally, the determining the complexity of the image classification task according to the complexity parameter may include: when the complexity parameter is smaller than the set threshold value, determining that the complexity of the image classification task is low; when the complexity parameter is larger than the set threshold, determining that the complexity of the image classification task is high.

In the scheme of the embodiment, the deviation between the target prediction result and the reference prediction result is determined by respectively determining the target prediction result and the reference prediction result in the plurality of prediction results; and normalizing each deviation, taking the average value of each normalized result as a complexity parameter corresponding to the image classification task, and rapidly determining the complexity parameter to provide a basis for determining the complexity of the image classification task.

Fig. 3 is a flowchart illustrating an image classification method according to an exemplary embodiment, which is a further refinement of the above-described technical solutions, and the technical solutions in this embodiment may be combined with each of the alternatives in one or more embodiments described above. As shown in fig. 3, the image classification method includes the following steps.

In step S31, at least two target image data sets corresponding to the original image data sets are acquired, each target image data set comprising a first image data set and a second image data set.

In the deployment S32, at least two first image datasets are respectively input into an image classification model for training, so as to obtain at least two target models.

In step S33, at least two second image data sets are respectively input into the target model, and a plurality of prediction results are obtained.

In step S34, the complexity of the image classification task based on the original image dataset is determined from each prediction result.

In step S35, when the complexity parameter is less than or equal to the set threshold, merging at least two first image data sets to redetermine the first image data sets; and inputting the redetermined first image data set into the image classification model for training to obtain a final image classification model.

In an optional implementation manner of this embodiment, if the complexity parameter determined by the foregoing embodiments is smaller than the set threshold, it may be determined that the complexity of the image classification task is low; at this time, at least two first image data sets may be combined (summarized); and inputting the combined first image data set into a deep learning model for training to determine a final image classification model corresponding to the image classification task.

For example, if the original image data sets are marked by ten marking personnel, ten target image data sets are obtained, wherein each target image data set comprises a first image data set and a second image data set; after determining that the complexity of the image classification task matched with the target data set is low according to the steps, ten first image data sets included in ten target image data sets can be combined, and the combined first image data sets are input into a deep learning model for training, so that a final image classification model corresponding to the image classification task is obtained.

According to the scheme, after the complexity of the image classification task is low, at least two first image data sets can be combined, and the combined first image data sets are input into the deep learning model for training so as to determine a final image classification model corresponding to the image classification task, so that the diversity of samples of the first image data sets can be enhanced, the accuracy of the final image classification model is improved, and the labeling quantity of the samples and the complexity of the model can be reduced while the task effect is ensured.

Fig. 4 is a flowchart illustrating an image classification method according to an exemplary embodiment, which is a further refinement of the above-described technical solutions, and the technical solutions in this embodiment may be combined with each of the alternatives in one or more embodiments described above. As shown in fig. 4, the image classification method includes the following steps.

In step S41, at least two target image data sets corresponding to the original image data sets are acquired, each target image data set comprising a first image data set and a second image data set.

In the deployment S42, at least two first image datasets are respectively input into the image classification model for training, so as to obtain at least two target models.

In step S43, at least two second image data sets are input into the target model, respectively, to obtain a plurality of prediction results.

In step S44, the complexity of the image classification task based on the original image dataset is determined from each prediction result.

In step S45, when the complexity parameter is greater than the set threshold, adding supplementary data to at least two first image data sets, and using the supplementary data and the raw data in each first image data set together as a multi-modal training sample; the multi-modal training samples are input into a first deep learning model for training.

In an optional implementation manner of this embodiment, if the complexity parameter determined in the foregoing embodiments is greater than the set threshold, it may be determined that the complexity of the image classification task is high; namely, when the complexity parameter is larger than a set threshold value, adding supplementary data into at least two first image data sets, and taking the supplementary data and the original data in the first image data sets together as a multi-mode training sample; inputting the multi-modal training sample into a first deep learning model for training to obtain a final image classification model; wherein the modalities of the supplemental data include at least one of: image data, text data, voice data, and user behavior data.

For example, if the target data sets are marked by ten marking personnel, ten target image data sets are obtained, wherein each target image data set comprises a first image data set and a second image data set; after determining that the complexity of the image classification task matching the target data set is high according to the method of each embodiment, the ten target image data sets may be added with supplementary data, such as adding voice data, respectively; training the ten first image data sets through the first deep learning model, further judging, and supplementing the complexity of the image classification task after data; if the complexity of the image classification task is higher, continuing to add supplementary data, such as user behavior data, to each first image dataset; and merging the first image data sets according to the embodiment until the complexity of the image classification task is low, so as to obtain a final image classification model.

It should be noted that, the first depth model in this embodiment may be a multi-modal deep learning model, that is, the multi-modal first image data set may be trained at the same time, for example, the image data and the voice data may be trained at the same time, which is not limited in this embodiment.

In the scheme of the embodiment, after the complexity of the image classification task is determined to be high, adding supplementary data into at least two first image data sets, and taking the supplementary data and the original data in the first image data sets together as a multi-mode training sample; the multi-mode training sample is input into the first deep learning model for training, so that the diversity of the first image dataset sample can be enhanced, and the accuracy of the final image classification model is improved.

In order to better understand the iterative image classification method according to the present embodiment, a specific example is described below, where the specific procedure includes:

1. a batch of data sets (collected task pictures) is prepared according to task or problem definitions, and the data sets need to be marked by a plurality of marking personnel, and the marked data amount of each person is consistent. Inconsistencies in labeling of certain data are caused by the varying degree of learning of labeling rules by each person and individual cognitive deviation. For example, when the labeling task is that whether a dog exists in the picture or not, the cognition is basically consistent; however, when a subjective task of "beautiful and not beautiful" is met for women in the picture, cognition is generally quite different. Therefore, the difficulty level of the problem can be judged according to the difference generated in the labeling of multiple people.

2. Training set is carried out on the data marked by multiple persons: test set = 9:1, respectively training 1 deep learning classification model containing the same network architecture and training mode (including an optimizer and the number of iterative steps) based on the training set of each person. Here, taking the image classification task as an example, the acceptance-v 3 network can be selected until the value of the loss function (the common loss function of the deep learning classification network, such as cross entropy loss) is hardly reduced any more, which proves that the network converges at this time, and the classification model is trained. Here, if N persons are assumed, N deep learning models are generated and N test sets are used to test the N models obtained by training, where N may be any positive number greater than 2, which is not limited in this embodiment.

3. The N deep learning models are tested for accuracy over N test sets, so there are N x N accuracy values, and a (i, j) represents the accuracy of model i over test set j.

4. K=mean (1-A (i, j)/A (i, i)) is used as an index for measuring task difficulty. When the problem is more complex, the more the difference of the multi-person labels is, resulting in A (i, j) < < A (i, i), the larger the K is.

When K is greater than the set threshold, it is determined that the task is difficult, and at this time, it is recommended to add supplemental information such as text, speech, user behavior, etc. (additional labels may be considered if the information is missing) for multi-modal model training to improve the performance of the algorithm on the difficult task.

When K is smaller than the set threshold, the task is judged to be simpler, and the task can be completed by directly using the current image-based deep learning classification model. Namely, summarizing the data marked by multiple persons, and training a classification model by using an acceptance-v 3 network. At this time, because the task is simple, the sample labeling quantity and the model complexity of the model are reduced while the task effect is ensured, and it can be understood that the model complexity of the multi-modal algorithm is higher than that of the deep learning classification model based on the pure image.

According to the scheme, the difficulty degree of the task is judged more objectively and accurately in a data driving mode, meanwhile, a technical solution is provided for the difficulty degree of the task, and the sample marking amount and the model complexity of the model can be reduced as much as possible while the task effect is ensured.

Fig. 5 is a block diagram of an image classification device according to an exemplary embodiment. Referring to fig. 5, the apparatus includes an acquisition module 51, a first training module 52, a test module 53, a determination module 54, and a second training module 55.

Wherein the acquisition module 51 is configured to acquire at least two target image data sets corresponding to the original image data sets, each target image data set comprising a first image data set and a second image data set;

The first training module 52 is configured to input at least two first image data sets into the deep learning model for training respectively, so as to obtain at least two target models;

the test module 53 is configured to input at least two second image data sets into the target model respectively, so as to obtain a plurality of prediction results;

a determination module 54 configured to determine the complexity of the model training task based on the original image dataset based on the respective prediction results;

a second training module 55 is configured to redetermine the first image dataset based on the relationship of the complexity parameter to the set threshold value and to determine a final image classification model based on the redetermined first image dataset.

Optionally, the determination module 54 includes a determination sub-module configured to

And determining complexity parameters corresponding to the image classification tasks according to the prediction results, and determining the complexity of the image classification tasks according to the complexity parameters.

Optionally, the determining submodule is specifically configured to determine a target prediction result and a reference prediction result in the plurality of prediction results respectively, and determine a deviation between the target prediction result and the reference prediction result; and normalizing each deviation, and taking the average value of each normalization result as a complexity parameter corresponding to the image classification task.

Optionally, the determining submodule determines a complexity parameter corresponding to the image classification task based on the following formula:

wherein K is a complexity parameter, N is the number of test sets, N > =2; a (i, j) represents a prediction result obtained by inputting a test set j into a target model i; a (i, i) represents a prediction result obtained by inputting the test set i into the target model i.

Optionally, the second training module 55 includes: a merge training sub-module configured to merge at least two first image datasets to redetermine the first image datasets when the complexity parameter is less than or equal to a set threshold; and inputting the redetermined first image data set into the image classification model for training to obtain a final image classification model.

Optionally, the second training module 55 further includes: the supplementary training sub-module is configured to add supplementary data to at least two first image data sets when the complexity parameter is greater than a set threshold value, and the supplementary data and the original data in each first image data set are used as multi-mode training samples together; inputting the multi-modal training sample into a first deep learning model for training; wherein the modalities of the supplemental data and the raw data include at least one of: image data, text data, voice data, and user behavior data.

Optionally, the image classification device further includes: and a classification module configured to classify any of the second image dataset images by a final image classification model.

With respect to the image classification apparatus in the above-described embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment regarding the method, and will not be described in detail here.

Fig. 6 is a block diagram illustrating a configuration of an electronic device according to an exemplary embodiment. As shown in fig. 6, the electronic device includes a processor 61; a Memory 62 for storing executable instructions of the processor 61, the Memory 62 may include a random access Memory (Random Access Memory, RAM) and a Read-Only Memory (ROM); wherein the processor 61 is configured to execute instructions to implement the above-described image classification method.

In an exemplary embodiment, a storage medium is also provided, such as a memory 62 storing executable instructions that are executable by a processor 61 of an electronic device (server or smart terminal) to perform the above-described image classification method.

Alternatively, the storage medium may be a non-transitory computer readable storage medium, for example, a ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which when executed by a processor of an electronic device (server or intelligent terminal) implements the above-described image classification method.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image classification method, comprising:

re-determining a first image data set according to the relation between the complexity parameter and the set threshold value, and determining a final image classification model according to the re-determined first image data set;

the method comprises the steps that at least two target image data sets corresponding to an original image data set are obtained, wherein the target image data sets are obtained, the original image data sets are marked by different marking personnel, the first image data set is a training image data set, the second image data set is a test image data set, and the first image data set and the second image data set are independent and do not overlap;

wherein the step of redefining the first image data set according to the relation between the complexity parameter and the set threshold value and determining the final image classification model according to the redetermined first image data comprises the steps of:

inputting the redetermined first image data set into an image classification model for training to obtain a final image classification model;

wherein the step of redefining the first image data set according to the relation between the complexity parameter and the set threshold value, and determining the final image classification model according to the redetermined first image data further comprises:

wherein the modalities of the supplemental data include at least one of: image data, text data, voice data, and user behavior data.

2. The method of claim 1, wherein the determining the complexity of the image classification task based on the target image dataset based on each prediction result comprises:

3. The method of claim 2, wherein the step of determining a complexity parameter corresponding to the image classification task based on each of the prediction results comprises:

4. A method according to claim 3, wherein the complexity parameter corresponding to the image classification task is determined based on the following formula:

5. The method according to any one of claims 1-4, wherein after the step of determining a final image classification model, the method further comprises:

6. An image classification apparatus, comprising:

a second training module configured to redetermine a first image dataset according to a relationship of the complexity parameter to a set threshold, and to determine a final image classification model according to the redetermined first image dataset;

the acquisition module is specifically used for acquiring target image data sets of which the original image sets are marked by different marking personnel; the first image data set is a training image data set, the second image data set is a test image data set, and the first image data set and the second image data set are independent and do not overlap;

Wherein the second training module comprises a merging training sub-module configured to merge at least two of the first image data sets to re-determine a first image data set when the complexity parameter is less than or equal to a set threshold;

the second training module further comprises a merging training sub-module, and is configured to add supplementary data to at least two first image data sets when the complexity parameter is greater than a set threshold value, and the supplementary data and the original data in each first image data set are used as multi-mode training samples together;

7. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image classification method of any of claims 1 to 5.

8. A storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the image classification method of any of claims 1-5.