CN111507407B

CN111507407B - Training method and device for image classification model

Info

Publication number: CN111507407B
Application number: CN202010306813.9A
Authority: CN
Inventors: 郭卉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2024-01-12
Anticipated expiration: 2040-04-17
Also published as: CN111507407A

Abstract

The invention provides a training method and device of an image classification model, electronic equipment and a storage medium; the method comprises the following steps: adopting an image test sample to detect the accuracy of the image classification model which is trained in the first stage so as to determine the accuracy of the image classification model; when the accuracy determines that the image training sample of the image classification model meets the triggering condition of noise identification, the noise identification is carried out on the image training sample of the image classification model, so that the probability that the image training sample belongs to the noise sample is obtained; taking the probability as the weight of the image training sample, and training an image classification model by adopting the weighted image training sample until the training of the second stage is completed; therefore, on the basis that the image classification model completes the first stage, the image classification model can be trained in the second stage according to the probability that the image training sample belongs to the noise sample, and the prediction accuracy of the image classification model obtained by training is improved.

Description

Training method and device for image classification model

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a training method and apparatus for an image classification model, an electronic device, and a storage medium.

Background

The artificial intelligence technology is a comprehensive subject, and relates to a technology with a hardware level and a technology with a software level in a wide field, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other large directions. Among them, machine Learning (ML) is the core of artificial intelligence, which is a fundamental approach for making computers have intelligence, and is applied throughout various fields of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, and induction learning.

The classification and recognition of images are important application branches of artificial intelligence, when an image classification model is trained through training samples, the training samples are often noisy due to human factors, for example, marking personnel mark very low marking error rate in a simple cat and dog classification marking task, however, when a piece of clothes is chiffon or pure cotton to be judged, the marking error rate is greatly increased. In this way, the classification prediction performance of the image classification model trained based on such training samples is degraded due to the presence of noise data in the training samples.

Disclosure of Invention

The embodiment of the invention provides a training method, a device, electronic equipment and a storage medium for an image classification model, which can train the image classification model in a second stage according to the probability that an image training sample belongs to a noise sample on the basis that the image classification model completes the first stage, and improve the prediction accuracy of the image classification model obtained by training.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a training method of an image classification model, which comprises the following steps:

adopting an image test sample to detect the accuracy of the image classification model which is trained in the first stage so as to determine the accuracy of the image classification model;

when it is determined based on the accuracy that the image training samples of the image classification model satisfy the trigger condition for noise recognition,

carrying out noise recognition on an image training sample of the image classification model through a noise recognition model to obtain the probability that the image training sample belongs to the noise sample;

taking the probability as the weight of the image training sample, and carrying out weighting treatment on the image training sample to obtain a weighted image training sample;

And training the image classification model by adopting the weighted image training sample until the training of the second stage is completed.

The embodiment of the invention also provides a training device of the image classification model, which comprises the following steps:

the detection module is used for detecting the accuracy of the image classification model which is trained in the first stage by adopting an image test sample so as to determine the accuracy of the image classification model;

the noise recognition module is used for carrying out noise recognition on the image training sample of the image classification model through the noise recognition model when the accuracy determines that the image training sample of the image classification model meets the triggering condition of noise recognition, so as to obtain the probability that the image training sample belongs to the noise sample;

the weighting module is used for weighting the image training samples by taking the probability as the weight of the image training samples to obtain weighted image training samples;

and the training module is used for training the image classification model by adopting the weighted image training samples until the training of the second stage is completed.

In the above scheme, the device further includes:

the determining module is used for comparing the accuracy of the image classification model with an accuracy threshold value to obtain a comparison result;

And when the comparison result represents that the accuracy is lower than an accuracy threshold, determining that the image training sample of the image classification model meets the triggering condition of noise identification.

In the above scheme, the determining module is further configured to obtain, in the process of the first stage training, a first accuracy corresponding to an image classification model that completes the present round of training, and a second accuracy corresponding to an image classification model that completes the previous round of training;

and when the difference value between the first accuracy and the second accuracy is lower than a difference threshold value, determining that the image training sample of the image classification model meets the triggering condition of noise recognition.

In the above scheme, the noise recognition module is further configured to perform feature extraction on the image training sample through a feature extraction layer of the noise recognition model to obtain features of the image training sample;

acquiring class center characteristics of at least two classes corresponding to the image training sample;

and carrying out noise recognition through a noise recognition layer of the noise recognition model based on the characteristics of the image training sample and the class center characteristics of the at least two classes to obtain the probability that the image training sample belongs to the noise sample.

In the above scheme, the noise recognition module is further configured to select an image training sample with a target proportion from a plurality of image training samples of the image classification model according to at least two categories corresponding to the image training samples;

respectively extracting the characteristics of the selected image training samples through the characteristic extraction layer of the noise identification model to obtain corresponding sample characteristics;

clustering the obtained sample features to obtain the sample features corresponding to the categories;

and selecting a target number of class center features from the sample features corresponding to the classes to obtain class center features of the at least two classes.

In the above scheme, the noise recognition module is further configured to determine cosine distances between features of the image training sample and the center-like features respectively;

and determining a target class center feature corresponding to the maximum cosine distance from class center features of the at least two classes;

and determining the probability that the image training sample belongs to a noise sample based on the characteristics of the image training sample and the target class center characteristics.

In the above scheme, the noise recognition module is further configured to determine, based on the features of the image training sample and the target class center feature, a probability that the image training sample belongs to a noise sample according to the following formula:

Wherein a is the characteristic of the image training sample, b is the central characteristic of the target class, and w _k And W is the probability that the image training sample belongs to a noise sample for the weight of the target class center characteristic.

In the above scheme, the training module is further configured to perform classification prediction on the weighted image training sample through the image classification model, so as to obtain a corresponding prediction result;

acquiring the difference between the prediction result and the classification label of the weighted image training sample;

determining a value of a loss function of the image classification model based on the obtained difference and the weight of the weighted image training sample;

and updating model parameters of the image classification model based on the value of the loss function.

In the above scheme, the detection module is further configured to perform classification prediction on the image test sample through an image classification model that completes the first stage training, so as to obtain a corresponding classification result;

determining the accuracy of a corresponding classification result based on the classification label of the image test sample;

and determining the accuracy of the image classification model based on the accuracy of the classification result corresponding to the image test sample.

The embodiment of the invention also provides electronic equipment, which comprises:

a memory for storing executable instructions;

and the processor is used for realizing the training method of the image classification model provided by the embodiment of the invention when executing the executable instructions stored in the memory.

The embodiment of the invention also provides a computer readable storage medium which stores executable instructions, and when the executable instructions are executed by a processor, the training method of the image classification model provided by the embodiment of the invention is realized.

The embodiment of the invention has the following beneficial effects:

performing accuracy inspection on the image classification model which is trained in the first stage through the image test sample, determining the probability of the image training sample belonging to the noise sample through the noise recognition model when the accuracy determines that the image training sample of the image classification model meets the triggering condition of noise recognition, and performing weighting treatment on the probability serving as the weight of the image training sample, so that the image classification model is trained through the weighted image training sample; therefore, on the basis that the image classification model completes the first stage, the image classification model can be trained in the second stage according to the probability that the image training sample belongs to the noise sample, and the prediction accuracy of the image classification model obtained by training is improved.

Drawings

FIGS. 1A-B are schematic diagrams of a method of identifying noise samples provided in the related art;

FIG. 2 is a schematic diagram of an implementation scenario of a training method of an image classification model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

FIG. 4 is a flowchart of a training method of an image classification model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a residual module of a differential network model provided by an embodiment of the present invention;

FIG. 6 is a flow chart of noise recognition of an image training sample provided by an embodiment of the present invention;

FIG. 7 is a flowchart of a training method of an image classification model according to an embodiment of the present invention;

FIG. 8 is a flowchart of a training method of an image classification model according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an image classification model applied to clothing image classification according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a training device for an image classification model according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the invention described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before describing embodiments of the present invention in further detail, the terms and terminology involved in the embodiments of the present invention will be described, and the terms and terminology involved in the embodiments of the present invention will be used in the following explanation.

1) In response to a condition or state that is used to represent the condition or state upon which the performed operation depends, the performed operation or operations may be in real-time or with a set delay when the condition or state upon which it depends is satisfied; without being specifically described, there is no limitation in the execution sequence of the plurality of operations performed.

2) Image recognition, namely, the recognition of the category or the level of the image, only considers the classification recognition carried out by the category (such as people, dogs, cats, birds and the like) of the image, and gives the category to which the image belongs.

3) Image multi-label identification: identifying, by a computer, whether the image has a plurality of specified attribute tags or category tags; an image may have multiple attributes or belong to multiple categories, and the multi-label recognition task is to determine which preset attribute labels a certain image has or belongs to which category labels.

4) Noise samples: the method is characterized in that noise data is carried in samples to a certain extent, not all samples are noise, and the samples comprise samples with wrong category labeling caused by personnel errors, incomplete concepts and incomplete consistency between images and corresponding category labels, such as partial overlapping of concepts between two categories, so that a certain image has 2 category attributes, but is labeled as 1 category attribute only.

5) Clean samples: the finger is manually confirmed that the sample does not contain noise data.

6) Full sample: refer to the union of clean samples and noise samples.

7) Checking a sample: refers to samples that have undergone artificial noise verification.

In the related art, as shown in fig. 1A, a first scheme is to use a Curriculum Net model, first perform one-stage learning on a classification model through a clean sample or a full sample, then divide a noise sample into two-stage and three-stage data through a density ρ, and respectively give different sample weights for performing noise learning; however, the scheme needs to initialize the model by acquiring a clean sample, so that additional requirements of manual labeling are brought, meanwhile, noise judgment is offline learned, once no change is determined, however, noise always has deviation through density judgment, and learning inaccuracy is easily caused, so that the optimization of the subsequent model is in trouble.

As shown in fig. 1B, the second solution is to use a Clean Net model, first learn a stage model from a full sample (Clean sample+noise sample), then give a check sample and train a noise judgment model according to the sample, and then the noise judgment model performs noise prediction on the full sample, where the prediction result is used as a sample weight to be applied to the two stage model learning as a sample weight. However, the scheme needs to collect check samples, and the more the check samples, the better the effect, and extra labor investment is brought.

Based on this, the embodiments of the present invention provide a training method, apparatus, electronic device and storage medium for an image classification model, so as to at least solve the above-mentioned problems in the related art, and respectively described below.

Based on the above explanation of terms and terminology involved in the embodiments of the present invention, the implementation scenario of the training method of the image classification model provided in the embodiments of the present invention will be described first, referring to fig. 2, fig. 2 is a schematic diagram of the implementation scenario of the training method of the image classification model provided in the embodiments of the present invention, and in order to support an exemplary application, the terminal 200 includes a terminal 200-1 and a terminal 200-2, where the terminal 200-1 is located on a developer side for controlling training of the image classification model, and the terminal 200-2 is located on a user side for requesting classification prediction for an image to be classified; the terminal 200 is connected to the server 100 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two, and data transmission is implemented by using a wireless or wired link.

The terminal 200-1 is used for sending training instructions for the image classification model to the server;

a server 100 for performing a first stage of training on the image classification model until completion in response to a training instruction for the image classification model; adopting an image test sample to detect the accuracy of the image classification model which is trained in the first stage so as to determine the accuracy of the image classification model; when the accuracy-based image training sample of the image classification model meets the triggering condition of noise identification, the noise identification is carried out on the image training sample of the image classification model through the noise identification model, so that the probability that the image training sample belongs to the noise sample is obtained; taking the probability as the weight of the image training sample, and carrying out weighting treatment on the image training sample to obtain a weighted image training sample; training an image classification model by adopting the weighted image training sample until the training of the second stage is completed;

After the image classification model completes the training of the second stage, the terminal 200-2 is configured to send an image classification instruction for the image to be classified;

the server 100 is configured to respond to the image classification instruction, perform classification prediction on the object to be classified through the image classification model trained in the second stage, obtain a corresponding image classification result, and return the image classification result to the terminal 200-2.

In practical application, the server 100 may be a separately configured server supporting various services, or may be configured as a server cluster; the terminal (e.g., terminal 200-1) may be a smart phone, tablet, notebook, etc., various types of user terminals, as well as a wearable computing device, a Personal Digital Assistant (PDA), a desktop computer, a cellular phone, a media player, a navigation device, a game console, a television, or a combination of any two or more of these or other data processing devices.

The hardware structure of the electronic device of the training method of the image classification model provided by the embodiment of the invention is described in detail below, and the electronic device includes, but is not limited to, a server or a terminal. Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present invention, and the electronic device 300 shown in fig. 3 includes: at least one processor 310, a memory 350, at least one network interface 320, and a user interface 330. The various components in the electronic device 300 are coupled together by a bus system 340. It is understood that the bus system 340 is used to enable connected communications between these components. The bus system 340 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 3 as bus system 340.

The processor 310 may be an integrated circuit chip with signal processing capabilities such as a general purpose processor, which may be a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

The user interface 330 includes one or more output devices 331 that enable presentation of media content, including one or more speakers and/or one or more visual displays. The user interface 330 also includes one or more input devices 332, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

Memory 350 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 350 optionally includes one or more storage devices physically located remote from processor 310.

Memory 350 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The non-volatile memory may be read only memory (ROM, read Only Me mory) and the volatile memory may be random access memory (RAM, random Access Memory). The memory 350 described in embodiments of the present invention is intended to comprise any suitable type of memory.

In some embodiments, memory 350 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

The operating system 351 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

network communication module 352 for reaching other computing devices via one or more (wired or wireless) network interfaces 320, exemplary network interfaces 320 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), etc.;

a presentation module 353 for enabling presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 331 (e.g., a display screen, speakers, etc.) associated with the user interface 330;

an input processing module 354 for detecting one or more user inputs or interactions from one of the one or more input devices 332 and translating the detected inputs or interactions.

In some embodiments, the training device for an image classification model provided in the embodiments of the present invention may be implemented in a software manner, and fig. 3 shows a training device 355 for an image classification model stored in a memory 350, which may be software in the form of a program, a plug-in, or the like, including the following software modules: the detection module 3551, the noise recognition module 3552, the weighting module 3553 and the training module 3554 are logical, and thus may be arbitrarily combined or further split according to the implemented functions, and functions of the respective modules will be described below.

In other embodiments, the training apparatus for an image classification model provided by the embodiments of the present invention may be implemented by combining software and hardware, and by way of example, the training apparatus for an image classification model provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor that is programmed to perform the training method for an image classification model provided by the embodiments of the present invention, for example, the processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (ASIC, application Specific Integrated Cir cuit), DSP, programmable logic device (PLD, programmable Logic Device), complex programmable logic device (CPLD, complex Programmable Logic Device), field programmable gate array (FPGA, field-Programmable Gate Array), or other electronic component.

Based on the above description of the implementation scenario and the electronic device of the training method for the image classification model according to the embodiment of the present invention, the training method for the image classification model provided by the embodiment of the present invention is described below. Referring to fig. 4, fig. 4 is a flowchart of a training method of an image classification model according to an embodiment of the present invention; in some embodiments, the training method of the image classification model may be implemented by a server or a terminal alone or in cooperation with the server and the terminal, and in an embodiment of the present invention, the training method of the image classification model includes:

step 401: the server adopts an image test sample to detect the accuracy of the image classification model which is trained in the first stage so as to determine the accuracy of the image classification model.

Here, in practical application, firstly, an image classification model, such as a convolutional neural network model, is constructed, secondly, model parameters of the image classification model are set to a state requiring learning, and then, a large number of image training samples are used for training the image classification model in a first stage. Specifically, for the first stage training, a corresponding training wheel number is set, and the image classification model is trained according to the training wheel number, for example, if the training wheel number is set to 100, then the image classification model is trained for 100 wheels through the image training sample, and when the training wheel number reaches 100, the first stage training of the image classification model is characterized to be completed.

The image classification model can be a multi-category image classification model, namely the image classification model can carry out classification prediction of various categories for one image, for example, whether one image belongs to an animal class, whether one image belongs to a landscape class or not can be identified at the same time; or may identify whether an image belongs to "clothing class", whether it belongs to "shirt class", etc.

In order to obtain an image classification model with higher classification prediction accuracy, after the first-stage training is completed, an image test sample is adopted to detect the accuracy of the image classification model which is completed with the first-stage training so as to determine the accuracy of the image classification model. Thereby facilitating a determination of whether continued training is required to improve the classification prediction accuracy of the image classification model.

In some embodiments, the server may determine the accuracy of the image classification model by: carrying out classification prediction on the image test sample through completing the image classification model trained in the first stage to obtain a corresponding classification result; determining the accuracy of a corresponding classification result based on the classification label of the image test sample; and determining the accuracy of the image classification model based on the accuracy of the classification result corresponding to the image test sample.

In practical application, a plurality of image test samples corresponding to the image classification model are firstly obtained, and the image test samples are marked with corresponding classification labels. And secondly, respectively inputting the plurality of image test samples into an image classification model which is trained in the first stage, and carrying out classification prediction on the corresponding image test samples through the image classification model to obtain corresponding classification results. And then determining the accuracy of the corresponding classification result according to the classification label of each image test sample and the corresponding classification result. And finally, determining the accuracy of the image classification model according to the accuracy of the classification result corresponding to each image test sample.

Step 402: and when the accuracy determines that the image training sample of the image classification model meets the triggering condition of noise identification, the noise identification is carried out on the image training sample of the image classification model through the noise identification model, so that the probability that the image training sample belongs to the noise sample is obtained.

After determining the accuracy of the image classification model trained in the first stage, judging whether the image training sample of the image classification model meets the triggering condition of noise recognition according to the accuracy, namely judging whether the image training sample for training the image classification model needs to carry out noise recognition, thereby further determining whether the image training sample of the image classification model is a noise sample or whether noise exists.

In some embodiments, the server may determine whether the image training samples of the image classification model satisfy the trigger condition for noise identification by: comparing the accuracy of the image classification model with an accuracy threshold value to obtain a comparison result; and when the comparison result represents that the accuracy is lower than the accuracy threshold, determining that the image training sample of the image classification model meets the triggering condition of noise recognition.

Here, the accuracy threshold value may be set in advance. After determining the accuracy of the image classification model trained in the first stage, comparing the accuracy of the image classification model with a preset accuracy threshold value to obtain a comparison result; when the accuracy of the comparison result representation is lower than the accuracy threshold, namely the accuracy of the image classification model trained in the first stage does not reach the preset standard, at the moment, the image training sample of the image classification model is determined to meet the triggering condition of noise identification, and the noise identification is needed.

In some embodiments, the server may also determine whether the image training samples of the image classification model satisfy the trigger condition for noise identification by: in the first stage training process, acquiring a first accuracy corresponding to an image classification model which completes the training of the round and a second accuracy corresponding to an image classification model which completes the training of the previous round; when the difference between the first accuracy and the second accuracy is lower than a difference threshold, determining that the image training sample of the image classification model meets the triggering condition of noise recognition.

In practical application, since the number of learning rounds for the first stage training is preset, in the process of the first stage training of the image classification model, when training of the target number of rounds (for example, the number of learning rounds is 100, and the target number of rounds may be 80 th round or 100 th round) is completed, a first accuracy corresponding to the image classification model completing the present round of training and a second accuracy corresponding to the image classification model completing the previous round of training may be obtained. And comparing the difference value between the first accuracy and the second accuracy with a preset difference threshold value to determine whether the image training sample of the image classification model meets the noise recognition condition. Here, the difference value is used to characterize the optimization degree of the image classification model obtained by the training of the present round compared with the image classification model obtained by the training of the previous round. When the difference value between the first accuracy and the second accuracy is lower than a difference value threshold, namely the optimization degree does not reach a preset optimization degree standard, determining that the image training sample of the image classification model meets the triggering condition of noise recognition.

Or, in the first stage training of the image classification model, when the training of each round of image classification model is completed, the first accuracy corresponding to the image classification model which completes the round of training and the second accuracy corresponding to the image classification model which completes the previous round of training are obtained. When it is determined that the optimization degree of the image classification model obtained by training of a previous round is poor and does not reach a preset optimization degree threshold, that is, in the training of the previous round, the difference value between the first accuracy and the second accuracy is lower than the difference value threshold, it is determined that the image training sample of the image classification model meets the triggering condition of noise identification.

When the image training sample of the image classification model is determined to meet the triggering condition of noise recognition, the image training sample is characterized to possibly have noise, so that the accuracy of the image classification model obtained through multiple times of training is still insufficient. Thus, in some embodiments, the server may noise identify the image training samples of the image classification model by: extracting features of the image training sample through a feature extraction layer of the noise identification model to obtain features of the image training sample; acquiring class center characteristics of at least two classes corresponding to the image training sample; and carrying out noise recognition through a noise recognition layer of the noise recognition model based on the characteristics of the image training sample and class center characteristics of at least two classes to obtain the probability that the image training sample belongs to the noise sample.

Here, the structure of the noise recognition model including the feature extraction layer and the noise recognition layer may be the same as or different from the underlying image classification model. In practical application, the noise recognition model may also be constructed through a Residual Network (res net), referring to fig. 5, fig. 5 is a schematic diagram of a Residual module of the Residual Network model provided by the embodiment of the present invention, where the input 256 dimensions are first reduced to 64 dimensions by using 1*1 convolution, then the 256 dimensions are recovered to 256 dimensions by 3*3 convolution, and then the 1*1 convolution is performed to recover the 256 dimensions, so that the calculation amount of parameters can be reduced by constructing the noise recognition model based on the Residual Network.

The noise recognition model includes a feature extraction layer and a noise recognition layer. When the noise recognition is carried out on the image training sample of the image classification model, the feature extraction is carried out on the input image training sample through the feature extraction layer of the constructed noise recognition model, so that the features of the image training sample are obtained; simultaneously acquiring class center characteristics of at least two classes corresponding to the image training sample; and carrying out noise recognition on the characteristics of the image training sample and the class center characteristics of at least two corresponding classes through a noise recognition layer of the noise recognition model, thereby obtaining the probability that the image training sample belongs to the noise sample.

In some embodiments, the server may obtain class center features corresponding to the image training samples by: selecting an image training sample with target proportion from a plurality of image training samples of the image classification model according to at least two categories corresponding to the image training samples; respectively extracting features of the selected image training samples through a feature extraction layer of the noise identification model to obtain corresponding sample features; clustering the obtained sample features to obtain sample features of all the classes; and selecting the class center features of the target number from the sample features corresponding to the classes to obtain class center features of at least two classes.

In practical application, firstly, selecting an image training sample with a target proportion from all image training samples as a reference sample. Here, the reference sample size of each category needs to be several times larger than the preset number K of class centers, for example, more than k×50 image training samples can be selected as the reference samples, in practical application, the number 50 times can be increased or decreased according to specific situations, and the number K of class centers can also be determined according to empirical values.

And then, carrying out feature extraction on the selected image training sample through a feature extraction layer of the noise identification model to obtain corresponding sample features. And clustering the obtained sample features to obtain class center features of at least two classes. Specifically, the obtained sample features are clustered, for example, a K-Means algorithm can be adopted to obtain sample features corresponding to each category; and selecting the class center features of the target number from the sample features of the corresponding classes, for example, K cluster centers in the sample most concentrated after the sample features are clustered can be selected as the class center features of the target number.

Specifically, referring to fig. 6, fig. 6 is a flowchart of noise recognition of an image training sample according to an embodiment of the present invention. The specific flow is as follows:

Step 601: selecting a reference sample of a target proportion from all the image training samples;

step 602: extracting features of a reference sample to obtain sample features;

step 603: clustering the sample features, and selecting to obtain center-like features;

step 604: extracting features of all the image training samples to obtain features of the full image training samples;

step 605: and carrying out noise recognition based on the class center characteristics and the characteristics of the full image training samples to obtain the probability of the image training samples belonging to the noise samples, and taking the probability as the weight of the image training samples.

In some embodiments, the server may also determine the probability that the image training samples are attributed to noise samples by: respectively determining cosine distances between the features of the image training sample and various center features; and determining a target class center feature corresponding to the maximum cosine distance from class center features of at least two classes; based on the characteristics of the image training samples and the central characteristics of the target class, the probability that the image training samples belong to noise samples is determined.

In practical application, cosine distances between the features of the image training sample and various center features in the class center features of at least two categories can be calculated respectively, so that the class center feature corresponding to the maximum cosine distance is searched for in the various center features, and the class center feature corresponding to the maximum cosine distance is determined as the target class center feature; and further determining the probability that the image training sample belongs to the noise sample based on the characteristics of the image training sample and the target class center characteristics.

In some embodiments, the server may determine the probability that an image training sample belongs to a noise sample based on the characteristics of the image training sample and the target class center characteristics using the following formula:

wherein a is the characteristic of the image training sample, b is the central characteristic of the target class, and w _k The weight of the center feature of the target class can be predefined empirically, or a value proportional (such as inversely proportional) to the cosine distance can be set according to the calculated cosine distance, and W is the probability that the image training sample belongs to the noise sample.

Step 403: and taking the probability as the weight of the image training sample, and carrying out weighting treatment on the image training sample to obtain a weighted image training sample.

Step 404: and training an image classification model by using the weighted image training samples until the training of the second stage is completed.

After the weighted image training samples are obtained, training the image classification model through the weighted image training samples until the training of the second stage is completed.

In some embodiments, the server may train the image classification model by: carrying out classified prediction on the weighted image training samples through an image classification model to obtain corresponding prediction results; obtaining the difference between the prediction result and the classification label of the weighted image training sample; determining a value of a loss function of the image classification model based on the acquired differences and the weights of the weighted image training samples; model parameters of the image classification model are updated based on the values of the loss function.

In the practical training process of the image classification model, classifying and predicting the weighted image training sample through the image classification model to obtain a corresponding prediction result; obtaining the difference between the prediction result and the classification label marked by the weighted image training sample; further, a value of a loss function of the image classification model is determined based on the acquired difference and the weighted weights of the image training samples. Here, the value L of the loss function corresponding to training of the image classification model is performed by the weighted image training sample _w Corresponding to training of image classification models by unweighted image training samplesThe value L of the loss function of (2) _class The following relationship exists:

L _w ＝wL _class ；

where w is the weight of the weighted image training samples.

After the value of the loss function of the image classification model is obtained through calculation, when the value of the loss function exceeds a set loss threshold value, determining an error signal of the image classification model based on the value of the loss function; the error signal is counter-propagated in the image classification model, whereby model parameters of each layer in the image classification model are updated during the counter-propagation of the error signal. For example, error signals are reversely propagated in the image classification model by a random gradient descent method, and model parameters of the image classification model are updated and optimized in the process of reverse propagation.

The second stage of training is to train the image classification model based on the weighted image training samples. Referring to fig. 7, fig. 7 is a flowchart of a training method of an image classification model according to an embodiment of the present invention, including:

step 701: training in a first stage by adopting a full-scale image training sample;

step 702: performing accuracy detection on the image classification model which is trained in the first stage to obtain the accuracy of the image classification model;

step 703: judging whether the image training sample meets the triggering condition of noise identification or not based on accuracy, if so, executing step 704, and if not, returning to step 701;

step 704: carrying out noise recognition on the image training sample through a noise recognition model to obtain the probability of the image training sample belonging to the noise sample, and taking the probability as the weight of the image training sample;

step 705: weighting the image training samples by adopting the weights of the image training samples;

step 706: training an image classification model through the weighted image training samples;

step 707: judging whether the second stage training of the image classification model is finished, if so, executing step 708, and if not, returning to step 703;

In practical application, a learning round number threshold value of the second-stage training can be set, and when the learning round number of the second-stage training reaches the learning round number threshold value, the learning is terminated; if not, returning to the noise identification operation to continue the second stage training of the image classification model until the completion.

Step 708: and outputting the image classification model which completes the training of the second stage.

By applying the embodiment of the invention, the accuracy test is carried out on the image classification model which is trained in the first stage through the image test sample, when the accuracy determines that the image training sample of the image classification model meets the triggering condition of noise identification, the probability that the image training sample belongs to the noise sample is determined through the noise identification model, and the probability is used as the weight of the image training sample to carry out weighting treatment, so that the image classification model is trained through the weighted image training sample; therefore, on the basis that the image classification model completes the first stage, the image classification model can be trained in the second stage according to the probability that the image training sample belongs to the noise sample, and the prediction accuracy of the image classification model obtained by training is improved.

An exemplary application of the embodiments of the present invention in a practical application scenario will be described below. Referring to fig. 8, fig. 8 is a flowchart of a training method of an image classification model according to an embodiment of the present invention, where the training method of an image classification model according to an embodiment of the present invention includes:

step 801: the server adopts an image training sample to train the image classification model in a first stage.

In practical application, an image classification model is first constructed, for example, the image classification model may be constructed through a Residual Network (res net), and referring to fig. 5, fig. 5 is a schematic diagram of a Residual module of the Residual Network model provided in an embodiment of the present invention, where the input 256 dimensions are first reduced to 64 dimensions by using 1*1 convolution, then the 256 dimensions are recovered to 256 dimensions by using 3*3 convolution, and then the 1*1 convolution is performed to recover the dimensions, so that the calculation amount of parameters can be reduced.

Specifically, the image classification model comprises a feature extraction layer and a classification prediction layer, and is constructed based on ResNet-101. As shown in table 1, the feature extraction layer of the image classification model comprises 5 fully connected layers of Conv1-Conv 5; as shown in table 2, the classification prediction layer of the image classification model comprises a pooling layer and a full connection layer; the image classification model is a multi-category image classification model, namely the image classification model can carry out classification prediction of various categories for one image, for example, whether one image belongs to an animal class, whether one image belongs to a landscape class or not can be identified at the same time; or may identify whether an image belongs to "clothing class", whether it belongs to "shirt class", etc.

TABLE 1 structural Table of ResNet-101 feature extraction layer

Layer name	Output size	Layer
			Pool_cr	1x2048	Max pool
Fc_cr	1xN	full connetction

TABLE 2 structural Table of ResNet-101 based Classification prediction layer, N is the number of categories learned

After the image classification model is built, setting model parameters of the image classification model to a state needing to be learned, and then training the image classification model in a first stage by adopting a large number of image training samples. Specifically, for the first stage training, a corresponding training wheel number is set, and the image classification model is trained according to the training wheel number, for example, if the training wheel number is set to 100, then the image classification model is trained for 100 wheels through the image training sample, and when the training wheel number reaches 100, the first stage training of the image classification model is characterized to be completed.

Step 802: and after the first-stage training is finished, adopting the image test sample to detect the accuracy of the image classification model, and obtaining the accuracy of the image classification model.

Here, in order to obtain an image classification model with higher classification prediction accuracy, after the first stage training is completed, an image test sample is used to perform accuracy detection on the image classification model after the first stage training is completed, so as to determine the accuracy of the image classification model. Thereby facilitating a determination of whether continued training is required to improve the classification prediction accuracy of the image classification model.

Step 803: determining whether the image training samples of the image classification model meet trigger conditions for noise recognition based on accuracy.

In practical applications, the accuracy threshold may be preset. After determining the accuracy of the image classification model trained in the first stage, comparing the accuracy of the image classification model with a preset accuracy threshold value to obtain a comparison result; when the accuracy of the comparison result representation is lower than the accuracy threshold, namely the accuracy of the image classification model trained in the first stage does not reach the preset standard, at the moment, the image training sample of the image classification model is determined to meet the triggering condition of noise identification, and the noise identification is needed.

Or, in practical application, since the number of learning rounds for the first stage training is preset, in the process of the first stage training of the image classification model, when the training of the target number of rounds (for example, the number of learning rounds is 100, and the target number of rounds may be the 80 th round, the 90 th round or the 100 th round) is completed, the first accuracy corresponding to the image classification model completing the training of the present round and the second accuracy corresponding to the image classification model completing the previous round may be obtained. And comparing the difference value between the first accuracy and the second accuracy with a preset difference threshold value to determine whether the image training sample of the image classification model meets the noise recognition condition. Here, the difference value is used to characterize the optimization degree of the image classification model obtained by the training of the present round compared with the image classification model obtained by the training of the previous round. When the difference value between the first accuracy and the second accuracy is lower than a difference threshold, namely, the optimization degree does not reach a preset optimization degree standard, determining that the image training sample of the image classification model meets the triggering condition of noise recognition.

Step 804: if not, the first stage training is continued.

Here, if the image training sample of the image classification model does not satisfy the trigger condition for noise recognition, the first-stage training of the image classification model is continued using the image training sample.

Step 805: and if so, selecting an image training sample with target proportion from a plurality of image training samples of the image classification model according to at least two categories corresponding to the image training samples.

Here, if the image training sample of the image classification model satisfies the trigger condition of noise recognition, noise may exist to characterize the image training sample, so that the accuracy of the image classification model obtained through multiple rounds of training is still insufficient. Thus requiring noise recognition of the image training samples.

Specifically, first, an image training sample of a target proportion is selected from all image training samples as a reference sample. Here, the reference sample size of each category needs to be several times larger than the preset number K of class centers, for example, more than k×50 image training samples can be selected as the reference samples, and in practical application, the 50 times can be increased or decreased according to specific situations.

Step 806: and respectively carrying out feature extraction on the selected image training samples through a feature extraction layer of the noise identification model to obtain corresponding sample features.

Here, the noise recognition model may have the same structure as the basic image classification model or may be different from the basic image classification model. The noise recognition model includes a feature extraction layer and a noise recognition layer. And extracting the characteristics of the selected image training sample through a characteristic extraction layer of the noise identification model to obtain corresponding sample characteristics.

Step 807: and clustering the obtained sample features to obtain class center features of at least two classes.

And clustering the obtained sample features to obtain class center features of at least two classes. Specifically, the obtained sample features are clustered, for example, a K-Means algorithm can be adopted to obtain sample features corresponding to each category; and selecting the class center features of the target number from the sample features of the corresponding classes, for example, K cluster centers in the sample most concentrated after the sample features are clustered can be selected as the class center features of the target number.

Step 808: and extracting the characteristics of the image training sample through a characteristic extraction layer of the noise identification model to obtain the characteristics of the image training sample.

Here, the feature extraction layer of the noise recognition model performs feature extraction on all the image training samples to obtain features of the image training samples.

Step 809: and respectively determining cosine distances between the features of the image training sample and various center features.

Step 810: and determining the target class center feature corresponding to the maximum cosine distance from class center features of at least two classes.

Step 811: based on the characteristics of the image training samples and the central characteristics of the target class, the probability that the image training samples belong to noise samples is determined.

Based on the characteristics of the image training sample and the center characteristics of the target class, the probability that the image training sample belongs to the noise sample is determined by adopting the following formula:

wherein a is the characteristic of the image training sample, b is the central characteristic of the target class, and w _k And W is the probability that the image training sample belongs to the noise sample.

step 602: extracting features of a reference sample to obtain sample features;

Step 812: and taking the probability as the weight of the image training sample, and carrying out weighting treatment on the image training sample to obtain a weighted image training sample.

Step 813: and training an image classification model by using the weighted image training samples until the training of the second stage is completed.

In the practical training process of the image classification model, classifying and predicting the weighted image training sample through the image classification model to obtain a corresponding prediction result; obtaining the difference between the prediction result and the classification label marked by the weighted image training sample; further, a value of a loss function of the image classification model is determined based on the acquired difference and the weighted weights of the image training samples. Here, the value L of the loss function corresponding to training of the image classification model is performed by the weighted image training sample _w And pass unweighted image training samplesThe value L of the loss function corresponding to the training of the image classification model _class The following relationship exists:

L _w ＝wL _class ；

where w is the weight of the weighted image training samples.

Here, the second stage training refers to training an image classification model based on the weighted image training samples. Referring to fig. 7, fig. 7 is a flowchart of a training method of an image classification model according to an embodiment of the present invention, including:

In practical applications, the image classification model trained in the second stage can be used for performing classification prediction on the images to be classified.

Step 814: the terminal sends a classification instruction aiming at the image to be classified.

Step 815: and the server responds to the classification instruction, performs classification prediction on the object to be classified through completing the image classification model trained in the second stage, obtains a corresponding classification result and returns the classification result to the terminal.

Referring to fig. 9, fig. 9 is a schematic diagram of an image classification model applied to clothing image classification according to an embodiment of the present invention. The image classification model is a classification model of a clothing image, such as shirt, T-shirt, underwear priming and the like, a user can import the clothing image into a front-end terminal, the front-end terminal is uploaded to a background server, and the background server classifies and identifies the received clothing image through the image classification model trained in the second stage to obtain a corresponding classification result, such as that the clothing image input by the user belongs to the underwear priming class.

Continuing with the description of the training device 355 for image classification models provided in embodiments of the present invention, in some embodiments, the training device for image classification models may be implemented by using a software module. Referring to fig. 10, fig. 10 is a schematic structural diagram of a training device 355 for an image classification model according to an embodiment of the present invention, where the training device 355 for an image classification model according to an embodiment of the present invention includes:

The detection module 3551 is configured to detect accuracy of the image classification model after the first-stage training by using an image test sample, so as to determine accuracy of the image classification model;

the noise recognition module 3552 is configured to, when it is determined based on the accuracy that the image training sample of the image classification model meets a trigger condition of noise recognition, perform noise recognition on the image training sample of the image classification model through the noise recognition model, so as to obtain a probability that the image training sample belongs to the noise sample;

the weighting module 3553 is configured to take the probability as a weight of the image training sample, and perform weighting processing on the image training sample to obtain a weighted image training sample;

and the training module 3554 is configured to train the image classification model by using the weighted image training samples until the training of the second stage is completed.

In some embodiments, the apparatus further comprises:

In some embodiments, the determining module is further configured to obtain, during the first stage of training, a first accuracy corresponding to an image classification model that completes the training of the present round, and a second accuracy corresponding to an image classification model that completes the training of the previous round;

In some embodiments, the noise recognition module 3552 is further configured to perform feature extraction on the image training sample through a feature extraction layer of the noise recognition model, so as to obtain features of the image training sample;

In some embodiments, the noise recognition module 3552 is further configured to select, from a plurality of image training samples of the image classification model, an image training sample of a target proportion according to at least two categories corresponding to the image training sample;

In some embodiments, the noise identification module 3552 is further configured to determine cosine distances between the features of the image training sample and each of the class center features;

In some embodiments, the noise recognition module 3552 is further configured to determine, based on the characteristics of the image training samples and the target class center characteristics, a probability that the image training samples belong to noise samples using the following formula:

In some embodiments, the training module 3554 is further configured to perform classification prediction on the weighted image training samples through the image classification model, so as to obtain a corresponding prediction result;

In some embodiments, the detection module 3551 is further configured to perform classification prediction on the image test sample through an image classification model that completes the first stage training, so as to obtain a corresponding classification result;

a memory for storing executable instructions;

In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories. The computer may be a variety of computing devices including smart terminals and servers.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.

The foregoing is merely exemplary embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A method of training an image classification model, the method comprising:

selecting a reference sample of a target proportion from the image training samples; extracting the characteristics of the reference sample through a characteristic extraction layer of the noise identification model to obtain corresponding sample characteristics; clustering the sample features, and selecting class center features of at least two classes; extracting features of the image training samples to obtain features of the full-scale image training samples;

respectively determining cosine distances between the features of the full image training sample and the center-like features;

based on the characteristics of the full image training samples and the target class center characteristics, carrying out noise recognition through a noise recognition layer of the noise recognition model to obtain the probability that the image training samples belong to the noise samples;

2. The method of claim 1, wherein the method further comprises:

comparing the accuracy of the image classification model with an accuracy threshold value to obtain a comparison result;

3. The method of claim 1, wherein the method further comprises:

in the first-stage training process, acquiring a first accuracy corresponding to an image classification model which completes the training of the first stage and a second accuracy corresponding to an image classification model which completes the training of the last stage;

4. The method of claim 1, wherein the performing noise recognition by the noise recognition layer of the noise recognition model based on the features of the full-scale image training sample and the target class center feature, obtaining the probability that the image training sample belongs to a noise sample, comprises:

Based on the features of the full image training samples and the target class center features, the probability that the image training samples belong to noise samples is determined by adopting the following formula:

wherein a is the characteristic of the full-scale image training sample, b is the central characteristic of the target class, and w _k And W is the probability that the image training sample belongs to a noise sample for the weight of the target class center characteristic.

5. The method of claim 1, wherein training the image classification model using the weighted image training samples comprises:

carrying out classified prediction on the weighted image training samples through the image classification model to obtain corresponding prediction results;

6. The method of claim 1, wherein using the image test sample to perform an accuracy test on the image classification model that has been trained in the first stage to determine the accuracy of the image classification model comprises:

Carrying out classification prediction on the image test sample through completing the image classification model trained in the first stage to obtain a corresponding classification result;

7. An apparatus for training an image classification model, the apparatus comprising:

the noise recognition module is used for selecting a reference sample with target proportion from the image training samples when the accuracy determines that the image training samples of the image classification model meet the triggering conditions of noise recognition; extracting the characteristics of the reference sample through a characteristic extraction layer of the noise identification model to obtain corresponding sample characteristics; clustering the sample features, and selecting class center features of at least two classes; extracting features of the image training samples to obtain features of the full-scale image training samples; respectively determining cosine distances between the features of the full image training sample and the center-like features; and determining a target class center feature corresponding to the maximum cosine distance from class center features of the at least two classes; based on the characteristics of the full image training samples and the target class center characteristics, carrying out noise recognition through a noise recognition layer of the noise recognition model to obtain the probability that the image training samples belong to the noise samples;

8. A computer readable storage medium storing executable instructions for implementing the training method of the image classification model of any one of claims 1 to 6 when executed by a processor.